XBI - Xie-Beni Index
The Xie-Beni Index (XBI) [35] is an internal clustering validation metric that measures the ratio of the total within-cluster variance (compactness) to the minimum squared distance between cluster centroids (separation).
Originally introduced for fuzzy clustering, it is widely adapted for hard clustering evaluations. Intuitively, it answers the question: “How compact are the clusters relative to the distance between the two closest clusters?” A smaller XBI value indicates a better clustering partition, implying that clusters are highly compact and well-separated.
Where:
\(N\) is the total number of data points.
\(K\) is the number of clusters.
\(C_k\) is the set of data points assigned to the \(k\)-th cluster.
\(c_k\) and \(c_j\) are the centroids of clusters \(k\) and \(j\) respectively.
The numerator represents the mean squared error (WGSS / N).
The denominator represents the minimum squared Euclidean distance between any two cluster centroids.
Handling Edge Cases (Finite Values)
The Xie-Beni index requires calculating the distance between at least two distinct cluster centroids. It is mathematically undefined when there is only one cluster (\(K = 1\)).
force_finite (bool): If
True, the function catches the undefined operation and returns a safe, finite number instead of raising aValueError. Default isTrue.finite_value (float): The specific fallback value returned when
force_finite=Trueand the clustering has only 1 cluster. Since smaller is better for XBI, the default fallback is a large penalty value (1e10).
Properties
Best possible score:
0.0(Smaller value is better).Worst possible score:
+inf(or the defined penaltyfinite_value).Range:
[0.0, +inf)
Example Usage
from permetrics.clustering import ClusteringMetric
import numpy as np
# ==============================================================================
# SCENARIO 1: Normal Clustering Evaluation
# ==============================================================================
print("--- 1. BASIC XIE-BENI INDEX EXAMPLE ---")
X_data = np.array([[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10, 0]])
y_pred_labels = np.array([0, 0, 0, 1, 1, 1])
cm = ClusteringMetric(X=X_data, y_pred=y_pred_labels)
xbi_score = cm.XBI()
print(f"Xie-Beni Index: {xbi_score}")
# ==============================================================================
# SCENARIO 2: Edge Case with 1 Cluster (Demonstrating force_finite)
# ==============================================================================
print("\n--- 2. EDGE CASE (1 CLUSTER) EXAMPLE ---")
# All data points are predicted to be in the same single cluster (label 0)
y_pred_single = np.array([0, 0, 0, 0, 0, 0])
cm_single = ClusteringMetric(X=X_data, y_pred=y_pred_single)
# Returns the penalty finite_value (1e10) instead of crashing
xbi_safe = cm_single.XBI(force_finite=True, finite_value=1e10)
print(f"XBI with 1 cluster (Safe Mode): {xbi_safe}")