XBI - Xie-Beni Index

The Xie-Beni Index (XBI) [35] is an internal clustering validation metric that measures the ratio of the total within-cluster variance (compactness) to the minimum squared distance between cluster centroids (separation).

Originally introduced for fuzzy clustering, it is widely adapted for hard clustering evaluations. Intuitively, it answers the question: “How compact are the clusters relative to the distance between the two closest clusters?” A smaller XBI value indicates a better clustering partition, implying that clusters are highly compact and well-separated.

\[\text{XBI} = \frac{\frac{1}{N} \sum_{k=1}^{K} \sum_{x_i \in C_k} ||x_i - c_k||^2}{\min_{j \neq k} ||c_j - c_k||^2}\]

Where:

  • \(N\) is the total number of data points.

  • \(K\) is the number of clusters.

  • \(C_k\) is the set of data points assigned to the \(k\)-th cluster.

  • \(c_k\) and \(c_j\) are the centroids of clusters \(k\) and \(j\) respectively.

  • The numerator represents the mean squared error (WGSS / N).

  • The denominator represents the minimum squared Euclidean distance between any two cluster centroids.


Handling Edge Cases (Finite Values)

The Xie-Beni index requires calculating the distance between at least two distinct cluster centroids. It is mathematically undefined when there is only one cluster (\(K = 1\)).

  • force_finite (bool): If True, the function catches the undefined operation and returns a safe, finite number instead of raising a ValueError. Default is True.

  • finite_value (float): The specific fallback value returned when force_finite=True and the clustering has only 1 cluster. Since smaller is better for XBI, the default fallback is a large penalty value (1e10).


Properties

  • Best possible score: 0.0 (Smaller value is better).

  • Worst possible score: +inf (or the defined penalty finite_value).

  • Range: [0.0, +inf)


Example Usage

from permetrics.clustering import ClusteringMetric
import numpy as np

# ==============================================================================
# SCENARIO 1: Normal Clustering Evaluation
# ==============================================================================
print("--- 1. BASIC XIE-BENI INDEX EXAMPLE ---")

X_data = np.array([[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10, 0]])
y_pred_labels = np.array([0, 0, 0, 1, 1, 1])

cm = ClusteringMetric(X=X_data, y_pred=y_pred_labels)
xbi_score = cm.XBI()
print(f"Xie-Beni Index: {xbi_score}")

# ==============================================================================
# SCENARIO 2: Edge Case with 1 Cluster (Demonstrating force_finite)
# ==============================================================================
print("\n--- 2. EDGE CASE (1 CLUSTER) EXAMPLE ---")

# All data points are predicted to be in the same single cluster (label 0)
y_pred_single = np.array([0, 0, 0, 0, 0, 0])
cm_single = ClusteringMetric(X=X_data, y_pred=y_pred_single)

# Returns the penalty finite_value (1e10) instead of crashing
xbi_safe = cm_single.XBI(force_finite=True, finite_value=1e10)
print(f"XBI with 1 cluster (Safe Mode): {xbi_safe}")