BHI - Ball-Hall Index

The Ball-Hall Index (BHI) is an internal clustering evaluation metric that measures the mean of the within-cluster dispersions. Introduced in 1965, it is based on the Sum of Squared Errors (SSE) within each cluster.

Intuitively, the Ball-Hall Index evaluates how compact the clusters are. It answers the question: “On average, how far are the data points in a cluster from their respective centroid?” A smaller value indicates denser, more compact clusters.

\[\text{BHI} = \frac{1}{K} \sum_{k=1}^{K} \sum_{x_i \in C_k} ||x_i - \bar{x}_k||^2\]

Where:

  • \(K\) is the total number of clusters.

  • \(C_k\) is the set of data points assigned to the \(k\)-th cluster.

  • \(\bar{x}_k\) is the centroid (mean) of cluster \(C_k\).

  • \(x_i\) is a data point belonging to cluster \(C_k\).


Properties


Example Usage

from permetrics.clustering import ClusteringMetric
import numpy as np

# ==============================================================================
# SCENARIO 1: Basic Evaluation (Internal Metric requires X and y_pred)
# ==============================================================================
print("--- 1. BASIC BALL-HALL INDEX EXAMPLE ---")

# Features (X) and predicted cluster labels (y_pred)
X_data = np.array([[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10, 0]])
y_pred_labels = np.array([0, 0, 0, 1, 1, 1])

# Initialize the metric object
cm = ClusteringMetric(X=X_data, y_pred=y_pred_labels)

# Calculate the Ball-Hall Index
bhi_score = cm.BHI()
print(f"Ball-Hall Index: {bhi_score}")

# ==============================================================================
# SCENARIO 2: Using the static method directly
# ==============================================================================
print("\n--- 2. STATIC METHOD USAGE ---")

# If you prefer to bypass object instantiation
bhi_static = ClusteringMetric.ball_hall_index(X=X_data, y_pred=y_pred_labels)
print(f"Ball-Hall Index (Static): {bhi_static}")