HI - Hartigan Index

The Hartigan Index (HI) is an internal clustering evaluation metric. It assesses the clustering quality by calculating the ratio of the within-cluster sum of squares to the sum of squares between clusters, specifically focusing on the relative dispersion of each cluster compared to its nearest neighbor.

Intuitively, HI answers the question: “Does the internal compactness of a cluster justify its existence compared to the next closest cluster?” A lower HI value indicates a better clustering partition, as it implies that the intra-cluster dispersion is small relative to the distance to the nearest competing cluster.

\[\text{HI} = \sum_{k=1}^{K} \left( \frac{\sum_{x_i \in C_k} ||x_i - c_k||^2}{\sum_{x_i \in C_k} ||x_i - c_{\text{nearest}}||^2} \right)\]

Where:

\(K\) is the total number of clusters.
\(c_k\) is the centroid of cluster \(k\).
\(c_{\text{nearest}}\) is the centroid of the cluster closest to cluster \(k\).
The numerator is the within-cluster dispersion (SSE) of cluster \(k\).
The denominator is the dispersion of cluster \(k\) relative to the nearest neighboring cluster.

Handling Edge Cases (Finite Values)

The Hartigan Index involves comparing clusters and their neighbors. It is mathematically undefined when there is only one cluster (\(K = 1\)), as there are no “nearest neighbors” to compare against.

force_finite (bool): If True, the function catches the undefined operation and returns a safe, finite number instead of raising a ValueError. Default is True.
finite_value (float): The fallback value returned when force_finite=True and the clustering has only 1 cluster. Since a smaller score is better for HI, the default fallback is a large penalty value (1e10).

Properties 

Best possible score: 0.0 (Smaller value is better).
Worst possible score: +inf (or the defined penalty finite_value).
Range: [0.0, +inf)

Example Usage 

from permetrics.clustering import ClusteringMetric
import numpy as np

# ==============================================================================
# SCENARIO 1: Basic Evaluation
# ==============================================================================
print("--- 1. BASIC HARTIGAN INDEX EXAMPLE ---")

X_data = np.array([[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10, 0]])
y_pred_labels = np.array([0, 0, 0, 1, 1, 1])

cm = ClusteringMetric(X=X_data, y_pred=y_pred_labels)
hi_score = cm.HI()
print(f"Hartigan Index: {hi_score}")

# ==============================================================================
# SCENARIO 2: Edge Case with 1 Cluster
# ==============================================================================
print("\n--- 2. EDGE CASE (1 CLUSTER) EXAMPLE ---")

y_pred_single = np.array([0, 0, 0, 0, 0, 0])
cm_single = ClusteringMetric(X=X_data, y_pred=y_pred_single)

# Returns the penalty finite_value (1e10)
hi_safe = cm_single.HI(force_finite=True, finite_value=1e10)
print(f"HI with 1 cluster (Safe Mode): {hi_safe}")

HI - Hartigan Index

Handling Edge Cases (Finite Values)

Properties

Example Usage

Properties 

Example Usage 