KDI - Ksq-DetW Index

The Ksq-DetW Index (KDI), closely related to Marriott’s criterion, is an internal clustering validation metric. It evaluates the quality of a partition by calculating the determinant of the pooled within-cluster scatter matrix scaled by the square of the number of clusters.

Intuitively, the determinant of the within-cluster scatter matrix (\(|W|\)) measures the internal “volume” or dispersion of the clusters. By multiplying this volume by \(K^2\), the metric introduces a penalty for adding more clusters. It answers the question: “Are the clusters tightly packed enough to justify the complexity of using K clusters?” When comparing different values of \(K\), a local minimum or a sharp drop in this index often indicates the optimal number of clusters.

\[\text{KDI} = K^2 \times |W|\]

Where:

  • \(K\) is the total number of clusters.

  • \(W\) is the pooled within-cluster scatter matrix (the sum of the scatter matrices of all individual clusters).

  • \(|W|\) is the determinant of matrix \(W\).


Normalization Strategy

The function provides an internal scaling mechanism to prevent the determinant from exploding (overflow) or vanishing (underflow) when dealing with features of vastly different magnitudes:

  • use_normalized (bool): If True (default), the pooled scatter matrix \(W\) undergoes a Min-Max normalization element-wise before its determinant is calculated: \(W_{norm} = \frac{W - \min(W)}{\max(W) - \min(W)}\). This ensures the matrix elements are bounded between 0 and 1, providing numerical stability.


Properties

  • Best possible score: No strictly defined “best” score for a single evaluation. It is typically used comparatively across different values of \(K\) to find an elbow or minimum.

  • Worst possible score: No strictly defined worst score.

  • Range: (-inf, +inf) (Due to the optional Min-Max normalization, the determinant of the shifted matrix can potentially span across real numbers).

  • References: Marriott, F. H. C. (1971). Practical problems in a method of cluster analysis. Biometrics.


Example Usage

from permetrics.clustering import ClusteringMetric
import numpy as np

# ==============================================================================
# SCENARIO 1: Normal Evaluation (with Default Normalization)
# ==============================================================================
print("--- 1. BASIC KSQ-DETW INDEX EXAMPLE ---")

X_data = np.array([[1, 2, 1], [1, 4, 2], [1, 0, 1], [10, 2, 8], [10, 4, 9], [10, 0, 7]])
y_pred_labels = np.array([0, 0, 0, 1, 1, 1])

cm = ClusteringMetric(X=X_data, y_pred=y_pred_labels)
# Calculates KDI with use_normalized=True by default
kdi_score_norm = cm.KDI()
print(f"Ksq-DetW Index (Normalized): {kdi_score_norm}")

# ==============================================================================
# SCENARIO 2: Evaluation Without Normalization
# ==============================================================================
print("\n--- 2. RAW MATRIX DETERMINANT EXAMPLE ---")

# If your data is already strictly standardized and you want the raw mathematical determinant
kdi_score_raw = cm.KDI(use_normalized=False)
print(f"Ksq-DetW Index (Raw Matrix): {kdi_score_raw}")