MSEI - Mean Squared Error Index

The Mean Squared Error Index (MSEI) is a direct derivative of the Sum of Squared Error Index (SSEI). While SSEI computes the total within-cluster dispersion, MSEI calculates the average dispersion per data point.

Intuitively, it answers the question: “On average, what is the squared distance from any given data point to its assigned cluster centroid?” A lower MSEI indicates that the clusters are highly compact and the data points are tightly grouped around their respective centers.

\[\text{MSEI} = \frac{1}{N} \sum_{k=1}^{K} \sum_{x_i \in C_k} ||x_i - c_k||^2\]

Where:

  • \(N\) is the total number of data points (samples).

  • \(K\) is the total number of clusters.

  • \(C_k\) is the set of data points assigned to the \(k\)-th cluster.

  • \(c_k\) is the centroid (mean) of cluster \(C_k\).

  • \(x_i\) is a data point belonging to cluster \(C_k\).


Algorithmic Variations (Performance Note)

Similar to the SSEI metric, this implementation calculates the mean dispersion using a highly optimized, vectorized approach. By mapping the precomputed centroids directly to the samples via centers[y_pred], it completely avoids inefficient nested loops and executes in \(O(N)\) time, making it exceptionally fast and memory-safe for large datasets.


Properties

  • Best possible score: 0.0 (Smaller value is better. A score of 0 indicates that every data point lies perfectly on top of its cluster centroid).

  • Worst possible score: No strict upper bound.

  • Range: [0.0, +inf)


Example Usage

from permetrics.clustering import ClusteringMetric
import numpy as np

# ==============================================================================
# SCENARIO 1: Basic Evaluation
# ==============================================================================
print("--- 1. BASIC MEAN SQUARED ERROR INDEX EXAMPLE ---")

# Features (X) and predicted cluster labels (y_pred)
X_data = np.array([[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10, 0]])
y_pred_labels = np.array([0, 0, 0, 1, 1, 1])

# Initialize the metric object
cm = ClusteringMetric(X=X_data, y_pred=y_pred_labels)
msei_score = cm.MSEI()
print(f"Mean Squared Error Index: {msei_score}")

# ==============================================================================
# SCENARIO 2: Using the static method directly
# ==============================================================================
print("\n--- 2. STATIC METHOD USAGE ---")

# Bypass object instantiation if you only need a single calculation
msei_static = ClusteringMetric.calculate_mean_squared_error_index(X=X_data, y_pred=y_pred_labels)
print(f"Mean Squared Error Index (Static): {msei_static}")