RSI - R-Squared Index ===================== .. toctree:: :maxdepth: 3 .. contents:: Table of Contents :local: :depth: 2 The **R-Squared Index (RSI)** (also known as the **Coefficient of Determination for Clustering**) is an internal evaluation metric. It measures the proportion of the total variance in the dataset that is explained by the clustering partition. Intuitively, RSI acts similarly to the R-squared score in linear regression models. It answers the question: *"How much of the total dispersion of the data is captured by grouping the points into these clusters?"* A higher RSI value indicates a more optimal partition, implying that clusters are tightly cohesive and capture the vast majority of the dataset's variation. .. math:: \text{RSI} = \frac{\text{TSS} - \text{WGSS}}{\text{TSS}} = \frac{\text{BGSS}}{\text{TSS}} Where: * :math:`\text{TSS}` is the Total Sum of Squares (total dispersion of all data points around the global dataset centroid). * :math:`\text{WGSS}` is the Within-Group Sum of Squares (total dispersion of data points around their respective cluster centroids). * :math:`\text{BGSS}` is the Between-Group Sum of Squares (:math:`\text{TSS} - \text{WGSS}`). ------------------------------------------------------------------------------- Properties ---------- * **Best possible score:** ``1.0`` (Higher value is better, indicating that 100% of the data variance is explained by the cluster centers). * **Worst possible score:** ``0.0`` (Indicates the clustering captures zero variance, performing no better than a single global mean). * **Range:** ``[0.0, 1.0]`` ------------------------------------------------------------------------------- Example Usage ------------- .. code-block:: python :emphasize-lines: 12,13,22,23 from permetrics.clustering import ClusteringMetric import numpy as np # ============================================================================== # SCENARIO 1: Basic Evaluation # ============================================================================== print("--- 1. BASIC R-SQUARED INDEX EXAMPLE ---") X_data = np.array([[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10, 0]]) y_pred_labels = np.array([0, 0, 0, 1, 1, 1]) cm = ClusteringMetric(X=X_data, y_pred=y_pred_labels) rsi_score = cm.RSI() print(f"R-Squared Index: {rsi_score}") # ============================================================================== # SCENARIO 2: Single Cluster Evaluation (Zero variance explained) # ============================================================================== print("\n--- 2. SINGLE CLUSTER EXAMPLE ---") y_pred_single = np.array([0, 0, 0, 0, 0, 0]) cm_single = ClusteringMetric(X=X_data, y_pred=y_pred_single) rsi_single = cm_single.RSI() print(f"RSI with 1 cluster: {rsi_single}")