MIS - Mutual Information Score
The Mutual Information Score (MIS) [44] is an external clustering evaluation metric that quantifies the “amount of information” (in nats) shared between the ground truth labels (\(y_{true}\)) and the predicted labels (\(y_{pred}\)).
Intuitively, MIS measures the reduction in uncertainty about one clustering partition given knowledge of the other. If the two partitions are identical, MIS reaches its maximum; if they are independent, MIS is zero. Unlike metrics based on pair counting (like Rand Score), MIS is based on information theory and is invariant to permutations of cluster labels.
Where:
\(Y\) is the set of ground truth classes.
\(P\) is the set of predicted clusters.
\(P(y, p)\) is the joint probability of a sample belonging to class \(y\) and cluster \(p\).
\(P(y)\) and \(P(p)\) are the marginal probabilities.
Properties
Best possible score: No strict upper bound (the score is upper-bounded by the entropy of the partition).
Worst possible score:
0.0(Indicates the two partitions are completely independent).Range:
[0.0, +inf)References: Scikit-Learn Mutual Information
Example Usage
from permetrics.clustering import ClusteringMetric
import numpy as np
# ==============================================================================
# SCENARIO 1: Basic Evaluation
# ==============================================================================
print("--- 1. BASIC MUTUAL INFORMATION SCORE EXAMPLE ---")
y_true = [0, 0, 1, 1, 2, 2]
y_pred = [0, 0, 1, 1, 2, 2]
# Initialize the metric object
cm = ClusteringMetric(y_true=y_true, y_pred=y_pred)
# Calculate the Mutual Information Score
mis_score = cm.MIS()
print(f"Mutual Information Score: {mis_score}")
# ==============================================================================
# SCENARIO 2: Perfect Agreement vs. Random Partition
# ==============================================================================
print("\n--- 2. AGREEMENT ANALYSIS ---")
# Perfect match
print(f"Perfect Agreement: {cm.MIS(y_true=[0, 0, 1], y_pred=[0, 0, 1])}")
# Random partition
print(f"Random Partition: {cm.MIS(y_true=[0, 0, 1], y_pred=[1, 0, 0])}")