MIS - Mutual Information Score

The Mutual Information Score (MIS) [44] is an external clustering evaluation metric that quantifies the “amount of information” (in nats) shared between the ground truth labels (\(y_{true}\)) and the predicted labels (\(y_{pred}\)).

Intuitively, MIS measures the reduction in uncertainty about one clustering partition given knowledge of the other. If the two partitions are identical, MIS reaches its maximum; if they are independent, MIS is zero. Unlike metrics based on pair counting (like Rand Score), MIS is based on information theory and is invariant to permutations of cluster labels.

\[\text{MIS}(Y, P) = \sum_{y \in Y} \sum_{p \in P} P(y, p) \log \left( \frac{P(y, p)}{P(y)P(p)} \right)\]

Where:

  • \(Y\) is the set of ground truth classes.

  • \(P\) is the set of predicted clusters.

  • \(P(y, p)\) is the joint probability of a sample belonging to class \(y\) and cluster \(p\).

  • \(P(y)\) and \(P(p)\) are the marginal probabilities.


Properties

  • Best possible score: No strict upper bound (the score is upper-bounded by the entropy of the partition).

  • Worst possible score: 0.0 (Indicates the two partitions are completely independent).

  • Range: [0.0, +inf)

  • References: Scikit-Learn Mutual Information


Example Usage

from permetrics.clustering import ClusteringMetric
import numpy as np

# ==============================================================================
# SCENARIO 1: Basic Evaluation
# ==============================================================================
print("--- 1. BASIC MUTUAL INFORMATION SCORE EXAMPLE ---")

y_true = [0, 0, 1, 1, 2, 2]
y_pred = [0, 0, 1, 1, 2, 2]

# Initialize the metric object
cm = ClusteringMetric(y_true=y_true, y_pred=y_pred)
# Calculate the Mutual Information Score
mis_score = cm.MIS()
print(f"Mutual Information Score: {mis_score}")

# ==============================================================================
# SCENARIO 2: Perfect Agreement vs. Random Partition
# ==============================================================================
print("\n--- 2. AGREEMENT ANALYSIS ---")

# Perfect match
print(f"Perfect Agreement: {cm.MIS(y_true=[0, 0, 1], y_pred=[0, 0, 1])}")
# Random partition
print(f"Random Partition:  {cm.MIS(y_true=[0, 0, 1], y_pred=[1, 0, 0])}")