NMIS - Normalized Mutual Information Score

The Normalized Mutual Information Score (NMIS) is an external clustering evaluation metric that scales the Mutual Information Score (MIS) to a range of [0, 1]. This normalization addresses the limitation of MIS, where the maximum value depends on the entropy of the partition, making it difficult to interpret or compare across different datasets.

Intuitively, NMIS answers: “How much information is shared between the ground truth and the prediction, relative to the total entropy of both?” A value of 1.0 indicates perfect agreement between partitions, while 0.0 indicates complete independence.

\[\text{NMIS}(Y, P) = \frac{\text{MIS}(Y, P)}{\text{mean}(\text{H}(Y), \text{H}(P))}\]

Where:

  • \(\text{MIS}(Y, P)\) is the Mutual Information Score.

  • \(\text{H}(Y)\) and \(\text{H}(P)\) are the entropy of the true labels and predicted labels, respectively.

  • The denominator is the arithmetic mean of the two entropies, which normalizes the score.


Handling Edge Cases (Finite Values)

The calculation of NMIS becomes undefined if either the ground truth or the predicted labels consist of only a single cluster (i.e., entropy is zero), as this would lead to division by zero.

  • force_finite (bool): If True, the function catches the undefined operation and returns a safe, finite number instead of raising a ValueError. Default is True.

  • finite_value (float): The specific fallback value returned when force_finite=True and the clustering has only 1 cluster. Default is 0.0.


Properties


Example Usage

from permetrics.clustering import ClusteringMetric
import numpy as np

# ==============================================================================
# SCENARIO 1: Basic Evaluation
# ==============================================================================
print("--- 1. BASIC NORMALIZED MUTUAL INFORMATION SCORE EXAMPLE ---")

y_true = [0, 0, 1, 1, 2, 2]
y_pred = [0, 0, 1, 1, 2, 2]

# Initialize the metric object
cm = ClusteringMetric(y_true=y_true, y_pred=y_pred)
# Calculate the Normalized Mutual Information Score
nmis_score = cm.NMIS()
print(f"Normalized Mutual Information Score: {nmis_score}")

# ==============================================================================
# SCENARIO 2: Edge Case with 1 Cluster
# ==============================================================================
print("\n--- 2. EDGE CASE (1 CLUSTER) EXAMPLE ---")

# One partition has only 1 cluster, causing undefined entropy
cm_single = ClusteringMetric(y_true=[0, 0, 0], y_pred=[0, 0, 0])

# Returns the finite_value (0.0) instead of crashing
nmis_safe = cm_single.NMIS(force_finite=True, finite_value=0.0)
print(f"NMIS with 1 cluster (Safe Mode): {nmis_safe}")