MNS - McNemar Score

The McNemar Score (MNS) is an external clustering evaluation metric derived from McNemar’s non-parametric hypothesis test. In the context of partition comparison, it evaluates clustering quality by measuring the asymmetry between True Negatives (\(nn\)) and False Positives (\(ny\)).

Intuitively, MNS answers the question: “When the model decides to separate two points into different clusters, does it do so correctly (\(nn\)), or is it erroneously breaking apart points that belong to the same true class (\(ny\))?” A higher score indicates a favorable balance where correct separations vastly outnumber false-positive co-clusterings.

\[\text{MNS} = \frac{nn - ny}{\sqrt{nn + ny}}\]

Where across all pairs of distinct data points:

  • \(nn\) (True Negatives): Number of pairs placed in different clusters in both the ground truth (\(y_{true}\)) and the prediction (\(y_{pred}\)).

  • \(ny\) (False Positives): Number of pairs placed in different classes in \(y_{true}\), but incorrectly grouped into the same cluster in \(y_{pred}\).


Algorithmic Optimizations (Performance Note)

Explicitly constructing pairwise contingency tables across all \(\binom{N}{2}\) combinations imposes an \(O(N^2)\) computational bottleneck.

This implementation extracts the exact pair totals (\(nn\) and \(ny\)) directly from the algebraic dot products of the Contingency Matrix marginals. This reduces the time complexity to :math:`O(N)`, enabling rapid benchmarking on massive datasets.


Handling Edge Cases (Finite Values)

The calculation of MNS involves division by \(\sqrt{nn + ny}\). If the predicted partition consists of a single universal cluster containing all data points (\(K = 1\)), the model separates zero pairs. Both \(nn\) and \(ny\) evaluate to zero, triggering an undefined mathematical division.

  • force_finite (bool): If True, catches the zero-division error and returns a safe fallback value instead of raising a ZeroDivisionError. Default is True.

  • finite_value (float): The fallback value returned when force_finite=True and the calculation fails. Since zero separated pairs yield zero net asymmetry, the default fallback is 0.0.


Properties

  • Best possible score: Depends on total pairs \(N_T\); approaches \(\sqrt{N_T}\) when False Positives (\(ny\)) approach zero.

  • Worst possible score: Negative values (when false co-clusterings \(ny\) severely overwhelm correct separations \(nn\)).

  • Permutation Invariance: Completely invariant to permutations of cluster labels.

  • Asymmetric: Unlike Jaccard or Rand, McNemar is strictly directional:

    \[\text{MNS}(y_{true}, y_{pred}) \neq \text{MNS}(y_{pred}, y_{true})\]
  • References: Desgraupes, Bernard. “Clustering indices.” University of Paris Ouest-Lab Modal’X 1.1 (2013): 34.


Example Usage

from permetrics.clustering import ClusteringMetric

# ==============================================================================
# SCENARIO 1: Basic Evaluation
# ==============================================================================
print("--- 1. BASIC MCNEMAR SCORE EXAMPLE ---")

y_true = [0, 0, 1, 1, 2, 2]
y_pred = [0, 0, 1, 1, 1, 2]

cm = ClusteringMetric(y_true=y_true, y_pred=y_pred)
mns_score = cm.MNS()
print(f"McNemar Score: {mns_score}")

# ==============================================================================
# SCENARIO 2: Demonstrating Asymmetry
# ==============================================================================
print("\n--- 2. ASYMMETRY CHECK ---")

mns_reverse = cm.MNS(y_true=y_pred, y_pred=y_true)
print(f"Forward MNS: {mns_score:.4f} | Reverse MNS: {mns_reverse:.4f}")