FMS - Fowlkes-Mallows Score

The Fowlkes-Mallows Score (FMS) is an external clustering evaluation metric. It evaluates the similarity between two clustering partitions by calculating the geometric mean of the pairwise Precision and Recall.

Intuitively, FMS answers the question: “When comparing all pairs of data points, what is the geometric mean of our clustering precision and clustering recall?” Because it relies on the geometric mean, FMS severely penalizes clusterings where either precision or recall is exceptionally low.

\[\text{FMS} = \frac{yy}{\sqrt{(yy + yn) \times (yy + ny)}}\]

Where across all pairs of distinct data points:

  • \(yy\) (True Positives): Number of pairs assigned to the same cluster in both \(y_{true}\) and \(y_{pred}\).

  • \(yn\) (False Negatives): Number of pairs assigned to the same cluster in \(y_{true}\), but in different clusters in \(y_{pred}\).

  • \(ny\) (False Positives): Number of pairs assigned to the same cluster in \(y_{pred}\), but in different clusters in \(y_{true}\).

Expressed directly via the pairwise Precision (\(\mathcal{P}\)) and Recall (\(\mathcal{R}\)):

\[\text{FMS} = \sqrt{\mathcal{P} \times \mathcal{R}}\]

Algorithmic Optimizations (Performance Note)

Iterating over all possible combinations of pairs to evaluate \(yy\), \(yn\), and \(ny\) incurs an expensive time and space complexity of \(O(N^2)\).

This implementation bypasses explicit pair enumeration entirely. By leveraging the dot products of the Contingency Matrix and its marginal sums, it computes the exact pairwise totals in :math:`O(N)` time complexity. This allows benchmarking on large datasets (e.g., \(N > 100,000\)) without triggering memory spikes.


Handling Edge Cases (Finite Values)

The Fowlkes-Mallows Score involves division by the square root of the grouped pair products. If either partition contains only isolated singletons (every cluster has exactly 1 data point), the number of intra-cluster pairs evaluates to zero (\(yy + yn = 0\) or \(yy + ny = 0\)), causing a zero-division error.

  • force_finite (bool): If True, the function catches the undefined division operation and returns a safe fallback value instead of raising a ZeroDivisionError. Default is True.

  • finite_value (float): The fallback value returned when force_finite=True and the calculation fails. Since the worst possible score is 0.0, the default fallback is 0.0.


Properties


Example Usage

from permetrics.clustering import ClusteringMetric

# ==============================================================================
# SCENARIO 1: Basic Evaluation
# ==============================================================================
print("--- 1. BASIC FOWLKES-MALLOWS SCORE EXAMPLE ---")

y_true = [0, 0, 1, 1, 2, 2]
y_pred = [0, 0, 1, 2, 1, 2]

cm = ClusteringMetric(y_true=y_true, y_pred=y_pred)
fms_score = cm.FMS()
print(f"Fowlkes-Mallows Score: {fms_score}")

# ==============================================================================
# SCENARIO 2: Perfect Match vs Completely Independent Partitions
# ==============================================================================
print("\n--- 2. EXTREME CASES EXAMPLE ---")

# Perfect correspondence
print(f"Perfect Match FMS:  {cm.FMS(y_true=[0, 0, 1], y_pred=[1, 1, 0])}")
# No shared intra-cluster pairs
print(f"Disjoint Match FMS: {cm.FMS(y_true=[0, 0, 0], y_pred=[0, 1, 2])}")