FMS - Fowlkes-Mallows Score
The Fowlkes-Mallows Score (FMS) is an external clustering evaluation metric. It evaluates the similarity between two clustering partitions by calculating the geometric mean of the pairwise Precision and Recall.
Intuitively, FMS answers the question: “When comparing all pairs of data points, what is the geometric mean of our clustering precision and clustering recall?” Because it relies on the geometric mean, FMS severely penalizes clusterings where either precision or recall is exceptionally low.
Where across all pairs of distinct data points:
\(yy\) (True Positives): Number of pairs assigned to the same cluster in both \(y_{true}\) and \(y_{pred}\).
\(yn\) (False Negatives): Number of pairs assigned to the same cluster in \(y_{true}\), but in different clusters in \(y_{pred}\).
\(ny\) (False Positives): Number of pairs assigned to the same cluster in \(y_{pred}\), but in different clusters in \(y_{true}\).
Expressed directly via the pairwise Precision (\(\mathcal{P}\)) and Recall (\(\mathcal{R}\)):
Algorithmic Optimizations (Performance Note)
Iterating over all possible combinations of pairs to evaluate \(yy\), \(yn\), and \(ny\) incurs an expensive time and space complexity of \(O(N^2)\).
This implementation bypasses explicit pair enumeration entirely. By leveraging the dot products of the Contingency Matrix and its marginal sums, it computes the exact pairwise totals in :math:`O(N)` time complexity. This allows benchmarking on large datasets (e.g., \(N > 100,000\)) without triggering memory spikes.
Handling Edge Cases (Finite Values)
The Fowlkes-Mallows Score involves division by the square root of the grouped pair products. If either partition contains only isolated singletons (every cluster has exactly 1 data point), the number of intra-cluster pairs evaluates to zero (\(yy + yn = 0\) or \(yy + ny = 0\)), causing a zero-division error.
force_finite (bool): If
True, the function catches the undefined division operation and returns a safe fallback value instead of raising aZeroDivisionError. Default isTrue.finite_value (float): The fallback value returned when
force_finite=Trueand the calculation fails. Since the worst possible score is 0.0, the default fallback is0.0.
Properties
Best possible score:
1.0(Indicates identical partitions where precision and recall are both 100%).Worst possible score:
0.0(Indicates zero agreement; no pair of points shares a cohesive grouping in both partitions).Permutation Invariance: The metric is completely invariant to permutations of cluster labels.
Range:
[0.0, 1.0]References:
Example Usage
from permetrics.clustering import ClusteringMetric
# ==============================================================================
# SCENARIO 1: Basic Evaluation
# ==============================================================================
print("--- 1. BASIC FOWLKES-MALLOWS SCORE EXAMPLE ---")
y_true = [0, 0, 1, 1, 2, 2]
y_pred = [0, 0, 1, 2, 1, 2]
cm = ClusteringMetric(y_true=y_true, y_pred=y_pred)
fms_score = cm.FMS()
print(f"Fowlkes-Mallows Score: {fms_score}")
# ==============================================================================
# SCENARIO 2: Perfect Match vs Completely Independent Partitions
# ==============================================================================
print("\n--- 2. EXTREME CASES EXAMPLE ---")
# Perfect correspondence
print(f"Perfect Match FMS: {cm.FMS(y_true=[0, 0, 1], y_pred=[1, 1, 0])}")
# No shared intra-cluster pairs
print(f"Disjoint Match FMS: {cm.FMS(y_true=[0, 0, 0], y_pred=[0, 1, 2])}")