RTS - Rogers-Tanimoto Score
The Rogers-Tanimoto Score (RTS) (also known as the Rogers-Tanimoto Index) is an external clustering evaluation metric belonging to the pair-counting family. It evaluates the similarity between two partitions by measuring the ratio of concordant pairs to the total pairs, but assigns double penalty weight to discordant (mismatched) pairs.
Intuitively, RTS acts as a stricter version of the standard Rand Score. It answers the question: “If we double the penalty points for every mistake the model makes—whether erroneously grouping separate points or splitting cohesive classes—what is our net pairwise accuracy?”
Where across all \(N_T = \binom{N}{2}\) possible pairs of distinct data points:
\(yy\) (True Positives): Pairs co-clustered in both partitions.
\(nn\) (True Negatives): Pairs separated in both partitions.
\(yn\) (False Negatives) and \(ny\) (False Positives): Discordant pairs (disagreements).
Algorithmic Optimizations (Performance Note)
Standard pairwise comparison requires evaluating all \(\binom{N}{2}\) sample combinations, resulting in an expensive \(O(N^2)\) runtime.
This implementation bypasses explicit pair enumeration. By deriving the exact pair counts directly from the algebraic dot products of the Contingency Matrix marginals, it computes the Rogers-Tanimoto index in :math:`O(N)` time complexity, guaranteeing optimal memory footprint.
Handling Edge Cases (Finite Values)
The calculation involves division by \(yy + nn + 2(yn + ny)\). Because this denominator is algebraically equal to the total number of sample pairs \(N_T\) plus the discordant pairs (\(N_T + yn + ny\)), it can only evaluate to zero if the dataset contains fewer than 2 samples (\(N < 2\)).
force_finite (bool): If
True, catches the zero-division error when \(N < 2\) and returns a safe fallback value instead of raising aZeroDivisionError. Default isTrue.finite_value (float): The fallback value returned when
force_finite=Trueand the calculation fails. Since an empty or single-point dataset contains zero meaningful similarity, the default fallback is0.0.
Properties
Best possible score:
1.0(Indicates identical partitions; zero discordant pairs).Worst possible score:
0.0(Indicates absolute disagreement; zero concordant pairs).Permutation Invariance: Strictly invariant to permutations of cluster labels.
Symmetry: Strictly symmetric: \(\text{RTS}(y_{true}, y_{pred}) = \text{RTS}(y_{pred}, y_{true})\).
Relationship with Rand Score (RaS): Because RTS inflates the denominator by adding an extra set of discordant pairs, it is always strictly bounded by the Rand Score:
\[\text{RTS} \le \text{RaS}\]Range:
[0.0, 1.0]References: Desgraupes, Bernard. “Clustering indices.” University of Paris Ouest-Lab Modal’X 1.1 (2013): 34.
Example Usage
from permetrics.clustering import ClusteringMetric
# ==============================================================================
# SCENARIO 1: Basic Evaluation
# ==============================================================================
print("--- 1. BASIC ROGERS-TANIMOTO SCORE EXAMPLE ---")
y_true = [0, 0, 1, 1, 2, 2]
y_pred = [0, 0, 1, 1, 1, 2]
cm = ClusteringMetric(y_true=y_true, y_pred=y_pred)
rts_score = cm.RTS()
print(f"Rogers-Tanimoto Score: {rts_score}")
# ==============================================================================
# SCENARIO 2: RTS vs Rand Score Stricter Penalty
# ==============================================================================
print("\n--- 2. STRICTER PENALTY COMPARISON ---")
cm_noisy = ClusteringMetric(y_true=[0, 0, 0, 1, 1], y_pred=[0, 1, 0, 1, 0])
print(f"Standard Rand Score: {cm_noisy.RaS():.4f}")
print(f"Rogers-Tanimoto (Strict): {cm_noisy.RTS():.4f}")