RTS - Rogers-Tanimoto Score =========================== .. toctree:: :maxdepth: 3 .. contents:: Table of Contents :local: :depth: 2 The **Rogers-Tanimoto Score (RTS)** (also known as the **Rogers-Tanimoto Index**) is an external clustering evaluation metric belonging to the pair-counting family. It evaluates the similarity between two partitions by measuring the ratio of concordant pairs to the total pairs, but assigns **double penalty weight** to discordant (mismatched) pairs. Intuitively, RTS acts as a stricter version of the standard Rand Score. It answers the question: *"If we double the penalty points for every mistake the model makes—whether erroneously grouping separate points or splitting cohesive classes—what is our net pairwise accuracy?"* .. math:: \text{RTS} = \frac{yy + nn}{yy + nn + 2(yn + ny)} Where across all :math:`N_T = \binom{N}{2}` possible pairs of distinct data points: * :math:`yy` (True Positives): Pairs co-clustered in both partitions. * :math:`nn` (True Negatives): Pairs separated in both partitions. * :math:`yn` (False Negatives) and :math:`ny` (False Positives): Discordant pairs (disagreements). ------------------------------------------------------------------------------- Algorithmic Optimizations (Performance Note) -------------------------------------------- Standard pairwise comparison requires evaluating all :math:`\binom{N}{2}` sample combinations, resulting in an expensive :math:`O(N^2)` runtime. This implementation bypasses explicit pair enumeration. By deriving the exact pair counts directly from the algebraic dot products of the **Contingency Matrix** marginals, it computes the Rogers-Tanimoto index in **:math:`O(N)` time complexity**, guaranteeing optimal memory footprint. ------------------------------------------------------------------------------- Handling Edge Cases (Finite Values) ----------------------------------- The calculation involves division by :math:`yy + nn + 2(yn + ny)`. Because this denominator is algebraically equal to the total number of sample pairs :math:`N_T` plus the discordant pairs (:math:`N_T + yn + ny`), it can only evaluate to zero if the dataset contains fewer than 2 samples (:math:`N < 2`). * **force_finite (bool):** If ``True``, catches the zero-division error when :math:`N < 2` and returns a safe fallback value instead of raising a ``ZeroDivisionError``. Default is ``True``. * **finite_value (float):** The fallback value returned when ``force_finite=True`` and the calculation fails. Since an empty or single-point dataset contains zero meaningful similarity, the default fallback is ``0.0``. ------------------------------------------------------------------------------- Properties ---------- * **Best possible score:** ``1.0`` (Indicates identical partitions; zero discordant pairs). * **Worst possible score:** ``0.0`` (Indicates absolute disagreement; zero concordant pairs). * **Permutation Invariance:** Strictly invariant to permutations of cluster labels. * **Symmetry:** Strictly symmetric: :math:`\text{RTS}(y_{true}, y_{pred}) = \text{RTS}(y_{pred}, y_{true})`. * **Relationship with Rand Score (RaS):** Because RTS inflates the denominator by adding an extra set of discordant pairs, it is always strictly bounded by the Rand Score: .. math:: \text{RTS} \le \text{RaS} * **Range:** ``[0.0, 1.0]`` * **References:** `Desgraupes, Bernard. "Clustering indices." University of Paris Ouest-Lab Modal’X 1.1 (2013): 34. `_ ------------------------------------------------------------------------------- Example Usage ------------- .. code-block:: python :emphasize-lines: 11,12,22,23 from permetrics.clustering import ClusteringMetric # ============================================================================== # SCENARIO 1: Basic Evaluation # ============================================================================== print("--- 1. BASIC ROGERS-TANIMOTO SCORE EXAMPLE ---") y_true = [0, 0, 1, 1, 2, 2] y_pred = [0, 0, 1, 1, 1, 2] cm = ClusteringMetric(y_true=y_true, y_pred=y_pred) rts_score = cm.RTS() print(f"Rogers-Tanimoto Score: {rts_score}") # ============================================================================== # SCENARIO 2: RTS vs Rand Score Stricter Penalty # ============================================================================== print("\n--- 2. STRICTER PENALTY COMPARISON ---") cm_noisy = ClusteringMetric(y_true=[0, 0, 0, 1, 1], y_pred=[0, 1, 0, 1, 0]) print(f"Standard Rand Score: {cm_noisy.RaS():.4f}") print(f"Rogers-Tanimoto (Strict): {cm_noisy.RTS():.4f}")