KLDL - Kullback-Leibler Divergence Loss ======================================= .. toctree:: :maxdepth: 3 .. contents:: Table of Contents :local: :depth: 2 The **Kullback-Leibler Divergence Loss (KLDL)** :cite:`kullback1951information` (widely celebrated as **Relative Entropy**) measures how one reference probability distribution :math:`P` diverges from a second candidate probability distribution :math:`Q`. .. image:: /_static/images/CLS_KLDL.png :align: center :alt: Kullback Leibler Divergence Relative Entropy Illustration In machine learning classification, it quantifies the exact information lost when the predicted probability distribution :math:`\hat{Y}` is used to approximate the ground truth target distribution :math:`Y`. .. math:: D_{\text{KL}}(Y \parallel \hat{Y}) = \sum_{k=1}^{K} Y_k \log\left(\frac{Y_k}{\hat{Y}_k}\right) ------------------------------------------------------------------------------- Architectural Design: Polymorphic Target Ingestion -------------------------------------------------- Unlike standard implementations restricted to discrete integer targets, ``permetrics`` dynamically parses the semantic structure of the supplied ground truth array :math:`Y`: 1. **Hard Target Binarization (Standard Classification):** If discrete class indices are passed (e.g., ``[0, 2, 1]``), the engine dynamically projects them into internal One-Hot distributions. *(Note: On strict One-Hot targets where base Entropy is zero, KLDL simplifies mathematically into Cross-Entropy / Log Loss).* 2. **Soft Target Preservation (Knowledge Distillation):** If continuous target probability matrices are passed (e.g., ``[[0.8, 0.2], [0.1, 0.9]]`` generated by a Teacher LLM), the engine preserves target entropy, computing the true asymmetric relative divergence. ------------------------------------------------------------------------------- Numerical Stability Strategy ---------------------------- To bypass the classic floating-point hazard where zero-probability ground truths trigger undefined operations (:math:`0 \times -\infty = \text{NaN}`), the implementation applies a **conditional logarithmic mask**. Wherever target class probability :math:`y_{ik} = 0`, the log-ratio is evaluated strictly as :math:`\log(1.0) = 0`, guaranteeing zero numerical pollution without distorting the reference target distribution with arbitrary Epsilon clipping. ------------------------------------------------------------------------------- Properties ---------- * **Best possible score:** ``0.0`` (Lower value is better; indicates two statistically indistinguishable distributions). * **Worst possible score:** Unbounded (:math:`+\infty`). * **Range:** ``[0.0, +\infty)`` * **Optimizer Note:** KLDL is a **Loss** metric. Hyperparameter search engines must be configured to *minimize*. ------------------------------------------------------------------------------- Example Usage ------------- .. code-block:: python :emphasize-lines: 11,12,23,24 from permetrics.classification import ClassificationMetric # ============================================================================== # SCENARIO 1: Standard Discrete Targets (Hard Labels) # ============================================================================== print("--- 1. HARD LABEL DIVERGENCE ---") y_true_hard = [0, 1, 1] y_pred_prob = [[0.9, 0.1], [0.2, 0.8], [0.1, 0.9]] cm_hard = ClassificationMetric(y_true_hard, y_pred_prob) print(f"Hard KLDL (Identical to Log Loss): {cm_hard.KLDL()}") # ============================================================================== # SCENARIO 2: Knowledge Distillation (Soft Targets) # y_true is a continuous probability distribution from a Teacher Model # ============================================================================== print("\n--- 2. SOFT LABEL DIVERGENCE (DISTILLATION) ---") y_true_soft = [[0.85, 0.15], [0.10, 0.90], [0.25, 0.75]] y_pred_student = [[0.80, 0.20], [0.15, 0.85], [0.40, 0.60]] cm_soft = ClassificationMetric(y_true_soft, y_pred_student) print(f"Teacher-Student KLDL : {cm_soft.KLDL()}")