KLDL - Kullback-Leibler Divergence Loss

The Kullback-Leibler Divergence Loss (KLDL) [30] (widely celebrated as Relative Entropy) measures how one reference probability distribution \(P\) diverges from a second candidate probability distribution \(Q\).

Kullback Leibler Divergence Relative Entropy Illustration

In machine learning classification, it quantifies the exact information lost when the predicted probability distribution \(\hat{Y}\) is used to approximate the ground truth target distribution \(Y\).

\[D_{\text{KL}}(Y \parallel \hat{Y}) = \sum_{k=1}^{K} Y_k \log\left(\frac{Y_k}{\hat{Y}_k}\right)\]

Architectural Design: Polymorphic Target Ingestion

Unlike standard implementations restricted to discrete integer targets, permetrics dynamically parses the semantic structure of the supplied ground truth array \(Y\):

  1. Hard Target Binarization (Standard Classification): If discrete class indices are passed (e.g., [0, 2, 1]), the engine dynamically projects them into internal One-Hot distributions. (Note: On strict One-Hot targets where base Entropy is zero, KLDL simplifies mathematically into Cross-Entropy / Log Loss).

  2. Soft Target Preservation (Knowledge Distillation): If continuous target probability matrices are passed (e.g., [[0.8, 0.2], [0.1, 0.9]] generated by a Teacher LLM), the engine preserves target entropy, computing the true asymmetric relative divergence.


Numerical Stability Strategy

To bypass the classic floating-point hazard where zero-probability ground truths trigger undefined operations (\(0 \times -\infty = \text{NaN}\)), the implementation applies a conditional logarithmic mask. Wherever target class probability \(y_{ik} = 0\), the log-ratio is evaluated strictly as \(\log(1.0) = 0\), guaranteeing zero numerical pollution without distorting the reference target distribution with arbitrary Epsilon clipping.


Properties

  • Best possible score: 0.0 (Lower value is better; indicates two statistically indistinguishable distributions).

  • Worst possible score: Unbounded (\(+\infty\)).

  • Range: [0.0, +\infty)

  • Optimizer Note: KLDL is a Loss metric. Hyperparameter search engines must be configured to minimize.


Example Usage

from permetrics.classification import ClassificationMetric

# ==============================================================================
# SCENARIO 1: Standard Discrete Targets (Hard Labels)
# ==============================================================================
print("--- 1. HARD LABEL DIVERGENCE ---")

y_true_hard = [0, 1, 1]
y_pred_prob = [[0.9, 0.1], [0.2, 0.8], [0.1, 0.9]]

cm_hard = ClassificationMetric(y_true_hard, y_pred_prob)
print(f"Hard KLDL (Identical to Log Loss): {cm_hard.KLDL()}")

# ==============================================================================
# SCENARIO 2: Knowledge Distillation (Soft Targets)
# y_true is a continuous probability distribution from a Teacher Model
# ==============================================================================
print("\n--- 2. SOFT LABEL DIVERGENCE (DISTILLATION) ---")

y_true_soft = [[0.85, 0.15], [0.10, 0.90], [0.25, 0.75]]
y_pred_student = [[0.80, 0.20], [0.15, 0.85], [0.40, 0.60]]

cm_soft = ClassificationMetric(y_true_soft, y_pred_student)
print(f"Teacher-Student KLDL             : {cm_soft.KLDL()}")