KLD - Kullback-Leibler Divergence ================================= .. toctree:: :maxdepth: 3 .. contents:: Table of Contents :local: :depth: 2 The **Kullback-Leibler Divergence (KLD)** :cite:`hershey2007approximating`, also known as relative entropy, is a foundational statistical measure originating from information theory. It quantifies how much one probability distribution (the predictions, :math:`\hat{y}`) differs from a reference probability distribution (the ground truth, :math:`y`). .. math:: D_{KL}(y || \hat{y}) = \sum_{i=1}^{N} y_i \ln\left(\frac{y_i}{\hat{y}_i}\right) Note: :math:`\ln` denotes the natural logarithm. The formula calculates the expectation of the logarithmic difference between the probabilities. ------------------------------------------------------------------------------- Description ----------- **Advantages:** * **Information Loss Measurement:** KLD is exceptional at measuring the exact amount of "information lost" when you use the predicted distribution to approximate the true distribution. * **Optimization Standard:** It is the core mathematical engine behind Cross-Entropy Loss (minimizing Cross-Entropy is directly tied to minimizing KLD), making it ubiquitous in machine learning and neural network training. **Disadvantages:** * **Asymmetry (Crucial Limitation):** KLD is **not** a true statistical distance metric because it is inherently asymmetric. :math:`D_{KL}(A || B)` does not equal :math:`D_{KL}(B || A)`. It also does not satisfy the triangle inequality. If you need a symmetric distance, use the Jensen-Shannon Divergence (JSD). * **Zero-Probability Crash:** The formula divides by :math:`\hat{y}_i`. If your model predicts exactly ``0.0`` for an event that actually occurs in the ground truth (:math:`y_i > 0`), the formula involves division by zero and will explode to infinity. *(Implementation note: Always add a tiny epsilon to the denominator).* * **Strict Domain Constraint:** Both arrays must strictly contain non-negative values (ideally representing valid probability distributions where the sum equals 1). ------------------------------------------------------------------------------- Properties ---------- * **Best possible score:** ``0.0`` (Indicates the two distributions are perfectly identical). * **Range:** ``[0.0, +inf)`` (By Gibbs' inequality, KLD is always non-negative). * **Mathematical Reference:** `Machine Learning Mastery `_ ------------------------------------------------------------------------------- Example Usage ------------- *Note: Ensure inputs are strictly non-negative, ideally structured as valid probability distributions.* .. code-block:: python :emphasize-lines: 10, 18 from numpy import array from permetrics.regression import RegressionMetric ## 1. For 1-D array (Single-output) y_true = array([0.1, 0.4, 0.2, 0.3]) y_pred = array([0.15, 0.35, 0.25, 0.25]) evaluator = RegressionMetric(y_true, y_pred) # Calculate Kullback-Leibler Divergence print("KLD: ", evaluator.KLD()) ## 2. For > 1-D array (Multi-output) y_true = array([[0.5, 0.5], [0.8, 0.2], [0.1, 0.9]]) y_pred = array([[0.4, 0.6], [0.7, 0.3], [0.2, 0.8]]) evaluator = RegressionMetric(y_true, y_pred) # Return an array of scores for each column print("KLD (Multi-output): ", evaluator.KLD(multi_output="raw_values"))