KLD - Kullback-Leibler Divergence

The Kullback-Leibler Divergence (KLD) [19], also known as relative entropy, is a foundational statistical measure originating from information theory. It quantifies how much one probability distribution (the predictions, \(\hat{y}\)) differs from a reference probability distribution (the ground truth, \(y\)).

\[D_{KL}(y || \hat{y}) = \sum_{i=1}^{N} y_i \ln\left(\frac{y_i}{\hat{y}_i}\right)\]

Note: \(\ln\) denotes the natural logarithm. The formula calculates the expectation of the logarithmic difference between the probabilities.


Description

Advantages:
  • Information Loss Measurement: KLD is exceptional at measuring the exact amount of “information lost” when you use the predicted distribution to approximate the true distribution.

  • Optimization Standard: It is the core mathematical engine behind Cross-Entropy Loss (minimizing Cross-Entropy is directly tied to minimizing KLD), making it ubiquitous in machine learning and neural network training.

Disadvantages:
  • Asymmetry (Crucial Limitation): KLD is not a true statistical distance metric because it is inherently asymmetric. \(D_{KL}(A || B)\) does not equal \(D_{KL}(B || A)\). It also does not satisfy the triangle inequality. If you need a symmetric distance, use the Jensen-Shannon Divergence (JSD).

  • Zero-Probability Crash: The formula divides by \(\hat{y}_i\). If your model predicts exactly 0.0 for an event that actually occurs in the ground truth (\(y_i > 0\)), the formula involves division by zero and will explode to infinity. (Implementation note: Always add a tiny epsilon to the denominator).

  • Strict Domain Constraint: Both arrays must strictly contain non-negative values (ideally representing valid probability distributions where the sum equals 1).


Properties

  • Best possible score: 0.0 (Indicates the two distributions are perfectly identical).

  • Range: [0.0, +inf) (By Gibbs’ inequality, KLD is always non-negative).

  • Mathematical Reference: Machine Learning Mastery


Example Usage

Note: Ensure inputs are strictly non-negative, ideally structured as valid probability distributions.

from numpy import array
from permetrics.regression import RegressionMetric

## 1. For 1-D array (Single-output)
y_true = array([0.1, 0.4, 0.2, 0.3])
y_pred = array([0.15, 0.35, 0.25, 0.25])

evaluator = RegressionMetric(y_true, y_pred)
# Calculate Kullback-Leibler Divergence
print("KLD: ", evaluator.KLD())

## 2. For > 1-D array (Multi-output)
y_true = array([[0.5, 0.5], [0.8, 0.2], [0.1, 0.9]])
y_pred = array([[0.4, 0.6], [0.7, 0.3], [0.2, 0.8]])

evaluator = RegressionMetric(y_true, y_pred)
# Return an array of scores for each column
print("KLD (Multi-output): ", evaluator.KLD(multi_output="raw_values"))