KLD - Kullback-Leibler Divergence
The Kullback-Leibler Divergence (KLD) [19], also known as relative entropy, is a foundational statistical measure originating from information theory. It quantifies how much one probability distribution (the predictions, \(\hat{y}\)) differs from a reference probability distribution (the ground truth, \(y\)).
Note: \(\ln\) denotes the natural logarithm. The formula calculates the expectation of the logarithmic difference between the probabilities.
Description
- Advantages:
Information Loss Measurement: KLD is exceptional at measuring the exact amount of “information lost” when you use the predicted distribution to approximate the true distribution.
Optimization Standard: It is the core mathematical engine behind Cross-Entropy Loss (minimizing Cross-Entropy is directly tied to minimizing KLD), making it ubiquitous in machine learning and neural network training.
- Disadvantages:
Asymmetry (Crucial Limitation): KLD is not a true statistical distance metric because it is inherently asymmetric. \(D_{KL}(A || B)\) does not equal \(D_{KL}(B || A)\). It also does not satisfy the triangle inequality. If you need a symmetric distance, use the Jensen-Shannon Divergence (JSD).
Zero-Probability Crash: The formula divides by \(\hat{y}_i\). If your model predicts exactly
0.0for an event that actually occurs in the ground truth (\(y_i > 0\)), the formula involves division by zero and will explode to infinity. (Implementation note: Always add a tiny epsilon to the denominator).Strict Domain Constraint: Both arrays must strictly contain non-negative values (ideally representing valid probability distributions where the sum equals 1).
Properties
Best possible score:
0.0(Indicates the two distributions are perfectly identical).Range:
[0.0, +inf)(By Gibbs’ inequality, KLD is always non-negative).Mathematical Reference: Machine Learning Mastery
Example Usage
Note: Ensure inputs are strictly non-negative, ideally structured as valid probability distributions.
from numpy import array
from permetrics.regression import RegressionMetric
## 1. For 1-D array (Single-output)
y_true = array([0.1, 0.4, 0.2, 0.3])
y_pred = array([0.15, 0.35, 0.25, 0.25])
evaluator = RegressionMetric(y_true, y_pred)
# Calculate Kullback-Leibler Divergence
print("KLD: ", evaluator.KLD())
## 2. For > 1-D array (Multi-output)
y_true = array([[0.5, 0.5], [0.8, 0.2], [0.1, 0.9]])
y_pred = array([[0.4, 0.6], [0.7, 0.3], [0.2, 0.8]])
evaluator = RegressionMetric(y_true, y_pred)
# Return an array of scores for each column
print("KLD (Multi-output): ", evaluator.KLD(multi_output="raw_values"))