KLD - Kullback-Leibler Divergence

The Kullback-Leibler Divergence (KLD) [19], also known as relative entropy, is a foundational statistical measure originating from information theory. It quantifies how much one probability distribution (the predictions, \(\hat{y}\)) differs from a reference probability distribution (the ground truth, \(y\)).

\[D_{KL}(y || \hat{y}) = \sum_{i=1}^{N} y_i \ln\left(\frac{y_i}{\hat{y}_i}\right)\]

Note: \(\ln\) denotes the natural logarithm. The formula calculates the expectation of the logarithmic difference between the probabilities.

Description 

Advantages:

Information Loss Measurement: KLD is exceptional at measuring the exact amount of “information lost” when you use the predicted distribution to approximate the true distribution.
Optimization Standard: It is the core mathematical engine behind Cross-Entropy Loss (minimizing Cross-Entropy is directly tied to minimizing KLD), making it ubiquitous in machine learning and neural network training.

Disadvantages:

Asymmetry (Crucial Limitation): KLD is not a true statistical distance metric because it is inherently asymmetric. \(D_{KL}(A || B)\) does not equal \(D_{KL}(B || A)\). It also does not satisfy the triangle inequality. If you need a symmetric distance, use the Jensen-Shannon Divergence (JSD).
Zero-Probability Crash: The formula divides by \(\hat{y}_i\). If your model predicts exactly 0.0 for an event that actually occurs in the ground truth (\(y_i > 0\)), the formula involves division by zero and will explode to infinity. (Implementation note: Always add a tiny epsilon to the denominator).
Strict Domain Constraint: Both arrays must strictly contain non-negative values (ideally representing valid probability distributions where the sum equals 1).

Properties 

Best possible score: 0.0 (Indicates the two distributions are perfectly identical).
Range: [0.0, +inf) (By Gibbs’ inequality, KLD is always non-negative).
Mathematical Reference: Machine Learning Mastery

Example Usage 

Note: Ensure inputs are strictly non-negative, ideally structured as valid probability distributions.

from numpy import array
from permetrics.regression import RegressionMetric

## 1. For 1-D array (Single-output)
y_true = array([0.1, 0.4, 0.2, 0.3])
y_pred = array([0.15, 0.35, 0.25, 0.25])

evaluator = RegressionMetric(y_true, y_pred)
# Calculate Kullback-Leibler Divergence
print("KLD: ", evaluator.KLD())

## 2. For > 1-D array (Multi-output)
y_true = array([[0.5, 0.5], [0.8, 0.2], [0.1, 0.9]])
y_pred = array([[0.4, 0.6], [0.7, 0.3], [0.2, 0.8]])

evaluator = RegressionMetric(y_true, y_pred)
# Return an array of scores for each column
print("KLD (Multi-output): ", evaluator.KLD(multi_output="raw_values"))

KLD - Kullback-Leibler Divergence

Description

Properties

Example Usage

Description 

Properties 

Example Usage 