GINI - Regression Gini ====================== .. toctree:: :maxdepth: 3 :caption: GINI - Gini coefficient .. toctree:: :maxdepth: 3 .. toctree:: :maxdepth: 3 .. toctree:: :maxdepth: 3 In regression analysis, the term "Gini" refers to two fundamentally different statistical paradigms depending on whether one evaluates the **ranking capability** of the predictions or the **dispersion of the prediction errors**. To prevent statistical misinterpretation, `permetrics` explicitly separates these into two distinct metrics: .. contents:: Table of Contents :local: :depth: 2 ------------------------------------------------------------------------------- 1. Normalized Gini Coefficient (Ranking Power) ---------------------------------------------- The **Normalized Gini Coefficient** :cite:`frees2011summarizing` measures the *actuarial ranking capability* of a regression model. Inherited from economics (the Lorenz curve) and heavily utilized in insurance pricing, credit scoring, and algorithmic trading, it quantifies how well the predicted values :math:`y_{\text{pred}}` can rank the actual continuous targets :math:`y_{\text{true}}`. .. math:: G_{\text{norm}} = \frac{\text{Gini}(y_{\text{true}}, y_{\text{pred}})}{\text{Gini}(y_{\text{true}}, y_{\text{true}})} where the numerator is the raw covariance Gini of the model, and the denominator is the raw Gini of an *optimal model* (the ground truth sorted by itself). Properties ~~~~~~~~~~ * **Best possible score:** ``1.0`` (Perfect ranking: the model sorts targets in the exact correct order). * **Worst possible score:** ``0.0`` (Random ranking) or ``-1.0`` (Perfectly inverted ranking). * **Range:** ``[-1, 1]`` * **Function call:** ``evaluator.normalized_gini_coefficient()`` ------------------------------------------------------------------------------- 2. Residual Gini Index (Error Dispersion) ----------------------------------------- The **Residual Gini Index** :cite:`yitzhaki2012gini` applies the classic economic Gini index of inequality to the **absolute regression residuals** :math:`E = \lvert y_{\text{true}} - y_{\text{pred}} \rvert`. Instead of measuring ranking, it answers an econometric question: *"Is the model's total error distributed equally across all samples, or is 90% of the total error caused by 3 extreme outliers?"* .. math:: G_{\text{residual}} = \frac{2 \sum_{i=1}^{n} i \cdot e_{(i)}}{n \sum_{i=1}^{n} e_i} - \frac{n+1}{n} where :math:`e_{(i)}` represents the absolute errors sorted in **non-decreasing order** (:math:`e_{(1)} \le e_{(2)} \le \dots \le e_{(n)}`). Properties ~~~~~~~~~~ * **Best possible score:** ``0.0`` (Complete equality: every single sample in the dataset experiences the exact same magnitude of error). * **Worst possible score:** approaching ``1.0`` (Extreme disparity: the model predicts perfectly for almost all samples, but fails catastrophically on a tiny fraction). * **Range:** ``[0, 1]`` * **Function call:** ``evaluator.residual_gini_index()`` ------------------------------------------------------------------------------- Example Usage ------------- .. code-block:: python import numpy as np from permetrics.regression import RegressionMetric y_true = np.array([10, 20, 30, 40, 50]) y_pred = np.array([12, 18, 33, 39, 55]) evaluator = RegressionMetric(y_true, y_pred) # 1. Evaluate Ranking capability gini_rank = evaluator.normalized_gini_coefficient() print(f"Ranking Gini: {gini_rank:.4f}") # Expected: ~1.0 (Very good ranker) # 2. Evaluate Error sparsity gini_err = evaluator.residual_gini_index() print(f"Residual Gini: {gini_err:.4f}") # Expected: closer to 0.0