COR - Correlation Coefficient ============================= .. toctree:: :maxdepth: 3 .. contents:: Table of Contents :local: :depth: 2 **Correlation (COR)** :cite:`joreskog1978structural` (specifically Pearson's Correlation Coefficient) measures the strength and direction of the linear relationship between the actual values (:math:`y`) and predicted values (:math:`\hat{y}`). It is calculated by dividing the covariance of the variables by the product of their standard deviations. This normalization scales the output to a specific range, making it dimensionless and entirely independent of the units of measurement. .. math:: \text{COR}(y, \hat{y}) = \frac{\text{cov}(y, \hat{y})}{\sigma_y \sigma_{\hat{y}}} = \frac{\sum_{i=1}^{N} (y_i - \bar{y})(\hat{y}_i - \bar{\hat{y}})}{\sqrt{\sum_{i=1}^{N} (y_i - \bar{y})^2 \sum_{i=1}^{N} (\hat{y}_i - \bar{\hat{y}})^2}} ------------------------------------------------------------------------------- Description ----------- **Advantages:** * **Scale-invariant:** Because it is dimensionless, you can use COR to compare the predictive behavior of models across entirely different datasets and domains. * **Trend evaluation:** Highly effective at identifying whether the model correctly captures the directional trend of the data (e.g., when the actual value goes up, the prediction also goes up). **Disadvantages:** * **Ignores magnitude (Crucial limitation):** COR measures linear tracking, *not* absolute accuracy. If actual values are `[1, 2, 3]` and predictions are `[1000, 2000, 3000]`, the COR will be a perfect `1.0`, even though the absolute errors are massive. It should always be paired with an error metric like RMSE or MAE. * **Linearity assumption:** It only evaluates linear relationships. A model might capture a perfect non-linear relationship (e.g., quadratic) but still score a low or zero Pearson correlation. * **Outlier sensitivity:** A single massive outlier can heavily distort the covariance, drastically shifting the correlation score (as demonstrated in Anscombe's quartet). ------------------------------------------------------------------------------- Properties ---------- * **Best possible score:** ``1.0`` (Perfect positive correlation). A score of ``-1.0`` indicates perfect negative correlation, and ``0.0`` indicates no linear correlation. * **Range:** ``[-1.0, 1.0]`` * **Mathematical Reference:** `Corporate Finance Institute `_ ------------------------------------------------------------------------------- Example Usage ------------- .. code-block:: python :emphasize-lines: 10, 18 from numpy import array from permetrics.regression import RegressionMetric ## 1. For 1-D array (Single-output) y_true = array([3, -0.5, 2, 7]) y_pred = array([2.5, 0.0, 2, 8]) evaluator = RegressionMetric(y_true, y_pred) # Calculate Correlation Coefficient print("COR: ", evaluator.COR()) ## 2. For > 1-D array (Multi-output) y_true = array([[0.5, 1], [-1, 1], [7, -6]]) y_pred = array([[0, 2], [-1, 2], [8, -5]]) evaluator = RegressionMetric(y_true, y_pred) # Return an array of scores for each column print("COR (Multi-output): ", evaluator.COR(multi_output="raw_values"))