COV - Covariance

Covariance (COV) is a statistical measure that evaluates the extent to which two variables—the actual values (\(y\)) and the predicted values (\(\hat{y}\))—change together.

It determines the directional relationship between the predictions and the ground truth, revealing whether they tend to increase or decrease in tandem.

Population Covariance:

\[\text{COV}_{\text{pop}}(y, \hat{y}) = \frac{1}{N} \sum_{i=1}^{N} (y_i - \bar{y}) (\hat{y}_i - \bar{\hat{y}})\]

Sample Covariance (Bessel’s correction):

\[\text{COV}_{\text{samp}}(y, \hat{y}) = \frac{1}{N-1} \sum_{i=1}^{N} (y_i - \bar{y}) (\hat{y}_i - \bar{\hat{y}})\]

Note: \(\bar{y}\) and \(\bar{\hat{y}}\) represent the mean of the actual and predicted values, respectively.


Description

Advantages:
  • Directional insight: A positive covariance indicates that as actual values increase, predicted values also tend to increase (moving in the same direction). A negative covariance reveals an inverse relationship.

  • Mathematical foundation: It serves as the core building block for calculating more advanced and interpretable metrics, such as Pearson’s Correlation Coefficient (COR).

Disadvantages:
  • Scale-dependency (Crucial limitation): Unlike Correlation, Covariance is not normalized. Its magnitude depends entirely on the units of the data. You cannot compare the COV of a dataset measured in millimeters with one measured in kilometers.

  • Ignores magnitude of error: COV only measures whether variables move together, not how close the predictions actually are to the ground truth. It does not assess the absolute accuracy of a model.

  • Unbounded: Because it has no upper or lower limits, a standalone covariance score is nearly impossible to interpret without additional context.


Properties

  • Best possible score: Undefined (There is no absolute “best” value. Larger positive/negative values simply indicate a stronger directional trend relative to the specific dataset’s scale).

  • Range: (-inf, +inf)

  • Mathematical Reference: Corporate Finance Institute


Example Usage

from numpy import array
from permetrics.regression import RegressionMetric

## 1. For 1-D array (Single-output)
y_true = array([3, -0.5, 2, 7])
y_pred = array([2.5, 0.0, 2, 8])

evaluator = RegressionMetric(y_true, y_pred)
# Calculate Covariance
print("COV: ", evaluator.COV())

## 2. For > 1-D array (Multi-output)
y_true = array([[0.5, 1], [-1, 1], [7, -6]])
y_pred = array([[0, 2], [-1, 2], [8, -5]])

evaluator = RegressionMetric(y_true, y_pred)
# Return an array of scores for each column
print("COV (Multi-output): ", evaluator.COV(multi_output="raw_values"))