CRM - Coefficient of Residual Mass

The Coefficient of Residual Mass (CRM) [23] is a systemic bias metric widely used in environmental engineering and hydrology (e.g., water quality, sediment transport modeling).

It evaluates the tendency of a model to systematically overestimate or underestimate the dependent variable by comparing the sum of predicted values against the sum of actual (observed) values.

\[\text{CRM}(y, \hat{y}) = \frac{\sum_{i=1}^{N} y_i - \sum_{i=1}^{N} \hat{y}_i}{\sum_{i=1}^{N} y_i}\]

Note: Depending on the specific literature, the numerator is sometimes written as \(\sum \hat{y}_i - \sum y_i\). Both forms evaluate the same bias, just with inverted signs.


Description

Advantages:
  • Systemic Bias Detection: CRM is an excellent diagnostic tool. A positive CRM indicates that the model systematically underestimates the actual values, while a negative CRM indicates systematic overestimation.

  • Domain Specificity: Highly interpretable in mass-balance systems (like fluid dynamics or sediment transport) where knowing the total cumulative volume error is more critical than individual step errors.

Disadvantages:
  • The Cancellation Effect (Crucial Flaw): CRM is NOT a measure of absolute accuracy. Because it simply sums the raw values, massive positive errors and massive negative errors can cancel each other out. A model could have terrible individual predictions but still achieve a perfect CRM of 0.0. CRM must always be used alongside metrics like RMSE or MAE.

  • Zero-Sum Vulnerability: If the sum of the actual values (\(\sum_{i=1}^{N} y_i\)) equals exactly zero, the formula will divide by zero and crash (returning NaN/Undefined).


Properties

  • Best possible score: 0.0 (Indicates no systemic bias; the total mass of predictions equals the total mass of observations).

  • Range: (-inf, +inf)

  • Mathematical Reference: ScienceDirect (CSITE)


Example Usage

from numpy import array
from permetrics.regression import RegressionMetric

## 1. For 1-D array (Single-output)
y_true = array([3, -0.5, 2, 7])
y_pred = array([2.5, 0.0, 2, 8])

evaluator = RegressionMetric(y_true, y_pred)
# Calculate Coefficient of Residual Mass
print("CRM: ", evaluator.CRM())

## 2. For > 1-D array (Multi-output)
y_true = array([[0.5, 1], [-1, 1], [7, -6]])
y_pred = array([[0, 2], [-1, 2], [8, -5]])

evaluator = RegressionMetric(y_true, y_pred)
# Return an array of scores for each column
print("CRM (Multi-output): ", evaluator.CRM(multi_output="raw_values"))