PERMETRICS Library
permetrics.utils package
- permetrics.utils.classifier_util module
- permetrics.utils.cluster_util module
calculate_adjusted_rand_score()calculate_ball_hall_index()calculate_banfeld_raftery_index()calculate_beale_index()calculate_calinski_harabasz_index()calculate_completeness_score()calculate_czekanowski_dice_score()calculate_davies_bouldin_index()calculate_dbcv_score()calculate_det_ratio_index()calculate_duda_hart_index()calculate_dunn_index()calculate_entropy_score()calculate_f_measure_score()calculate_fowlkes_mallows_score()calculate_gamma_score()calculate_gplus_score()calculate_hartigan_index()calculate_homogeneity_score()calculate_hubert_gamma_score()calculate_jaccard_score()calculate_ksq_detw_index()calculate_kulczynski_score()calculate_log_det_ratio_index()calculate_mc_nemar_score()calculate_mean_squared_error_index()calculate_mutual_info_score()calculate_normalized_mutual_info_score()calculate_phi_score()calculate_precision_score()calculate_purity_score()calculate_r_squared_index()calculate_rand_score()calculate_recall_score()calculate_rogers_tanimoto_score()calculate_russel_rao_score()calculate_silhouette_index()calculate_sokal_sneath1_score()calculate_sokal_sneath2_score()calculate_sum_squared_error_index()calculate_tau_score()calculate_v_measure_score()calculate_xie_beni_index()compute_BGSS()compute_TSS()compute_WG()compute_WGSS()compute_barycenters()compute_clusters()compute_conditional_entropy()compute_confusion_matrix()compute_contingency_matrix()compute_entropy()compute_nd_splus_sminus_t()sum_comb()
- permetrics.utils.data_util module
- permetrics.utils.encoder module
- permetrics.utils.regressor_util module
permetrics.evaluator module
- class permetrics.evaluator.Evaluator(y_true=None, y_pred=None, **kwargs)[source]
Bases:
objectThis is base class for all performance metrics
- EPSILON = 1e-10
- SUPPORT = {}
- get_metric_by_name(metric_name=<class 'str'>, paras=None) dict[source]
Get single metric by name, specific parameter of metric by dictionary
- Parameters:
metric_name (str) – Select name of metric
paras (dict) – Dictionary of hyper-parameter for that metric
- Returns:
{ metric_name: value }
- Return type:
result (dict)
- get_metrics_by_dict(metrics_dict: dict) dict[source]
Get results of list metrics by its name and parameters wrapped by dictionary
- Parameters:
metrics_dict (dict) – key is metric name and value is dict of parameters
- Returns:
e.g, { “RMSE”: 0.3524, “MAE”: 0.445263 }
- Return type:
dict
Examples
>>> evaluator.get_metrics_by_dict({ ... "RMSE": {"multi_output": "raw_values"}, ... "MAE": {"multi_output": "raw_values"} ... }) {"RMSE": 0.3524, "MAE": 0.445263}
- get_metrics_by_list_names(list_metric_names=<class 'list'>, list_paras=None) dict[source]
Get results of list metrics by its name and parameters
- Parameters:
list_metric_names (list) – e.g, [“RMSE”, “MAE”, “MAPE”]
list_paras (list) – e.g, [ {“multi_output”: “raw_values”}, {“multi_output”: “raw_values”}, {“multi_output”: [2, 3]} ]
- Returns:
e.g, { “RMSE”: 0.25, “MAE”: [0.3, 0.6], “MAPE”: 0.15 }
- Return type:
results (dict)
- get_output_result(result=None, n_out=None, multi_output=None, force_finite=None, finite_value=None)[source]
Get final output result based on selected parameter
- Parameters:
result – The raw result from metric
n_out – The number of column in y_true or y_pred
multi_output – raw_values - return multi-output, weights - return single output based on weights, else - return mean result
force_finite – Make result as finite number
finite_value – The value that used to replace the infinite value or NaN value.
- Returns:
Final output results based on selected parameter
- Return type:
final_result
permetrics.regression module
- class permetrics.regression.RegressionMetric(y_true=None, y_pred=None, **kwargs)[source]
Bases:
EvaluatorDefines a RegressionMetric class that hold all regression metrics (for both regression and time-series problems)
An extension of scikit-learn metrics section, with the addition of many more regression metrics.
https://scikit-learn.org/stable/modules/model_evaluation.html#classification-metrics
Some methods in scikit-learn can’t generate the multi-output metrics, we re-implement all of them and allow multi-output metrics
Therefore, we support calculate the multi-output metrics for all methods
- Parameters:
y_true (tuple, list, np.ndarray, default = None) – The ground truth values.
y_pred (tuple, list, np.ndarray, default = None) – The prediction values.
- A10(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=0.0, **kwargs)
A10 index (A10)
Notes
a10-index is engineering index for evaluating artificial intelligence models by showing the number of samples
that fit the prediction values with a deviation of ±10% compared to experimental values
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
A10 metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- A20(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=0.0, **kwargs)
A20 index (A20)
Notes
a20-index evaluated metric by showing the number of samples that fit the prediction values with a deviation of ±20% compared to experimental values
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
A20 metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- A30(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=0.0, **kwargs)
A30 index (A30)
Note: a30-index evaluated metric by showing the number of samples that fit the prediction values with a deviation of ±30% compared to experimental values
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
A30 metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- ACOD(y_true=None, y_pred=None, X_shape=None, multi_output='raw_values', force_finite=True, finite_value=0.0, **kwargs)
Adjusted Coefficient of Determination (ACOD/AR2)
Notes
Scikit-learn and other websites denoted COD as R^2 (or R squared), it leads to the misunderstanding of R^2 in which R is PCC.
We should denote it as COD or R2 only.
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
X_shape (tuple, list, np.ndarray) – The shape of X_train dataset
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
AR2 metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- AE(y_true=None, y_pred=None, **kwargs)
Absolute Error (AE) Note: Computes the absolute error between two numbers, or for element between a pair of list, tuple or numpy arrays.
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
- Returns:
AE metric
- Return type:
result (np.ndarray)
- APCC(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=0.0, **kwargs)
Absolute Pearson’s Correlation Coefficient (APCC or AR)
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
AR metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- AR(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=0.0, **kwargs)
Absolute Pearson’s Correlation Coefficient (APCC or AR)
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
AR metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- AR2(y_true=None, y_pred=None, X_shape=None, multi_output='raw_values', force_finite=True, finite_value=0.0, **kwargs)
Adjusted Coefficient of Determination (ACOD/AR2)
Notes
Scikit-learn and other websites denoted COD as R^2 (or R squared), it leads to the misunderstanding of R^2 in which R is PCC.
We should denote it as COD or R2 only.
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
X_shape (tuple, list, np.ndarray) – The shape of X_train dataset
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
AR2 metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- CE(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=-1.0, **kwargs)
Cross Entropy (CE)
Notes
Greater value of Entropy, the greater the uncertainty for probability distribution and smaller the value the less the uncertainty
https://datascience.stackexchange.com/questions/20296/cross-entropy-loss-explanation
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
CE metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- CI(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=0.0, **kwargs)
Confidence Index (or Performance Index): CI (PI)
Notes
Reference evapotranspiration for Londrina, Paraná, Brazil: performance of different estimation methods
> 0.85, Excellent
0.76-0.85, Very good
0.66-0.75, Good
0.61-0.65, Satisfactory
0.51-0.60, Poor
0.41-0.50, Bad
< 0.40, Very bad
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
CI (PI) metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- COD(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=0.0, **kwargs)
Coefficient of Determination (COD/R2)
Notes
https://scikit-learn.org/stable/modules/model_evaluation.html#r2-score
Scikit-learn and other websites denoted COD as R^2 (or R squared), it leads to the misunderstanding of R^2 in which R is PCC.
We should denote it as COD or R2 only.
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
R2 metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- COR(y_true=None, y_pred=None, sample=False, multi_output='raw_values', force_finite=True, finite_value=0.0, **kwargs)
Correlation (COR) Links: https://corporatefinanceinstitute.com/resources/data-science/covariance/
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
sample (bool) – sample covariance or population covariance. See the website above for more details
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
COR metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- COV(y_true=None, y_pred=None, sample=False, multi_output='raw_values', force_finite=True, finite_value=-10.0, **kwargs)
- Covariance (COV)
is a measure of the relationship between two random variables
evaluates how much – to what extent – the variables change together
does not assess the dependency between variables
Positive covariance: Indicates that two variables tend to move in the same direction.
Negative covariance: Reveals that two variables tend to move in inverse directions.
- Links:
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
sample (bool) – sample covariance or population covariance. See the website above for more details
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
COV metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- CRM(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=-1.0, **kwargs)
Coefficient of Residual Mass (CRM) Links: https://doi.org/10.1016/j.csite.2022.101797
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
CRM metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- DRV(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=10.0, **kwargs)
Deviation of Runoff Volume (DRV) Link: https://rstudio-pubs-static.s3.amazonaws.com/433152_56d00c1e29724829bad5fc4fd8c8ebff.html
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
DRV metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- EC(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=0.0, **kwargs)
Efficiency Coefficient (EC) Links: https://doi.org/10.1016/j.solener.2019.01.037
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
EC metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- EVS(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=0.0, **kwargs)
Explained Variance Score (EVS)
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
EVS metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- JSD(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=1.0, **kwargs)
Jensen-Shannon Divergence (JSD) Link: https://machinelearningmastery.com/divergence-between-probability-distributions/
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
JSD metric (bits) for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- KGE(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=0.0, **kwargs)
Kling-Gupta Efficiency (KGE) Link: https://rstudio-pubs-static.s3.amazonaws.com/433152_56d00c1e29724829bad5fc4fd8c8ebff.html
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
KGE metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- KLD(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=-1.0, **kwargs)
Kullback-Leibler Divergence (KLD) Link: https://machinelearningmastery.com/divergence-between-probability-distributions/
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
KLD metric (bits) for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- MAAPE(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=1.0, **kwargs)
Mean Arctangent Absolute Percentage Error (MAAPE)
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
MAAPE metric for single column or multiple columns (radian values)
- Return type:
result (float, int, np.ndarray)
- MAE(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=1.0, **kwargs)
Mean Absolute Error (MAE)
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
MAE metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- MAPE(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=1.0, **kwargs)
Mean Absolute Percentage Error (MAPE)
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
MAPE metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- MASE(y_true=None, y_pred=None, m=1, multi_output='raw_values', force_finite=True, finite_value=10000000000.0, **kwargs)
Mean Absolute Scaled Error (MASE) Link: https://en.wikipedia.org/wiki/Mean_absolute_scaled_error
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
m (int) – m = 1 for non-seasonal data, m > 1 for seasonal data. (Optional, default = 1)
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 1e10)
- Returns:
MASE metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- MBE(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=1.0, **kwargs)
Mean Bias Error (MBE)
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
MBE metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- ME(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=1.0, **kwargs)
Max Error (ME)
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
ME metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- MPE(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=1.0, **kwargs)
Mean Percentage Error (MPE) Link: https://www.dataquest.io/blog/understanding-regression-error-metrics/
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
MPE metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- MRB(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=1.0, **kwargs)
Mean Relative Error (MRE) - Mean Relative Bias (MRB)
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
MRE (MRB) metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- MRE(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=1.0, **kwargs)
Mean Relative Error (MRE) - Mean Relative Bias (MRB)
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
MRE (MRB) metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- MSE(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=1.0, **kwargs)
Mean Squared Error (MSE)
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
MSE metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- MSLE(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=1.0, **kwargs)
Mean Squared Log Error (MSLE) Link: https://peltarion.com/knowledge-center/documentation/modeling-view/build-an-ai-model/loss-functions/mean-squared-logarithmic-error-(msle)
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
MSLE metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- MedAE(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=1.0, **kwargs)
Median Absolute Error (MedAE)
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
MedAE metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- NGINI(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=0.0, **kwargs)
Normalized Gini Coefficient for Regression (Actuarial Lorenz / Ranking Power). Measures how well the predictions rank the actual continuous targets. Best possible score is 1.0 (perfect ranking), 0.0 is random ranking. Range = [-1, 1].
References
Frees, Edward W., Glenn Meyers, and A. David Cummings. “Summarizing insurance scores using a Gini index.” Journal of the American Statistical Association 106.495 (2011): 1085-1098.
- NNSE(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=0.0, **kwargs)
Normalize Nash-Sutcliffe Efficiency (NNSE) Link: https://agrimetsoft.com/calculators/Nash%20Sutcliffe%20model%20Efficiency%20coefficient
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
NSE metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- NRMSE(y_true=None, y_pred=None, normalization='mean', multi_output='raw_values', force_finite=True, finite_value=10000000000.0, **kwargs)
Normalized Root Mean Square Error (NRMSE).
References
https://www.marinedatascience.co/blog/2019/01/07/normalizing-the-rmse/
https://en.wikipedia.org/wiki/Root-mean-square_deviation#Normalized_root-mean-square_deviation
https://search.r-project.org/CRAN/refmans/hydroGOF/html/nrmse.html
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
normalization (str) – The method to normalize RMSE. Valid values: - “mean”: Normalizes by the mean of y_true (also known as CV(RMSE)). - “range”: Normalizes by the difference between max and min of y_true. - “std”: Normalizes by the standard deviation of y_true. - “iqr”: Normalizes by the Interquartile Range (Q3 - Q1) of y_true.
multi_output – Can be “raw_values” or list weights of variables.
force_finite (bool) – Replace NaN or Inf results with finite_value.
finite_value (float) – Replacement value for Non-finite errors.
- NSE(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=0.0, **kwargs)
Nash-Sutcliffe Efficiency (NSE) Link: https://agrimetsoft.com/calculators/Nash%20Sutcliffe%20model%20Efficiency%20coefficient
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
NSE metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- OI(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=0.0, **kwargs)
Overall Index (OI) Links: https://doi.org/10.1016/j.solener.2019.01.037
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
OI metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- PCC(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=-1.0, **kwargs)
Pearson’s Correlation Coefficient (PCC or R) .. rubric:: Notes
Reference evapotranspiration for Londrina, Paraná, Brazil: performance of different estimation methods
Remember no absolute in the equations
https://en.wikipedia.org/wiki/Pearson_correlation_coefficient
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
R metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- PCD(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=0.0, **kwargs)
Prediction of Change in Direction (PCD)
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
PCD metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- R(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=-1.0, **kwargs)
Pearson’s Correlation Coefficient (PCC or R) .. rubric:: Notes
Reference evapotranspiration for Londrina, Paraná, Brazil: performance of different estimation methods
Remember no absolute in the equations
https://en.wikipedia.org/wiki/Pearson_correlation_coefficient
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
R metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- R2(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=0.0, **kwargs)
Coefficient of Determination (COD/R2)
Notes
https://scikit-learn.org/stable/modules/model_evaluation.html#r2-score
Scikit-learn and other websites denoted COD as R^2 (or R squared), it leads to the misunderstanding of R^2 in which R is PCC.
We should denote it as COD or R2 only.
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
R2 metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- R2S(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=0.0, **kwargs)
(Pearson’s Correlation Index)^2 = R^2 = R2S = RSQ (R square)
Notes
Do not misunderstand between R2s and R2 (Coefficient of Determination), they are different
Most of online tutorials (article, wikipedia,…) or even scikit-learn library are denoted the wrong R2s and R2.
R^2 = R2s = R squared should be (Pearson’s Correlation Index)^2
Meanwhile, R2 = Coefficient of Determination
https://en.wikipedia.org/wiki/Pearson_correlation_coefficient
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
R2s metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- RAE(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=10000000000.0, **kwargs)
Relative Absolute Error (RAE)
Notes
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
RAE metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- RB(y_true=None, y_pred=None, **kwargs)
Relative Error (RE) Note: Computes the relative error between two numbers, or for element between a pair of list, tuple or numpy arrays.
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
- Returns:
RE metric
- Return type:
result (np.ndarray)
- RE(y_true=None, y_pred=None, **kwargs)
Relative Error (RE) Note: Computes the relative error between two numbers, or for element between a pair of list, tuple or numpy arrays.
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
- Returns:
RE metric
- Return type:
result (np.ndarray)
- RGINI(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=0.0, **kwargs)
Gini Index of Absolute Residuals (Error Dispersion).
References
Yitzhaki, Shlomo, and Edna Schechtman. The Gini methodology: a primer on a statistical methodology. Vol. 272. Springer Science & Business Media, 2012.
- RMSE(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=1.0, **kwargs)
Root Mean Squared Error (RMSE)
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
RMSE metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- RRSE(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=0.0, **kwargs)
Root Relative Squared Error (RRSE)
Notes
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
RRSE metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- RSE(y_true=None, y_pred=None, X_shape=None, multi_output='raw_values', force_finite=True, finite_value=1.0, **kwargs)
Residual Standard Error (RSE)
- Links:
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
X_shape (tuple, list, np.ndarray) – The shape of X_train dataset
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
RSE metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- RSQ(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=0.0, **kwargs)
(Pearson’s Correlation Index)^2 = R^2 = R2S = RSQ (R square)
Notes
Do not misunderstand between R2s and R2 (Coefficient of Determination), they are different
Most of online tutorials (article, wikipedia,…) or even scikit-learn library are denoted the wrong R2s and R2.
R^2 = R2s = R squared should be (Pearson’s Correlation Index)^2
Meanwhile, R2 = Coefficient of Determination
https://en.wikipedia.org/wiki/Pearson_correlation_coefficient
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
R2s metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- SE(y_true=None, y_pred=None, **kwargs)
Squared Error (SE) Note: Computes the squared error between two numbers, or for element between a pair of list, tuple or numpy arrays.
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
- Returns:
SE metric
- Return type:
result (np.ndarray)
- SLE(y_true=None, y_pred=None, **kwargs)
Squared Log Error (SLE) Note: Computes the squared log error between two numbers, or for element between a pair of list, tuple or numpy arrays.
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
- Returns:
SLE metric
- Return type:
result (np.ndarray)
- SMAPE(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=200.0, **kwargs)
Symmetric Mean Absolute Percentage Error (SMAPE). Original version. Range [0, 200%]. References: Forecasting, Long-Range. “From Crystal Ball to Computer.” Scott Armstrong Robert J. Genetski (1978).
- SMAPE_NP(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=2.0, **kwargs)
Symmetric Mean Absolute Percentage Error (SMAPE_NP) Original version. Range [0, 2]. References: Forecasting, Long-Range. “From Crystal Ball to Computer.” Scott Armstrong Robert J. Genetski (1978).
- SMAPE_S(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=1.0, **kwargs)
Symmetric Mean Absolute Percentage Error Simplified (SMAPE_S). Simplified version of SMAPE with Range [0, 1] (or [0, 100%]), smaller is better. References: Makridakis, Spyros. “Accuracy measures: theoretical and practical concerns.” International journal of forecasting 9.4 (1993): 527-529.
- SMAPE_S_P(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=100.0, **kwargs)
Symmetric Mean Absolute Percentage Error Simplified (SMAPE_S_P). Simplified version of SMAPE with Range [0, 1] (or [0, 100%]), smaller is better. References: Makridakis, Spyros. “Accuracy measures: theoretical and practical concerns.” International journal of forecasting 9.4 (1993): 527-529.
- SUPPORT = {'A10': {'best': '1', 'range': '[0, 1]', 'type': 'max'}, 'A20': {'best': '1', 'range': '[0, 1]', 'type': 'max'}, 'A30': {'best': '1', 'range': '[0, 1]', 'type': 'max'}, 'ACOD': {'best': '1', 'range': '(-inf, 1]', 'type': 'max'}, 'AE': {'best': '0', 'range': '[0, +inf)', 'type': 'min'}, 'APCC': {'best': '1', 'range': '[0, 1]', 'type': 'max'}, 'AR': {'best': '1', 'range': '[0, 1]', 'type': 'max'}, 'AR2': {'best': '1', 'range': '(-inf, 1]', 'type': 'max'}, 'CE': {'best': '0', 'range': '[0, +inf)', 'type': 'min'}, 'CI': {'best': '1', 'range': '[-1, 1]', 'type': 'max'}, 'COD': {'best': '1', 'range': '(-inf, 1]', 'type': 'max'}, 'COR': {'best': '1', 'range': '[-1, 1]', 'type': 'max'}, 'COV': {'best': 'unknown', 'range': '(-inf, +inf)', 'type': 'max'}, 'CRM': {'best': '0', 'range': '(-inf, +inf)', 'type': 'min'}, 'DRV': {'best': '1', 'range': '(-inf, +inf)', 'type': 'unknown'}, 'EC': {'best': '1', 'range': '(-inf, 1]', 'type': 'max'}, 'EVS': {'best': '1', 'range': '(-inf, 1]', 'type': 'max'}, 'JSD': {'best': '0', 'range': '[0, 1]', 'type': 'min'}, 'KGE': {'best': '1', 'range': '(-inf, 1]', 'type': 'max'}, 'KLD': {'best': '0', 'range': '[0, +inf)', 'type': 'min'}, 'MAAPE': {'best': '0', 'range': '[0, +1.5708)', 'type': 'min'}, 'MAE': {'best': '0', 'range': '[0, +inf)', 'type': 'min'}, 'MAPE': {'best': '0', 'range': '[0, +inf)', 'type': 'min'}, 'MASE': {'best': '0', 'range': '[0, +inf)', 'type': 'min'}, 'MBE': {'best': '0', 'range': '(-inf, +inf)', 'type': 'unknown'}, 'ME': {'best': '0', 'range': '[0, +inf)', 'type': 'min'}, 'MPE': {'best': '0', 'range': '(-inf, +inf)', 'type': 'unknown'}, 'MRB': {'best': '0', 'range': '[0, +inf)', 'type': 'min'}, 'MRE': {'best': '0', 'range': '[0, +inf)', 'type': 'min'}, 'MSE': {'best': '0', 'range': '[0, +inf)', 'type': 'min'}, 'MSLE': {'best': '0', 'range': '[0, +inf)', 'type': 'min'}, 'MedAE': {'best': '0', 'range': '[0, +inf)', 'type': 'min'}, 'NGINI': {'best': '1', 'range': '[-1, +1]', 'type': 'max'}, 'NNSE': {'best': '1', 'range': '(0, 1]', 'type': 'max'}, 'NRMSE': {'best': '0', 'range': '[0, +inf)', 'type': 'min'}, 'NSE': {'best': '1', 'range': '(-inf, 1]', 'type': 'max'}, 'OI': {'best': '1', 'range': '(-inf, 1]', 'type': 'max'}, 'PCC': {'best': '1', 'range': '[-1, 1]', 'type': 'max'}, 'PCD': {'best': '1', 'range': '[0, 1]', 'type': 'max'}, 'R': {'best': '1', 'range': '[-1, 1]', 'type': 'max'}, 'R2': {'best': '1', 'range': '(-inf, 1]', 'type': 'max'}, 'R2S': {'best': '1', 'range': '[0, 1]', 'type': 'max'}, 'RAE': {'best': '0', 'range': '[0, +inf)', 'type': 'min'}, 'RB': {'best': '0', 'range': '[0, +inf)', 'type': 'min'}, 'RE': {'best': '0', 'range': '[0, +inf)', 'type': 'min'}, 'RGINI': {'best': '0', 'range': '[0, +1]', 'type': 'min'}, 'RMSE': {'best': '0', 'range': '[0, +inf)', 'type': 'min'}, 'RRSE': {'best': '0', 'range': '[0, +inf)', 'type': 'min'}, 'RSE': {'best': '0', 'range': '[0, +inf)', 'type': 'min'}, 'RSQ': {'best': '1', 'range': '[0, 1]', 'type': 'max'}, 'SE': {'best': '0', 'range': '[0, +inf)', 'type': 'min'}, 'SLE': {'best': '0', 'range': '[0, +inf)', 'type': 'min'}, 'SMAPE': {'best': '0', 'range': '[0, 200]', 'type': 'min'}, 'SMAPE_NP': {'best': '0', 'range': '[0, 2]', 'type': 'min'}, 'SMAPE_S': {'best': '0', 'range': '[0, 1]', 'type': 'min'}, 'SMAPE_S_P': {'best': '0', 'range': '[0, 100]', 'type': 'min'}, 'VAF': {'best': '100', 'range': '(-inf, 100)', 'type': 'max'}, 'WI': {'best': '1', 'range': '[0, 1]', 'type': 'max'}}
- VAF(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=0.0, **kwargs)
Variance Accounted For between 2 signals (VAF) Link: https://www.dcsc.tudelft.nl/~jwvanwingerden/lti/doc/html/vaf.html
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
VAF metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- WI(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=0.0, **kwargs)
Willmott Index (WI)
Notes
Reference evapotranspiration for Londrina, Paraná, Brazil: performance of different estimation methods
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
WI metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- a10_index(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=0.0, **kwargs)[source]
A10 index (A10)
Notes
a10-index is engineering index for evaluating artificial intelligence models by showing the number of samples
that fit the prediction values with a deviation of ±10% compared to experimental values
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
A10 metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- a20_index(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=0.0, **kwargs)[source]
A20 index (A20)
Notes
a20-index evaluated metric by showing the number of samples that fit the prediction values with a deviation of ±20% compared to experimental values
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
A20 metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- a30_index(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=0.0, **kwargs)[source]
A30 index (A30)
Note: a30-index evaluated metric by showing the number of samples that fit the prediction values with a deviation of ±30% compared to experimental values
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
A30 metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- absolute_pearson_correlation_coefficient(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=0.0, **kwargs)[source]
Absolute Pearson’s Correlation Coefficient (APCC or AR)
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
AR metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- adjusted_coefficient_of_determination(y_true=None, y_pred=None, X_shape=None, multi_output='raw_values', force_finite=True, finite_value=0.0, **kwargs)[source]
Adjusted Coefficient of Determination (ACOD/AR2)
Notes
Scikit-learn and other websites denoted COD as R^2 (or R squared), it leads to the misunderstanding of R^2 in which R is PCC.
We should denote it as COD or R2 only.
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
X_shape (tuple, list, np.ndarray) – The shape of X_train dataset
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
AR2 metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- coefficient_of_determination(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=0.0, **kwargs)[source]
Coefficient of Determination (COD/R2)
Notes
https://scikit-learn.org/stable/modules/model_evaluation.html#r2-score
Scikit-learn and other websites denoted COD as R^2 (or R squared), it leads to the misunderstanding of R^2 in which R is PCC.
We should denote it as COD or R2 only.
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
R2 metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- coefficient_of_residual_mass(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=-1.0, **kwargs)[source]
Coefficient of Residual Mass (CRM) Links: https://doi.org/10.1016/j.csite.2022.101797
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
CRM metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- confidence_index(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=0.0, **kwargs)[source]
Confidence Index (or Performance Index): CI (PI)
Notes
Reference evapotranspiration for Londrina, Paraná, Brazil: performance of different estimation methods
> 0.85, Excellent
0.76-0.85, Very good
0.66-0.75, Good
0.61-0.65, Satisfactory
0.51-0.60, Poor
0.41-0.50, Bad
< 0.40, Very bad
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
CI (PI) metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- correlation(y_true=None, y_pred=None, sample=False, multi_output='raw_values', force_finite=True, finite_value=0.0, **kwargs)[source]
Correlation (COR) Links: https://corporatefinanceinstitute.com/resources/data-science/covariance/
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
sample (bool) – sample covariance or population covariance. See the website above for more details
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
COR metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- covariance(y_true=None, y_pred=None, sample=False, multi_output='raw_values', force_finite=True, finite_value=-10.0, **kwargs)[source]
- Covariance (COV)
is a measure of the relationship between two random variables
evaluates how much – to what extent – the variables change together
does not assess the dependency between variables
Positive covariance: Indicates that two variables tend to move in the same direction.
Negative covariance: Reveals that two variables tend to move in inverse directions.
- Links:
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
sample (bool) – sample covariance or population covariance. See the website above for more details
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
COV metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- cross_entropy(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=-1.0, **kwargs)[source]
Cross Entropy (CE)
Notes
Greater value of Entropy, the greater the uncertainty for probability distribution and smaller the value the less the uncertainty
https://datascience.stackexchange.com/questions/20296/cross-entropy-loss-explanation
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
CE metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- deviation_of_runoff_volume(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=10.0, **kwargs)[source]
Deviation of Runoff Volume (DRV) Link: https://rstudio-pubs-static.s3.amazonaws.com/433152_56d00c1e29724829bad5fc4fd8c8ebff.html
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
DRV metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- efficiency_coefficient(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=0.0, **kwargs)[source]
Efficiency Coefficient (EC) Links: https://doi.org/10.1016/j.solener.2019.01.037
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
EC metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- explained_variance_score(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=0.0, **kwargs)[source]
Explained Variance Score (EVS)
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
EVS metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- get_processed_data(y_true=None, y_pred=None, **kwargs)[source]
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
- Returns:
y_true used in evaluation process. y_pred_final: y_pred used in evaluation process n_out: Number of outputs
- Return type:
y_true_final
- static get_support(name=None, verbose=True)[source]
- jensen_shannon_divergence(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=1.0, **kwargs)[source]
Jensen-Shannon Divergence (JSD) Link: https://machinelearningmastery.com/divergence-between-probability-distributions/
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
JSD metric (bits) for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- kling_gupta_efficiency(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=0.0, **kwargs)[source]
Kling-Gupta Efficiency (KGE) Link: https://rstudio-pubs-static.s3.amazonaws.com/433152_56d00c1e29724829bad5fc4fd8c8ebff.html
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
KGE metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- kullback_leibler_divergence(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=-1.0, **kwargs)[source]
Kullback-Leibler Divergence (KLD) Link: https://machinelearningmastery.com/divergence-between-probability-distributions/
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
KLD metric (bits) for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- max_error(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=1.0, **kwargs)[source]
Max Error (ME)
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
ME metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- mean_absolute_error(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=1.0, **kwargs)[source]
Mean Absolute Error (MAE)
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
MAE metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- mean_absolute_percentage_error(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=1.0, **kwargs)[source]
Mean Absolute Percentage Error (MAPE)
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
MAPE metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- mean_absolute_scaled_error(y_true=None, y_pred=None, m=1, multi_output='raw_values', force_finite=True, finite_value=10000000000.0, **kwargs)[source]
Mean Absolute Scaled Error (MASE) Link: https://en.wikipedia.org/wiki/Mean_absolute_scaled_error
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
m (int) – m = 1 for non-seasonal data, m > 1 for seasonal data. (Optional, default = 1)
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 1e10)
- Returns:
MASE metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- mean_arctangent_absolute_percentage_error(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=1.0, **kwargs)[source]
Mean Arctangent Absolute Percentage Error (MAAPE)
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
MAAPE metric for single column or multiple columns (radian values)
- Return type:
result (float, int, np.ndarray)
- mean_bias_error(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=1.0, **kwargs)[source]
Mean Bias Error (MBE)
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
MBE metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- mean_percentage_error(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=1.0, **kwargs)[source]
Mean Percentage Error (MPE) Link: https://www.dataquest.io/blog/understanding-regression-error-metrics/
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
MPE metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- mean_relative_bias(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=1.0, **kwargs)
Mean Relative Error (MRE) - Mean Relative Bias (MRB)
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
MRE (MRB) metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- mean_relative_error(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=1.0, **kwargs)[source]
Mean Relative Error (MRE) - Mean Relative Bias (MRB)
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
MRE (MRB) metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- mean_squared_error(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=1.0, **kwargs)[source]
Mean Squared Error (MSE)
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
MSE metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- mean_squared_log_error(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=1.0, **kwargs)[source]
Mean Squared Log Error (MSLE) Link: https://peltarion.com/knowledge-center/documentation/modeling-view/build-an-ai-model/loss-functions/mean-squared-logarithmic-error-(msle)
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
MSLE metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- median_absolute_error(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=1.0, **kwargs)[source]
Median Absolute Error (MedAE)
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
MedAE metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- nash_sutcliffe_efficiency(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=0.0, **kwargs)[source]
Nash-Sutcliffe Efficiency (NSE) Link: https://agrimetsoft.com/calculators/Nash%20Sutcliffe%20model%20Efficiency%20coefficient
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
NSE metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- normalized_gini_coefficient(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=0.0, **kwargs)[source]
Normalized Gini Coefficient for Regression (Actuarial Lorenz / Ranking Power). Measures how well the predictions rank the actual continuous targets. Best possible score is 1.0 (perfect ranking), 0.0 is random ranking. Range = [-1, 1].
References
Frees, Edward W., Glenn Meyers, and A. David Cummings. “Summarizing insurance scores using a Gini index.” Journal of the American Statistical Association 106.495 (2011): 1085-1098.
- normalized_nash_sutcliffe_efficiency(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=0.0, **kwargs)[source]
Normalize Nash-Sutcliffe Efficiency (NNSE) Link: https://agrimetsoft.com/calculators/Nash%20Sutcliffe%20model%20Efficiency%20coefficient
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
NSE metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- normalized_root_mean_square_error(y_true=None, y_pred=None, normalization='mean', multi_output='raw_values', force_finite=True, finite_value=10000000000.0, **kwargs)[source]
Normalized Root Mean Square Error (NRMSE).
References
https://www.marinedatascience.co/blog/2019/01/07/normalizing-the-rmse/
https://en.wikipedia.org/wiki/Root-mean-square_deviation#Normalized_root-mean-square_deviation
https://search.r-project.org/CRAN/refmans/hydroGOF/html/nrmse.html
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
normalization (str) – The method to normalize RMSE. Valid values: - “mean”: Normalizes by the mean of y_true (also known as CV(RMSE)). - “range”: Normalizes by the difference between max and min of y_true. - “std”: Normalizes by the standard deviation of y_true. - “iqr”: Normalizes by the Interquartile Range (Q3 - Q1) of y_true.
multi_output – Can be “raw_values” or list weights of variables.
force_finite (bool) – Replace NaN or Inf results with finite_value.
finite_value (float) – Replacement value for Non-finite errors.
- overall_index(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=0.0, **kwargs)[source]
Overall Index (OI) Links: https://doi.org/10.1016/j.solener.2019.01.037
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
OI metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- pearson_correlation_coefficient(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=-1.0, **kwargs)[source]
Pearson’s Correlation Coefficient (PCC or R) .. rubric:: Notes
Reference evapotranspiration for Londrina, Paraná, Brazil: performance of different estimation methods
Remember no absolute in the equations
https://en.wikipedia.org/wiki/Pearson_correlation_coefficient
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
R metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- pearson_correlation_coefficient_square(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=0.0, **kwargs)[source]
(Pearson’s Correlation Index)^2 = R^2 = R2S = RSQ (R square)
Notes
Do not misunderstand between R2s and R2 (Coefficient of Determination), they are different
Most of online tutorials (article, wikipedia,…) or even scikit-learn library are denoted the wrong R2s and R2.
R^2 = R2s = R squared should be (Pearson’s Correlation Index)^2
Meanwhile, R2 = Coefficient of Determination
https://en.wikipedia.org/wiki/Pearson_correlation_coefficient
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
R2s metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- prediction_of_change_in_direction(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=0.0, **kwargs)[source]
Prediction of Change in Direction (PCD)
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
PCD metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- relative_absolute_error(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=10000000000.0, **kwargs)[source]
Relative Absolute Error (RAE)
Notes
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
RAE metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- residual_gini_index(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=0.0, **kwargs)[source]
Gini Index of Absolute Residuals (Error Dispersion).
References
Yitzhaki, Shlomo, and Edna Schechtman. The Gini methodology: a primer on a statistical methodology. Vol. 272. Springer Science & Business Media, 2012.
- residual_standard_error(y_true=None, y_pred=None, X_shape=None, multi_output='raw_values', force_finite=True, finite_value=1.0, **kwargs)[source]
Residual Standard Error (RSE)
- Links:
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
X_shape (tuple, list, np.ndarray) – The shape of X_train dataset
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
RSE metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- root_mean_squared_error(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=1.0, **kwargs)[source]
Root Mean Squared Error (RMSE)
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
RMSE metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- root_relative_squared_error(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=0.0, **kwargs)[source]
Root Relative Squared Error (RRSE)
Notes
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
RRSE metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- single_absolute_error(y_true=None, y_pred=None, **kwargs)[source]
Absolute Error (AE) Note: Computes the absolute error between two numbers, or for element between a pair of list, tuple or numpy arrays.
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
- Returns:
AE metric
- Return type:
result (np.ndarray)
- single_relative_bias(y_true=None, y_pred=None, **kwargs)
Relative Error (RE) Note: Computes the relative error between two numbers, or for element between a pair of list, tuple or numpy arrays.
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
- Returns:
RE metric
- Return type:
result (np.ndarray)
- single_relative_error(y_true=None, y_pred=None, **kwargs)[source]
Relative Error (RE) Note: Computes the relative error between two numbers, or for element between a pair of list, tuple or numpy arrays.
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
- Returns:
RE metric
- Return type:
result (np.ndarray)
- single_squared_error(y_true=None, y_pred=None, **kwargs)[source]
Squared Error (SE) Note: Computes the squared error between two numbers, or for element between a pair of list, tuple or numpy arrays.
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
- Returns:
SE metric
- Return type:
result (np.ndarray)
- single_squared_log_error(y_true=None, y_pred=None, **kwargs)[source]
Squared Log Error (SLE) Note: Computes the squared log error between two numbers, or for element between a pair of list, tuple or numpy arrays.
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
- Returns:
SLE metric
- Return type:
result (np.ndarray)
- symmetric_mean_absolute_percentage_error(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=200.0, **kwargs)[source]
Symmetric Mean Absolute Percentage Error (SMAPE). Original version. Range [0, 200%]. References: Forecasting, Long-Range. “From Crystal Ball to Computer.” Scott Armstrong Robert J. Genetski (1978).
- symmetric_mean_absolute_percentage_error_np(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=2.0, **kwargs)[source]
Symmetric Mean Absolute Percentage Error (SMAPE_NP) Original version. Range [0, 2]. References: Forecasting, Long-Range. “From Crystal Ball to Computer.” Scott Armstrong Robert J. Genetski (1978).
- symmetric_mean_absolute_percentage_error_simplified(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=1.0, **kwargs)[source]
Symmetric Mean Absolute Percentage Error Simplified (SMAPE_S). Simplified version of SMAPE with Range [0, 1] (or [0, 100%]), smaller is better. References: Makridakis, Spyros. “Accuracy measures: theoretical and practical concerns.” International journal of forecasting 9.4 (1993): 527-529.
- symmetric_mean_absolute_percentage_error_simplified_p(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=100.0, **kwargs)[source]
Symmetric Mean Absolute Percentage Error Simplified (SMAPE_S_P). Simplified version of SMAPE with Range [0, 1] (or [0, 100%]), smaller is better. References: Makridakis, Spyros. “Accuracy measures: theoretical and practical concerns.” International journal of forecasting 9.4 (1993): 527-529.
- variance_accounted_for(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=0.0, **kwargs)[source]
Variance Accounted For between 2 signals (VAF) Link: https://www.dcsc.tudelft.nl/~jwvanwingerden/lti/doc/html/vaf.html
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
VAF metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
- willmott_index(y_true=None, y_pred=None, multi_output='raw_values', force_finite=True, finite_value=0.0, **kwargs)[source]
Willmott Index (WI)
Notes
Reference evapotranspiration for Londrina, Paraná, Brazil: performance of different estimation methods
- Parameters:
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
multi_output – Can be “raw_values” or list weights of variables such as [0.5, 0.2, 0.3] for 3 columns, (Optional, default = “raw_values”)
force_finite (bool) – When result is not finite, it can be NaN or Inf. Their result will be replaced by finite_value (Optional, default = True)
finite_value (float) – The finite value used to replace Inf or NaN result (Optional, default = 0.0)
- Returns:
WI metric for single column or multiple columns
- Return type:
result (float, int, np.ndarray)
permetrics.classification module
- class permetrics.classification.ClassificationMetric(y_true=None, y_pred=None, **kwargs)[source]
Bases:
EvaluatorA class for evaluating classification metrics.
- Parameters:
y_true (tuple, list, np.ndarray, optional) – The ground truth values. Default is None.
y_pred (tuple, list, np.ndarray, optional) – The predicted values. Default is None.
labels (tuple, list, np.ndarray, optional) – List of labels to index the matrix. This may be used to reorder or select a subset of labels. Default is None.
pos_label (int or str) – Positive label for binary classification.
average (str or None, optional) – Determines the type of averaging performed on the data. Options are: - ‘binary’: Calculate for binary classification problem - ‘micro’: Calculate metrics globally by considering each element of the label indicator matrix as a label. - ‘macro’: Calculate metrics for each label and find their unweighted mean. - ‘weighted’: Calculate metrics for each label and find their average, weighted by support. - None: Scores for each class are returned. Default is “binary”.
- get_support(name=None, verbose=True)[source]
Retrieve the support information for a specific metric or all metrics.
- get_processed_data(y_true=None, y_pred=None)[source]
Process and format the input data for evaluation.
- get_processed_data2(y_true=None, y_pred=None)[source]
Process and format the input data for ROC and probability-based metrics.
- precision_score(...)[source]
Calculate the precision score.
- negative_predictive_value(...)[source]
Calculate the negative predictive value.
- specificity_score(...)[source]
Calculate the specificity score.
- recall_score(...)[source]
Calculate the recall score.
- f1_score(...)[source]
Calculate the F1 score.
- f2_score(...)[source]
Calculate the F2 score.
- fbeta_score(...)[source]
Calculate the F-beta score.
- matthews_correlation_coefficient(...)[source]
Calculate the Matthews correlation coefficient.
- hamming_loss(...)[source]
Calculate the hamming loss.
- lift_score(...)[source]
Calculate the lift score.
- cohen_kappa_score(...)[source]
Calculate the Cohen’s kappa score.
- jaccard_similarity_index(...)[source]
Calculate the Jaccard similarity index.
- g_mean_score(...)[source]
Calculate the geometric mean score.
- accuracy_score(...)[source]
Calculate the accuracy score.
- confusion_matrix(...)[source]
Generate the confusion matrix.
- roc_auc_score(...)[source]
Calculate the ROC-AUC score.
- gini_index(...)[source]
Calculate the Gini index.
- brier_score_loss(...)[source]
Calculate the Brier score loss.
- crossentropy_loss(...)[source]
Calculate the cross-entropy loss.
- hinge_loss(...)[source]
Calculate the hinge loss.
- kullback_leibler_divergence_loss(...)[source]
Calculate the Kullback-Leibler divergence loss.
- AS(y_true=None, y_pred=None, normalize=True, sample_weight=None, **kwargs)
- Parameters:
y_true (array-like, optional) – Ground truth (correct) target values.
y_pred (array-like, optional) – Estimated target values.
normalize (bool, optional) – If True, return the fraction of correctly classified samples (float). If False, return the number of correctly classified samples (int).
sample_weight (array-like, optional) – Sample weights.
- Returns:
Accuracy score.
- Return type:
float or int
- AUC(y_true=None, y_pred=None, average='macro', **kwargs)
Compute the Area Under the Receiver Operating Characteristic Curve (ROC AUC).
- Parameters:
y_true (array-like, optional) – Ground truth (correct) target values.
y_pred (array-like, optional) – Estimated probabilities or decision function.
average (str, optional) – Averaging method (‘macro’, ‘weighted’, or None).
- Returns:
ROC AUC score.
- Return type:
float or dict
- BSL(y_true=None, y_pred=None, **kwargs)
- Parameters:
y_true (array-like, optional) – Ground truth (correct) target values.
y_pred (array-like, optional) – Predicted probabilities.
- Returns:
Brier score loss.
- Return type:
float
- CEL(y_true=None, y_pred=None, **kwargs)
- Parameters:
y_true (array-like, optional) – Ground truth (correct) target values.
y_pred (array-like, optional) – Predicted probabilities.
- Returns:
Cross-entropy loss.
- Return type:
float
- CKS(y_true=None, y_pred=None, labels=None, pos_label=1, average='binary', **kwargs)
- Parameters:
y_true (array-like, optional) – Ground truth values.
y_pred (array-like, optional) – Predicted values.
labels (list, optional) – List of labels to include in the calculation.
pos_label (int or str, optional) – The positive class label for binary classification.
average (str, optional) – Averaging method (‘binary’, ‘micro’, ‘macro’, ‘weighted’, or None).
- Returns:
Cohen’s kappa score.
- Return type:
float or dict
- CM(y_true=None, y_pred=None, labels=None, normalize=None, **kwargs)
Compute the confusion matrix for classification tasks.
- Parameters:
y_true (array-like) – Ground truth (correct) labels.
y_pred (array-like) – Predicted labels.
labels (list, optional) – Subset of labels to include in the matrix. Default is None.
normalize (str, optional) – Normalization method. One of {“true”, “pred”, “all”}. - “true”: Normalize rows (true labels). - “pred”: Normalize columns (predicted labels). - “all”: Normalize the entire matrix. Default is None (no normalization).
- Returns:
matrix (ndarray): Confusion matrix (normalized if specified).
imap (dict): Mapping of labels to matrix indices.
imap_count (dict): Count of true labels for each class.
- Return type:
tuple
- Raises:
ValueError – If specified labels do not exist in y_true or y_pred.
- F1S(y_true=None, y_pred=None, labels=None, pos_label=1, average='binary', **kwargs)
- Parameters:
y_true (array-like, optional) – Ground truth values.
y_pred (array-like, optional) – Predicted values.
labels (list, optional) – List of labels to include in the calculation.
pos_label (int or str, optional) – The positive class label for binary classification.
average (str, optional) – Averaging method (‘binary’, ‘micro’, ‘macro’, ‘weighted’, or None).
- Returns:
F1 score.
- Return type:
float or dict
- F2S(y_true=None, y_pred=None, labels=None, pos_label=1, average='binary', **kwargs)
- Parameters:
y_true (array-like, optional) – Ground truth values.
y_pred (array-like, optional) – Predicted values.
labels (list, optional) – List of labels to include in the calculation.
pos_label (int or str, optional) – The positive class label for binary classification.
average (str, optional) – Averaging method (‘binary’, ‘micro’, ‘macro’, ‘weighted’, or None).
- Returns:
F2 score.
- Return type:
float or dict
- FBS(y_true=None, y_pred=None, beta=1.0, labels=None, pos_label=1, average='binary', **kwargs)
- Parameters:
y_true (array-like, optional) – Ground truth values.
y_pred (array-like, optional) – Predicted values.
beta (float, optional) – Weight of recall in the F-beta score.
labels (list, optional) – List of labels to include in the calculation.
pos_label (int or str, optional) – The positive class label for binary classification.
average (str, optional) – Averaging method (‘binary’, ‘micro’, ‘macro’, ‘weighted’, or None).
- Returns:
F-beta score.
- Return type:
float or dict
- GINI(y_true=None, y_pred=None, **kwargs)
Compute the Gini index based on the ROC AUC score.
- Parameters:
y_true (array-like, optional) – Ground truth (correct) target values.
y_pred (array-like, optional) – Estimated probabilities or decision function.
- Returns:
Gini index.
- Return type:
float or dict
- GMS(y_true=None, y_pred=None, labels=None, pos_label=1, average='binary', **kwargs)
- Parameters:
y_true (array-like, optional) – Ground truth values.
y_pred (array-like, optional) – Predicted values.
labels (list, optional) – List of labels to include in the calculation.
pos_label (int or str, optional) – The positive class label for binary classification.
average (str, optional) – Averaging method (‘binary’, ‘micro’, ‘macro’, ‘weighted’, or None).
- Returns:
Geometric mean (G-mean) score.
- Return type:
float or dict
- HGL(y_true=None, y_pred=None, **kwargs)
- Parameters:
y_true (array-like, optional) – Ground truth (correct) target values.
y_pred (array-like, optional) – Predicted scores.
- Returns:
Hinge loss.
- Return type:
float
- HML(y_true=None, y_pred=None, labels=None, pos_label=1, average='binary', **kwargs)
- Parameters:
y_true (array-like, optional) – Ground truth values.
y_pred (array-like, optional) – Predicted values.
labels (list, optional) – List of labels to include in the calculation.
pos_label (int or str, optional) – The positive class label for binary classification.
average (str, optional) – Averaging method (‘binary’, ‘micro’, ‘macro’, ‘weighted’, or None).
- Returns:
Hamming loss.
- Return type:
float or dict
- JSC(y_true=None, y_pred=None, labels=None, pos_label=1, average='binary', **kwargs)
- Parameters:
y_true (array-like, optional) – Ground truth values.
y_pred (array-like, optional) – Predicted values.
labels (list, optional) – List of labels to include in the calculation.
pos_label (int or str, optional) – The positive class label for binary classification.
average (str, optional) – Averaging method (‘binary’, ‘micro’, ‘macro’, ‘weighted’, or None).
- Returns:
Jaccard similarity index.
- Return type:
float or dict
- JSI(y_true=None, y_pred=None, labels=None, pos_label=1, average='binary', **kwargs)
- Parameters:
y_true (array-like, optional) – Ground truth values.
y_pred (array-like, optional) – Predicted values.
labels (list, optional) – List of labels to include in the calculation.
pos_label (int or str, optional) – The positive class label for binary classification.
average (str, optional) – Averaging method (‘binary’, ‘micro’, ‘macro’, ‘weighted’, or None).
- Returns:
Jaccard similarity index.
- Return type:
float or dict
- JSS(y_true=None, y_pred=None, labels=None, pos_label=1, average='binary', **kwargs)
- Parameters:
y_true (array-like, optional) – Ground truth values.
y_pred (array-like, optional) – Predicted values.
labels (list, optional) – List of labels to include in the calculation.
pos_label (int or str, optional) – The positive class label for binary classification.
average (str, optional) – Averaging method (‘binary’, ‘micro’, ‘macro’, ‘weighted’, or None).
- Returns:
Jaccard similarity index.
- Return type:
float or dict
- KLDL(y_true=None, y_pred=None, **kwargs)
- Parameters:
y_true (array-like, optional) – Ground truth (correct) target values.
y_pred (array-like, optional) – Predicted probabilities.
- Returns:
Kullback-Leibler divergence loss.
- Return type:
float
- LS(y_true=None, y_pred=None, labels=None, pos_label=1, average='binary', **kwargs)
- Parameters:
y_true (array-like, optional) – Ground truth values.
y_pred (array-like, optional) – Predicted values.
labels (list, optional) – List of labels to include in the calculation.
pos_label (int or str, optional) – The positive class label for binary classification.
average (str, optional) – Averaging method (‘binary’, ‘micro’, ‘macro’, ‘weighted’, or None).
- Returns:
Lift score.
- Return type:
float or dict
- MCC(y_true=None, y_pred=None, labels=None, pos_label=1, average='binary', **kwargs)
- Parameters:
y_true (array-like, optional) – Ground truth values.
y_pred (array-like, optional) – Predicted values.
labels (list, optional) – List of labels to include in the calculation.
pos_label (int or str, optional) – The positive class label for binary classification.
average (str, optional) – Averaging method (‘binary’, ‘micro’, ‘macro’, ‘weighted’, or None).
- Returns:
Matthews correlation coefficient.
- Return type:
float or dict
- NPV(y_true=None, y_pred=None, labels=None, pos_label=1, average='binary', **kwargs)
Calculate the negative predictive value.
- Parameters:
y_true (array-like, optional) – Ground truth values.
y_pred (array-like, optional) – Predicted values.
labels (list, optional) – List of labels to include in the calculation.
pos_label (int or str, optional) – The positive class label for binary classification.
average (str, optional) – Averaging method (‘binary’, ‘micro’, ‘macro’, ‘weighted’, or None).
- Returns:
Negative predictive value.
- Return type:
float or dict
- PS(y_true=None, y_pred=None, labels=None, pos_label=1, average='binary', **kwargs)
- Parameters:
y_true (array-like, optional) – Ground truth values.
y_pred (array-like, optional) – Predicted values.
labels (list, optional) – List of labels to include in the calculation.
pos_label (int or str, optional) – The positive class label for binary classification.
average (str, optional) – Averaging method (‘binary’, ‘micro’, ‘macro’, ‘weighted’, or None).
- Returns:
Precision score.
- Return type:
float or dict
- RAS(y_true=None, y_pred=None, average='macro', **kwargs)
Compute the Area Under the Receiver Operating Characteristic Curve (ROC AUC).
- Parameters:
y_true (array-like, optional) – Ground truth (correct) target values.
y_pred (array-like, optional) – Estimated probabilities or decision function.
average (str, optional) – Averaging method (‘macro’, ‘weighted’, or None).
- Returns:
ROC AUC score.
- Return type:
float or dict
- ROC(y_true=None, y_pred=None, average='macro', **kwargs)
Compute the Area Under the Receiver Operating Characteristic Curve (ROC AUC).
- Parameters:
y_true (array-like, optional) – Ground truth (correct) target values.
y_pred (array-like, optional) – Estimated probabilities or decision function.
average (str, optional) – Averaging method (‘macro’, ‘weighted’, or None).
- Returns:
ROC AUC score.
- Return type:
float or dict
- RS(y_true=None, y_pred=None, labels=None, pos_label=1, average='binary', **kwargs)
- Parameters:
y_true (array-like, optional) – Ground truth values.
y_pred (array-like, optional) – Predicted values.
labels (list, optional) – List of labels to include in the calculation.
pos_label (int or str, optional) – The positive class label for binary classification.
average (str, optional) – Averaging method (‘binary’, ‘micro’, ‘macro’, ‘weighted’, or None).
- Returns:
Recall score.
- Return type:
float or dict
- SS(y_true=None, y_pred=None, labels=None, pos_label=1, average='binary', **kwargs)
Calculate the specificity score.
- Parameters:
y_true (array-like, optional) – Ground truth values.
y_pred (array-like, optional) – Predicted values.
labels (list, optional) – List of labels to include in the calculation.
pos_label (int or str, optional) – The positive class label for binary classification.
average (str, optional) – Averaging method (‘binary’, ‘micro’, ‘macro’, ‘weighted’, or None).
- Returns:
Specificity score.
- Return type:
float or dict
- SUPPORT = {'AS': {'best': '1', 'range': '[0, 1]', 'type': 'max'}, 'AUC': {'best': '1', 'range': '[0, 1]', 'type': 'max'}, 'BSL': {'best': '0', 'range': '[0, 1]', 'type': 'min'}, 'CEL': {'best': '0', 'range': '[0, +inf)', 'type': 'min'}, 'CKS': {'best': '1', 'range': '[-1, +1]', 'type': 'max'}, 'F1S': {'best': '1', 'range': '[0, 1]', 'type': 'max'}, 'F2S': {'best': '1', 'range': '[0, 1]', 'type': 'max'}, 'FBS': {'best': '1', 'range': '[0, 1]', 'type': 'max'}, 'GINI': {'best': '1', 'range': '[-1, 1]', 'type': 'max'}, 'GMS': {'best': '1', 'range': '[0, 1]', 'type': 'max'}, 'HGL': {'best': '0', 'range': '[0, +inf)', 'type': 'min'}, 'HML': {'best': '0', 'range': '[0, 1]', 'type': 'min'}, 'JSI': {'best': '1', 'range': '[0, 1]', 'type': 'max'}, 'JSS': {'best': '1', 'range': '[0, 1]', 'type': 'max'}, 'KLDL': {'best': '0', 'range': '[0, +inf)', 'type': 'min'}, 'LS': {'best': 'unknown', 'range': '[0, +inf)', 'type': 'max'}, 'MCC': {'best': '1', 'range': '[-1, +1]', 'type': 'max'}, 'NPV': {'best': '1', 'range': '[0, 1]', 'type': 'max'}, 'PS': {'best': '1', 'range': '[0, 1]', 'type': 'max'}, 'ROC': {'best': '1', 'range': '[0, 1]', 'type': 'max'}, 'ROC-AUC': {'best': '1', 'range': '[0, 1]', 'type': 'max'}, 'RS': {'best': '1', 'range': '[0, 1]', 'type': 'max'}, 'SS': {'best': '1', 'range': '[0, 1]', 'type': 'max'}}
- accuracy_score(y_true=None, y_pred=None, normalize=True, sample_weight=None, **kwargs)[source]
- Parameters:
y_true (array-like, optional) – Ground truth (correct) target values.
y_pred (array-like, optional) – Estimated target values.
normalize (bool, optional) – If True, return the fraction of correctly classified samples (float). If False, return the number of correctly classified samples (int).
sample_weight (array-like, optional) – Sample weights.
- Returns:
Accuracy score.
- Return type:
float or int
- brier_score_loss(y_true=None, y_pred=None, **kwargs)[source]
- Parameters:
y_true (array-like, optional) – Ground truth (correct) target values.
y_pred (array-like, optional) – Predicted probabilities.
- Returns:
Brier score loss.
- Return type:
float
- cohen_kappa_score(y_true=None, y_pred=None, labels=None, pos_label=1, average='binary', **kwargs)[source]
- Parameters:
y_true (array-like, optional) – Ground truth values.
y_pred (array-like, optional) – Predicted values.
labels (list, optional) – List of labels to include in the calculation.
pos_label (int or str, optional) – The positive class label for binary classification.
average (str, optional) – Averaging method (‘binary’, ‘micro’, ‘macro’, ‘weighted’, or None).
- Returns:
Cohen’s kappa score.
- Return type:
float or dict
- confusion_matrix(y_true=None, y_pred=None, labels=None, normalize=None, **kwargs)[source]
Compute the confusion matrix for classification tasks.
- Parameters:
y_true (array-like) – Ground truth (correct) labels.
y_pred (array-like) – Predicted labels.
labels (list, optional) – Subset of labels to include in the matrix. Default is None.
normalize (str, optional) – Normalization method. One of {“true”, “pred”, “all”}. - “true”: Normalize rows (true labels). - “pred”: Normalize columns (predicted labels). - “all”: Normalize the entire matrix. Default is None (no normalization).
- Returns:
matrix (ndarray): Confusion matrix (normalized if specified).
imap (dict): Mapping of labels to matrix indices.
imap_count (dict): Count of true labels for each class.
- Return type:
tuple
- Raises:
ValueError – If specified labels do not exist in y_true or y_pred.
- crossentropy_loss(y_true=None, y_pred=None, **kwargs)[source]
- Parameters:
y_true (array-like, optional) – Ground truth (correct) target values.
y_pred (array-like, optional) – Predicted probabilities.
- Returns:
Cross-entropy loss.
- Return type:
float
- f1_score(y_true=None, y_pred=None, labels=None, pos_label=1, average='binary', **kwargs)[source]
- Parameters:
y_true (array-like, optional) – Ground truth values.
y_pred (array-like, optional) – Predicted values.
labels (list, optional) – List of labels to include in the calculation.
pos_label (int or str, optional) – The positive class label for binary classification.
average (str, optional) – Averaging method (‘binary’, ‘micro’, ‘macro’, ‘weighted’, or None).
- Returns:
F1 score.
- Return type:
float or dict
- f2_score(y_true=None, y_pred=None, labels=None, pos_label=1, average='binary', **kwargs)[source]
- Parameters:
y_true (array-like, optional) – Ground truth values.
y_pred (array-like, optional) – Predicted values.
labels (list, optional) – List of labels to include in the calculation.
pos_label (int or str, optional) – The positive class label for binary classification.
average (str, optional) – Averaging method (‘binary’, ‘micro’, ‘macro’, ‘weighted’, or None).
- Returns:
F2 score.
- Return type:
float or dict
- fbeta_score(y_true=None, y_pred=None, beta=1.0, labels=None, pos_label=1, average='binary', **kwargs)[source]
- Parameters:
y_true (array-like, optional) – Ground truth values.
y_pred (array-like, optional) – Predicted values.
beta (float, optional) – Weight of recall in the F-beta score.
labels (list, optional) – List of labels to include in the calculation.
pos_label (int or str, optional) – The positive class label for binary classification.
average (str, optional) – Averaging method (‘binary’, ‘micro’, ‘macro’, ‘weighted’, or None).
- Returns:
F-beta score.
- Return type:
float or dict
- g_mean_score(y_true=None, y_pred=None, labels=None, pos_label=1, average='binary', **kwargs)[source]
- Parameters:
y_true (array-like, optional) – Ground truth values.
y_pred (array-like, optional) – Predicted values.
labels (list, optional) – List of labels to include in the calculation.
pos_label (int or str, optional) – The positive class label for binary classification.
average (str, optional) – Averaging method (‘binary’, ‘micro’, ‘macro’, ‘weighted’, or None).
- Returns:
Geometric mean (G-mean) score.
- Return type:
float or dict
- get_processed_data(y_true=None, y_pred=None)[source]
Process and format the input data for evaluation.
- Returns:
y_true used in evaluation process. y_pred_final: y_pred used in evaluation process unique_classes: All unique classes from y_true and y_pred representor: the label is number or string
- Return type:
y_true_final
- get_processed_data2(y_true=None, y_pred=None)[source]
- Returns:
y_true used in evaluation process. y_pred_final: y_pred used in evaluation process binary: is problem binary or multi-class classification representor: the label is number or string
- Return type:
y_true_final
- static get_support(name=None, verbose=True)[source]
Retrieve the support information for a specific metric or all metrics.
- Parameters:
name (str, optional) – Name of the metric to retrieve. Use “all” to retrieve all metrics.
verbose (bool, optional) – Whether to print the metric details.
- Returns:
Support information for the specified metric(s).
- Return type:
dict
- gini_index(y_true=None, y_pred=None, **kwargs)[source]
Compute the Gini index based on the ROC AUC score.
- Parameters:
y_true (array-like, optional) – Ground truth (correct) target values.
y_pred (array-like, optional) – Estimated probabilities or decision function.
- Returns:
Gini index.
- Return type:
float or dict
- hamming_loss(y_true=None, y_pred=None, labels=None, pos_label=1, average='binary', **kwargs)[source]
- Parameters:
y_true (array-like, optional) – Ground truth values.
y_pred (array-like, optional) – Predicted values.
labels (list, optional) – List of labels to include in the calculation.
pos_label (int or str, optional) – The positive class label for binary classification.
average (str, optional) – Averaging method (‘binary’, ‘micro’, ‘macro’, ‘weighted’, or None).
- Returns:
Hamming loss.
- Return type:
float or dict
- hinge_loss(y_true=None, y_pred=None, **kwargs)[source]
- Parameters:
y_true (array-like, optional) – Ground truth (correct) target values.
y_pred (array-like, optional) – Predicted scores.
- Returns:
Hinge loss.
- Return type:
float
- jaccard_similarity_coefficient(y_true=None, y_pred=None, labels=None, pos_label=1, average='binary', **kwargs)
- Parameters:
y_true (array-like, optional) – Ground truth values.
y_pred (array-like, optional) – Predicted values.
labels (list, optional) – List of labels to include in the calculation.
pos_label (int or str, optional) – The positive class label for binary classification.
average (str, optional) – Averaging method (‘binary’, ‘micro’, ‘macro’, ‘weighted’, or None).
- Returns:
Jaccard similarity index.
- Return type:
float or dict
- jaccard_similarity_index(y_true=None, y_pred=None, labels=None, pos_label=1, average='binary', **kwargs)[source]
- Parameters:
y_true (array-like, optional) – Ground truth values.
y_pred (array-like, optional) – Predicted values.
labels (list, optional) – List of labels to include in the calculation.
pos_label (int or str, optional) – The positive class label for binary classification.
average (str, optional) – Averaging method (‘binary’, ‘micro’, ‘macro’, ‘weighted’, or None).
- Returns:
Jaccard similarity index.
- Return type:
float or dict
- jaccard_similarity_score(y_true=None, y_pred=None, labels=None, pos_label=1, average='binary', **kwargs)
- Parameters:
y_true (array-like, optional) – Ground truth values.
y_pred (array-like, optional) – Predicted values.
labels (list, optional) – List of labels to include in the calculation.
pos_label (int or str, optional) – The positive class label for binary classification.
average (str, optional) – Averaging method (‘binary’, ‘micro’, ‘macro’, ‘weighted’, or None).
- Returns:
Jaccard similarity index.
- Return type:
float or dict
- kullback_leibler_divergence_loss(y_true=None, y_pred=None, **kwargs)[source]
- Parameters:
y_true (array-like, optional) – Ground truth (correct) target values.
y_pred (array-like, optional) – Predicted probabilities.
- Returns:
Kullback-Leibler divergence loss.
- Return type:
float
- lift_score(y_true=None, y_pred=None, labels=None, pos_label=1, average='binary', **kwargs)[source]
- Parameters:
y_true (array-like, optional) – Ground truth values.
y_pred (array-like, optional) – Predicted values.
labels (list, optional) – List of labels to include in the calculation.
pos_label (int or str, optional) – The positive class label for binary classification.
average (str, optional) – Averaging method (‘binary’, ‘micro’, ‘macro’, ‘weighted’, or None).
- Returns:
Lift score.
- Return type:
float or dict
- matthews_correlation_coefficient(y_true=None, y_pred=None, labels=None, pos_label=1, average='binary', **kwargs)[source]
- Parameters:
y_true (array-like, optional) – Ground truth values.
y_pred (array-like, optional) – Predicted values.
labels (list, optional) – List of labels to include in the calculation.
pos_label (int or str, optional) – The positive class label for binary classification.
average (str, optional) – Averaging method (‘binary’, ‘micro’, ‘macro’, ‘weighted’, or None).
- Returns:
Matthews correlation coefficient.
- Return type:
float or dict
- negative_predictive_value(y_true=None, y_pred=None, labels=None, pos_label=1, average='binary', **kwargs)[source]
Calculate the negative predictive value.
- Parameters:
y_true (array-like, optional) – Ground truth values.
y_pred (array-like, optional) – Predicted values.
labels (list, optional) – List of labels to include in the calculation.
pos_label (int or str, optional) – The positive class label for binary classification.
average (str, optional) – Averaging method (‘binary’, ‘micro’, ‘macro’, ‘weighted’, or None).
- Returns:
Negative predictive value.
- Return type:
float or dict
- precision_score(y_true=None, y_pred=None, labels=None, pos_label=1, average='binary', **kwargs)[source]
- Parameters:
y_true (array-like, optional) – Ground truth values.
y_pred (array-like, optional) – Predicted values.
labels (list, optional) – List of labels to include in the calculation.
pos_label (int or str, optional) – The positive class label for binary classification.
average (str, optional) – Averaging method (‘binary’, ‘micro’, ‘macro’, ‘weighted’, or None).
- Returns:
Precision score.
- Return type:
float or dict
- recall_score(y_true=None, y_pred=None, labels=None, pos_label=1, average='binary', **kwargs)[source]
- Parameters:
y_true (array-like, optional) – Ground truth values.
y_pred (array-like, optional) – Predicted values.
labels (list, optional) – List of labels to include in the calculation.
pos_label (int or str, optional) – The positive class label for binary classification.
average (str, optional) – Averaging method (‘binary’, ‘micro’, ‘macro’, ‘weighted’, or None).
- Returns:
Recall score.
- Return type:
float or dict
- roc_auc_score(y_true=None, y_pred=None, average='macro', **kwargs)[source]
Compute the Area Under the Receiver Operating Characteristic Curve (ROC AUC).
- Parameters:
y_true (array-like, optional) – Ground truth (correct) target values.
y_pred (array-like, optional) – Estimated probabilities or decision function.
average (str, optional) – Averaging method (‘macro’, ‘weighted’, or None).
- Returns:
ROC AUC score.
- Return type:
float or dict
- specificity_score(y_true=None, y_pred=None, labels=None, pos_label=1, average='binary', **kwargs)[source]
Calculate the specificity score.
- Parameters:
y_true (array-like, optional) – Ground truth values.
y_pred (array-like, optional) – Predicted values.
labels (list, optional) – List of labels to include in the calculation.
pos_label (int or str, optional) – The positive class label for binary classification.
average (str, optional) – Averaging method (‘binary’, ‘micro’, ‘macro’, ‘weighted’, or None).
- Returns:
Specificity score.
- Return type:
float or dict
permetrics.clustering module
- class permetrics.clustering.ClusteringMetric(y_true=None, y_pred=None, X=None, force_finite=True, finite_value=None, **kwargs)[source]
Bases:
EvaluatorDefines a ClusteringMetric class that holds all internal and external metrics for clustering problems.
This class provides a unified interface to compute a wide range of clustering validation indexes (CVIs), including both internal metrics (requiring only data features $X$ and predicted labels) and external metrics (requiring ground truth labels).
- Parameters:
y_true (tuple, list, or np.ndarray, default=None) – The ground truth class labels. Used for calculating external validation metrics.
y_pred (tuple, list, or np.ndarray, default=None) – The predicted cluster labels. Used for both internal and external metrics.
X (tuple, list, or np.ndarray, default=None) – The input feature matrix/dataset of shape (n_samples, n_features). Required for internal validation metrics.
force_finite (bool, default=True) – If True, non-finite values (such as NaN or Inf) resulting from undefined mathematical operations (e.g., division by zero) will be replaced by finite_value.
finite_value (float, default=None) – The specific fallback value used to replace infinite or NaN results when force_finite is True.
- X
Stored feature dataset.
- Type:
np.ndarray or None
- le
The label encoder instance used during internal data formatting.
- Type:
LabelEncoder or None
- force_finite
Flag indicating whether to replace non-finite metrics.
- Type:
bool
- finite_value
Fallback value for non-finite metrics.
- Type:
float or None
- ARS(y_true=None, y_pred=None, **kwargs)
Computes the Adjusted rand score
- Parameters:
y_true (array-like) – The true labels for each sample.
y_pred (array-like) – The predicted cluster labels for each sample.
- Returns:
The Adjusted rand score
- Return type:
result (float)
- BHI(X=None, y_pred=None, **kwargs)
The Ball-Hall Index (1995) is the mean of the mean dispersion across all clusters. The largest difference between successive clustering levels indicates the optimal number of clusters. Smaller is better (Best = 0), Range=[0, +inf)
- Parameters:
X (array-like of shape (n_samples, n_features)) – A list of n_features-dimensional data points. Each row corresponds to a single data point.
y_pred (array-like of shape (n_samples,)) – Predicted labels for each sample.
- Returns:
The Ball-Hall index
- Return type:
result (float)
- BI(X=None, y_pred=None, force_finite=True, finite_value=10000000000.0, **kwarg)
Computes the Beale Index
- Parameters:
X (array-like of shape (n_samples, n_features)) – A list of n_features-dimensional data points. Each row corresponds to a single data point.
y_pred (array-like of shape (n_samples,)) – Predicted labels for each sample.
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The Beale Index
- Return type:
result (float)
- BRI(X=None, y_pred=None, force_finite=True, finite_value=10000000000.0, **kwargs)
Computes the Banfeld-Raftery Index.
- Parameters:
X (array-like of shape (n_samples, n_features)) – A list of n_features-dimensional data points. Each row corresponds to a single data point.
y_pred (array-like of shape (n_samples,)) – Predicted labels for each sample.
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The Banfeld-Raftery Index
- Return type:
result (float)
- CDS(y_true=None, y_pred=None, force_finite=True, finite_value=0.0, **kwargs)
Computes the Czekanowski-Dice score
- Parameters:
y_true (array-like) – The true labels for each sample.
y_pred (array-like) – The predicted cluster labels for each sample.
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The Czekanowski-Dice score
- Return type:
result (float)
- CHI(X=None, y_pred=None, force_finite=True, finite_value=0.0, **kwargs)
Compute the Calinski and Harabasz (1974) index. It is also known as the Variance Ratio Criterion. The score is defined as ratio between the within-cluster dispersion and the between-cluster dispersion.
- Parameters:
X (array-like of shape (n_samples, n_features)) – A list of n_features-dimensional data points. Each row corresponds to a single data point.
y_pred (array-like of shape (n_samples,)) – Predicted labels for each sample.
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The resulting Calinski-Harabasz index.
- Return type:
result (float)
- CS(y_true=None, y_pred=None, force_finite=True, finite_value=0.0, **kwargs)
Computes the Completeness Score
- Parameters:
y_true (array-like) – The true labels for each sample.
y_pred (array-like) – The predicted cluster labels for each sample.
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The completeness score.
- Return type:
result (float)
- DBCVI(X=None, y_pred=None, force_finite=True, finite_value=0.0, return_type='global', **kwarg)
Computes the Density-based Clustering Validation Index
- Parameters:
X (array-like of shape (n_samples, n_features)) – A list of n_features-dimensional data points. Each row corresponds to a single data point.
y_pred (array-like of shape (n_samples,)) – Predicted labels for each sample.
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
return_type (str) – The output type. Can be “global”, “per-cluster”, or “both”. Default is “global”.
- Returns:
If “global”: Returns the overall DBCV score (float).
If “per-cluster”: Returns a dictionary mapping valid cluster labels to their individual validity scores.
If “both”: Returns a tuple (global_score, per_cluster_dict).
- Return type:
float or dict or tuple
- DBI(X=None, y_pred=None, force_finite=True, finite_value=10000000000.0, **kwargs)
Computes the Davies-Bouldin index
- Parameters:
X (array-like of shape (n_samples, n_features)) – A list of n_features-dimensional data points. Each row corresponds to a single data point.
y_pred (array-like of shape (n_samples,)) – Predicted labels for each sample.
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The Davies-Bouldin index
- Return type:
result (float)
- DHI(X=None, y_pred=None, chunk_size=5000, force_finite=True, finite_value=10000000000.0, **kwargs)
Computes the Duda Index or Duda-Hart index
- Parameters:
X (array-like of shape (n_samples, n_features)) – A list of n_features-dimensional data points. Each row corresponds to a single data point.
y_pred (array-like of shape (n_samples,)) – Predicted labels for each sample.
chunk_size (int) – Split original data to chunk_size to avoid OOM problem
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The Duda-Hart index
- Return type:
result (float)
- DI(X=None, y_pred=None, use_modified=True, force_finite=True, finite_value=0.0, **kwargs)
Computes the Dunn Index
- Parameters:
X (array-like of shape (n_samples, n_features)) – A list of n_features-dimensional data points. Each row corresponds to a single data point.
y_pred (array-like of shape (n_samples,)) – Predicted labels for each sample.
use_modified (bool) – The modified version we proposed to speed up the computational time for this metric, default=True
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The Dunn Index
- Return type:
result (float)
- DRI(X=None, y_pred=None, force_finite=True, finite_value=0.0, **kwargs)
Computes the Det-Ratio index
- Parameters:
X (array-like of shape (n_samples, n_features)) – A list of n_features-dimensional data points. Each row corresponds to a single data point.
y_pred (array-like of shape (n_samples,)) – Predicted labels for each sample.
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The Det-Ratio index
- Return type:
result (float)
- EnS(y_true=None, y_pred=None, **kwargs)
Computes the Entropy score
- Parameters:
y_true (array-like) – The true labels for each sample.
y_pred (array-like) – The predicted cluster labels for each sample.
- Returns:
The Entropy score
- Return type:
result (float)
- FMS(y_true=None, y_pred=None, force_finite=True, finite_value=0.0, **kwargs)
Computes the Fowlkes-Mallows score
- Parameters:
y_true (array-like) – The true labels for each sample.
y_pred (array-like) – The predicted cluster labels for each sample.
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The Fowlkes-Mallows score
- Return type:
result (float)
- FS(y_true=None, y_pred=None, beta=1.0, force_finite=True, finite_value=0.0, **kwargs)
Computes the F-Measure score
- Parameters:
y_true (array-like) – The true labels for each sample.
y_pred (array-like) – The predicted cluster labels for each sample.
beta (float) – The weight parameter, default = 1.0
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The F-Measure score
- Return type:
result (float)
- GAS(y_true=None, y_pred=None, force_finite=True, finite_value=0.0, **kwargs)
Computes the Gamma Score
- Parameters:
y_true (array-like) – The true labels for each sample.
y_pred (array-like) – The predicted cluster labels for each sample.
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The Gamma Score
- Return type:
result (float)
- GPS(y_true=None, y_pred=None, force_finite=True, finite_value=0.0, **kwargs)
Computes the Gplus Score
- Parameters:
y_true (array-like) – The true labels for each sample.
y_pred (array-like) – The predicted cluster labels for each sample.
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The Gplus Score
- Return type:
result (float)
- HGS(y_true=None, y_pred=None, force_finite=True, finite_value=0.0, **kwargs)
Computes the Hubert Gamma score
- Parameters:
y_true (array-like) – The true labels for each sample.
y_pred (array-like) – The predicted cluster labels for each sample.
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The Hubert Gamma score
- Return type:
result (float)
- HI(X=None, y_pred=None, force_finite=True, finite_value=10000000000.0, **kwarg)
Computes the Hartigan index for a clustering solution.
- Parameters:
X (array-like of shape (n_samples, n_features)) – A list of n_features-dimensional data points. Each row corresponds to a single data point.
y_pred (array-like of shape (n_samples,)) – Predicted labels for each sample.
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The Hartigan index
- Return type:
result (float)
- HS(y_true=None, y_pred=None, force_finite=True, finite_value=0.0, **kwargs)
Computes the Homogeneity Score
It measures the extent to which each cluster contains only data points that belong to a single class or category. In other words, homogeneity assesses whether all the data points in a cluster are members of the same true class or label. A higher homogeneity score indicates better clustering results, where each cluster corresponds well to a single ground truth class.
- Parameters:
y_true (array-like) – The true labels for each sample.
y_pred (array-like) – The predicted cluster labels for each sample.
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The Homogeneity Score
- Return type:
result (float)
- JS(y_true=None, y_pred=None, force_finite=True, finite_value=0.0, **kwargs)
Computes the Jaccard score
- Parameters:
y_true (array-like) – The true labels for each sample.
y_pred (array-like) – The predicted cluster labels for each sample.
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The Jaccard score
- Return type:
result (float)
- KDI(X=None, y_pred=None, use_normalized=True, **kwargs)
Computes the Ksq-DetW Index
- Parameters:
X (array-like of shape (n_samples, n_features)) – A list of n_features-dimensional data points. Each row corresponds to a single data point.
y_pred (array-like of shape (n_samples,)) – Predicted labels for each sample.
use_normalized (bool) – We normalize the scatter matrix before calculate the Det to reduce the value, default=True
- Returns:
The Ksq-DetW Index
- Return type:
result (float)
- KS(y_true=None, y_pred=None, force_finite=True, finite_value=0.0, **kwargs)
Computes the Kulczynski Score
- Parameters:
y_true (array-like) – The true labels for each sample.
y_pred (array-like) – The predicted cluster labels for each sample.
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The Kulczynski score
- Return type:
result (float)
- LDRI(X=None, y_pred=None, force_finite=True, finite_value=-10000000000.0, **kwargs)
Computes the Log Det Ratio Index
- Parameters:
X (array-like of shape (n_samples, n_features)) – A list of n_features-dimensional data points. Each row corresponds to a single data point.
y_pred (array-like of shape (n_samples,)) – Predicted labels for each sample.
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The Log Det Ratio Index
- Return type:
result (float)
- LSRI(X=None, y_pred=None, force_finite=True, finite_value=-10000000000.0, **kwargs)
Computes the Log SS Ratio Index (LSRI).
- Parameters:
X (array-like of shape (n_samples, n_features)) – A list of n_features-dimensional data points. Each row corresponds to a single data point.
y_pred (array-like of shape (n_samples,)) – Predicted labels for each sample.
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The Log SS Ratio Index
- Return type:
result (float)
- MIS(y_true=None, y_pred=None, **kwargs)
Computes the Mutual Information score.
- Parameters:
y_true (array-like) – The true labels for each sample.
y_pred (array-like) – The predicted cluster labels for each sample.
- Returns:
The Mutual Information score
- Return type:
result (float)
- MNS(y_true=None, y_pred=None, force_finite=True, finite_value=0.0, **kwargs)
Computes the Mc Nemar score
- Parameters:
y_true (array-like) – The true labels for each sample.
y_pred (array-like) – The predicted cluster labels for each sample.
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The Mc Nemar score
- Return type:
result (float)
- MSEI(X=None, y_pred=None, **kwarg)
Computes the Mean Squared Error Index MSEI measures the mean of squared distances between each data point and its corresponding centroid or cluster center.
- Parameters:
X (array-like of shape (n_samples, n_features)) – A list of n_features-dimensional data points. Each row corresponds to a single data point.
y_pred (array-like of shape (n_samples,)) – Predicted labels for each sample.
- Returns:
The Mean Squared Error Index
- Return type:
result (float)
- NMIS(y_true=None, y_pred=None, force_finite=True, finite_value=0.0, **kwargs)
Computes the normalized mutual information It is a variation of the mutual information score that normalizes the result to take values between 0 and 1. It is defined as the mutual information divided by the average entropy of the true and predicted clusterings. Bigger is better (Best = 1), Range = [0, 1]
- Parameters:
y_true (array-like) – The true labels for each sample.
y_pred (array-like) – The predicted cluster labels for each sample.
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The normalized mutual information score.
- Return type:
result (float)
- PhS(y_true=None, y_pred=None, force_finite=True, finite_value=0.0, **kwargs)
Computes the Phi score
- Parameters:
y_true (array-like) – The true labels for each sample.
y_pred (array-like) – The predicted cluster labels for each sample.
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The Phi score
- Return type:
result (float)
- PrS(y_true=None, y_pred=None, force_finite=True, finite_value=0.0, **kwargs)
Computes the Precision Score
- Parameters:
y_true (array-like) – The true labels for each sample.
y_pred (array-like) – The predicted cluster labels for each sample.
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The Precision score
- Return type:
result (float)
- PuS(y_true=None, y_pred=None, **kwargs)
Computes the Purity score
- Parameters:
y_true (array-like) – The true labels for each sample.
y_pred (array-like) – The predicted cluster labels for each sample.
- Returns:
The Purity score
- Return type:
result (float)
- RRS(y_true=None, y_pred=None, force_finite=True, finite_value=0.0, **kwargs)
Computes the Russel-Rao score
- Parameters:
y_true (array-like) – The true labels for each sample.
y_pred (array-like) – The predicted cluster labels for each sample.
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The Russel-Rao score
- Return type:
result (float)
- RSI(X=None, y_pred=None, **kwarg)
Computes the R-squared index
- Parameters:
X (array-like of shape (n_samples, n_features)) – A list of n_features-dimensional data points. Each row corresponds to a single data point.
y_pred (array-like of shape (n_samples,)) – Predicted labels for each sample.
- Returns:
The R-squared index
- Return type:
result (float)
- RTS(y_true=None, y_pred=None, force_finite=True, finite_value=0.0, **kwargs)
Computes the Rogers-Tanimoto score
- Parameters:
y_true (array-like) – The true labels for each sample.
y_pred (array-like) – The predicted cluster labels for each sample.
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The Rogers-Tanimoto score
- Return type:
result (float)
- RaS(y_true=None, y_pred=None, **kwargs)
Computes the Rand score
- Parameters:
y_true (array-like) – The true labels for each sample.
y_pred (array-like) – The predicted cluster labels for each sample.
- Returns:
The rand score.
- Return type:
result (float)
- ReS(y_true=None, y_pred=None, force_finite=True, finite_value=0.0, **kwargs)
Computes the Recall Score
- Parameters:
y_true (array-like) – The true labels for each sample.
y_pred (array-like) – The predicted cluster labels for each sample.
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The Recall score
- Return type:
result (float)
- SI(X=None, y_pred=None, multi_output=False, force_finite=True, finite_value=-1.0, chunk_size=5000, **kwargs)
Computes the Silhouette Index
- Parameters:
X (array-like of shape (n_samples, n_features)) – A list of n_features-dimensional data points. Each row corresponds to a single data point.
y_pred (array-like of shape (n_samples,)) – Predicted labels for each sample.
multi_output (bool) – Returned scores for each cluster, default=False
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
chunk_size (int) – Split original data to chunk_size to avoid OOM problem
- Returns:
The Silhouette Index
- Return type:
result (float)
- SS1S(y_true=None, y_pred=None, force_finite=True, finite_value=0.0, **kwargs)
Computes the Sokal-Sneath 1 score
- Parameters:
y_true (array-like) – The true labels for each sample.
y_pred (array-like) – The predicted cluster labels for each sample.
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The Sokal-Sneath 1 score
- Return type:
result (float)
- SS2S(y_true=None, y_pred=None, force_finite=True, finite_value=0.0, **kwargs)
Computes the Sokal-Sneath 2 score
- Parameters:
y_true (array-like) – The true labels for each sample.
y_pred (array-like) – The predicted cluster labels for each sample.
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The Sokal-Sneath 2 score
- Return type:
result (float)
- SSEI(X=None, y_pred=None, **kwarg)
Computes the Sum of Squared Error Index
- Parameters:
X (array-like of shape (n_samples, n_features)) – A list of n_features-dimensional data points. Each row corresponds to a single data point.
y_pred (array-like of shape (n_samples,)) – Predicted labels for each sample.
- Returns:
The Sum of Squared Error Index
- Return type:
result (float)
- SUPPORT = {'ARS': {'best': '1', 'range': '[-1, 1]', 'type': 'max'}, 'BHI': {'best': '0', 'range': '[0, +inf)', 'type': 'min'}, 'BI': {'best': '0', 'range': '[0, +inf)', 'type': 'min'}, 'BRI': {'best': 'unknown', 'range': '(-inf, +inf)', 'type': 'min'}, 'CDS': {'best': '1', 'range': '[0, 1]', 'type': 'max'}, 'CHI': {'best': 'unknown', 'range': '[0, +inf)', 'type': 'max'}, 'CS': {'best': '1', 'range': '[0, 1]', 'type': 'max'}, 'DBCVI': {'best': '1', 'range': '[-1, 1]', 'type': 'max'}, 'DBI': {'best': '0', 'range': '[0, +inf)', 'type': 'min'}, 'DHI': {'best': '0', 'range': '[0, +inf)', 'type': 'min'}, 'DI': {'best': 'unknown', 'range': '[0, +inf)', 'type': 'max'}, 'DRI': {'best': 'unknown', 'range': '[1, +inf)', 'type': 'max'}, 'EnS': {'best': '0', 'range': '[0, +inf)', 'type': 'min'}, 'FMS': {'best': '1', 'range': '[0, 1]', 'type': 'max'}, 'FS': {'best': '1', 'range': '[0, 1]', 'type': 'max'}, 'GAS': {'best': '1', 'range': '[-1, 1]', 'type': 'max'}, 'GPS': {'best': '0', 'range': '[0, 1]', 'type': 'min'}, 'HGS': {'best': '1', 'range': '[-1, 1]', 'type': 'max'}, 'HI': {'best': '0', 'range': '[0, +inf)', 'type': 'min'}, 'HS': {'best': '1', 'range': '[0, 1]', 'type': 'max'}, 'JS': {'best': '1', 'range': '[0, 1]', 'type': 'max'}, 'KDI': {'best': 'unknown', 'range': '(-inf, +inf)', 'type': 'max'}, 'KS': {'best': '1', 'range': '[0, 1]', 'type': 'max'}, 'LDRI': {'best': 'unknown', 'range': '(-inf, +inf)', 'type': 'max'}, 'LSRI': {'best': 'unknown', 'range': '(-inf, +inf)', 'type': 'max'}, 'MIS': {'best': 'unknown', 'range': '[0, +inf)', 'type': 'max'}, 'MNS': {'best': 'unknown', 'range': '(-inf, +inf)', 'type': 'max'}, 'MSEI': {'best': '0', 'range': '[0, +inf)', 'type': 'min'}, 'NMIS': {'best': '1', 'range': '[0, 1]', 'type': 'max'}, 'PhS': {'best': '1', 'range': '[-1, 1]', 'type': 'max'}, 'PrS': {'best': '1', 'range': '[0, 1]', 'type': 'max'}, 'PuS': {'best': '1', 'range': '[0, 1]', 'type': 'max'}, 'RRS': {'best': '1', 'range': '[0, 1]', 'type': 'max'}, 'RSI': {'best': '1', 'range': '[0, 1]', 'type': 'max'}, 'RTS': {'best': '1', 'range': '[0, 1]', 'type': 'max'}, 'RaS': {'best': '1', 'range': '[0, 1]', 'type': 'max'}, 'ReS': {'best': '1', 'range': '[0, 1]', 'type': 'max'}, 'SI': {'best': '1', 'range': '[-1, 1]', 'type': 'max'}, 'SS1S': {'best': '1', 'range': '[0, 1]', 'type': 'max'}, 'SS2S': {'best': '1', 'range': '[0, 1]', 'type': 'max'}, 'SSEI': {'best': '0', 'range': '[0, +inf)', 'type': 'min'}, 'TauS': {'best': '1', 'range': '[-1, 1]', 'type': 'max'}, 'VMS': {'best': '1', 'range': '[0, 1]', 'type': 'max'}, 'XBI': {'best': '0', 'range': '[0, +inf)', 'type': 'min'}}
- TauS(y_true=None, y_pred=None, force_finite=True, finite_value=0.0, **kwargs)
Computes the Tau Score
- Parameters:
y_true (array-like) – The true labels for each sample.
y_pred (array-like) – The predicted cluster labels for each sample.
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The Tau Score
- Return type:
result (float)
- VMS(y_true=None, y_pred=None, beta=1.0, force_finite=True, finite_value=0.0, **kwargs)
Computes the V Measure Score
It is a combination of two other metrics: homogeneity and completeness. Homogeneity measures whether all the data points in a given cluster belong to the same class. Completeness measures whether all the data points of a certain class are assigned to the same cluster. The V-measure combines these two metrics into a single score.
- Parameters:
y_true (array-like) – The true labels for each sample.
y_pred (array-like) – The predicted cluster labels for each sample.
beta (float) – The weight parameter, default = 1.0
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The V measure score
- Return type:
result (float)
- XBI(X=None, y_pred=None, force_finite=True, finite_value=10000000000.0, **kwargs)
Computes the Xie-Beni index.
The Xie-Beni index is an index of fuzzy clustering, but it is also applicable to crisp clustering. The numerator is the mean of the squared distances of all of the points with respect to their barycenter of the cluster they belong to. The denominator is the minimal squared distances between the points in the clusters. The minimum value indicates the best number of clusters.
- Parameters:
X (array-like of shape (n_samples, n_features)) – A list of n_features-dimensional data points. Each row corresponds to a single data point.
y_pred (array-like of shape (n_samples,)) – Predicted labels for each sample.
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The Xie-Beni index
- Return type:
result (float)
- adjusted_rand_score(y_true=None, y_pred=None, **kwargs)[source]
Computes the Adjusted rand score
- Parameters:
y_true (array-like) – The true labels for each sample.
y_pred (array-like) – The predicted cluster labels for each sample.
- Returns:
The Adjusted rand score
- Return type:
result (float)
- ball_hall_index(X=None, y_pred=None, **kwargs)[source]
The Ball-Hall Index (1995) is the mean of the mean dispersion across all clusters. The largest difference between successive clustering levels indicates the optimal number of clusters. Smaller is better (Best = 0), Range=[0, +inf)
- Parameters:
X (array-like of shape (n_samples, n_features)) – A list of n_features-dimensional data points. Each row corresponds to a single data point.
y_pred (array-like of shape (n_samples,)) – Predicted labels for each sample.
- Returns:
The Ball-Hall index
- Return type:
result (float)
- banfeld_raftery_index(X=None, y_pred=None, force_finite=True, finite_value=10000000000.0, **kwargs)[source]
Computes the Banfeld-Raftery Index.
- Parameters:
X (array-like of shape (n_samples, n_features)) – A list of n_features-dimensional data points. Each row corresponds to a single data point.
y_pred (array-like of shape (n_samples,)) – Predicted labels for each sample.
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The Banfeld-Raftery Index
- Return type:
result (float)
- beale_index(X=None, y_pred=None, force_finite=True, finite_value=10000000000.0, **kwarg)[source]
Computes the Beale Index
- Parameters:
X (array-like of shape (n_samples, n_features)) – A list of n_features-dimensional data points. Each row corresponds to a single data point.
y_pred (array-like of shape (n_samples,)) – Predicted labels for each sample.
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The Beale Index
- Return type:
result (float)
- calinski_harabasz_index(X=None, y_pred=None, force_finite=True, finite_value=0.0, **kwargs)[source]
Compute the Calinski and Harabasz (1974) index. It is also known as the Variance Ratio Criterion. The score is defined as ratio between the within-cluster dispersion and the between-cluster dispersion.
- Parameters:
X (array-like of shape (n_samples, n_features)) – A list of n_features-dimensional data points. Each row corresponds to a single data point.
y_pred (array-like of shape (n_samples,)) – Predicted labels for each sample.
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The resulting Calinski-Harabasz index.
- Return type:
result (float)
- check_X(X)[source]
Validate the structural properties of the feature matrix array X.
- Parameters:
X (tuple, list, or np.ndarray, optional) – The input features. If None, uses the instance attribute self.X.
- Returns:
The validated 2D NumPy array representation of the dataset.
- Return type:
np.ndarray
- Raises:
ValueError – If the feature data is missing, not exactly 2-dimensional, or is empty.
- completeness_score(y_true=None, y_pred=None, force_finite=True, finite_value=0.0, **kwargs)[source]
Computes the Completeness Score
- Parameters:
y_true (array-like) – The true labels for each sample.
y_pred (array-like) – The predicted cluster labels for each sample.
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The completeness score.
- Return type:
result (float)
- czekanowski_dice_score(y_true=None, y_pred=None, force_finite=True, finite_value=0.0, **kwargs)[source]
Computes the Czekanowski-Dice score
- Parameters:
y_true (array-like) – The true labels for each sample.
y_pred (array-like) – The predicted cluster labels for each sample.
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The Czekanowski-Dice score
- Return type:
result (float)
- davies_bouldin_index(X=None, y_pred=None, force_finite=True, finite_value=10000000000.0, **kwargs)[source]
Computes the Davies-Bouldin index
- Parameters:
X (array-like of shape (n_samples, n_features)) – A list of n_features-dimensional data points. Each row corresponds to a single data point.
y_pred (array-like of shape (n_samples,)) – Predicted labels for each sample.
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The Davies-Bouldin index
- Return type:
result (float)
- density_based_clustering_validation_index(X=None, y_pred=None, force_finite=True, finite_value=0.0, return_type='global', **kwarg)[source]
Computes the Density-based Clustering Validation Index
- Parameters:
X (array-like of shape (n_samples, n_features)) – A list of n_features-dimensional data points. Each row corresponds to a single data point.
y_pred (array-like of shape (n_samples,)) – Predicted labels for each sample.
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
return_type (str) – The output type. Can be “global”, “per-cluster”, or “both”. Default is “global”.
- Returns:
If “global”: Returns the overall DBCV score (float).
If “per-cluster”: Returns a dictionary mapping valid cluster labels to their individual validity scores.
If “both”: Returns a tuple (global_score, per_cluster_dict).
- Return type:
float or dict or tuple
- det_ratio_index(X=None, y_pred=None, force_finite=True, finite_value=0.0, **kwargs)[source]
Computes the Det-Ratio index
- Parameters:
X (array-like of shape (n_samples, n_features)) – A list of n_features-dimensional data points. Each row corresponds to a single data point.
y_pred (array-like of shape (n_samples,)) – Predicted labels for each sample.
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The Det-Ratio index
- Return type:
result (float)
- duda_hart_index(X=None, y_pred=None, chunk_size=5000, force_finite=True, finite_value=10000000000.0, **kwargs)[source]
Computes the Duda Index or Duda-Hart index
- Parameters:
X (array-like of shape (n_samples, n_features)) – A list of n_features-dimensional data points. Each row corresponds to a single data point.
y_pred (array-like of shape (n_samples,)) – Predicted labels for each sample.
chunk_size (int) – Split original data to chunk_size to avoid OOM problem
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The Duda-Hart index
- Return type:
result (float)
- dunn_index(X=None, y_pred=None, use_modified=True, force_finite=True, finite_value=0.0, **kwargs)[source]
Computes the Dunn Index
- Parameters:
X (array-like of shape (n_samples, n_features)) – A list of n_features-dimensional data points. Each row corresponds to a single data point.
y_pred (array-like of shape (n_samples,)) – Predicted labels for each sample.
use_modified (bool) – The modified version we proposed to speed up the computational time for this metric, default=True
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The Dunn Index
- Return type:
result (float)
- entropy_score(y_true=None, y_pred=None, **kwargs)[source]
Computes the Entropy score
- Parameters:
y_true (array-like) – The true labels for each sample.
y_pred (array-like) – The predicted cluster labels for each sample.
- Returns:
The Entropy score
- Return type:
result (float)
- f_measure_score(y_true=None, y_pred=None, beta=1.0, force_finite=True, finite_value=0.0, **kwargs)[source]
Computes the F-Measure score
- Parameters:
y_true (array-like) – The true labels for each sample.
y_pred (array-like) – The predicted cluster labels for each sample.
beta (float) – The weight parameter, default = 1.0
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The F-Measure score
- Return type:
result (float)
- fowlkes_mallows_score(y_true=None, y_pred=None, force_finite=True, finite_value=0.0, **kwargs)[source]
Computes the Fowlkes-Mallows score
- Parameters:
y_true (array-like) – The true labels for each sample.
y_pred (array-like) – The predicted cluster labels for each sample.
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The Fowlkes-Mallows score
- Return type:
result (float)
- gamma_score(y_true=None, y_pred=None, force_finite=True, finite_value=0.0, **kwargs)[source]
Computes the Gamma Score
- Parameters:
y_true (array-like) – The true labels for each sample.
y_pred (array-like) – The predicted cluster labels for each sample.
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The Gamma Score
- Return type:
result (float)
- get_processed_external_data(y_true=None, y_pred=None, force_finite=None, finite_value=None)[source]
Validate, prioritize, and format the ground truth and predicted labels for external metrics.
- Parameters:
y_true (tuple, list, or np.ndarray, optional) – The ground truth class values. If None, uses the instance attribute self.y_true.
y_pred (tuple, list, or np.ndarray, optional) – The predicted cluster labels. If None, uses the instance attribute self.y_pred.
force_finite (bool, optional) – Override for the force_finite configuration.
finite_value (float, optional) – Override for the finite_value fallback configuration.
- Returns:
A tuple containing: - y_true_final (np.ndarray): Formatted ground truth labels. - y_pred_final (np.ndarray): Formatted predicted cluster labels. - le (LabelEncoder): The label encoder instance mapping target values. - force_finite (bool): The final finite-forcing flag applied. - finite_value (float or None): The final fallback value applied.
- Return type:
tuple
- Raises:
ValueError – If either y_true or y_pred is unavailable.
- get_processed_internal_data(y_pred=None, force_finite=None, finite_value=None)[source]
Validate, prioritize, and format predicted labels for internal metrics.
- Parameters:
y_pred (tuple, list, or np.ndarray, optional) – The predicted cluster labels. If None, uses the instance attribute self.y_pred.
force_finite (bool, optional) – Override for the force_finite configuration.
finite_value (float, optional) – Override for the finite_value fallback configuration.
- Returns:
A tuple containing: - y_pred_final (np.ndarray): Formatted predicted cluster labels. - le (LabelEncoder): The label encoder instance. - force_finite (bool): The final finite-forcing flag applied. - finite_value (float or None): The final fallback value applied.
- Return type:
tuple
- Raises:
ValueError – If y_pred is unavailable.
- static get_support(name=None, verbose=True)[source]
Get metadata support information for a specific metric or all metrics.
- Parameters:
name (str, default=None) – The abbreviation of the metric (e.g., ‘DBCVI’, ‘SI’). If ‘all’, returns information for all supported metrics.
verbose (bool, default=True) – Whether to print the metric details directly to the console.
- Returns:
A dictionary containing properties (‘type’, ‘range’, ‘best’) of the requested metric(s).
- Return type:
dict
- Raises:
ValueError – If the metric name is not supported by the class.
- gplus_score(y_true=None, y_pred=None, force_finite=True, finite_value=0.0, **kwargs)[source]
Computes the Gplus Score
- Parameters:
y_true (array-like) – The true labels for each sample.
y_pred (array-like) – The predicted cluster labels for each sample.
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The Gplus Score
- Return type:
result (float)
- hartigan_index(X=None, y_pred=None, force_finite=True, finite_value=10000000000.0, **kwarg)[source]
Computes the Hartigan index for a clustering solution.
- Parameters:
X (array-like of shape (n_samples, n_features)) – A list of n_features-dimensional data points. Each row corresponds to a single data point.
y_pred (array-like of shape (n_samples,)) – Predicted labels for each sample.
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The Hartigan index
- Return type:
result (float)
- homogeneity_score(y_true=None, y_pred=None, force_finite=True, finite_value=0.0, **kwargs)[source]
Computes the Homogeneity Score
It measures the extent to which each cluster contains only data points that belong to a single class or category. In other words, homogeneity assesses whether all the data points in a cluster are members of the same true class or label. A higher homogeneity score indicates better clustering results, where each cluster corresponds well to a single ground truth class.
- Parameters:
y_true (array-like) – The true labels for each sample.
y_pred (array-like) – The predicted cluster labels for each sample.
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The Homogeneity Score
- Return type:
result (float)
- hubert_gamma_score(y_true=None, y_pred=None, force_finite=True, finite_value=0.0, **kwargs)[source]
Computes the Hubert Gamma score
- Parameters:
y_true (array-like) – The true labels for each sample.
y_pred (array-like) – The predicted cluster labels for each sample.
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The Hubert Gamma score
- Return type:
result (float)
- jaccard_score(y_true=None, y_pred=None, force_finite=True, finite_value=0.0, **kwargs)[source]
Computes the Jaccard score
- Parameters:
y_true (array-like) – The true labels for each sample.
y_pred (array-like) – The predicted cluster labels for each sample.
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The Jaccard score
- Return type:
result (float)
- ksq_detw_index(X=None, y_pred=None, use_normalized=True, **kwargs)[source]
Computes the Ksq-DetW Index
- Parameters:
X (array-like of shape (n_samples, n_features)) – A list of n_features-dimensional data points. Each row corresponds to a single data point.
y_pred (array-like of shape (n_samples,)) – Predicted labels for each sample.
use_normalized (bool) – We normalize the scatter matrix before calculate the Det to reduce the value, default=True
- Returns:
The Ksq-DetW Index
- Return type:
result (float)
- kulczynski_score(y_true=None, y_pred=None, force_finite=True, finite_value=0.0, **kwargs)[source]
Computes the Kulczynski Score
- Parameters:
y_true (array-like) – The true labels for each sample.
y_pred (array-like) – The predicted cluster labels for each sample.
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The Kulczynski score
- Return type:
result (float)
- log_det_ratio_index(X=None, y_pred=None, force_finite=True, finite_value=-10000000000.0, **kwargs)[source]
Computes the Log Det Ratio Index
- Parameters:
X (array-like of shape (n_samples, n_features)) – A list of n_features-dimensional data points. Each row corresponds to a single data point.
y_pred (array-like of shape (n_samples,)) – Predicted labels for each sample.
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The Log Det Ratio Index
- Return type:
result (float)
- log_ss_ratio_index(X=None, y_pred=None, force_finite=True, finite_value=-10000000000.0, **kwargs)[source]
Computes the Log SS Ratio Index (LSRI).
- Parameters:
X (array-like of shape (n_samples, n_features)) – A list of n_features-dimensional data points. Each row corresponds to a single data point.
y_pred (array-like of shape (n_samples,)) – Predicted labels for each sample.
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The Log SS Ratio Index
- Return type:
result (float)
- mc_nemar_score(y_true=None, y_pred=None, force_finite=True, finite_value=0.0, **kwargs)[source]
Computes the Mc Nemar score
- Parameters:
y_true (array-like) – The true labels for each sample.
y_pred (array-like) – The predicted cluster labels for each sample.
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The Mc Nemar score
- Return type:
result (float)
- mean_squared_error_index(X=None, y_pred=None, **kwarg)[source]
Computes the Mean Squared Error Index MSEI measures the mean of squared distances between each data point and its corresponding centroid or cluster center.
- Parameters:
X (array-like of shape (n_samples, n_features)) – A list of n_features-dimensional data points. Each row corresponds to a single data point.
y_pred (array-like of shape (n_samples,)) – Predicted labels for each sample.
- Returns:
The Mean Squared Error Index
- Return type:
result (float)
- mutual_info_score(y_true=None, y_pred=None, **kwargs)[source]
Computes the Mutual Information score.
- Parameters:
y_true (array-like) – The true labels for each sample.
y_pred (array-like) – The predicted cluster labels for each sample.
- Returns:
The Mutual Information score
- Return type:
result (float)
- normalized_mutual_info_score(y_true=None, y_pred=None, force_finite=True, finite_value=0.0, **kwargs)[source]
Computes the normalized mutual information It is a variation of the mutual information score that normalizes the result to take values between 0 and 1. It is defined as the mutual information divided by the average entropy of the true and predicted clusterings. Bigger is better (Best = 1), Range = [0, 1]
- Parameters:
y_true (array-like) – The true labels for each sample.
y_pred (array-like) – The predicted cluster labels for each sample.
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The normalized mutual information score.
- Return type:
result (float)
- phi_score(y_true=None, y_pred=None, force_finite=True, finite_value=0.0, **kwargs)[source]
Computes the Phi score
- Parameters:
y_true (array-like) – The true labels for each sample.
y_pred (array-like) – The predicted cluster labels for each sample.
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The Phi score
- Return type:
result (float)
- precision_score(y_true=None, y_pred=None, force_finite=True, finite_value=0.0, **kwargs)[source]
Computes the Precision Score
- Parameters:
y_true (array-like) – The true labels for each sample.
y_pred (array-like) – The predicted cluster labels for each sample.
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The Precision score
- Return type:
result (float)
- purity_score(y_true=None, y_pred=None, **kwargs)[source]
Computes the Purity score
- Parameters:
y_true (array-like) – The true labels for each sample.
y_pred (array-like) – The predicted cluster labels for each sample.
- Returns:
The Purity score
- Return type:
result (float)
- r_squared_index(X=None, y_pred=None, **kwarg)[source]
Computes the R-squared index
- Parameters:
X (array-like of shape (n_samples, n_features)) – A list of n_features-dimensional data points. Each row corresponds to a single data point.
y_pred (array-like of shape (n_samples,)) – Predicted labels for each sample.
- Returns:
The R-squared index
- Return type:
result (float)
- rand_score(y_true=None, y_pred=None, **kwargs)[source]
Computes the Rand score
- Parameters:
y_true (array-like) – The true labels for each sample.
y_pred (array-like) – The predicted cluster labels for each sample.
- Returns:
The rand score.
- Return type:
result (float)
- recall_score(y_true=None, y_pred=None, force_finite=True, finite_value=0.0, **kwargs)[source]
Computes the Recall Score
- Parameters:
y_true (array-like) – The true labels for each sample.
y_pred (array-like) – The predicted cluster labels for each sample.
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The Recall score
- Return type:
result (float)
- rogers_tanimoto_score(y_true=None, y_pred=None, force_finite=True, finite_value=0.0, **kwargs)[source]
Computes the Rogers-Tanimoto score
- Parameters:
y_true (array-like) – The true labels for each sample.
y_pred (array-like) – The predicted cluster labels for each sample.
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The Rogers-Tanimoto score
- Return type:
result (float)
- russel_rao_score(y_true=None, y_pred=None, force_finite=True, finite_value=0.0, **kwargs)[source]
Computes the Russel-Rao score
- Parameters:
y_true (array-like) – The true labels for each sample.
y_pred (array-like) – The predicted cluster labels for each sample.
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The Russel-Rao score
- Return type:
result (float)
- silhouette_index(X=None, y_pred=None, multi_output=False, force_finite=True, finite_value=-1.0, chunk_size=5000, **kwargs)[source]
Computes the Silhouette Index
- Parameters:
X (array-like of shape (n_samples, n_features)) – A list of n_features-dimensional data points. Each row corresponds to a single data point.
y_pred (array-like of shape (n_samples,)) – Predicted labels for each sample.
multi_output (bool) – Returned scores for each cluster, default=False
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
chunk_size (int) – Split original data to chunk_size to avoid OOM problem
- Returns:
The Silhouette Index
- Return type:
result (float)
- sokal_sneath1_score(y_true=None, y_pred=None, force_finite=True, finite_value=0.0, **kwargs)[source]
Computes the Sokal-Sneath 1 score
- Parameters:
y_true (array-like) – The true labels for each sample.
y_pred (array-like) – The predicted cluster labels for each sample.
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The Sokal-Sneath 1 score
- Return type:
result (float)
- sokal_sneath2_score(y_true=None, y_pred=None, force_finite=True, finite_value=0.0, **kwargs)[source]
Computes the Sokal-Sneath 2 score
- Parameters:
y_true (array-like) – The true labels for each sample.
y_pred (array-like) – The predicted cluster labels for each sample.
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The Sokal-Sneath 2 score
- Return type:
result (float)
- sum_squared_error_index(X=None, y_pred=None, **kwarg)[source]
Computes the Sum of Squared Error Index
- Parameters:
X (array-like of shape (n_samples, n_features)) – A list of n_features-dimensional data points. Each row corresponds to a single data point.
y_pred (array-like of shape (n_samples,)) – Predicted labels for each sample.
- Returns:
The Sum of Squared Error Index
- Return type:
result (float)
- tau_score(y_true=None, y_pred=None, force_finite=True, finite_value=0.0, **kwargs)[source]
Computes the Tau Score
- Parameters:
y_true (array-like) – The true labels for each sample.
y_pred (array-like) – The predicted cluster labels for each sample.
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The Tau Score
- Return type:
result (float)
- v_measure_score(y_true=None, y_pred=None, beta=1.0, force_finite=True, finite_value=0.0, **kwargs)[source]
Computes the V Measure Score
It is a combination of two other metrics: homogeneity and completeness. Homogeneity measures whether all the data points in a given cluster belong to the same class. Completeness measures whether all the data points of a certain class are assigned to the same cluster. The V-measure combines these two metrics into a single score.
- Parameters:
y_true (array-like) – The true labels for each sample.
y_pred (array-like) – The predicted cluster labels for each sample.
beta (float) – The weight parameter, default = 1.0
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The V measure score
- Return type:
result (float)
- xie_beni_index(X=None, y_pred=None, force_finite=True, finite_value=10000000000.0, **kwargs)[source]
Computes the Xie-Beni index.
The Xie-Beni index is an index of fuzzy clustering, but it is also applicable to crisp clustering. The numerator is the mean of the squared distances of all of the points with respect to their barycenter of the cluster they belong to. The denominator is the minimal squared distances between the points in the clusters. The minimum value indicates the best number of clusters.
- Parameters:
X (array-like of shape (n_samples, n_features)) – A list of n_features-dimensional data points. Each row corresponds to a single data point.
y_pred (array-like of shape (n_samples,)) – Predicted labels for each sample.
force_finite (bool) – Make result as finite number
finite_value (float) – The value that used to replace the infinite value or NaN value.
- Returns:
The Xie-Beni index
- Return type:
result (float)