permetrics.utils.classifier_util module
- permetrics.utils.classifier_util.calculate_accuracy_score(y_true, y_pred, normalize=True, sample_weight=None)[source]
Compute the accuracy score for classification tasks.
Accuracy is the ratio of correctly predicted samples to the total number of samples. It can also compute weighted accuracy if sample weights are provided.
- Parameters:
y_true (array-like) – Ground truth (correct) labels.
y_pred (array-like) – Predicted labels.
normalize (bool, optional) – If True, return the fraction of correctly predicted samples. If False, return the number of correctly predicted samples. Default is True.
sample_weight (array-like, optional) – Sample weights. Default is None.
- Returns:
Accuracy score (normalized or raw count).
- Return type:
float or int
- permetrics.utils.classifier_util.calculate_class_support(y_true)[source]
Compute the support (number of occurrences) for each class in the ground truth labels.
- Parameters:
y_true (array-like) – Ground truth (correct) labels.
- Returns:
Array of class counts.
- Return type:
ndarray
- permetrics.utils.classifier_util.calculate_confusion_matrix(y_true=None, y_pred=None, labels=None, normalize=None)[source]
Compute the confusion matrix for classification tasks.
The confusion matrix summarizes the performance of a classification model by comparing the predicted labels with the true labels. It can also normalize the matrix based on the specified normalization method.
- Parameters:
y_true (array-like) – Ground truth (correct) labels.
y_pred (array-like) – Predicted labels.
labels (list, optional) – Subset of labels to include in the matrix. Default is None.
normalize (str, optional) – Normalization method. One of {“true”, “pred”, “all”}. - “true”: Normalize rows (true labels). - “pred”: Normalize columns (predicted labels). - “all”: Normalize the entire matrix. Default is None (no normalization).
- Returns:
matrix (ndarray): Confusion matrix (normalized if specified).
imap (dict): Mapping of labels to matrix indices.
imap_count (dict): Count of true labels for each class.
- Return type:
tuple
- Raises:
ValueError – If specified labels do not exist in y_true or y_pred.
- permetrics.utils.classifier_util.calculate_roc_curve(y_true, y_score)[source]
Compute the Receiver Operating Characteristic (ROC) curve.
The ROC curve is a graphical representation of the trade-off between the true positive rate (sensitivity) and the false positive rate (1-specificity) at various threshold settings.
- Parameters:
y_true (array-like) – Ground truth (correct) binary labels.
y_score (array-like) – Predicted scores or probabilities for the positive class.
- Returns:
tpr (ndarray): True positive rates.
fpr (ndarray): False positive rates.
thresholds (ndarray): Thresholds used to compute TPR and FPR.
- Return type:
tuple
Notes
This function assumes y_true contains binary labels (0 and 1).
If only one class is present in y_true, the ROC curve is not defined.
- permetrics.utils.classifier_util.calculate_single_label_metric(matrix, imap, imap_count, beta=1.0)[source]
Compute various classification metrics for single-label classification.
This function calculates metrics such as precision, recall, specificity, F1 score, Matthews correlation coefficient (MCC), and others for each class in the confusion matrix.
- Parameters:
matrix (ndarray) – Confusion matrix.
imap (dict) – Mapping of labels to matrix indices.
imap_count (dict) – Count of true labels for each class.
beta (float, optional) – Weight of recall in the F-beta score. Default is 1.0.
- Returns:
A dictionary where keys are class labels and values are dictionaries of metrics.
- Return type:
dict
permetrics.utils.cluster_util module
- permetrics.utils.cluster_util.calculate_banfeld_raftery_index(X=None, y_pred=None, force_finite=True, finite_value=10000000000.0)[source]
- permetrics.utils.cluster_util.calculate_beale_index(X=None, y_pred=None, force_finite=True, finite_value=10000000000.0)[source]
Beale Index (BI).
- permetrics.utils.cluster_util.calculate_calinski_harabasz_index(X=None, y_pred=None, force_finite=True, finite_value=0.0)[source]
Calinski-Harabasz Index (Variance Ratio Criterion).
- permetrics.utils.cluster_util.calculate_completeness_score(y_true=None, y_pred=None, force_finite=True, finite_value=1.0)[source]
- permetrics.utils.cluster_util.calculate_czekanowski_dice_score(y_true=None, y_pred=None, force_finite=True, finite_value=1.0)[source]
- permetrics.utils.cluster_util.calculate_davies_bouldin_index(X=None, y_pred=None, force_finite=True, finite_value=10000000000.0)[source]
- permetrics.utils.cluster_util.calculate_dbcv_score(X=None, y_pred=None, force_finite=True, finite_value=0.0)[source]
Density-Based Clustering Validation (DBCV) - Moulavi et al. (2014). Valid for arbitrarily shaped clusters. Range: [-1, 1].
- permetrics.utils.cluster_util.calculate_det_ratio_index(X=None, y_pred=None, force_finite=True, finite_value=-10000000000.0)[source]
- permetrics.utils.cluster_util.calculate_duda_hart_index(X=None, y_pred=None, chunk_size=5000, force_finite=True, finite_value=10000000000.0)[source]
- permetrics.utils.cluster_util.calculate_dunn_index(X=None, y_pred=None, use_modified=True, force_finite=True, finite_value=0.0)[source]
- permetrics.utils.cluster_util.calculate_entropy_score(y_true=None, y_pred=None)[source]
O(N) Vectorized Cluster Entropy Score (Corrected Formulation).
- permetrics.utils.cluster_util.calculate_f_measure_score(y_true=None, y_pred=None, beta=1.0, force_finite=True, finite_value=1.0)[source]
- permetrics.utils.cluster_util.calculate_fowlkes_mallows_score(y_true=None, y_pred=None, force_finite=True, finite_value=0.0)[source]
- permetrics.utils.cluster_util.calculate_gamma_score(y_true=None, y_pred=None, force_finite=True, finite_value=1.0)[source]
- permetrics.utils.cluster_util.calculate_gplus_score(y_true=None, y_pred=None, force_finite=True, finite_value=0.0)[source]
- permetrics.utils.cluster_util.calculate_hartigan_index(X=None, y_pred=None, force_finite=True, finite_value=10000000000.0)[source]
- permetrics.utils.cluster_util.calculate_homogeneity_score(y_true=None, y_pred=None, force_finite=True, finite_value=1.0)[source]
- permetrics.utils.cluster_util.calculate_hubert_gamma_score(y_true=None, y_pred=None, force_finite=True, finite_value=0.0)[source]
- permetrics.utils.cluster_util.calculate_jaccard_score(y_true=None, y_pred=None, force_finite=True, finite_value=0.0)[source]
- permetrics.utils.cluster_util.calculate_ksq_detw_index(X=None, y_pred=None, use_normalized=True, force_finite=True, finite_value=0.0)[source]
- permetrics.utils.cluster_util.calculate_kulczynski_score(y_true=None, y_pred=None, force_finite=True, finite_value=0.0)[source]
- permetrics.utils.cluster_util.calculate_log_det_ratio_index(X=None, y_pred=None, force_finite=True, finite_value=-10000000000.0)[source]
- permetrics.utils.cluster_util.calculate_mc_nemar_score(y_true=None, y_pred=None, force_finite=True, finite_value=0.0)[source]
- permetrics.utils.cluster_util.calculate_normalized_mutual_info_score(y_true=None, y_pred=None, force_finite=True, finite_value=0.0)[source]
- permetrics.utils.cluster_util.calculate_phi_score(y_true=None, y_pred=None, force_finite=True, finite_value=0.0)[source]
- permetrics.utils.cluster_util.calculate_precision_score(y_true=None, y_pred=None, force_finite=True, finite_value=1.0)[source]
- permetrics.utils.cluster_util.calculate_purity_score(y_true=None, y_pred=None)[source]
O(N) Vectorized Purity Score. Safe for arbitrary label formats.
- permetrics.utils.cluster_util.calculate_recall_score(y_true=None, y_pred=None, force_finite=True, finite_value=1.0)[source]
- permetrics.utils.cluster_util.calculate_rogers_tanimoto_score(y_true=None, y_pred=None, force_finite=True, finite_value=0.0)[source]
- permetrics.utils.cluster_util.calculate_russel_rao_score(y_true=None, y_pred=None, force_finite=True, finite_value=0.0)[source]
- permetrics.utils.cluster_util.calculate_silhouette_index(X, y_pred, chunk_size=5000, multi_output=False, force_finite=True, finite_value=-1.0)[source]
A chunk-based implementation of Silhouette Score to prevent OOM on large datasets (100K+).
- permetrics.utils.cluster_util.calculate_sokal_sneath1_score(y_true=None, y_pred=None, force_finite=True, finite_value=0.0)[source]
- permetrics.utils.cluster_util.calculate_sokal_sneath2_score(y_true=None, y_pred=None, force_finite=True, finite_value=0.0)[source]
- permetrics.utils.cluster_util.calculate_tau_score(y_true=None, y_pred=None, force_finite=True, finite_value=1.0)[source]
- permetrics.utils.cluster_util.calculate_v_measure_score(y_true=None, y_pred=None, beta=1.0, force_finite=True, finite_value=1.0)[source]
- permetrics.utils.cluster_util.calculate_xie_beni_index(X=None, y_pred=None, force_finite=True, finite_value=10000000000.0)[source]
- permetrics.utils.cluster_util.compute_BGSS(X, labels)[source]
The between-group dispersion BGSS or between-cluster variance
- permetrics.utils.cluster_util.compute_WGSS(X, labels)[source]
Calculate the pooled within-cluster sum of squares WGSS or The within-cluster variance
- permetrics.utils.cluster_util.compute_barycenters(X, labels)[source]
Get the barycenter for each cluster and barycenter for all observations
- Parameters:
X (np.ndarray) – The features of datasets
labels (np.ndarray) – The predicted labels
- Returns:
The barycenter for each clusters in form of matrix overall_barycenter (np.ndarray): the barycenter for all observations
- Return type:
barycenters (np.ndarray)
- permetrics.utils.cluster_util.compute_clusters(labels)[source]
Get the dict of clusters and dict of cluster size
- permetrics.utils.cluster_util.compute_confusion_matrix(y_true, y_pred, normalize=False)[source]
Computes the confusion matrix for a clustering problem given the true labels and the predicted labels.
permetrics.utils.data_util module
- permetrics.utils.data_util.get_regression_non_zero_data(y_true, y_pred, one_dim=None, rule_idx=0)[source]