permetrics.utils.classifier_util module
- permetrics.utils.classifier_util.calculate_class_weights(y_true, y_pred=None, y_score=None)[source]
- permetrics.utils.classifier_util.calculate_confusion_matrix(y_true=None, y_pred=None, labels=None, normalize=None)[source]
Generate a confusion matrix for multiple classification
- Parameters
y_true (tuple, list, np.ndarray) – a list of integers or strings for known classes
y_pred (tuple, list, np.ndarray) – a list of integers or strings for y_pred classes
labels (tuple, list, np.ndarray) – List of labels to index the matrix. This may be used to reorder or select a subset of labels.
normalize ('true', 'pred', 'all', None) – Normalizes confusion matrix over the true (rows), predicted (columns) conditions or all the population.
- Returns
a 2-dimensional list of pairwise counts imap (dict): a map between label and index of confusion matrix imap_count (dict): a map between label and number of true label in y_true
- Return type
matrix (np.ndarray)
- permetrics.utils.classifier_util.calculate_single_label_metric(matrix, imap, imap_count, beta=1.0)[source]
Generate a dictionary of supported metrics for each label
- Parameters
matrix (np.ndarray) – a 2-dimensional list of pairwise counts
imap (dict) – a map between label and index of confusion matrix
imap_count (dict) – a map between label and number of true label in y_true
beta (float) – to calculate the f-beta score
- Returns
a dictionary of supported metrics
- Return type
dict_metrics (dict)
permetrics.utils.cluster_util module
- permetrics.utils.cluster_util.calculate_banfeld_raftery_index(X=None, y_pred=None, force_finite=True, finite_value=10000000000.0)[source]
- permetrics.utils.cluster_util.calculate_beale_index(X=None, y_pred=None, force_finite=True, finite_value=10000000000.0)[source]
- permetrics.utils.cluster_util.calculate_calinski_harabasz_index(X=None, y_pred=None, force_finite=True, finite_value=0.0)[source]
- Parameters
X – The X matrix features
y_pred – The predicted results
force_finite – Make result as finite number
finite_value – The value that used to replace the infinite value or NaN value.
- Returns
The Calinski Harabasz Index
- permetrics.utils.cluster_util.calculate_davies_bouldin_index(X=None, y_pred=None, force_finite=True, finite_value=10000000000.0)[source]
- permetrics.utils.cluster_util.calculate_density_based_clustering_validation_index(X=None, y_pred=None, force_finite=True, finite_value=1.0)[source]
- permetrics.utils.cluster_util.calculate_det_ratio_index(X=None, y_pred=None, force_finite=True, finite_value=- 10000000000.0)[source]
- permetrics.utils.cluster_util.calculate_duda_hart_index(X=None, y_pred=None, force_finite=True, finite_value=10000000000.0)[source]
- permetrics.utils.cluster_util.calculate_dunn_index(X=None, y_pred=None, use_modified=True, force_finite=True, finite_value=0.0)[source]
- permetrics.utils.cluster_util.calculate_fowlkes_mallows_score(y_true=None, y_pred=None, force_finite=True, finite_value=0.0)[source]
- permetrics.utils.cluster_util.calculate_gamma_score(y_true=None, y_pred=None)[source]
Cluster Validation for Mixed-Type Data: Paper
- permetrics.utils.cluster_util.calculate_gplus_score(y_true=None, y_pred=None)[source]
Cluster Validation for Mixed-Type Data: Paper
- permetrics.utils.cluster_util.calculate_hartigan_index(X=None, y_pred=None, force_finite=True, finite_value=10000000000.0)[source]
- permetrics.utils.cluster_util.calculate_hubert_gamma_score(y_true=None, y_pred=None, force_finite=True, finite_value=- 1.0)[source]
- permetrics.utils.cluster_util.calculate_ksq_detw_index(X=None, y_pred=None, use_normalized=True)[source]
- permetrics.utils.cluster_util.calculate_log_det_ratio_index(X=None, y_pred=None, force_finite=True, finite_value=- 10000000000.0)[source]
- permetrics.utils.cluster_util.calculate_normalized_mutual_info_score(y_true=None, y_pred=None, force_finite=True, finite_value=0.0)[source]
- permetrics.utils.cluster_util.calculate_phi_score(y_true=None, y_pred=None, force_finite=True, finite_value=- 10000000000.0)[source]
- permetrics.utils.cluster_util.calculate_silhouette_index(X=None, y_pred=None, multi_output=False, force_finite=True, finite_value=- 1.0)[source]
Calculates the silhouette score for a given clustering.
- Parameters
data – A numpy array of shape (n_samples, n_features) representing the data points.
labels – A numpy array of shape (n_samples,) containing the cluster labels for each data point.
- Returns
The silhouette score, a value between -1 and 1.
- permetrics.utils.cluster_util.calculate_silhouette_index_ver2(X=None, y_pred=None, multi_output=False, force_finite=True, finite_value=- 1.0)[source]
- permetrics.utils.cluster_util.calculate_silhouette_index_ver3(X=None, y_pred=None, multi_output=False, force_finite=True, finite_value=- 1.0)[source]
- permetrics.utils.cluster_util.calculate_tau_score(y_true=None, y_pred=None)[source]
Cluster Validation for Mixed-Type Data: Paper
- permetrics.utils.cluster_util.calculate_xie_beni_index(X=None, y_pred=None, force_finite=True, finite_value=10000000000.0)[source]
- permetrics.utils.cluster_util.compute_BGSS(X, labels)[source]
The between-group dispersion BGSS or between-cluster variance
- permetrics.utils.cluster_util.compute_WGSS(X, labels)[source]
Calculate the pooled within-cluster sum of squares WGSS or The within-cluster variance
- permetrics.utils.cluster_util.compute_barycenters(X, labels)[source]
Get the barycenter for each cluster and barycenter for all observations
- Parameters
X (np.ndarray) – The features of datasets
labels (np.ndarray) – The predicted labels
- Returns
The barycenter for each clusters in form of matrix overall_barycenter (np.ndarray): the barycenter for all observations
- Return type
barycenters (np.ndarray)
- permetrics.utils.cluster_util.compute_clusters(labels)[source]
Get the dict of clusters and dict of cluster size
- permetrics.utils.cluster_util.compute_confusion_matrix(y_true, y_pred, normalize=False)[source]
Computes the confusion matrix for a clustering problem given the true labels and the predicted labels. http://cran.nexr.com/web/packages/clusterCrit/vignettes/clusterCrit.pdf
permetrics.utils.data_util module
- permetrics.utils.data_util.format_classification_data(y_true: numpy.ndarray, y_pred: numpy.ndarray)[source]
- permetrics.utils.data_util.format_external_clustering_data(y_true: numpy.ndarray, y_pred: numpy.ndarray)[source]
Need both of y_true and y_pred to format
- permetrics.utils.data_util.format_regression_data_type(y_true: numpy.ndarray, y_pred: numpy.ndarray)[source]
- permetrics.utils.data_util.get_regression_non_zero_data(y_true, y_pred, one_dim=True, rule_idx=0)[source]
Get non-zero data based on rule
- Parameters
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
one_dim (bool) – is y_true has 1 dimensions or not
rule_idx (int) – valid values [0, 1, 2] corresponding to [y_true, y_pred, both true and pred]
- Returns
y_true with positive values based on rule y_pred: y_pred with positive values based on rule
- Return type
y_true
- permetrics.utils.data_util.get_regression_positive_data(y_true, y_pred, one_dim=True, rule_idx=0)[source]
Get positive data based on rule
- Parameters
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
one_dim (bool) – is y_true has 1 dimensions or not
rule_idx (int) – valid values [0, 1, 2] corresponding to [y_true, y_pred, both true and pred]
- Returns
y_true with positive values based on rule y_pred: y_pred with positive values based on rule
- Return type
y_true