permetrics.utils.classifier_util module

permetrics.utils.classifier_util.calculate_class_weights(y_true, y_pred=None, y_score=None)[source]
permetrics.utils.classifier_util.calculate_confusion_matrix(y_true=None, y_pred=None, labels=None, normalize=None)[source]

Generate a confusion matrix for multiple classification

Parameters
  • y_true (tuple, list, np.ndarray) – a list of integers or strings for known classes

  • y_pred (tuple, list, np.ndarray) – a list of integers or strings for y_pred classes

  • labels (tuple, list, np.ndarray) – List of labels to index the matrix. This may be used to reorder or select a subset of labels.

  • normalize ('true', 'pred', 'all', None) – Normalizes confusion matrix over the true (rows), predicted (columns) conditions or all the population.

Returns

a 2-dimensional list of pairwise counts imap (dict): a map between label and index of confusion matrix imap_count (dict): a map between label and number of true label in y_true

Return type

matrix (np.ndarray)

permetrics.utils.classifier_util.calculate_roc_curve(y_true, y_score)[source]
permetrics.utils.classifier_util.calculate_single_label_metric(matrix, imap, imap_count, beta=1.0)[source]

Generate a dictionary of supported metrics for each label

Parameters
  • matrix (np.ndarray) – a 2-dimensional list of pairwise counts

  • imap (dict) – a map between label and index of confusion matrix

  • imap_count (dict) – a map between label and number of true label in y_true

  • beta (float) – to calculate the f-beta score

Returns

a dictionary of supported metrics

Return type

dict_metrics (dict)

permetrics.utils.cluster_util module

permetrics.utils.cluster_util.calculate_adjusted_rand_score(y_true=None, y_pred=None)[source]
permetrics.utils.cluster_util.calculate_ball_hall_index(X=None, y_pred=None)[source]
permetrics.utils.cluster_util.calculate_banfeld_raftery_index(X=None, y_pred=None, force_finite=True, finite_value=10000000000.0)[source]
permetrics.utils.cluster_util.calculate_beale_index(X=None, y_pred=None, force_finite=True, finite_value=10000000000.0)[source]
permetrics.utils.cluster_util.calculate_calinski_harabasz_index(X=None, y_pred=None, force_finite=True, finite_value=0.0)[source]
Parameters
  • X – The X matrix features

  • y_pred – The predicted results

  • force_finite – Make result as finite number

  • finite_value – The value that used to replace the infinite value or NaN value.

Returns

The Calinski Harabasz Index

permetrics.utils.cluster_util.calculate_completeness_score(y_true=None, y_pred=None)[source]
permetrics.utils.cluster_util.calculate_czekanowski_dice_score(y_true=None, y_pred=None)[source]
permetrics.utils.cluster_util.calculate_davies_bouldin_index(X=None, y_pred=None, force_finite=True, finite_value=10000000000.0)[source]
permetrics.utils.cluster_util.calculate_density_based_clustering_validation_index(X=None, y_pred=None, force_finite=True, finite_value=1.0)[source]
permetrics.utils.cluster_util.calculate_det_ratio_index(X=None, y_pred=None, force_finite=True, finite_value=- 10000000000.0)[source]
permetrics.utils.cluster_util.calculate_duda_hart_index(X=None, y_pred=None, force_finite=True, finite_value=10000000000.0)[source]
permetrics.utils.cluster_util.calculate_dunn_index(X=None, y_pred=None, use_modified=True, force_finite=True, finite_value=0.0)[source]
permetrics.utils.cluster_util.calculate_entropy_score(y_true=None, y_pred=None)[source]
permetrics.utils.cluster_util.calculate_f_measure_score(y_true=None, y_pred=None)[source]
permetrics.utils.cluster_util.calculate_fowlkes_mallows_score(y_true=None, y_pred=None, force_finite=True, finite_value=0.0)[source]
permetrics.utils.cluster_util.calculate_gamma_score(y_true=None, y_pred=None)[source]

Cluster Validation for Mixed-Type Data: Paper

permetrics.utils.cluster_util.calculate_gplus_score(y_true=None, y_pred=None)[source]

Cluster Validation for Mixed-Type Data: Paper

permetrics.utils.cluster_util.calculate_hartigan_index(X=None, y_pred=None, force_finite=True, finite_value=10000000000.0)[source]
permetrics.utils.cluster_util.calculate_homogeneity_score(y_true=None, y_pred=None)[source]
permetrics.utils.cluster_util.calculate_hubert_gamma_score(y_true=None, y_pred=None, force_finite=True, finite_value=- 1.0)[source]
permetrics.utils.cluster_util.calculate_jaccard_score(y_true=None, y_pred=None)[source]
permetrics.utils.cluster_util.calculate_ksq_detw_index(X=None, y_pred=None, use_normalized=True)[source]
permetrics.utils.cluster_util.calculate_kulczynski_score(y_true=None, y_pred=None)[source]
permetrics.utils.cluster_util.calculate_log_det_ratio_index(X=None, y_pred=None, force_finite=True, finite_value=- 10000000000.0)[source]
permetrics.utils.cluster_util.calculate_mc_nemar_score(y_true=None, y_pred=None)[source]
permetrics.utils.cluster_util.calculate_mean_squared_error_index(X=None, y_pred=None)[source]
permetrics.utils.cluster_util.calculate_mutual_info_score(y_true=None, y_pred=None)[source]
permetrics.utils.cluster_util.calculate_normalized_mutual_info_score(y_true=None, y_pred=None, force_finite=True, finite_value=0.0)[source]
permetrics.utils.cluster_util.calculate_phi_score(y_true=None, y_pred=None, force_finite=True, finite_value=- 10000000000.0)[source]
permetrics.utils.cluster_util.calculate_precision_score(y_true=None, y_pred=None)[source]
permetrics.utils.cluster_util.calculate_purity_score(y_true=None, y_pred=None)[source]
permetrics.utils.cluster_util.calculate_r_squared_index(X=None, y_pred=None)[source]
permetrics.utils.cluster_util.calculate_rand_score(y_true=None, y_pred=None)[source]
permetrics.utils.cluster_util.calculate_recall_score(y_true=None, y_pred=None)[source]
permetrics.utils.cluster_util.calculate_rogers_tanimoto_score(y_true=None, y_pred=None)[source]
permetrics.utils.cluster_util.calculate_russel_rao_score(y_true=None, y_pred=None)[source]
permetrics.utils.cluster_util.calculate_silhouette_index(X=None, y_pred=None, multi_output=False, force_finite=True, finite_value=- 1.0)[source]

Calculates the silhouette score for a given clustering.

Parameters
  • data – A numpy array of shape (n_samples, n_features) representing the data points.

  • labels – A numpy array of shape (n_samples,) containing the cluster labels for each data point.

Returns

The silhouette score, a value between -1 and 1.

permetrics.utils.cluster_util.calculate_silhouette_index_ver1(X=None, y_pred=None)[source]
permetrics.utils.cluster_util.calculate_silhouette_index_ver2(X=None, y_pred=None, multi_output=False, force_finite=True, finite_value=- 1.0)[source]
permetrics.utils.cluster_util.calculate_silhouette_index_ver3(X=None, y_pred=None, multi_output=False, force_finite=True, finite_value=- 1.0)[source]
permetrics.utils.cluster_util.calculate_sokal_sneath1_score(y_true=None, y_pred=None)[source]
permetrics.utils.cluster_util.calculate_sokal_sneath2_score(y_true=None, y_pred=None)[source]
permetrics.utils.cluster_util.calculate_sum_squared_error_index(X=None, y_pred=None)[source]
permetrics.utils.cluster_util.calculate_tau_score(y_true=None, y_pred=None)[source]

Cluster Validation for Mixed-Type Data: Paper

permetrics.utils.cluster_util.calculate_v_measure_score(y_true=None, y_pred=None)[source]
permetrics.utils.cluster_util.calculate_xie_beni_index(X=None, y_pred=None, force_finite=True, finite_value=10000000000.0)[source]
permetrics.utils.cluster_util.compute_BGSS(X, labels)[source]

The between-group dispersion BGSS or between-cluster variance

permetrics.utils.cluster_util.compute_TSS(X)[source]
permetrics.utils.cluster_util.compute_WG(X)[source]
permetrics.utils.cluster_util.compute_WGSS(X, labels)[source]

Calculate the pooled within-cluster sum of squares WGSS or The within-cluster variance

permetrics.utils.cluster_util.compute_barycenters(X, labels)[source]

Get the barycenter for each cluster and barycenter for all observations

Parameters
  • X (np.ndarray) – The features of datasets

  • labels (np.ndarray) – The predicted labels

Returns

The barycenter for each clusters in form of matrix overall_barycenter (np.ndarray): the barycenter for all observations

Return type

barycenters (np.ndarray)

permetrics.utils.cluster_util.compute_clusters(labels)[source]

Get the dict of clusters and dict of cluster size

permetrics.utils.cluster_util.compute_conditional_entropy(y_true, y_pred)[source]
permetrics.utils.cluster_util.compute_confusion_matrix(y_true, y_pred, normalize=False)[source]

Computes the confusion matrix for a clustering problem given the true labels and the predicted labels. http://cran.nexr.com/web/packages/clusterCrit/vignettes/clusterCrit.pdf

permetrics.utils.cluster_util.compute_contingency_matrix(y_true, y_pred)[source]
permetrics.utils.cluster_util.compute_entropy(labels)[source]
permetrics.utils.cluster_util.compute_nd_splus_sminus_t(y_true=None, y_pred=None)[source]

concordant_discordant

permetrics.utils.data_util module

permetrics.utils.data_util.format_classification_data(y_true: numpy.ndarray, y_pred: numpy.ndarray)[source]
permetrics.utils.data_util.format_external_clustering_data(y_true: numpy.ndarray, y_pred: numpy.ndarray)[source]

Need both of y_true and y_pred to format

permetrics.utils.data_util.format_internal_clustering_data(y_pred: numpy.ndarray)[source]
permetrics.utils.data_util.format_regression_data_type(y_true: numpy.ndarray, y_pred: numpy.ndarray)[source]
permetrics.utils.data_util.format_y_score(y_true: numpy.ndarray, y_score: numpy.ndarray)[source]
permetrics.utils.data_util.get_regression_non_zero_data(y_true, y_pred, one_dim=True, rule_idx=0)[source]

Get non-zero data based on rule

Parameters
  • y_true (tuple, list, np.ndarray) – The ground truth values

  • y_pred (tuple, list, np.ndarray) – The prediction values

  • one_dim (bool) – is y_true has 1 dimensions or not

  • rule_idx (int) – valid values [0, 1, 2] corresponding to [y_true, y_pred, both true and pred]

Returns

y_true with positive values based on rule y_pred: y_pred with positive values based on rule

Return type

y_true

permetrics.utils.data_util.get_regression_positive_data(y_true, y_pred, one_dim=True, rule_idx=0)[source]

Get positive data based on rule

Parameters
  • y_true (tuple, list, np.ndarray) – The ground truth values

  • y_pred (tuple, list, np.ndarray) – The prediction values

  • one_dim (bool) – is y_true has 1 dimensions or not

  • rule_idx (int) – valid values [0, 1, 2] corresponding to [y_true, y_pred, both true and pred]

Returns

y_true with positive values based on rule y_pred: y_pred with positive values based on rule

Return type

y_true

permetrics.utils.data_util.is_unique_labels_consecutive_and_start_zero(vector)[source]

permetrics.utils.encoder module

class permetrics.utils.encoder.LabelEncoder[source]

Bases: object

fit(y)[source]
fit_transform(y)[source]
inverse_transform(y)[source]
transform(y)[source]

permetrics.utils.regressor_util module

permetrics.utils.regressor_util.calculate_absolute_pcc(y_true, y_pred)[source]
permetrics.utils.regressor_util.calculate_ec(y_true, y_pred)[source]
permetrics.utils.regressor_util.calculate_entropy(y_true, y_pred)[source]
permetrics.utils.regressor_util.calculate_mse(y_true, y_pred)[source]
permetrics.utils.regressor_util.calculate_nse(y_true, y_pred)[source]
permetrics.utils.regressor_util.calculate_pcc(y_true, y_pred)[source]
permetrics.utils.regressor_util.calculate_wi(y_true, y_pred)[source]