permetrics.utils.classifier_util module

permetrics.utils.classifier_util.calculate_class_weights(y_true, y_pred=None, y_score=None)[source]
permetrics.utils.classifier_util.calculate_confusion_matrix(y_true=None, y_pred=None, labels=None, normalize=None)[source]

Generate a confusion matrix for multiple classification

Parameters
  • y_true (tuple, list, np.ndarray) – a list of integers or strings for known classes

  • y_pred (tuple, list, np.ndarray) – a list of integers or strings for y_pred classes

  • labels (tuple, list, np.ndarray) – List of labels to index the matrix. This may be used to reorder or select a subset of labels.

  • normalize ('true', 'pred', 'all', None) – Normalizes confusion matrix over the true (rows), predicted (columns) conditions or all the population.

Returns

a 2-dimensional list of pairwise counts imap (dict): a map between label and index of confusion matrix imap_count (dict): a map between label and number of true label in y_true

Return type

matrix (np.ndarray)

permetrics.utils.classifier_util.calculate_roc_curve(y_true, y_score)[source]
permetrics.utils.classifier_util.calculate_single_label_metric(matrix, imap, imap_count, beta=1.0)[source]

Generate a dictionary of supported metrics for each label

Parameters
  • matrix (np.ndarray) – a 2-dimensional list of pairwise counts

  • imap (dict) – a map between label and index of confusion matrix

  • imap_count (dict) – a map between label and number of true label in y_true

  • beta (float) – to calculate the f-beta score

Returns

a dictionary of supported metrics

Return type

dict_metrics (dict)

permetrics.utils.cluster_util module

permetrics.utils.cluster_util.compute_BGSS(X, labels)[source]

The between-group dispersion BGSS

permetrics.utils.cluster_util.compute_TSS(X)[source]
permetrics.utils.cluster_util.compute_WG(X)[source]
permetrics.utils.cluster_util.compute_WGSS(X, labels)[source]

Calculate the pooled within-cluster sum of squares WGSS

permetrics.utils.cluster_util.compute_barycenters(X, labels)[source]

Get the barycenter for each cluster and barycenter for all observations

Parameters
  • X (np.ndarray) – The features of datasets

  • labels (np.ndarray) – The predicted labels

Returns

The barycenter for each clusters in form of matrix overall_barycenter (np.ndarray): the barycenter for all observations

Return type

barycenters (np.ndarray)

permetrics.utils.cluster_util.compute_clusters(labels)[source]

Get the dict of clusters and dict of cluster size

permetrics.utils.cluster_util.compute_conditional_entropy(y_true, y_pred)[source]
permetrics.utils.cluster_util.compute_confusion_matrix(y_true, y_pred, normalize=False)[source]

Computes the confusion matrix for a clustering problem given the true labels and the predicted labels. http://cran.nexr.com/web/packages/clusterCrit/vignettes/clusterCrit.pdf

permetrics.utils.cluster_util.compute_contingency_matrix(y_true, y_pred)[source]
permetrics.utils.cluster_util.compute_entropy(labels)[source]
permetrics.utils.cluster_util.compute_homogeneity(y_true, y_pred)[source]
permetrics.utils.cluster_util.get_centroids(X, labels)[source]

Calculates the centroids from the data given, for each class.

Parameters
  • X (pd.DataFrame, np.ndarray) – The original data that was clustered

  • labels (list, np.ndarray) – The predicted cluster assignment values

Returns

The centroids given the input data and labels

Return type

centroids (np.ndarray)

permetrics.utils.cluster_util.get_min_dist(X, centers)[source]

Get the min distance from samples X to centers

permetrics.utils.data_util module

permetrics.utils.data_util.format_classification_data(y_true: numpy.ndarray, y_pred: numpy.ndarray)[source]
permetrics.utils.data_util.format_external_clustering_data(y_true: numpy.ndarray, y_pred: numpy.ndarray)[source]

Need both of y_true and y_pred to format

permetrics.utils.data_util.format_internal_clustering_data(labels: numpy.ndarray)[source]
permetrics.utils.data_util.format_regression_data(y_true: numpy.ndarray, y_pred: numpy.ndarray)[source]
permetrics.utils.data_util.format_regression_data_type(y_true, y_pred)[source]
permetrics.utils.data_util.format_y_score(y_true: numpy.ndarray, y_score: numpy.ndarray)[source]
permetrics.utils.data_util.get_regression_non_zero_data(y_true, y_pred, one_dim=True, rule_idx=0)[source]

Get non-zero data based on rule

Parameters
  • y_true (tuple, list, np.ndarray) – The ground truth values

  • y_pred (tuple, list, np.ndarray) – The prediction values

  • one_dim (bool) – is y_true has 1 dimensions or not

  • rule_idx (int) – valid values [0, 1, 2] corresponding to [y_true, y_pred, both true and pred]

Returns

y_true with positive values based on rule y_pred: y_pred with positive values based on rule

Return type

y_true

permetrics.utils.data_util.get_regression_positive_data(y_true, y_pred, one_dim=True, rule_idx=0)[source]

Get positive data based on rule

Parameters
  • y_true (tuple, list, np.ndarray) – The ground truth values

  • y_pred (tuple, list, np.ndarray) – The prediction values

  • one_dim (bool) – is y_true has 1 dimensions or not

  • rule_idx (int) – valid values [0, 1, 2] corresponding to [y_true, y_pred, both true and pred]

Returns

y_true with positive values based on rule y_pred: y_pred with positive values based on rule

Return type

y_true

permetrics.utils.data_util.is_consecutive_and_start_zero(vector)[source]

permetrics.utils.encoder module

class permetrics.utils.encoder.LabelEncoder[source]

Bases: object

fit(y)[source]
fit_transform(y)[source]
inverse_transform(y)[source]
transform(y)[source]

permetrics.utils.regressor_util module

permetrics.utils.regressor_util.calculate_absolute_pcc(y_true, y_pred, one_dim)[source]
permetrics.utils.regressor_util.calculate_ec(y_true, y_pred, one_dim)[source]
permetrics.utils.regressor_util.calculate_entropy(y_true, y_pred, one_dim)[source]
permetrics.utils.regressor_util.calculate_mse(y_true, y_pred, one_dim)[source]
permetrics.utils.regressor_util.calculate_nse(y_true, y_pred, one_dim)[source]
permetrics.utils.regressor_util.calculate_pcc(y_true, y_pred, one_dim)[source]
permetrics.utils.regressor_util.calculate_wi(y_true, y_pred, one_dim)[source]