permetrics.utils.classifier_util module
- permetrics.utils.classifier_util.calculate_class_weights(y_true, y_pred=None, y_score=None)[source]
- permetrics.utils.classifier_util.calculate_confusion_matrix(y_true=None, y_pred=None, labels=None, normalize=None)[source]
Generate a confusion matrix for multiple classification
- Parameters
y_true (tuple, list, np.ndarray) – a list of integers or strings for known classes
y_pred (tuple, list, np.ndarray) – a list of integers or strings for y_pred classes
labels (tuple, list, np.ndarray) – List of labels to index the matrix. This may be used to reorder or select a subset of labels.
normalize ('true', 'pred', 'all', None) – Normalizes confusion matrix over the true (rows), predicted (columns) conditions or all the population.
- Returns
a 2-dimensional list of pairwise counts imap (dict): a map between label and index of confusion matrix imap_count (dict): a map between label and number of true label in y_true
- Return type
matrix (np.ndarray)
- permetrics.utils.classifier_util.calculate_single_label_metric(matrix, imap, imap_count, beta=1.0)[source]
Generate a dictionary of supported metrics for each label
- Parameters
matrix (np.ndarray) – a 2-dimensional list of pairwise counts
imap (dict) – a map between label and index of confusion matrix
imap_count (dict) – a map between label and number of true label in y_true
beta (float) – to calculate the f-beta score
- Returns
a dictionary of supported metrics
- Return type
dict_metrics (dict)
permetrics.utils.cluster_util module
- permetrics.utils.cluster_util.compute_WGSS(X, labels)[source]
Calculate the pooled within-cluster sum of squares WGSS
- permetrics.utils.cluster_util.compute_barycenters(X, labels)[source]
Get the barycenter for each cluster and barycenter for all observations
- Parameters
X (np.ndarray) – The features of datasets
labels (np.ndarray) – The predicted labels
- Returns
The barycenter for each clusters in form of matrix overall_barycenter (np.ndarray): the barycenter for all observations
- Return type
barycenters (np.ndarray)
- permetrics.utils.cluster_util.compute_clusters(labels)[source]
Get the dict of clusters and dict of cluster size
- permetrics.utils.cluster_util.compute_confusion_matrix(y_true, y_pred, normalize=False)[source]
Computes the confusion matrix for a clustering problem given the true labels and the predicted labels. http://cran.nexr.com/web/packages/clusterCrit/vignettes/clusterCrit.pdf
- permetrics.utils.cluster_util.get_centroids(X, labels)[source]
Calculates the centroids from the data given, for each class.
- Parameters
X (pd.DataFrame, np.ndarray) – The original data that was clustered
labels (list, np.ndarray) – The predicted cluster assignment values
- Returns
The centroids given the input data and labels
- Return type
centroids (np.ndarray)
permetrics.utils.data_util module
- permetrics.utils.data_util.format_classification_data(y_true: numpy.ndarray, y_pred: numpy.ndarray)[source]
- permetrics.utils.data_util.format_external_clustering_data(y_true: numpy.ndarray, y_pred: numpy.ndarray)[source]
Need both of y_true and y_pred to format
- permetrics.utils.data_util.format_regression_data(y_true: numpy.ndarray, y_pred: numpy.ndarray)[source]
- permetrics.utils.data_util.get_regression_non_zero_data(y_true, y_pred, one_dim=True, rule_idx=0)[source]
Get non-zero data based on rule
- Parameters
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
one_dim (bool) – is y_true has 1 dimensions or not
rule_idx (int) – valid values [0, 1, 2] corresponding to [y_true, y_pred, both true and pred]
- Returns
y_true with positive values based on rule y_pred: y_pred with positive values based on rule
- Return type
y_true
- permetrics.utils.data_util.get_regression_positive_data(y_true, y_pred, one_dim=True, rule_idx=0)[source]
Get positive data based on rule
- Parameters
y_true (tuple, list, np.ndarray) – The ground truth values
y_pred (tuple, list, np.ndarray) – The prediction values
one_dim (bool) – is y_true has 1 dimensions or not
rule_idx (int) – valid values [0, 1, 2] corresponding to [y_true, y_pred, both true and pred]
- Returns
y_true with positive values based on rule y_pred: y_pred with positive values based on rule
- Return type
y_true