HGL - Hinge Loss

The Hinge Loss (HGL) is a maximum-margin optimization metric primarily used for training classifiers such as Support Vector Machines (SVMs) [29].

Unlike probability-based losses (like Log Loss or Brier Score), Hinge Loss penalizes predictions not only when they are incorrect, but also when they are correct but not confident enough. It enforces a strict mathematical boundary: a correct prediction must maintain a safety margin of at least 1.0 distance units from the decision hyper-plane to avoid incurring a penalty.

\[L_{\text{hinge}}(y, \hat{w}) = \max\left(0, \, 1 - y \cdot \hat{w}\right)\]

Architectural Design: Crammer-Singer Multiclass Generalization 

While classical Hinge Loss is restricted to binary labels (\(y \in \{-1, +1\}\)), permetrics implements the generalized Crammer-Singer Multiclass Hinge Loss formulated over discrete integer targets \(y \in \{0, 1, \dots, K-1\}\):

\[L_{\text{CS}}(y_i, \hat{s}_i) = \max\left(0, \, \max_{k \neq y_i} (\hat{s}_{ik}) - \hat{s}_{i, y_i} + 1\right)\]

Where:

\(\hat{s}_{i, y_i}\) is the raw uncalibrated decision score predicted for the true ground truth class.
\(\max_{k \neq y_i} (\hat{s}_{ik})\) is the highest score assigned to any incorrect class.

Intuitively, the model incurs zero loss if and only if the score of the true class exceeds the score of the next closest competing class by a margin of at least 1.0.

Critical Developer Warning: Raw Logits vs. Probabilities 

Do NOT pass standardized probabilities (e.g., outputs from .predict_proba()) to HGL. Hinge Loss assumes an unbounded decision space \((-\infty, +\infty)\). If normalized probabilities bounded in \([0.0, 1.0]\) are supplied, the safety margin condition \((\text{Score}_{\text{true}} - \text{Score}_{\text{false}} \ge 1.0)\) becomes mathematically impossible to satisfy reliably. You must supply uncalibrated linear scores or raw decision function outputs (e.g., .decision_function() in scikit-learn or linear layer outputs before Softmax in PyTorch).

Properties 

Best possible score: 0.0 (Lower value is better; true class score safely dominates all incorrect classes by a margin \(\ge 1.0\)).
Worst possible score: Unbounded (\(+\infty\)).
Range: \([0.0, +\infty)\)
Optimizer Note: HGL is a Loss metric. Ensure automated hyperparameter search engines are explicitly configured to minimize.
References: Scikit-Learn hinge_loss

Example Usage 

from permetrics.classification import ClassificationMetric

# ==============================================================================
# SCENARIO 1: Binary SVM Decision Scores
# y_pred expects raw distances from the hyperplane (negative = Class 0, positive = Class 1)
# ==============================================================================
print("--- 1. BINARY HINGE LOSS EXAMPLES ---")

y_true_bin = [0, 1, 1, 0]
# Confident, correct raw scores
y_decision_good = [-1.5, 2.2, 1.1, -0.8]

cm_bin = ClassificationMetric(y_true_bin, y_decision_good)
print(f"Safe margin HGL : {cm_bin.HGL()}")

# Borderline correct prediction (Score = 0.2 for Class 1 -> Margin violation penalty!)
y_decision_unsafe = [-1.5, 0.2, 1.1, -0.8]
cm_unsafe = ClassificationMetric(y_true_bin, y_decision_unsafe)
print(f"Unsafe margin HGL: {cm_unsafe.HGL()}")

# ==============================================================================
# SCENARIO 2: Multiclass Crammer-Singer Hinge Loss
# y_pred expects a 2D matrix of uncalibrated logits
# ==============================================================================
print("\n--- 2. MULTICLASS HINGE LOSS EXAMPLES ---")

y_true_multi = [0, 1, 2]
y_logits_multi = [
    [3.2, 0.5, -1.0],  # Class 0 leads Class 1 by 2.7 (> 1.0 margin -> Loss = 0)
    [0.1, 1.8, 1.2],   # Class 1 leads Class 2 by 0.6 (< 1.0 margin -> Loss = 0.4)
    [-0.5, 2.1, 0.4]   # Incorrect (Class 1 leads Class 2 by 1.7 -> Loss = 2.7)
]

cm_multi = ClassificationMetric(y_true_multi, y_logits_multi)
print(f"Multiclass CS HGL: {cm_multi.HGL()}")

HGL - Hinge Loss

Architectural Design: Crammer-Singer Multiclass Generalization

Critical Developer Warning: Raw Logits vs. Probabilities

Properties

Example Usage

Architectural Design: Crammer-Singer Multiclass Generalization 

Critical Developer Warning: Raw Logits vs. Probabilities 

Properties 

Example Usage 