Viewpoints of Linear Classifiers

Loss Functions

Multi-Class SVM (Hinge Loss)

The score of the correct class should be higher than all the other scores

$$ L_i=\sum_{j\neq y_i}max(0, s_j - s_{y_i}+1) $$

Cross Entropy Loss

Want to interpret raw classifier scores as probability

$$ s = f(x_i;W), P(Y=k|X=x_i)=\frac{e^sk}{\sum_je^{s_j}} $$

Class A Class B Class C
Unnormalized log-probability (logit) 3.2 5.1 -1.7
Unnormalized Probabilities (Exp) 24.5 164.0 0.18
Normalized Probabilities (Softmax) 0.13 0.87 0.00
Correct Probabilities 1 0 0

$$ L_i=-log\,P(Y=y_i|X=x_i) $$

Regularization

$$ L(W)=\frac{1}{N}\sum^N_{i=1}L_i(f(x_i,W),y_i)+\lambda R(W) $$