CS231n Notes Linear Classification
1. Linear Classifier
在数学上是如下的式子:每个example都是一个column vector,可以把W矩阵的每行看作一个针对每个类别的classifier。
针对W和b可以有(直观上)两种理解:
(1)hyperplane:在high dimensional space上将data points线性分开。
(2)template matching:
Each row of W corresponds to a template (or sometimes also called a prototype) for one of the classes. The score of each class for an image is then obtained by comparing each template with the image using an inner product (or dot product) one by one to find the one that “fits” best.
2. Loss Function
(1)multiclass support vector machine loss(also known as Hinge Loss)
这个最好从高维空间和超平面的角度去理解。
The SVM loss is set up so that the SVM “wants” the correct class for each image to a have a score higher than the incorrect classes by some fixed margin Δ(delta).
The Multiclass Support Vector Machine "wants" the score of the correct class to be higher than all other scores by at least a margin of delta. If any class has a score inside the red region (or higher), then there will be accumulated loss. Otherwise the loss will be zero. Our objective will be to find the weights that will simultaneously satisfy this constraint for all examples in the training data and give a total loss that is as low as possible.
此外注意,单就hinge loss而言,Δ的效果就是控制当前这个类别j和target之间的loss什么情况下进行反向传播,也即控制max(0,—)函数什么时候去到后值。一旦max(0,—)函数取到后值,无论Δ取多少,反向传播都和Δ无关了。
(2)softmax + cross entropy
注意softmax的作用是将scores转化为probability distribution,仅此而已。然后模型output distribution和target distribution做cross entropy loss的结果就是logpyi,此处不赘述。
(3)softmax vs svm
The Softmax classifier is never fully happy with the scores it produces: the correct class could always have a higher probability and the incorrect classes always a lower probability and the loss would always get better.
However, the SVM is happy once the margins are satisfied and it does not micromanage the exact scores beyond this constraint.