Hinge loss 与二分类SVM

原文地址:http://breezedeus.github.io/2015/07/12/breezedeus-svm-is-hingeloss-with-l2regularization.html

SVM等于Hinge损失 + L2正则化


这里说的SVM是指最原始的2分类SVM,不考虑SVM的其他各种扩展。为简单起见,我们也只考虑线性SVM,对于带核函数的SVM,利用相似的推导我们可以获得相同的结论:

2分类SVM等于Hinge损失 + L2正则化。

下面是线性SVM的一般形式,其中目标分类Hinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMy∈{−1,1}Hinge loss 与二分类SVMC为给定的惩罚系数:

Hinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMminω,γ,ξ[12‖ω‖22+C∑i=1nξi]s.t. (ωTxi+γ)yi≥1−ξi, ∀i=1,…,nξi≥0, ∀i=1,…,n

Hinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMm≜fθ(x)y(其中Hinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMy∈{−1,1}),那么对于2分类问题,最理想的损失函数是0/1损失函数。也就当Hinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMfθ(x)Hinge loss 与二分类SVMy有相同符号时,损失为0;而当Hinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMfθ(x)Hinge loss 与二分类SVMy有不同符号时,损失为1。但0/1损失函数既不是处处可微,又不是凸函数,所以直接最小化0/1损失函数很困难。Hinge损失是0/1损失的一种近似(见下图):

Hinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMJhinge(m)=max{0,1−m}  。

Hinge loss 与二分类SVM

Hinge损失的名字是源自它跟打开135度的折叶(hinge)长得很像。

Hinge loss 与二分类SVM

带有L2正则项的Hinge损失优化问题如下:

Hinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMminω,γ[C∑i=1nmax{0,1−(ωTxi+γ)yi}+12‖ω‖22]  。

为了与前面的SVM表达式对应,我们把L2正则项中的惩罚系数挪到前面的Hinge损失上了。Hinge损失函数有如下的等价定义:

Hinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMmax{0,1−m}=minξs.t. ξ≥1−mξ≥0

利用上面的等价定义,我们可以重写带有L2正则项的Hinge损失优化问题为:

Hinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMHinge loss 与二分类SVMminω,γ,ξ[C∑i=1nξi+12‖ω‖22]s.t. ξi≥1−(ωTxi+γ)yi, ∀i=1,…,nξi≥0, ∀i=1,…,n

嗯,上式就是本文最开始给出的SVM优化问题了。