强对偶定理的证明

强对偶性的证明

本博客是参考资料的笔记整理:
https://www.bilibili.com/video/BV1dJ411B7gh?p=11

1. 预备知识

定义 1 凸集:某点集D是凸集,是指对于任意两点x1x_1, x2Dx_2∈ D0λ10 ≤ λ ≤ 1,有:
x=λx1+(1λ)x2D(1) x=\lambda x_{1}+(1-\lambda) x_{2} \in D \tag{1}
以下是凸集的例子

强对偶定理的证明

定理 1 分离超平面定理:假设两个不相交的凸集 CCDD,即CD=C \cap D=\emptyset,则存在
向量a0a \neq 0和常数bb,有
{aTxbxCaTxbxD(2) \left\{\begin{array}{ll} \boldsymbol{a}^{\mathrm{T}} x \leq b & \forall x \in C \\ \boldsymbol{a}^{\mathrm{T}} x \geq b & \forall x \in D \end{array}\right. \tag{2}
强对偶定理的证明

Proof of 定理1:

定义 2:点集的 CCDD 的之间的距离为:
dist(C,D)=infuC,vDuv2(3) \operatorname{dist}(C, D)=\inf _{u \in C, v \in D}\|u-v\|^{2} \tag{3}
​ 假设cC,dDc \in C, d \in D能达到此最小距离,即dist(C,D)=cd2\operatorname{dist}(C, D)=\|c-d\|^{2},令a=cda=c-db=c2d22b=\frac{\|c\|^{2}-\|d\|^{2}}{2}(实际上,(cd)Txc2d22=0(c-d)^{\mathrm{T}} x-\frac{\|c\|^{2}-\|d\|^{2}}{2}=0是点 cc 和点 dd 连线的“中垂面”),下面证明:①对于任意uCu \in C,有aTub0a^{\mathrm{T}} u-b \geq 0; ②对于任意vDv \in D,有aTvb0a^{\mathrm{T}} v-b \leq 0.

反证法:假设存在一个uCu \in C,使
aTub<0(cd)Tuc2d22<0(cd)T(u12(c+d))<0(cd)T((uc)+12(cd))<0(cd)T(uc)+12cd2<0 \begin{array}{l} a^{\mathrm{T}} u-b<0 \\ (c-d)^{\mathrm{T}} u-\frac{\|c\|^{2}-\|d\|^{2}}{2}<0 \\ (c-d)^{\mathrm{T}}\left(u-\frac{1}{2}(c+d)\right)<0 \\ (c-d)^{\mathrm{T}}\left((u-c)+\frac{1}{2}(c-d)\right)<0 \\ (c-d)^{\mathrm{T}}(u-c)+\frac{1}{2}\|c-d\|^{2}<0 \end{array}
因为12cd20-\frac{1}{2}\|c-d\|^{2} \geq 0,所以(cd)T(uc)<0(c-d)^{\mathrm{T}}(u-c)<0。假设另有一点ppuucc的连线上,即p=λu+(1λ)cp=\lambda u+(1-\lambda) c,其中 0λ10 \leq \lambda \leq 1。根据 CC 是凸集,则有pCp \in C。下面计算pd2\|p-d\|^{2}
pd2=λu+(1λ)cd2=(cd)+λ(uc)2=cd2+2λ(cd)T(uc)+λ2uc2=cd2+λ[2(cd)T(uc)+λuc2] \begin{aligned}\|p-d\|^{2} &=\|\lambda u+(1-\lambda) c-d\|^{2} \\ &=\|(c-d)+\lambda(u-c)\|^{2} \\ &=\|c-d\|^{2}+2 \lambda(c-d)^{\mathrm{T}}(u-c)+\lambda^{2}\|u-c\|^{2} \\ &=\|c-d\|^{2}+\lambda\left[2(c-d)^{\mathrm{T}}(u-c)+\lambda\|u-c\|^{2}\right] \end{aligned}
分析(cd)T(uc)<0(c-d)^{\mathrm{T}}(u-c)<0,当 λ\lambda 取一个很小的正数时,即满足
λ<2(cd)T(uc)uc2(4) \lambda<-\frac{2(c-d)^{\mathrm{T}}(u-c)}{\|u-c\|^{2}} \tag{4}
一定有:pd2<cd2\|p-d\|^{2}<\|c-d\|^{2}pCp \in C,这与定义 2 矛盾,故①得证。而②的证明过程,同理。\blacksquare

定理2:若 cc 是一个非零向量,即c2>0\|c\|^{2}>0,即则对任意 ε>0\varepsilon>0,存在一个向量 xx 满足: ①x2ε\|x\|^{2} \leq \varepsilon ;②cTx>0c^{T} x>0.

Proof of 定理2: 取x=εc2cx=\frac{\varepsilon}{\|c\|^{2}} c,则x2=ε\|x\|^{2}=\varepsilon,且cTx=ε>0c^{T} x=\varepsilon>0,同理也存在一个向量 xx,使①x2ε\|x\|^{2} \leq \varepsilon,②cTx>0c^{T} x>0. \blacksquare

2. 对偶问题

原问题(Prime Problem):
minwf(w) s.t. gi(w)0,i=1,2,,Khj(w)=0,j=1,2,,M(5) \begin{aligned} &\min _{w} f(w)\\ \text { s.t. }& g_{i}(w) \leq 0, \quad i=1,2, \dots, K\\ &h_{j}(w)=0, \quad j=1,2, \dots, M \end{aligned} \tag{5}
对偶问题(Dual Problem):

先定义拉格朗日函数

L(w,α,β)=f(w)+i=1Kαigi(w)+j=1Mβjhj(w)(6) \mathcal{L}(w, \alpha, \beta)=f(w)+\sum_{i=1}^{K} \alpha_{i} g_{i}(w)+\sum_{j=1}^{M} \beta_{j} h_{j}(w) \tag{6}
由拉格朗日函数推导出对偶问题的形式:
maxα,βθ(α,β)=infwL(w,α,β) s.t. αi0,i=1,2,K(7) \begin{aligned} \max _{\alpha, \beta} \theta(\alpha, \beta)=\inf _{w} \mathcal{L}(w, \alpha, \beta) \\ \text { s.t. } \alpha_{i} \geq 0, \quad i=1,2, \dots K \end{aligned} \tag{7}
定理3:若WW^{*}是原问题的解,(α,β)\left(\alpha^{*}, \beta^{*}\right)是对偶问题的解,则有:
KaTeX parse error: Can't use function '$' in math mode at position 2: $̲\theta\left(\al…
Proof of 定理3:
θ(α,β)=infwL(w,α,β)L(w,α,β)=f(w)+i=1Kαigi(w)+j=1Mβjhj(w)f(w) \begin{aligned} \theta\left(\alpha^{*}, \beta^{*}\right) &=\inf _{w} \mathcal{L}\left(w, \alpha^{*}, \beta^{*}\right) \\ & \leq \mathcal{L}\left(w^{*}, \alpha^{*}, \beta^{*}\right) \\ &=f\left(w^{*}\right)+\sum_{i=1}^{K} \alpha_{i}^{*} g_{i}\left(w^{*}\right)+\sum_{j=1}^{M} \beta_{j}^{*} h_{j}\left(w^{*}\right) \\ & \leq f\left(w^{*}\right) \end{aligned}
\blacksquare

定义3(凸函数)f(w)f(w)是凸函数是指对 w1,w2,λ[0,1]\forall w_{1}, w_{2}, \quad \forall \lambda \in[0,1],有:
f(λw1+(1λ)w2)λf(w1)+(1λ)f(w2)(9) f\left(\lambda w_{1}+(1-\lambda) w_{2}\right) \leq \lambda f\left(w_{1}\right)+(1-\lambda) f\left(w_{2}\right) \tag{9}
强对偶定理的证明

3. 强对偶性的证明

定理4(强对偶定理):对于f(w),gi(w),hj(w)f(w), g_{i}(w), h_{j}(w),若满足:

f(w)f(w)是凸函数;

gi(w)g_{i}(w)是凸函数;

hj(w)h_{j}(w)是仿射函数,即hj(w)=cjTw+dh_{j}(w)=c_{j}^{\mathrm{T}} w+d

④slater条件:存在一个ww使gi(w)<0g_i(w)<0hj(w)=0h_j(w)=0

ww的取值范围DD是开集,即若 wDw \in D 则存在邻域N(w,ε)DN(w, \varepsilon) \in D

ww的取值范围DD是凸集。

则有:f(w)=θ(α,β)f\left(w^{*}\right)=\theta\left(\alpha^{*}, \beta^{*}\right).

Proof of 强对偶定理

构造点集:
A={(u,v,t)wD, 使 gi(w)ui,hj(w)=vi,f(w)t}(10) A=\left\{(u, v, t) | \exists w \in D, \text { 使 } g_{i}(w) \leq u_{i}, h_{j}(w)=v_{i}, f(w) \leq t\right\} \tag{10}
定义:
g(w)=[g1(w)g2(w)gK(w)],h(w)=[h1(w)h2(w)hM(w)](11) g(w)=\left[\begin{array}{c} g_{1}(w) \\ g_{2}(w) \\ \vdots \\ g_{K}(w) \end{array}\right], \quad h(w)=\left[\begin{array}{c} h_{1}(w) \\ h_{2}(w) \\ \vdots \\ h_{M}(w) \end{array}\right] \tag{11}
注意:①若wDw \in D,则(g(w),h(w),f(w))A(g(w), h(w), f(w)) \in A(证明:至少可以使定义中等号成立);②若wDw \in D,则(+,h(w),+)A(+\infty, h(w),+\infty) \in A(证明:任何数都小于正无穷)。

引理1:若 DD 是凸集,gi(w)g_{i}(w)是凸函数(i=1,2,,K)(i=1,2, \dots, K)hj(w)h_{j}(w)是仿射函数,即hi(w)=cw+dh_{i}(w)=c w+df(w)f(w)是凸函数,则 AA 是凸集.

证明:

(u1,v1,t1),(u2,v2,t2)A\left(u_{1}, v_{1}, t_{1}\right),\left(u_{2}, v_{2}, t_{2}\right) \in A,我们要证当0λ10 \leq \lambda \leq 1时,有
(λu1+(1λ)u2,λv1+(1λ)v2,λt1+(1λ)t2)A(12) \left(\lambda u_{1}+(1-\lambda) u_{2}, \lambda v_{1}+(1-\lambda) v_{2}, \lambda t_{1}+(1-\lambda) t_{2}\right) \in A \tag{12}
①因为(u1,v1,t1)A\left(u_{1}, v_{1}, t_{1}\right) \in A,所以w1D\exists w_{1} \in D,使gi(w1)ui,hj(w1)=vi,f(w1)tg_{i}\left(w_{1}\right) \leq u_{i}, h_{j}\left(w_{1}\right)=v_{i}, f\left(w_{1}\right) \leq t;同理(u2,v2,t2)A\left(u_{2}, v_{2}, t_{2}\right) \in A,所以 w2D\exists w_{2} \in D,使gi(w2)ui,hj(w2)=vi,f(w2)tg_{i}\left(w_{2}\right) \leq u_{i}, h_{j}\left(w_{2}\right)=v_{i}, f\left(w_{2}\right) \leq t.

②设w=λw1+(1λ)w2w^{\prime}=\lambda w_{1}+(1-\lambda) w_{2},因为DD是凸集,所以wDw^{\prime} \in D。由于gi(w)g_i(w)是凸函数,故:gi(w)λgi(w1)+(1λ)gi(w2)λu1,i+(1λ)u2,ig_{i}\left(w^{\prime}\right) \leq \lambda g_{i}\left(w_{1}\right)+(1-\lambda) g_{i}\left(w_{2}\right) \leq \lambda u_{1, i}+(1-\lambda) u_{2, i},同理有f(w)λt1+(1λ)t2f\left(w^{\prime}\right) \leq \lambda t_{1}+(1-\lambda) t_{2}.

hj(w)=cw+dh_{j}\left(w^{\prime}\right)=c w^{\prime}+d
=λ(cw1+d)+(1λ)(cw2+d)=\lambda\left(c w_{1}+d\right)+(1-\lambda)\left(c w_{2}+d\right)
=λhj(w1)+(1λ)hj(w2)=\lambda h_{j}\left(w_{1}\right)+(1-\lambda) h_{j}\left(w_{2}\right)
=λv1,j+(1λ)v2,j=\lambda v_{1, j}+(1-\lambda) v_{2, j}

综上①②③,引理1得证. \blacksquare

根据式子(10)的定义,我们有原问题的解
f(w)=min(0,0,t)At(13) f\left(w^{*}\right)=\min _{(0,0, t) \in A} t \tag{13}
定义另一个点集B={(0,0,s)s<f(w)}B=\left\{(0,0, s) | s<f\left(w^{*}\right)\right\},可以证明 BB 也是凸集,且AB=A \cap B=\emptyset.

根据定理1(分离超平面定理),存在(α,β,η)(\alpha, \beta, \eta)使得:①若(u,v,t)A(u, v, t) \in A,则αTu+βTv+ηtb\alpha^{\mathrm{T}} u+\beta^{\mathrm{T}} v+\eta t \geq b;②若(u,v,t)B(u, v, t) \in B,则αTu+βTv+ηt<b\alpha^{\mathrm{T}} u+\beta^{\mathrm{T}} v+\eta t<b。由于此时,u=0u=0v=0v=0,所以ηt<b-\eta t<b.

引理2:若对(u,v,t)A\forall(u, v, t) \in A,有αTu+βTv+ηtb\alpha^{\mathrm{T}} u+\beta^{\mathrm{T}} v+\eta t \geq b,则有
α=[α1,α2,,αK]0,η0(14) \alpha=\left[\alpha_{1}, \alpha_{2}, \ldots, \alpha_{K}\right] \succcurlyeq 0, \quad \eta \geq 0 \tag{14}
Proof

假设某个αi<0\alpha_{i}<0,则可以取相应ui=+u_{i}=+\infty,此时(u,v,t)(u, v, t)仍然属于AA,但αTu+βTv+ηt=\alpha^{\mathrm{T}} u+\beta^{\mathrm{T}} v+\eta t=-\infty,这与αTu+βTv+ηt0\alpha^{\mathrm{T}} u+\beta^{\mathrm{T}} v+\eta t \geq 0矛盾。同理可证η0\eta \geq 0

根据 AA 的定义和①可得,对wD\forall w \in D,有i=1Kαigi(w)+j=1Mβjhj(w)+ηf(w)b\sum_{i=1}^{K} \alpha_{i} g_{i}(w)+\sum_{j=1}^{M} \beta_{j} h_{j}(w)+\eta f(w) \geq b;根据 BB 的定义和②的ηt<b-\eta t<b可得,ηf(w)b\eta f\left(w^{*}\right) \leq b。因此有:
i=1Kαigi(w)+j=1Mβjhj(w)+ηf(w)bηf(w)(15) \sum_{i=1}^{K} \alpha_{i} g_{i}(w)+\sum_{j=1}^{M} \beta_{j} h_{j}(w)+\eta f(w) \geq b \geq \eta f\left(w^{*}\right) \tag{15}
下面分两种情况讨论:

情况1η0\eta \neq 0,此时有
f(w)i=1Kαiηgi(w)+j=1Mβjηhj(w)+f(w)=L(w,αη,βη)(16) f\left(w^{*}\right) \leq \sum_{i=1}^{K} \frac{\alpha_{i}}{\eta} g_{i}(w)+\sum_{j=1}^{M} \frac{\beta_{j}}{\eta} h_{j}(w)+f(w)=\mathcal{L}\left(w, \frac{\alpha}{\eta}, \frac{\beta}{\eta}\right) \tag{16}
由于ww是任意的,因此有
f(w)infwL(w,αη,βη)=θ(αη,βη)(17) f\left(w^{*}\right) \leq \inf _{w} \mathcal{L}\left(w, \frac{\alpha}{\eta}, \frac{\beta}{\eta}\right)=\theta\left(\frac{\alpha}{\eta}, \frac{\beta}{\eta}\right) \tag{17}
由于α0,η>0\alpha \succ 0, \eta>0,所以αη0\frac{\alpha}{\eta} \succ 0,满足对偶问题的限制条件,因此有:
f(w)θ(α,β)(18) f\left(w^{*}\right) \leq \theta\left(\alpha^{*}, \beta^{*}\right) \tag{18}
在根据定理3,有θ(α,β)f(w)\theta\left(\alpha^{*}, \beta^{*}\right) \leq f\left(w^{*}\right),所以f(w)=θ(α,β)f\left(w^{*}\right)=\theta\left(\alpha^{*}, \beta^{*}\right),得证。

情况2η=0\eta=0,此时对wD\forall w \in D,有
i=1Kαigi(w)+j=1Mβjhj(w)0(19) \sum_{i=1}^{K} \alpha_{i} g_{i}(w)+\sum_{j=1}^{M} \beta_{j} h_{j}(w) \geq 0 \tag{19}
根据定理4中的条件④(slater条件),w\exists w使gi(w)<0g_{i}(w)<0hj(w)=0h_{j}(w)=0,这可以推出αi=0\alpha_{i}=0,因此公式(19)变为
j=1Mβjhj(w)0, 或记为 βTh(w)0(20) \sum_{j=1}^{M} \beta_{j} h_{j}(w) \geq 0, \text { 或记为 } \beta^{\mathrm{T}} h(w) \geq 0 \tag{20}
根据定理4中的条件③,h(w)=cw+dh(w)=c w+d,代入得:
βTh(w)0βTcw+βTd0(21) \begin{array}{l} \beta^{\mathrm{T}} h(w) \geq 0 \\ \beta^{\mathrm{T}} c w+\beta^{\mathrm{T}} d \geq 0 \end{array} \tag{21}
P=βTcP=\beta^{\mathrm{T}} cq=βTdq=\beta^{\mathrm{T}} d,则式子(21)改写为:
Pw+q0(22) P w+q \geq 0 \tag{22}
注意公式(22)对所有的wDw \in D都成立。根据条件④ (slater条件),w\exists w 使 cw+d=0c w+d=0,从而Pw+q=0P w+q=0.

下面证明,存在一个 w=w+Δww^{\prime}=w+\Delta w,其中Δw\Delta www的一个领域N(0,ε)N(0, \varepsilon)中,使Pw+q<0P w^{\prime}+q<0

证明:根据定理1,有β0\beta \neq 0,否则(α,β,η)(\alpha, \beta, \eta)都为0,与分离超平面定理矛盾。则有P=βTc0P=\beta^{\mathrm{T}} c \neq 0;根据定理2,存在一个Δw\Delta w满足w2<ε\|w\|^{2}<\varepsilonPΔw<0P \Delta w<0。因此,w=w+ΔwN(0,ε)w^{\prime}=w+\Delta w \in N(0, \varepsilon)

根据定理4中的条件⑤,wDw^{\prime} \in D,同时,
Pw+q=P(w+Δw)+q=(Pw+q)+PΔw=PΔw<0(23) \begin{aligned} P w^{\prime}+q &=P(w+\Delta w)+q \\ &=(P w+q)+P \Delta w \\ &=P \Delta w<0 \end{aligned} \tag{23}
这与式子(22)矛盾,所以情况2不成立/不存在。

定理4 强对偶定理得证. \blacksquare