多层感知机BP算法推导

前向计算

含有数据输入层,1个以上隐藏层,1个输出层。各层神经元之间全连接,同一层的神经元之间无连接。
多层感知机BP算法推导

在图中,z(l)=W(l)a(l1)+b(l)a(l)=f(l)(z(l))z^{(l)}=W^{(l)}\cdot a^{(l-1)}+b^{(l)}\\ a^{(l)}=f^{(l)}(z^{(l)})
其中f()f(\cdot)是激励函数,aa是该层的输出值
变量关系:
z1=g1(x,W1)z2=g2(z1,W2)zl1=gl1(zl2,Wl1)zl=gl(zl1,Wl)zl+1=gl+1(zl,Wl+1)zL=gL(zL1,WL)y=fL(zL)J(W,y)z^{1}=g_{1}(x,W^{1})\\ z^{2}=g_{2}(z^{1},W^{2})\\ \cdots\\ z^{l-1}=g_{l-1}(z^{l-2},W^{l-1})\\ z^{l}=g_{l}(z^{l-1},W^{l})\\ z^{l+1}=g_{l+1}(z^{l},W^{l+1})\\ \cdots\\ z^{L}=g_{L}(z^{L-1},W^{L})\\ y=f_{L}(z^{L})\\ J(W,y)
变量依赖:
J(W,y)J(W,y)xx的依赖关系:J(W,y)=J(f(gL(...g2(g1(x,W1),W2)...,WL))J(W,y)=J(f(g_{L}(...g_{2}(g_{1}(x,W^{1}),W^{2})...,W^{L}))

反向传播

目标是最小化损失函数,通过梯度下降:
W(l)=W(l)αJ(W,b)W(l)=W(l)α1Ni=1NJ(W,b;x(i),y(i))W(l)b(l)=b(l)αJ(W,b)b(l)=b(l)α1Ni=1NJ(W,b;x(i),y(i))b(l)W^{(l)}=W^{(l)}-\alpha \frac{\partial J(W,\bm{b})}{\partial W^{(l)}} =W^{(l)}-\alpha \frac{\partial \frac{1}{N}\sum_{i=1}^{N}J(W,\bm{b};\bm{x}^{(i)},y^{(i)})}{\partial W^{(l)}}\\ \bm{b}^{(l)}=\bm{b}^{(l)}-\alpha \frac{\partial J(W,\bm{b})}{\partial \bm{b}^{(l)}} =\bm{b}^{(l)}-\alpha \frac{\partial \frac{1}{N}\sum_{i=1}^{N}J(W,\bm{b};\bm{x}^{(i)},y^{(i)})}{\partial \bm{b}^{(l)}}