关于反向传播算法中几个公式的推导


参考自资料,戳此处

J(W,b;x,y)=12||hW,b(x)y||2

符号说明:

l=Sl=lL=

如下图中:
S1=2;SL=S3=2

关于反向传播算法中几个公式的推导


推导之前先列出几个用到的等式:

zjl=k=1Sl1f(zkl1)Wjkl1=f(z1l1)Wj1l1+f(z2l1)Wj2l1++f(zSl1l1)WjSl1l1(1)=a1l1Wj1l1+a2l1Wj2l1++aSl1l1WjSl1l1(2)

例如:

z12=k=13f(zk1)W1k1=a11W111+a21W121+a31W131

h(xi)=aiL=f(ziL);zil=j=1S(l1)Wijl1ajl1(3)



定义:δil=Jzil这只是定义的一个称之为“残差”的中间变量

推导:


公式一:JWijl=δil+1ajl

JWijl=k=1Sl+1Jzkl+1zkl+1Wijl=Jzil+1zil+1Wijlkizil+1Wijl=δil+1zil+1Wijl=δil+1Wijl[a1lWi1l+a2lWi2l++a1lWiSll](2)=δil+1ajlalWijlj,

JWl=δl+1(al)T

由上面的网络图可知:

JW112=δ13a12;JW122=δ13a22;JW212=δ23a12;JW222=δ23a22;JW2=[δ13δ23][a12a22]T=[δ13δ23][a12a22]=[δ13a12δ13a22δ23a12δ23a22]=δl+1(al)T


公式二:δiL=[yiaiL]f(ziL)

δiL=JziL=ziL[12k=1SL(ykh(x)k)2]=ziL[12k=1SL(ykf(zkL))2](3)=ziL12{[y1f(z1L)]2+[y2f(z2L)]2++[yif(ziL)]2++[ySLf(zSLL)]2}=[yif(ziL)]f(ziL)ziLziLziL0=[yif(ziL)]f(ziL)=[yiaiL]f(ziL)(3)

δL=[yaL]f(zL)

这个表达式是根据一个特定的代价函数推导出来的,更为一般的形式如下:(另,本文所有的公式推导均不依赖任何具体的代价函数)

δiL=JziL=JaiLaiLziL=JaiLf(ziL)ziL=JaiLf(ziL)

例如对于代价函数

J(Θ)=1mi=1mk=1K[yk(i)log((hΘ(x(i)))k)+(1yk(i))log(1(hΘ(x(i)))k)]+λ2ml=1L1i=1slj=1sl+1(Θj,i(l))2

求导时只考虑一个训练数据即可即m=1,且regular term对aiL求导均为0

δiL=JaiLf(ziL)=aiLk=1SL[yklog(h(x)k)+(1yk)log(1h(x)k)]f(ziL)=k=1SLaiL[yklog(akL)+(1yk)log(1akL)]f(ziL)=k=1SL[yk1akLakLaiL+(1yk)11akLakLaiL]f(ziL)logln,loge=lne=1=[yi1aiL1+(1yi)11aiL1]f(ziL)kiakLaiL=10=[yiaiL+1yi1aiL][aiL(1aiL)]=aiLyicourseraAngdrewNg


公式三:δl=(Wl)Tδl+1.f(zl)

δiL1=JziL1=Jz1Lz1LziL1+Jz2Lz2LziL1++JzSLLzSLLziL1=k=1SLJzkLzkLziL1=k=1SLδkLziL1j=1S(L1)f(zjL1)WkjL1zkL=k=1SLδkLWkiL1f(ziL1)jif(zjL1)0

L1l,Ll+1δil=k=1Sl+1δkl+1Wkilf(zil)ll+1

δl=(Wl)Tδl+1.f(zl)


公式四:Jbil=δil+1


注意:敲黑板!!!

在Michael Nielsen 的笔记NeuralNetworkandDeepLearning中,Jbil=δil,到底哪个对呢?当然是都对,只是在定义参数时候用的角标指代不同。在Michael Nielsen 笔记中Wijl指的是第l1层的第j个神经元指向第l层的第i个神经元之间的权重,当然bil也就指的是第ll到第l层之间的偏置,所以它定义了zl=wlal1+bl

在其它一些地方包括此处,Wijl指的是第l层第j个神经元指向第l+1层的第i个神经元之间的权重,bil同理,所以定义了zl+1=wlal+bl

先证明第一种:

Jbil=Jz1lz1lbil+Jz2lz2lbil++JzSllzSllbil=δ1lz1lbil+δ2lz2lbil++δilzilbil++δSllzSllbil=δ1l0+δ2l0++δil1++δSll0zl=Wla(l1)+bl=δil

第二种

Jbil=Jz1l+1z1l+1bil+Jz2l+1z2l+1bil++JzSll+1zSll+1bil=δ1l+1z1l+1bil+δ2l+1z2l+1bil++δil+1zil+1bil++δSll+1zSll+1bil=δ1l+10+δ2l+10++δil+11++δSll+10zl+1=Wlal+bl=δil+1


公式五:f(z)=f(z)(1f(z))

f(z)=11+ezf(z)ez=1f(z)f(z)=ez(1+ez)2=f2(z)ez=f(z)(1f(z))