反向传播

  • 一些笔记(未完待续)

文章中的英文描述,公式以及图片,均来自吴恩达深度学习课程的课后作业

Jz2(i)=1m(a[2](i)y(i))\frac{\partial \mathcal{J} }{ \partial z_{2}^{(i)} } = \frac{1}{m} (a^{[2](i)} - y^{(i)})

JW2=Jz2(i)a[1](i)T\frac{\partial \mathcal{J} }{ \partial W_2 } = \frac{\partial \mathcal{J} }{ \partial z_{2}^{(i)} } a^{[1] (i) T}

Jb2=iJz2(i)\frac{\partial \mathcal{J} }{ \partial b_2 } = \sum_i{\frac{\partial \mathcal{J} }{ \partial z_{2}^{(i)}}}

Jz1(i)=W2TJz2(i)(1a[1](i)2)\frac{\partial \mathcal{J} }{ \partial z_{1}^{(i)} } = W_2^T \frac{\partial \mathcal{J} }{ \partial z_{2}^{(i)} } * ( 1 - a^{[1] (i) 2})

JW1=Jz1(i)XT\frac{\partial \mathcal{J} }{ \partial W_1 } = \frac{\partial \mathcal{J} }{ \partial z_{1}^{(i)} } X^T

Jib1=iJz1(i)\frac{\partial \mathcal{J} _i }{ \partial b_1 } = \sum_i{\frac{\partial \mathcal{J} }{ \partial z_{1}^{(i)}}}

下图是反向传播时的梯度计算,输出层**函数为Sigmoid函数,隐藏层的**函数是tanh(),右侧是对应的向量化实现。
反向传播

  • Note that * denotes elementwise multiplication.

  • The notation you will use is common in deep learning coding:

    • dW1 = JW1\frac{\partial \mathcal{J} }{ \partial W_1 }
    • db1 = Jb1\frac{\partial \mathcal{J} }{ \partial b_1 }
    • dW2 = JW2\frac{\partial \mathcal{J} }{ \partial W_2 }
    • db2 = Jb2\frac{\partial \mathcal{J} }{ \partial b_2 }
  • Tips:

    • To compute dZ1 you’ll need to compute g[1](Z[1])g^{[1]'}(Z^{[1]}). Since g[1](.)g^{[1]}(.) is the tanh activation function, if a=g[1](z)a = g^{[1]}(z) then g[1](z)=1a2g^{[1]'}(z) = 1-a^2. So you can compute
      g[1](Z[1])g^{[1]'}(Z^{[1]}) using (1 - np.power(A1, 2)).