神经网络前向传播和后向传播解读

序言:
虽然训练深度学习模型已经有一段时间了,但是总是觉得哪里不太对,最近想先从神经网络的前向和后向传播推导一遍,以后会再加上卷积。这里只是给出比较容易理解的BP算法解读。本文大部分参考https://blog.****.net/cc514981717/article/details/73832119,在其基础上加入一些自己的理解,如有侵权请告知删除。

下面给出一个简单的神经网络结构图1:
神经网络前向传播和后向传播解读
其中,i1,i2为输入层,h1,h2为隐藏层,o1,o2为输出层,b1,b2为偏置,sigmoid为**函数。

前向传播
i1->h1:
neth1=i1×w1+i2×w2+b1×1net_{h1} = i_1 \times w_1 + i_2 \times w_2 + b_1 \times 1
h1->sigmoid:
outh1=sigmoid(neth1)=11+eneth1out_{h1} = sigmoid(net_{h1}) = \frac{1}{1 + e^{-net_{h1}}}
net_h2、out_h2的计算同理
out_h1->o1:
neto1=outh1×w5+outh2×w6+b2×1net_{o1} = out_{h1} \times w_5 + out_{h2} \times w_6 + b_2 \times 1
o1->sigmoid:
outo1=sigmoid(neto1)=11+eneto1out_{o1} = sigmoid(net_{o1}) = \frac{1}{1 + e^{-net_{o1}}}
net_o2、out_02的计算同理

后向传播
总误差:
Etotal=12(targetoutput)2E_{total} = \sum\frac{1}{2}(target - output)^2
隐藏层权值更新(假设对w5w_5):
Etotalw5\frac{\partial E_{total}}{\partial w_5}
采用链式求导法则:
Etotalw5Etotalouto1×outo1neto1×neto1w5\frac{\partial E_{total}}{\partial w_5}=\frac{\partial E_{total}}{\partial out_{o1}} \times \frac{\partial out_{o1}}{\partial net_{o1}} \times \frac{\partial net_{o1}}{\partial w_5}
其中Etotal=12(gto1outo1)2+12(gto2outo2)2 E_{total} = \frac{1}{2}(gt_{o1}-out_{o1})^2 + \frac{1}{2}(gt_{o2} - out_{o2})^2
所以Etotalouto1=outo1gto1\frac{\partial E_{total}}{\partial out_{o1}} = out_{o1} - gt_{o1}
outo1neto1=eneto1(1+eneto1)2=outo1×(1outo1)\frac{\partial out_{o1}}{\partial net_{o1}} = \frac{e^{-net_{o1}}}{(1+e^{-net_{o1}})^2 } = out_{o1} \times (1-out_{o1})
neto1w5=outh1\frac{\partial net_{o1}}{\partial w_5} = out_{h1}

Etotalw5Etotalouto1×outo1neto1×neto1w5=(outo1gto1)×outo1(1outo1)×outh1\frac{\partial E_{total}}{\partial w_5}=\frac{\partial E_{total}}{\partial out_{o1}} \times \frac{\partial out_{o1}}{\partial net_{o1}} \times \frac{\partial net_{o1}}{\partial w_5} = (out_{o1} - gt_{o1}) \times out_{o1}(1-out_{o1}) \times out_{h1}
其中,gto1gt_{o1}表示o1的ground truth

更新权重
对于w5w_5
w5+=w5η×Etotalw5w_5^+ = w_5 - \eta \times \frac{\partial E_{total}}{\partial w_5}

python代码
下面是一个简单的python写的神经网络

import numpy as np
def sigmoid(x, deriv = False):
    if (deriv == True):
        return x * (1-x)
    return 1/(1+np.exp(-x))
x = np.array([[0,0,1],
              [0,1,1],
              [1,0,1],
              [1,1,1],
              [0,0,1]]
)
y = np.array([[0],
              [1],
              [1],
              [0],
              [0]]
)
np.random.seed(22)
w0 = 2 * np.random.random((3,4)) - 1
w1 = 2 * np.random.random((4,1)) - 1
for j in xrange(60000):
    l0 = x
    l1 = sigmoid(np.dot(l0, w0))
    l2 = sigmoid(np.dot(l1, w1))
    l2_error = y - l2
    if (j%5000) == 0:
        print 'Error'+str(np.mean(np.abs(l2_error)))
    l2_delta = l2_error * sigmoid(l2, deriv=True)
    l1_error = l2_delta.dot(w1.T)
    l1_delta = l1_error * sigmoid(l1, deriv=True)
    
    w1 += l1.T.dot(l2_delta)
    w0 += l0.T.dot(l1_delta)

最后
第一次在****写公开的博客,请大家多多指教
参考:
1.https://blog.****.net/cc514981717/article/details/73832119
2.唐宇迪深度学习入门课程