利用反向传播训练多层神经网络的原理

原文链接戳此处

Principles of training multi-layer neural network using backpropagation


The project describes teaching process of multi-layer neural network employing backpropagation algorithm. To illustrate this process the three layer neural network with two inputs and one output,which is shown in the picture below, is used:

这篇文章主要描述的是应用反向传播算法来训练多层神经网络的过程。为了能更加形象的说明训练过程,下面展示了一个有两个输入和一个输出的3层神经网络。如下:(不应该是4层么)

利用反向传播训练多层神经网络的原理

Each neuron is composed of two units. First unit adds products of weights coefficients and input signals. The second unit realise nonlinear function, called neuron activation function. Signal e is adder output signal, and y=f(e) is output signal of nonlinear element. Signal y is also output signal of neuron.

每个神经元都是由两个部分组成。第一个部分是权重系数与所有输入信号的乘积。(Ps.这里的输入信号不仅仅是指输入层,而是所有第(l1)层对第l层来说都叫做输入信号)第二部是实现非线性函数的神经元**函数。信号e表示加法输出信号(也就是权重系数和),而y=f(e)表示f作用在e上之后的输出信号。

利用反向传播训练多层神经网络的原理

To teach the neural network we need training data set. The training data set consists of input signals (x1andx2 ) assigned with corresponding target (desired output) z. The network training is an iterative process. In each iteration weights coefficients of nodes are modified using new data from training data set. Modification is calculated using algorithm described below: Each teaching step starts with forcing both input signals from training set. After this stage we can determine output signals values for each neuron in each network layer. Pictures below illustrate how signal is propagating through the network, Symbols w(xm)n represent weights of connections between network input xm and neuron n in input layer. Symbols yn represents output signal of neuron n.

为了训练神经网络,我们需要一个训练集。这个训练集是由输入信号量(x1 和 x2)组成,并且输入信号将分配给对应的期望输出z。神经网络的训练是一个迭代循环的过程。在每一次的迭代中,每个神经元节点的权重系数都将通过训练集中的新数据来更新。其更新的计算过程如下:每一次训练都从训练集中的输入信号开始,在这之后便可以计算出网络中所有神经元的输出值。下面的图示展示了信号量是如何通过网络来传播的,其中符号w(xm)n表示连接xm与下一层第n一神经元之间的权重值 (Ps.对着下图看容易理解)。符号yn表示第n个神经元的输出信号。

利用反向传播训练多层神经网络的原理

Propagation of signals through the hidden layer. Symbols wmn represent weights of connections between output of neuron m and input of neuron n in the next layer.

信号量的传播是通过隐藏层来进行的。符号wmn 表示连接输出神经元m和下一层输入神经元n之间的权重。

利用反向传播训练多层神经网络的原理

Propagation of signals through the output layer.

信号量通过输出层传播。

利用反向传播训练多层神经网络的原理

In the next algorithm step the output signal of the network y is compared with the desired output value (the target), which is found in training data set. The difference is called error signal δ of output layer neuron.

在算法的下一个步骤中,神经网络的最终输出值y,将同预期期望的目标值(训练数据集中给出)进行比较。两者之间的差被称之为输出层神经元的误差值δ(error signal)。

利用反向传播训练多层神经网络的原理

It is impossible to compute error signal for internal neurons directly, because output values of these neurons are unknown. For many years the effective method for training multiplayer networks has been unknown. Only in the middle eighties the back propagation algorithm has been worked out. The idea is to propagate error signal δ (computed in single teaching step) back to all neurons, which output signals were input for discussed neuron.

当然,直接来计算内部隐藏层神经元的误差值几乎是不可能的,因为这些隐藏层神经元的输出值是未知的。并且很多年来,训练多层神经网络的有效方法一直不被人们所知晓。直到在80年代中期,反向传播算法才凸显出来。该算法的思想就是将训练步骤中最终得来的误差值δ反向传播给网络中的所有神经元,此时输出信号将作为上一层的输入信号。

利用反向传播训练多层神经网络的原理

The weights’ coefficients wmn used to propagate errors back are equal to this used during computing output value. Only the direction of data flow is changed (signals are propagated from output to inputs one after the other). This technique is used for all network layers. If propagated errors came from few neurons they are added. The illustration is below:

用于计算反向传播误差δ的权重系数wmn等同于计算输出值中的权重系数 (Ps. 也就是正向传播中的权重值),仅仅只是数据了的方向发生了改变(信号量一个接一个地从输出层传播到输入层)。这一方法将在所有的网络层中应用。如果传播的误差值来自极少数的神经元,则他们会被添加进来。插入如下:

利用反向传播训练多层神经网络的原理

When the error signal for each neuron is computed, the weights coefficients of each neuron input node may be modified. In formulas below df(e)/de represents derivative of neuron activation function (which weights are modified).

当所有神经元的误差值都计算完成时,每一个输入神经元的权重系数将可能发生改变。其计算公式df(e)/de 表示神经元**函数的导数(此时权重值将发生变化)。

利用反向传播训练多层神经网络的原理

Coefficient η affects network teaching speed. There are a few techniques to select this parameter. The first method is to start teaching process with large value of the parameter. While weights coefficients are being established the parameter is being decreased gradually. The second, more complicated, method starts teaching with small parameter value. During the teaching process the parameter is being increased when the teaching is advanced and then decreased again in the final stage. Starting teaching process with low parameter value enables to determine weights coefficients signs.

系数η (learning rate)影响着整个网络的学习速率。当然,这里有一些选择它的方法:第一种,取一个很大值的学习速率来训练网络,当权重值正在被建立的同时,学习速率也将缓慢的下降。第二种(很复杂),以一个很小的学习速率开始训练网络,在这个训练过程当中,当训练过程进行时,学习速率将先增大然后最终再次减小。但是,选择很小的学习速率来训练网络能够确定权重的符号(正负)。

图解微积分:反向传播
利用反向传播训练多层神经网络的原理