神经网络和深度学习（一）

这两天看了 Neural Networks and Deep Learning 网上在线书目的第一章的内容和斯坦福大学《机器学习》的公开课，学习了两种主要的神经网络结构和机器学习中重要的算法——随机梯度下降算法。现在总结如下：

一个计算模型要划分为神经网络，通常需要大量彼此连接的节点（神经元），具有两个特特性：

1.每个神经元通过某种特定的输出函数（或称激励函数 activation function）计算处理来自其他相邻神经元的加权输入值

2.神经元之间的信息传递的强度，用所谓加权值来定义，算法不断自我学习，调整权值weights

在此基础上，神经网络的计算模型，依靠大量的数据来训练。

几个概念：

cost function(成本函数) ：用来定量评估根据特定输入值计算出来的输出结果，离正确值的偏差

learning algorithm : 根据cost function的结果，自我纠正，最快找到神经元之间的最优化的weights权重

神经元Perceptron :

神经网络和深度学习（一）

图1 Perceptron neuron

其中x1, x2, x3 为inputs ,且必须为二进制数字（0 or 1），outputs 也是只有二进制输出。w 为计算权重，这个w设计是重点也是难点。其中计算公式如下：

神经网络和深度学习（一）

简化公式，神经网络和深度学习（一），其中w 和 x 分别代表权重和输入向量，用偏置 b= -threshold,

神经元Sigmoid :

图2 Sigmoid neurons

比较percrptron 神经元和 Sigmoid神经元，发现他们的结构是一样的，但是对于inputs取值不同，Sigmoid 神经元的Inputs 可以取0~1中的任意值，而且输出值不是0 or 1, 而是神经网络和深度学习（一）

，这里σ 被称为 sigmoid 函数，定义如下：

所以，inputs为x1, x2, ..., weights w1, w2,..., bias b 所对应的sigmoid neuron 的输出为：

根据公式，可以得到sigmoid 函数的响应曲线，如下

神经网络的架构

如上图所示，神经网络架构包括输入层、输出层和隐藏层。这种多层网络被称为 multilayer perceptrons or MLPs 。

梯度下降算法（gradient descent）：

为了能够检验对于所有的训练输入值x,我们选择的weihts权重和 biases偏置使得输出值都近似和 y(x) 相等，使用了一个cost function(成本函数 or loss or objective function)：

其中，w 代表网络中所有权重的集合，b 代表所有的偏置，n 是训练输入的总数目，a 是输出向量（依赖于x 、w、b）

如果C(w,b) ≈ 0 ，那么对于所有的training inputs x, y(x) 约等于output a .非常好

如果 C(w,b) 非常大，那么说明对于很多inputs ，y(x)不收敛到outputs a。

我们训练算法的目的是minimize 函数C(w,b)，换句话说，我们想要找到一系列的w(权重)和b(偏置)，使得 C(w,b) 尽可能的小。

我们使用的算法就是梯度下降算法。

我们要找到上图中的最低值，使用的方法是高数中的【梯度】，就是用来求变化率最大的地方，也即是沿着哪个方向，C(w,b)的值下降最快，这就是梯度下降算法的核心思想。（此处用v1, v2来代表w 和 b）

∆v1 和∆v2 分别代表在v1方向和v2方向上的变化量， ∆C表示C(v1,v2)的变化量

我们现在的想法是找到合适的∆v1 、∆v2 使得∆C为负值，这样 C就向着变小的方向变化了。神经网络和深度学习（一）

定义梯度向量：

此时，公式（7）可以重新表示如下：

我们令

其中η 是一个小的正参数（被称为学习率）

可以得到新公式：

然后我们可以不断更新：

如何将梯度下降算法应用在神经网络中呢？就是用梯度下降算法来不断寻找、纠正权重w 和偏置b 来使得等式（6）取得最小值。公式如下：

随机梯度下降算法（Stochastic Gradient Descent）

为了解决梯度下降算法训练样本输入数据太大，学习速度太慢的问题，来加速学习，产生了一种新的算法是随机梯度下降算法。这个算法通过随机选择一定的训练输入样本来计算出一个神经网络和深度学习（一）

来代表梯度

。

其中，m为随机选取的输入样本数量。标记X1, X2,...,Xm 称作 mini-batch。

可以得到：

应用如上网络进行简单的手写数字识别的代码实现

[python]view
plain copy

""" 

network.py 

~~~~~~~~~~ 

A module to implement the stochastic gradient descent learning 

algorithm for a feedforward neural network.  Gradients are calculated 

using backpropagation.  Note that I have focused on making the code 

simple, easily readable, and easily modifiable.  It is not optimized, 

and omits many desirable features. 

"""  

#### Libraries  

# Standard library  

import random  

# Third-party libraries  

import numpy as np  

class Network(object):  

    def __init__(self, sizes):  

        """The list ``sizes`` contains the number of neurons in the 

        respective layers of the network.  For example, if the list 

        was [2, 3, 1] then it would be a three-layer network, with the 

        first layer containing 2 neurons, the second layer 3 neurons, 

        and the third layer 1 neuron.  The biases and weights for the 

        network are initialized randomly, using a Gaussian 

        distribution with mean 0, and variance 1.  Note that the first 

        layer is assumed to be an input layer, and by convention we 

        won't set any biases for those neurons, since biases are only 

        ever used in computing the outputs from later layers."""  

        self.num_layers = len(sizes)  

        self.sizes = sizes  

        self.biases = [np.random.randn(y, 1) for y in sizes[1:]]  

        self.weights = [np.random.randn(y, x)  

                        for x, y in zip(sizes[:-1], sizes[1:])]  

    def feedforward(self, a):  

        """Return the output of the network if ``a`` is input."""  

        for b, w in zip(self.biases, self.weights):  

            a = sigmoid(np.dot(w, a)+b)  

        return a  

    def SGD(self, training_data, epochs, mini_batch_size, eta,  

            test_data=None):  

        """Train the neural network using mini-batch stochastic 

        gradient descent.  The ``training_data`` is a list of tuples 

        ``(x, y)`` representing the training inputs and the desired 

        outputs.  The other non-optional parameters are 

        self-explanatory.  If ``test_data`` is provided then the 

        network will be evaluated against the test data after each 

        epoch, and partial progress printed out.  This is useful for 

        tracking progress, but slows things down substantially."""  

        if test_data: n_test = len(test_data)  

        n = len(training_data)  

        for j in xrange(epochs):  

            random.shuffle(training_data)  

            mini_batches = [  

                training_data[k:k+mini_batch_size]  

                for k in xrange(0, n, mini_batch_size)]  

            for mini_batch in mini_batches:  

                self.update_mini_batch(mini_batch, eta)  

            if test_data:  

                print "Epoch {0}: {1} / {2}".format(  

                    j, self.evaluate(test_data), n_test)  

            else:  

                print "Epoch {0} complete".format(j)  

    def update_mini_batch(self, mini_batch, eta):  

        """Update the network's weights and biases by applying 

        gradient descent using backpropagation to a single mini batch. 

        The ``mini_batch`` is a list of tuples ``(x, y)``, and ``eta`` 

        is the learning rate."""  

        nabla_b = [np.zeros(b.shape) for b in self.biases]  

        nabla_w = [np.zeros(w.shape) for w in self.weights]  

        for x, y in mini_batch:  

            delta_nabla_b, delta_nabla_w = self.backprop(x, y)  

            nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]  

            nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)]  

        self.weights = [w-(eta/len(mini_batch))*nw  

                        for w, nw in zip(self.weights, nabla_w)]  

        self.biases = [b-(eta/len(mini_batch))*nb  

                       for b, nb in zip(self.biases, nabla_b)]  

    def backprop(self, x, y):  

        """Return a tuple ``(nabla_b, nabla_w)`` representing the 

        gradient for the cost function C_x.  ``nabla_b`` and 

        ``nabla_w`` are layer-by-layer lists of numpy arrays, similar 

        to ``self.biases`` and ``self.weights``."""  

        nabla_b = [np.zeros(b.shape) for b in self.biases]  

        nabla_w = [np.zeros(w.shape) for w in self.weights]  

        # feedforward  

        activation = x  

        activations = [x] # list to store all the activations, layer by layer  

        zs = [] # list to store all the z vectors, layer by layer  

        for b, w in zip(self.biases, self.weights):  

            z = np.dot(w, activation)+b  

            zs.append(z)  

            activation = sigmoid(z)  

            activations.append(activation)  

        # backward pass  

        delta = self.cost_derivative(activations[-1], y) * \  

            sigmoid_prime(zs[-1])  

        nabla_b[-1] = delta  

        nabla_w[-1] = np.dot(delta, activations[-2].transpose())  

        # Note that the variable l in the loop below is used a little  

        # differently to the notation in Chapter 2 of the book.  Here,  

        # l = 1 means the last layer of neurons, l = 2 is the  

        # second-last layer, and so on.  It's a renumbering of the  

        # scheme in the book, used here to take advantage of the fact  

        # that Python can use negative indices in lists.  

        for l in xrange(2, self.num_layers):  

            z = zs[-l]  

            sp = sigmoid_prime(z)  

            delta = np.dot(self.weights[-l+1].transpose(), delta) * sp  

            nabla_b[-l] = delta  

            nabla_w[-l] = np.dot(delta, activations[-l-1].transpose())  

        return (nabla_b, nabla_w)  

    def evaluate(self, test_data):  

        """Return the number of test inputs for which the neural 

        network outputs the correct result. Note that the neural 

        network's output is assumed to be the index of whichever 

        neuron in the final layer has the highest activation."""  

        test_results = [(np.argmax(self.feedforward(x)), y)  

                        for (x, y) in test_data]  

        return sum(int(x == y) for (x, y) in test_results)  

    def cost_derivative(self, output_activations, y):  

        """Return the vector of partial derivatives \partial C_x / 

        \partial a for the output activations."""  

        return (output_activations-y)  

#### Miscellaneous functions  

def sigmoid(z):  

    """The sigmoid function."""  

    return 1.0/(1.0+np.exp(-z))  

def sigmoid_prime(z):  

    """Derivative of the sigmoid function."""  

    return sigmoid(z)*(1-sigmoid(z))

抓紧时间充电——面向对象的编程C++ / Python、神经网络知识体系架构！

神经网络和深度学习（一）

神经元Perceptron :

神经元Sigmoid :

神经网络的架构

梯度下降算法（gradient descent）：

随机梯度下降算法（Stochastic Gradient Descent）

应用如上网络 进行 简单的手写数字识别 的代码实现

相关推荐

应用如上网络进行简单的手写数字识别的代码实现