卷积神经网络的一些直观解释
卷积神经网络的一些直观解释
刚刚读完一篇博客:
《An Intuitive Explanation of Convolutional Neural Networks》,博客挺长的,我把自己觉得精彩的部分摘抄下来了。
博客地址点击打开链接
博客的中文翻译版点击打开链接
原文的英文博客用词造句都挺友好的,为了便于阅读顺便把中文翻译搬过来。
The Convolution Step
In the table below, we can see the effects of convolution of the above image with different filters. As shown, we can perform operations such as Edge Detection, Sharpen and Blur just by changing the numeric values of our filter matrix before the convolution operation [8] – this means that different filters can detect different features from an image, for example edges, curves etc. More such examples are available in Section 8.2.4 here.
在下表中,我们可以看到不同滤波器对上图卷积的效果。正如表中所示,通过在卷积操作前修改滤波矩阵的数值,我们可以进行诸如边缘检测、锐化和模糊等操作 —— 这表明不同的滤波器可以从图中检测到不同的特征,比如边缘、曲线等。在这里的 8.2.4 部分中可以看到更多的例子。
Another good way to understand the Convolution operation is by looking at the animation in Figure 6 below:
The Pooling Step
The function of Pooling is to progressively reduce the spatial size of the input representation [4]. In particular, pooling
- makes the input representations (feature dimension) smaller and more manageable
- reduces the number of parameters and computations in the network, therefore, controlling overfitting [4]
- makes the network invariant to small transformations, distortions and translations in the input image (a small distortion in input will not change the output of Pooling – since we take the maximum / average value in a local neighborhood).
- helps us arrive at an almost scale invariant representation of our image (the exact term is “equivariant”). This is very powerful since we can detect objects in an image no matter where they are located (read [18] and [19] for details).
池化函数可以逐渐降低输入表示的空间尺度。特别地,池化:
- 使输入表示(特征维度)变得更小,并且网络中的参数和计算的数量更加可控的减小,因此,可以控制过拟合(上述1,2点)
- 使网络对于输入图像中更小的变化、冗余和变换变得不变性(输入的微小冗余将不会改变池化的输出——因为我们在局部邻域中使用了最大化/平均值的操作。
- 帮助我们获取图像最大程度上的尺度不变性(准确的词是“不变性”)。它非常的强大,因为我们可以检测图像中的物体,无论它们位置在哪里(参考 18 和 19 获取详细信息)。
Fully Connected Layer
The output from the convolutional and pooling layers represent high-level features of the input image. The purpose of the Fully Connected layer is to use these features for classifying the input image into various classes based on the training dataset. For example, the image classification task we set out to perform has four possible outputs as shown in Figure 14 below (note that Figure 14 does not show connections between the nodes in the fully connected layer)
卷积和池化层的输出表示了输入图像的高级特征。全连接层的目的是为了使用这些特征把输入图像基于训练数据集进行分类。比如,在下面图中我们进行的图像分类有四个可能的输出结果(注意下图并没有显示全连接层的节点连接)。
Apart from classification, adding a fully-connected layer is also a (usually) cheap way of learning non-linear combinations of these features. Most of the features from convolutional and pooling layers may be good for the classification task, but combinations of those features might be even better [11].
除了分类,添加一个全连接层也(一般)是学习这些特征的非线性组合的简单方法。从卷积和池化层得到的大多数特征可能对分类任务有效,但这些特征的组合可能会更好。