ImageNet Classification with Deep Convolutional Neural Networks-AlexNet阅读笔记

ImageNet Classification with Deep Convolutional Neural Networks-AlexNet

authors: Alex Krizhevsky,Ilya Sutskever,Geoffrey E. Hinton

点击打开链接

AlexNet

Ⅰ new contributions:

1.    highly-optimized GPUimplementation of 2D convolution

2.    improve its performance andreduce its training time

3.    preventing overfitting

 

Ⅱ Dataset: ImageNet

ImageNet is a dataset of over 15 million labeledhigh-resolution images belonging to roughly 22,000 categories. The images werecollected from the web and labeled by human labelers using Amazon’s MechanicalTurk crowd-sourcing tool.

 

Ⅲ Features:

1.    ReLU Nonlinearity

Rectified Linear Units (ReLUs)

non-saturating neurons 非饱和神经元(?目前的理解:没有值域限制,比如饱和神经元sigmoid值域被限制在[0,1])对提升训练效率效果最显著

 ImageNet Classification with Deep Convolutional Neural Networks-AlexNet阅读笔记

2.    Training on Multiple GPUs

spread the net across two GPUs将网络分布在两个GPU上。

employ parallelization scheme 并行计算

the GPUs communicate only in certain layers GPU通信限制在某些特定的层上

3.    Local Response Normalization LRN

ImageNet Classification with Deep Convolutional Neural Networks-AlexNet阅读笔记

(不太理解)横向抑制,归一化,用在ReLU之后?

4.    Overlapping Pooling

Models with overlapping pooling are slightly more difficultto overfit

s=2 z=3

有重叠的池化,可以提升效率

5.    Reduce Overfitting 解决过拟合问题

Data Augmentation: label-preserving transformations, at test time extract ten patches

数据增强,比如训练时剪裁、翻转原图片得到多个patches

Dropout: setting to zero the output of each hidden neuron with probability0.5. The neurons which are “dropped out” in this way do not contribute to theforward pass and do not participate in back-propagation.

 在训练中以概率P(一般为50%)关掉一部分神经元,在预测的时候,将使用所有的神经元,但是会将其输出乘以0.5

Used dropout in the first twofully-connected layers 在前两个全连接层使用

 

IV  Overall Architecture

● 8 layers: 5 convolutional+3 fc

● The output of the last fully-connected layer is fed to a 1000-way softmaxwhich produces a distribution over the 1000 class labels. 最后一层输出对1000个类别的预测

● Maximizes the multinomial logistic regression objective. 多项式回归

● The kernels of the second, fourth, and fifth convolutional layersare connected only to those kernel maps in the previous layer which reside onthe same GPU . The kernels of the third convolutional layer are connected toall kernel maps in the second layer

第2、4、5个卷积层的内核只与前一层与自己同在一个GPU上的内核映射相连接。

第三层的内核与全部的第二层内核映射相连接。

● Response-normalization layers follow the first and secondconvolutional layers

前两个卷积层后有LRN

● Max-pooling layers follow both response-normalization layers aswell as the fifth convolutional layer

   LRN层和第五个卷积层后有池化层

● The ReLU non-linearity is applied to the output of everyconvolutional and fully-connected layer ReLU用在每一个卷积层、全连接层后

 ImageNet Classification with Deep Convolutional Neural Networks-AlexNet阅读笔记

网络架构

Input 224×224×3

C1: 96 kernels of size 11×11×3, stride=4  卷积层+RLN+pooling,55×55×48×2

response-normalized and pooled 27×27×48×2

C2: 256 kernels of size 5×5×48 卷积层+RLN+pooling,27×27×128×2

response-normalized and pooled 13×13×128×2

C3: 384 kernels of size 3 × 3 × 256 卷积层,13×13×192×2

C4: 384 kernels of size 3×3×192 卷积层,13×13×192×2

C5: 256 kernels of size 3×3×192 卷积层,13×13×128×2

FC6: 4096 neurons 全连接层,2048×2

FC7: 4096 neurons 全连接层,2048×2

FC8: 4096 neurons 全连接层+softmax   1000

 

V  Details of Learning

using stochastic gradient descent with a batch size of 128 examples, momentum of 0.9, and weight decay of 0.0005.

ImageNet Classification with Deep Convolutional Neural Networks-AlexNet阅读笔记

VI  Discussion

1.    depth really is important

2.    temporal structure?