ImageNet Evolution论文笔记（4）

Deep residual learning for image recognition

degradation problem：with the network depth increasing, accuracy gets saturated (which might be unsurprising) and then degrades rapidly。【解决了增加深度带来的副作用（退化问题），这样能够通过单纯地增加网络深度，来提高网络性能。】

Deep Residual Learning

Identity Mapping by Shortcuts
we explicitly let these layers approximate a residual functionF(x) := H(x) − x。The original function thus becomesF(x)+x。
对于残差网络，维度匹配的shortcut连接为实线，反之为虚线。维度不匹配时，同等映射有两种可选方案：一是直接通过zero padding 来增加维度（channel）。二是乘以W矩阵投影到新的空间。实现是用1x1卷积实现的，直接改变1x1卷积的filters数目。这种会增加参数。
ImageNet Evolution论文笔记（4）
Network Architectures
1，全是3x3卷积核；积步长2取代池化；使用Batch Normalization【中间的正则化层】,
2，取消Max池化、全连接层、Dropout

更深网络：根据Bootleneck优化残差映射网络【原始：3x3x256x256->3x3x256x256。优化：1x1x256x64->3x3x64x64->1x1x64x256】
ImageNet Evolution论文笔记（4）

实验参数

1，The image is resized with its shorter side randomly sampled in [256; 480] for scale augmentation .A 224×224 crop is randomly sampled from an image or its horizontal flip, with the per-pixel mean subtracted。
2，standard color augmentation，adopt batch normalization (BN) right after each convolution and before activation
3，SGD with a mini-batch size of 256. The learning rate，starts from 0.1 and is divided by 10 when the error plateaus, and the models are trained for up to 60 × 104 iterations.
4，a weight decay of 0.0001 and a momentum of 0.9
testing
1，adopt the standard 10-crop testing
2， average the scores at multiple scales (images are resized such that the shorter side is in {224; 256; 384; 480; 640}).

ImageNet Evolution论文笔记（4）

Deep residual learning for image recognition

Deep Residual Learning

实验参数

相关推荐