ImageNet Evolution论文笔记(4)

Deep residual learning for image recognition

degradation problem:with the network depth increasing, accuracy gets saturated (which might be unsurprising) and then degrades rapidly。【解决了增加深度带来的副作用(退化问题),这样能够通过单纯地增加网络深度,来提高网络性能。】

Deep Residual Learning

Identity Mapping by Shortcuts
we explicitly let these layers approximate a residual functionF(x) := H(x) − x。The original function thus becomesF(x)+x。
对于残差网络,维度匹配的shortcut连接为实线,反之为虚线。维度不匹配时,同等映射有两种可选方案:一是直接通过zero padding 来增加维度(channel)。二是乘以W矩阵投影到新的空间。实现是用1x1卷积实现的,直接改变1x1卷积的filters数目。这种会增加参数。
ImageNet Evolution论文笔记(4)
Network Architectures
1,全是3x3卷积核;积步长2取代池化;使用Batch Normalization【中间的正则化层】,
2,取消Max池化、全连接层、Dropout
ImageNet Evolution论文笔记(4)
更深网络:根据Bootleneck优化残差映射网络【原始:3x3x256x256->3x3x256x256。 优化:1x1x256x64->3x3x64x64->1x1x64x256】
ImageNet Evolution论文笔记(4)
ImageNet Evolution论文笔记(4)

实验参数

1,The image is resized with its shorter side randomly sampled in [256; 480] for scale augmentation .A 224×224 crop is randomly sampled from an image or its horizontal flip, with the per-pixel mean subtracted。
2,standard color augmentation,adopt batch normalization (BN) right after each convolution and before activation
3,SGD with a mini-batch size of 256. The learning rate,starts from 0.1 and is divided by 10 when the error plateaus, and the models are trained for up to 60 × 104 iterations.
4,a weight decay of 0.0001 and a momentum of 0.9
testing
1,adopt the standard 10-crop testing
2, average the scores at multiple scales (images are resized such that the shorter side is in {224; 256; 384; 480; 640}).