Inception系列3_Inception-v4：Inception-ResNet and the Impact of Residual Connections on Learning

Inception-v4：Inception-ResNet and the Impact of Residual Connections on Learning
作者：Christian Szegedy，Sergey Ioffe，Vincent Vanhoucke

有一条很清楚的经验证明：训练具有残差连接的网络能够显著的加速Inception网络的训练。
论文提出两种模型：Inception-v4，Inception-ResNet。

ResNet的作者声称，残差网络对于很深的网络非常重要。但是本文作者的实验结果却不能证明这一点，但是本文却证实残差网络能够极大加快训练速度。

Inception-v4如图3，4，5，6，7，8，9。没有标记为V的卷积表示进行padding。标记V的表示经过卷积后，特征图大小会减小。

纯Inception-V4总框架如图9：
Inception系列3_Inception-v4：Inception-ResNet and the Impact of Residual Connections on Learning
输入模块如图3。

Inception-A模块如图4。

Inception-B模块如图5。

Inception-C模块如图6。

过度残差模块（特征图大小变换，从35 × 35到17 × 17），如图7。
Inception系列3_Inception-v4：Inception-ResNet and the Impact of Residual Connections on Learning
参数k，l，m，n如表1。

过度残差模块（特征图大小变换，从17 × 17 到 8 × 8），如图8。

Residual Inceptio模块
作者提供了两种网络，Inception-ResNet-v1（消耗的计算资源和Inception-v3相近）和Inception-ResNet-v2（消耗的计算资源和Inception-v4相近，但是Inception-v4的stem非常慢）。如图15。
Inception系列3_Inception-v4：Inception-ResNet and the Impact of Residual Connections on Learning
有一点需要注意的是，在Inception-ResNet中，作者仅仅在传统的层上加BN，而没有在组合层后加。这样做是为了使得每个模型的复制品能够在单GPU上训练，虽然彻底的增加BN层是较好的。由于在组合层后边使用BN极大的消耗内存，所以取消这些层的BN操作后，可以显著的增加Inception模块。作者期望通过增加Inception模块来平衡去掉BN层带来的损失。

Inception-ResNet-v1使用模块
Inception系列3_Inception-v4：Inception-ResNet and the Impact of Residual Connections on Learning

Inception系列3_Inception-v4：Inception-ResNet and the Impact of Residual Connections on Learning

Inception-ResNet-v2使用模块
Inception系列3_Inception-v4：Inception-ResNet and the Impact of Residual Connections on Learning

Inception系列3_Inception-v4：Inception-ResNet and the Impact of Residual Connections on Learning

Scaling of the Residuals
作者发现，当滤波器个数增加到1000时，残差变体变得极度不稳定，网络在训练早期就停止更新。average pooling后的几层在几万次迭代后产生零。就算给这些层降低学习率或者增加额外的BN，也没有作用。
作者发现，在将这些残差模块加到前一层的**函数的后面之前，先乘以一个比例，这样可以稳定训练比例因子为0.1到0.3。然后在将通过比例缩放的残差通过accumulated layer activations。如图20。
Inception系列3_Inception-v4：Inception-ResNet and the Impact of Residual Connections on Learning

非常深的残差网络也具有不稳定性。He et al.建议先通过较小的学习率来进行训练，然后通过较大的学习率来训练。但作者发现，即使是0.00001这么小的学习率，如果滤波器的个数非常多，网络也不能稳定。
虽然进行比例缩放不一定严格必须，但是并没有降低网络的性能，而且能够稳定网络的训练。

Training Methodology
使用tensorflow，优化器使用RMSProp，decay of 0.9 and Inception系列3_Inception-v4：Inception-ResNet and the Impact of Residual Connections on Learning ，学习率为0.045，每两个周期使用指数率为0.94进行下降。

Inception系列3_Inception-v4：Inception-ResNet and the Impact of Residual Connections on Learning

相关推荐