1-by-1 Convolution Layer

转载自 https://zhuanlan.zhihu.com/p/30182988

对于已经懂得Conv Layer原理的小伙伴们而言，kernel size为 1-by-1 Convolution Layer 的conv layer的具体运作方式和产生的结果其实都是容易知道的。但要想搞清楚为什么要弄个看似没啥作用的的kernel，可能还是需要稍微调查研究一番的。

复习

先简要回顾一遍 conv layer在kernel size为 1-by-1 Convolution Layer ，strides=1时的运作过程。

图中输入层大小为 1-by-1 Convolution Layer ，kernel有3个，大小为而输出层则为。可以看出，所输出的feature map的空间大小并没变，可变的是输出层的每个空间位置上的depth，或者说是feature channel的个数。也就是说，如果这里filter的个数大于输层入的depth，那么会起到了升维的效果，若filter的个数小于输入层的depth，那么就起到了降维的效果。下面我们通过具体的使用场景来看这么做的原因何在。

具体使用场景－ Inception Module

GoogLeNet是获得了ImageNet Challenge 2014 图像分类任务第一名的模型，其中最重要改进就是使用了Inception Module with Dimension Reduction.

先介绍一下Inception Module。我们在设计深度神经网络结构的时候常常面临一个问题：是选择 1-by-1 Convolution Layer 的卷积层好呢，还是的好呢，还是先做个pooling好呢。随后就有人提出来，为什么不让模型自己来选择呢。于是就设计出了这种先平行使用多个尺度的卷积层，然后再拼接起来的结构，这就是Inception Module的最直观、基本的概念（如下图所示）。

然而在这个过程中，如果我们就按照上面这种网络结构进行计算，那么需要额外耗费不少的计算量和参数存储空间。那么有没有什么方案能解决这一问题呢？答案是肯定的，那就是在其中加入 1-by-1 Convolution Layer 的卷积层。其改进结构如下图所示。

如果大家感兴趣的话可以假设一个输入、输出层尺寸，然后进行手算对比，会发现加入了 1-by-1 Convolution Layer 之后，计算中的浮点操作次数有了大幅的下降，同时所需存储的参数量也有了大幅的下降。这篇文章Inception modules: explained and implemented以原文中Inception(3a)层为例，已经把具体原因和过程讲的很清楚，我就不再赘述了。

只说一点我个人的体会：这么做能够减少计算量和存储的消耗是显而易见的，但它并没有让模型的能力下降或者说能力受损。我觉得这是因为同一个空间点上的多个feature channel中的信息量是有一定冗余的，而通过与更少数量的 1-by-1 Convolution Layer filter进行线性相乘，能够将高维度的信息凝练到低维度且不损失真正有用的那些部分。当然，我这个理解也不一定对，有待今后接触到更多相关内容后再给出更深刻的理解。

另一个相关场景－用Conv Layer替代Fully Connected Layer

为什么要在这里讲替代FC但事情，因为Yann LeCun在Facebook上发过一个post说道：

In Convolutional Nets, there is no such thing as "fully-connected layers". There are only convolution layers with 1x1 convolution kernels and a full connection table.
It's a too-rarely-understood fact that ConvNets don't need to have a fixed-size input. You can train them on inputs that happen to produce a single output vector (with no spatial extent), and then apply them to larger images. Instead of a single output vector, you then get a spatial map of output vectors. Each vector sees input windows at different locations on the input.
In that scenario, the "fully connected layers" really act as 1x1 convolutions.

但这post里没说清楚这个 1-by-1 Convolution Layer 是filter的尺寸还是output的尺寸。我的理解是，FC和的输出是划等号的，而如果说的是filter，那么其大小得是，因为这样才能产生的输出。

把这个列出来主要是为了和上面的GoogLeNet的情形做一定的区分。尺寸同样是 1-by-1 Convolution Layer ，但一个说的是filter，而另一个说的是output。

参考：

What does 1x1 convolution mean in a neural network?

Inception modules: explained and implemented

https://www.facebook.com/yann.lecun/posts/10152820758292143

How are 1x1 convolutions the same as a fully connected layer?

1-by-1 Convolution Layer

1-by-1 Convolution Layer

复习

具体使用场景 － Inception Module

另一个相关场景 － 用Conv Layer替代Fully Connected Layer

相关推荐

具体使用场景－ Inception Module

另一个相关场景－用Conv Layer替代Fully Connected Layer