Non-local Neural Networks

CVPR2018的一篇文章

论文：https://arxiv.org/pdf/1711.07971v3.pdf

主要贡献：以往的CNN或者RCNN都是考虑的local的信息，虽然可以通过扩大感受野来增加不那么local的信息，但是却是有限的，non local按照文中的说法就是non-local operation computes the response at a position as a weighted sum of the features at all positions.从而可以capturing long-range dependencies.，作者将non local的输入输出设计成一样的，使得non local这个block可以灵活的插入到任何框架中。按照作者的说法the self-attention module [Attention is all you need. In Neural Information Processing Systems (NIPS)] recently presented for machine translation is a special case of non-local operations in the embedded Gaussian version.

就像下图中在xi处的响应用周围（可以是空间，也可以是时间）所有的位置特征的加权平均。

non local的结构如下图

Non-local Neural Networks

上图对应的公式为

Non-local Neural Networks

其中Wz对应着上图最上面那个1*1*1的卷积，yi的表示为

Non-local Neural Networks

g就对应着上图最右边那个g的卷积，函数f的形式文中给出了很多种，每种f都对应这不同的C(x),比如Gaussian，Dot product，Concatenation，具体形式可以查看论文，写的很清楚，其实不得不说整篇论文都写的比较清楚，实验证明这些形式最后的效果都是差不多的，证明了non local 对这些是不敏感的。图中默认的f是Embedded Gaussian，形式为

Non-local Neural Networks

为了减少计算量，文中采取了两种方式，

1.set the number of channels represented by Wg, Wθ, and Wφ to be half of the number of channels in x

2.subsampling trick,用max pool去下采样φ，g

Non-local Neural Networks

下图是non local插在resnet50不同stage的效果情况。可以看出在res2，res3，res4的效果是差不多的，res5的效果略差，作者推测为res5 has a small spatial size (7×7) and it is insufficient to provide precise spatial information.开源的代码中应该是用在了res4的位置。

Non-local Neural Networks

思考：

全连接层感觉也是non local，但是可以看做是non local 的一个特例，还有Attention is all you need. In Neural Information Processing Systems也是non local的特例，分析见https://zhuanlan.zhihu.com/p/33345791，感觉这篇写的不错。还有一张用来解释y公式的图
代码解释参考：https://blog.****.net/u014380165/article/details/80011785

Non-local Neural Networks

相关推荐