Non-local Neural Networks

CVPR2018的一篇文章

论文:https://arxiv.org/pdf/1711.07971v3.pdf

主要贡献:以往的CNN或者RCNN都是考虑的local的信息,虽然可以通过扩大感受野来增加不那么local的信息,但是却是有限的,non local按照文中的说法就是non-local operation computes the response at a position as a weighted sum of the features at all positions.从而可以capturing long-range dependencies.,作者将non local的输入输出设计成一样的,使得non local这个block可以灵活的插入到任何框架中。按照作者的说法the self-attention module [Attention is all you need. In Neural Information Processing Systems (NIPS)] recently presented for machine translation is a special case of non-local operations in the embedded Gaussian version. 

就像下图中在xi处的响应用周围(可以是空间,也可以是时间)所有的位置特征的加权平均。

Non-local Neural Networks

non local的结构如下图

Non-local Neural Networks

 上图对应的公式为

Non-local Neural Networks

 其中Wz对应着上图最上面那个1*1*1的卷积,yi的表示为

Non-local Neural Networks

 g就对应着上图最右边那个g的卷积,函数f的形式文中给出了很多种,每种f都对应这不同的C(x),比如Gaussian,Dot product,Concatenation,具体形式可以查看论文,写的很清楚,其实不得不说整篇论文都写的比较清楚,实验证明这些形式最后的效果都是差不多的,证明了non local 对这些是不敏感的。图中默认的f是Embedded Gaussian,形式为

Non-local Neural Networks

 为了减少计算量,文中采取了两种方式,

1.set the number of channels represented by Wg, Wθ, and Wφ to be half of the number of channels in x

2.subsampling trick,用max pool去下采样φ,g

 

Non-local Neural Networks

下图是non local插在resnet50不同stage的效果情况。可以看出在res2,res3,res4的效果是差不多的,res5的效果略差,作者推测为res5 has a small spatial size (7×7) and it is insufficient to provide precise spatial information.开源的代码中应该是用在了res4的位置。

Non-local Neural Networks

 

思考: