【FPN】《Feature Pyramid Networks for Object Detection》

【FPN】《Feature Pyramid Networks for Object Detection》

CVPR 2017


目录


Feature pyramids are a basic component in recognition systems for detecting objects at different scales.

But recent deep learning object detectors have avoided pyramid representations, in part because they are compute and memory intensive.

作者提出来FPN,with marginal extra cost,without bells and whistles(花里胡哨), surpassing all existing single-model entries including those from the COCO 2016 challenge winners.

1 Motivation

  • Recognizing objects at vastly different scales is a fundamental challenge in computer vision.

【FPN】《Feature Pyramid Networks for Object Detection》

rcnn系列在单个scale的feature map做检测 (b),尽管conv已经对scale有些鲁棒了,但是还是不够。物体各种各样的scale还是是个难题,尤其是小物体,所以有很多论文在这上面做工作,最简单的做法就是类似于数据增强了,train时把图片放缩成不同尺度送入网络进行训练,但是图片变大很吃内存,一般只在测试时放缩图片,这样一来测试时需要测试好几遍时间就慢了(a)。另一种就是SSD的做法(c),在不同尺度的feature map上做检测,按理说它该在计算好的不同 scale 的 feature map 上做检测,但是它放弃了前面的low-levelfeature map,而是从 conv4_3 开始用而且在后面加了一些 conv,生成更多高层语义的 feature map 在上面检测.

所以本文就想即利用 conv net 本身的这种已经计算过的不同 scale 的 feature,又想让 low-level 的高分辩的 feature具有很强的语义,所以自然的想法就是把 high-level 的低分辨的 feature map 融合过来。类似的工作还有RON: Reverse Connection with Objectness Prior Networks for Object Detection

通常卷积神经网络中都会使用这两种类型的features: 卷积神经网络的前几层学习low level feature,后几层学习的是high level feature。作者 combines low-resolution, semantically strong features with high-resolution, semantically weak features.

2 Notion

  • Low-level feature: 通常是指图像中的一些小的细节信息,例如edge、corner、color、pixeles、gradients等,这些信息可以通过滤波器、SIFT或HOG获取;

  • High level feature:是建立在low level feature之上的,可以用于图像中目标或物体形状的识别和检测,具有更丰富的语义信息。

  • Image pyramid

【FPN】《Feature Pyramid Networks for Object Detection》

    -

3 Advantages

  • In ablation experiments, we find that for bounding box proposals, FPN significantly increases the Average Recall (AR) by 8.0 points; for object detection, it improves the COCO-style Average Precision (AP) by 2.3 points and PASCAL-style AP by 3.8 points.

  • In addition, our pyramid structure can be trained end-to-end with all scales and is used consistently at train/test time, which would be memory-infeasible using image pyramids.

4 Feature Pyramid Networks

【FPN】《Feature Pyramid Networks for Object Detection》

4.1 Bottom-up pathway

作者用的是ResNet

We denote the output of these last residual blocks as {C2;C3;C4;C5} for conv2, conv3, conv4, and conv5 outputs, and note that they have strides of {4, 8, 16, 32}**pixels with **respect to the input image.

【FPN】《Feature Pyramid Networks for Object Detection》

4.2 Top-down pathway and lateral connections

we upsample the spatial resolution by a factor of 2 (using nearest neighbor upsampling for simplicity). The upsampled map is then merged with the corresponding bottom-up map (which undergoes a 1×1 convolutional layer to reduce channel dimensions) by element-wise addition.

Designing better connection modules is not the focus of this paper, so we opt for the simple design described above.

4.3 利用FPN构建Faster R-CNN检测器步骤

【FPN】《Feature Pyramid Networks for Object Detection》

  • 首先,选择一张需要处理的图片,然后对该图片进行预处理操作;
  • 然后,将处理过的图片送入预训练的特征网络中(如ResNet等),即构建所谓的bottom-up网络;
  • 接着,如图5所示,构建对应的top-down网络(即对层4进行上采样操作,先用1x1的卷积对层2进行降维处理,然后将两者相加(对应元素相加),最后进行3x3的卷积操作,最后);
  • 接着,在图中的4、5、6层上面分别进行RPN操作,即一个3x3的卷积后面分两路,分别连接一个1x1的卷积用来进行分类和回归操作
  • 接着,将上一步获得的候选ROI分别输入到4、5、6层上面分别进行ROI Pool操作(固定为7x7的特征);
  • 最后,在上一步的基础上面连接两个1024层的全连接网络层,然后分两个支路,连接对应的分类层和回归层;

5 Applications

5.1 Feature Pyramid Networks for RPN

RPN is a sliding-window class-agnostic object detector.

Because the head slides densely over all locations in all pyramid levels, it is not necessary to have multi-scale anchors on a specific level. Instead, we assign anchors of a single scale to each level.

之前一层,anchor 多个 scale
现在多层,anchor 一个 scale

【FPN】《Feature Pyramid Networks for Object Detection》

RPN生成roi后对应feature时在哪个level上取呢?
k0 是faster rcnn时在哪取的feature map如resnet那篇文章是在C4取的,k0=4 (C5相当于fc,也有在C5取的,在后面再多添加fc),比如roi是 w/2,h/2wh=224),那么 k=k01=41=3

5.2 Feature Pyramid Networks for Fast RCNN

Fast R-CNN is a region-based object detector in which Region-of-Interest (RoI) pooling is used to extract features. Fast R-CNN is most commonly performed on a single-scale feature map. To use it with our FPN, we need to assign RoIs of different scales to the pyramid levels.

6 Experiments on Object Detection

6.1 Region Proposal with RPN

看看加入FPN的RPN网络的有效性,如下表Table1。网络这些结果都是基于ResNet-50。评价标准采用AR,AR表示Average Recall,AR右上角的100表示每张图像有100个anchor,AR的右下角s,m,l表示COCO数据集中object的大小分别是小,中,大。feature列的大括号{}表示每层独立预测。

【FPN】《Feature Pyramid Networks for Object Detection》

从(a)(b)(c)的对比可以看出FRN的作用确实很明显。另外(a)和(b)的对比可以看出高层特征并非比低一层的特征有效。

6.1.1 How important is top-down enrichment?

Table 1(d)

表示只有横向连接,而没有自顶向下的过程,也就是仅仅对自底向上(bottom-up)的每一层结果做一个1*1的横向连接和3*3的卷积得到最终的结果,有点像Fig1的(b)。从feature列可以看出预测还是分层独立的。作者推测(d)的结果并不好的原因在于在自底向上的不同层之间的semantic gaps比较大。

6.1.2 How important are lateral connections?

Table 1(e)
这样效果也不好的原因在于目标的location特征在经过多次降采样和上采样过程后变得更加不准确。

6.1.3 How important are pyramid representations?

Table 1(f)

【FPN】《Feature Pyramid Networks for Object Detection》

6.2 Object Detection with Fast/Faster RCNN

fast rcnn

【FPN】《Feature Pyramid Networks for Object Detection》

faster rcnn

【FPN】《Feature Pyramid Networks for Object Detection》

6.3 Comparing with COCO CompetitionWinners

【FPN】《Feature Pyramid Networks for Object Detection》

7 Extensions: Segmentation Proposals

其它的应用
Our method is a generic pyramid representation and can be used in applications other than object detection(to generate segmentation proposals).

8 CVPR 现场 QA:

  • 不同深度的 feature map 为什么可以经过 upsample 后直接相加?

A:作者解释说这个原因在于我们做了 end-to-end 的 training,因为不同层的参数不是固定的,不同层同时给监督做 end-to-end training,所以相加训练出来的东西能够更有效地融合浅层和深层的信息。

  • 为什么 FPN 相比去掉深层特征 upsample(bottom-up pyramid) 对于小物体检测提升明显?(RPN 步骤 AR 从 30.5 到 44.9,Fast RCNN 步骤 AP 从 24.9 到 33.9)

A:作者在 poster 里给出了这个问题的答案

【FPN】《Feature Pyramid Networks for Object Detection》

对于小物体,一方面我们需要高分辨率的 feature map 更多关注小区域信息,另一方面,如图中的挎包一样,需要更全局的信息更准确判断挎包的存在及位置。

  • 如果不考虑时间情况下,image pyramid 是否可能会比 feature pyramid 的性能更高?

A:作者觉得经过精细调整训练是可能的,但是 image pyramid 主要的问题在于时间和空间占用太大,而 feature pyramid 可以在几乎不增加额外计算量情况下解决多尺度检测问题。


参考
【1】FPN详解
【2】FPN(feature pyramid networks)算法讲解
【3】FPN解读
【4】CVPR 2017论文解读:特征金字塔网络FPN
【5】计算机视觉中low-level feature和high level feature的理解