论文阅读:NAS-FPN

1、论文总述

NAS的全称:Neural Architecture Search
这篇论文是NAS诞生以后在FPN的设计上的发力,其实就是优化FPN,这段时间目标检测学术方面的创新中提升效果比较明显的基本都是基于FPN的改进,例如前段时间刚提出的DetectoRS(提出递归FPN),还有efficientdet的BiFPN。

需要注意的是:NAS自从诞生以后,总体结构已经不怎么变了,现在的NAS已经是把它应用到各个方面:如backbone、FPN、还有数据增强等,都是一些比较成熟的研究方向,因为NAS的应用中比较重要的就是如何构建一个合理的搜索空间,要想这个空间合理,必须加入一些人为的先验知识,比如 bottom-up 和 top-down 还有两个feature map之间进行sum操作合并为一个feature map等;接着这个搜索空间合理了,就可以上几百个或者上千个TPU/GPU 利用NAS搜索算法进行暴力搜索,搜索出看似不合理其实效果还不错的FPN结构。

下图是一个NAS搜索出来的FPN结构:
论文阅读:NAS-FPN

The key contribution of our work is in designing
the search space that covers all possible cross-scale connections to generate multiscale feature representations. During
the search, we aims to discover an atomic architecture that
has identical input and output feature levels and can be applied repeatedly. The modular search space makes searching pyramidal architectures manageable. Another benefit
of modular pyramidal architecture is the ability for anytime
object detection (or “early exit”). Although such early exit
approach has been attempted [14], manually designing such
architecture with this constraint in mind is quite difficult

In our work, instead of manually designing architectures for pyramidal
representations, we use a combination of scalable search space and
Neural Architecture Search algorithm to overcome the large search
space of pyramidal architectures. We constrain the search to find an
architecture that can be applied repeatedly. The architecture can
therefore be used for anytime object detection (or “early exit”). Such
early exit idea is related to [3, 37], especially in image
classification [14].

2、 Binary operations

论文里NAS搜索空间构建时:两个相同分辨率(如果不同需要先降采样或者上采样)的feature map如果进行合并
一个是直接求和,一个是类似于加了注意力模块然后再求和

论文阅读:NAS-FPN

3、Merging cell

Merging cell: NAS-FPN中搜索的基本单元
论文阅读:NAS-FPN

4、the architecture of the FPN can be stacked repeatedly for better accuracy

We use as inputs features in 5 scales {C3, C4, C5, C6, C7} with corresponding feature stride of {8, 16, 32, 64, 128} pixels. The
C6 and C7 are created by simply applying stride 2 and stride
4 max pooling to C5. The input features are then passed to a
pyramid network consisting of a series of merging cells (see
below) that introduce cross-scale connections. The pyramid
network then outputs augmented multiscale feature representations {P3, P4, P5, P6, P7}.
Since both inputs and outputs of a pyramid network are feature layers in the identical
scales, the architecture of the FPN can be stacked repeatedly
for better accuracy. In Section 4, we show controlling the
number of pyramid networks is one simple way to tradeoff
detection speed and accuracy

因为FPN的输入输出的feature map的分辨率相同,所以可以对FPN进行堆叠 ,进一步提高性能

5、What makes a good feature pyramid architecture?

We hope to shed lights
on this question by visualizing the discovered architectures.
In Figure 7(b-f), we plot NAS-FPN architectures with progressively higher reward during RL training. We find the
RNN controller can quickly pick up some important cross-scale connections in the early learning stage. For example, it discovers the connection between high resolution input and output feature layers, which is critical to generate high resolution features for detecting small objects. As the
controller converges, the controller discovers architectures
that have both top-down and bottom-up connections which
is different from vanilla FPN in Figure 7(a). We also find
better feature reuse as the controller converges. Instead of
randomly picking any two input layers from the candidate
pool, the controller learns to build connections on newlygenerated layers to reuse previously computed feature representations.

论文阅读:NAS-FPN
从搜索出来的结果可以看出,确实一般人想不到。

6、how to control the model capacity

In this section, we show how to control the model capacity by adjusting (1) backbone model , (2) the number of
repeated pyramid networks, and (3) the number of dimension in pyramid network. We discuss how these adjustments
tradeoff computational time and speed. We define a simple
notation to indicate backbone model and NAS-FPN capacity. For example, R-50, 5 @ 256 indicate a model using
ResNet-50 backbone model, 5 stacked NAS-FPN pyramid
networks, and 256 feature dimension.

论文阅读:NAS-FPN

7、 Further Improvements with DropBlock

Increasing feature dimension would
require model regularization technique

feature map通道数增加之后,容易过拟合,加入DropBlock正则化技术

Due to the increased number of new layers introduced
in NAS-FPN architecture, a proper model regularization is
needed to prevent overfitting. Following the technique in
[9], we apply DropBlock with block size 3x3 after batch
normalization layers in the the NAS-FPN layers. Figure 10
shows DropBlock improves the performance of NAS-FPN.
Especially, it improves more for architecture that has more
newly introduced filters. Note that by default we do not
apply DropBlock in previous experiments for the fair comparison to existing works

.

.参考文献:

1、【论文笔记】:NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection
2、一文看尽物体检测中的各种FPN

3、如何评价Google Brain最新检测论文NAS-FPN?