[论文解读]CVPR2019 | FSAF:Feature Selective Anchor-Free Module for Single-Shot Object Detection 自动选择特征层

论文名称：Feature Selective Anchor-Free Module for Single-Shot Object Detection
作者：Chenchen、Zhu Yihui He、 Marios Savvides from Carnegie Mellon University

Abstract:

作者认为传统的anchor-based检测方法有2个缺点：
1、引导式的特征选择 ;
2、重叠的anchor采样；（没大懂）
The FSAF module addresses two limitations brought up by the conventional anchor-based detection:

heuristic-guided feature selection;
overlap-based anchor sampling.

因此，作者提出anchor-free的在线式的选择特征层的方法，尤其是对FPN的多个特征层选择，并且在这个训练的同时也可以采用anchor-based的方法并行训练。
实验表明：在COCO上达到了44.6%的MAP的好成绩。

Introduction
One challenging problem for object detection is scale variation.
作者认为最大的问题是目标检测在尺度变化上的问题。如果采用公式： $l' = l_0 + log_2(\sqrt{wh}/224)$ 来分层决定尺度的话，不能保证为GT指派最合适的feature map。如40X40与50X50的车被化分到同一特征层上。
于是作者提出的FSAF模型用于解决上述问题。该模型主要是让实例选择最合适的特征层次进而来优化网络，因此，在该模型中不应该存在anchor限制特征的选择。
[论文解读]CVPR2019 | FSAF:Feature Selective Anchor-Free Module for Single-Shot Object Detection 自动选择特征层 Related Work
aspects:

how to create the anchor-free branches in the network;
how to generate supervision signals for anchor-free branches;
how to dynamically select feature level for each instance;
how to jointly train and test anchor-free and anchor-based branches).

architecture
FPN 有7层，文中只画了3层。在 RetinaNet 的 box 和 cls 分支上仅仅各加了一层 conv layer，分别生成一个 W × H × K classification output 和一个 W × H × 4 的 regression output
不多说直接上图：
[论文解读]CVPR2019 | FSAF:Feature Selective Anchor-Free Module for Single-Shot Object Detection 自动选择特征层
ground-truth & loss

class:
每一个 pixel 预测这个位置是什么类别，共K个类别，在第k个类别上的instance在feature map 上的映射的 0.2 倍 box 内为 positive，在 0.5 倍 box 内进行忽略，不进行反向传播；其他都设为0。以此，进行回归。
location:
而 regression output 只针对于 0.2 倍的 instance box 进行训练，回归像素点（i，j）离边界的距离。
直接上公式：
网络出来的预测值乘上比例系数（文中为4.0)：
$[p_{t}^{i,j}, p_{l}^{i,j},p_{b}^{i,j},p_{r}^{i,j}] \to S*[p_{t}^{i,j}, p_{l}^{i,j},p_{b}^{i,j},p_{r}^{i,j}]$
再转换成（x1,y1,x2,y2)的形式：
$[i-S*p_{l}^{i,j}, j-S*p_{r}^{i,j}, i+S*p_{t}^{i,j}, j+S*p_{b}^{i,j}]$
最后乘上feature map 的缩放系数2^l，最后得到预测的坐标为：
$Predict\_loc = 2^l* [i-S*p_{l}^{i,j}, j-S*p_{r}^{i,j}, i+S*p_{t}^{i,j}, j+S*p_{b}^{i,j}]$
和ground-truth作损失：
$loc\_loss =IOU\_LOSS(Predict\_loc, ground\_truch\_loc)$
lossfun采用的是IOU loss.

Feature Select
哪一个 anchor-free branch 输出的 loss 最小，就把 ground-truth 分配去哪一个层:
[论文解读]CVPR2019 | FSAF:Feature Selective Anchor-Free Module for Single-Shot Object Detection 自动选择特征层
前面提到anchor-free branch 和 anchor-based branch 联合起来训练，它与 anchor-based 的分支进行加权训练，通过 λ = 0.5 进行权衡。在推理的时候，要把 anchor-free branch 得到的 box 拿过来和 anchor-based branch 一起做 NMS。

experiment：
直接上图：
[论文解读]CVPR2019 | FSAF:Feature Selective Anchor-Free Module for Single-Shot Object Detection 自动选择特征层
如果采用公式选层： $l' = l_0 + log_2(\sqrt{wh}/224)$ 来选层的话，对于 anchor-free branch，AP 下降了 1.2 （35.9 → 34.7），当有 anchor-based branch 后，采用 online feature selection 可以比公式选层增加 1.1 的 AP（36.1 → 37.2）。

[论文解读]CVPR2019 | FSAF:Feature Selective Anchor-Free Module for Single-Shot Object Detection 自动选择特征层

作者也分析了anchor-free的必要性，online feature selection的重要性，结果都是比较robust的。

[论文解读]CVPR2019 | FSAF:Feature Selective Anchor-Free Module for Single-Shot Object Detection 自动选择特征层

相关推荐