[paper reading] CenterNet (Triplets)

GitHub：Notes of Classic Detection Papers

本来想放到GitHub的，结果GitHub不支持公式。
没办法只能放到****，但是格式也有些乱
强烈建议去GitHub上下载源文件，来阅读学习！！！这样阅读体验才是最好的
当然，如果有用，希望能给个star！

topic	motivation	technique	key element	math	use yourself	relativity
CenterNet (triple)	Problem to Solve Idea Intuition	CenterNet Architecture Center Pooling Cascade Corner Pooling Central Region Exploration	Baseline：CornerNet Generating BBox Training Inferencing Ablation Experiment Error Analysis Metric AP & AR & FD Small & Medium & Large	Central Region Loss Function	……	Related Work

文章目录

[paper reading] CenterNet (Triplets)

Motivation

Problem to Solve

keypoint-based方法的弊端（这里主要指的是CornerNet）：

由于缺少对于cropped region的additional look，无法获得bounding box region的visual pattern，会导致产生大量的incorrect bounding box

[paper reading] CenterNet (Triplets)

① CornerNet 会产生很多的错误的bounding box

Idea

用一个keypoint triplet（top-left corner & bottom-right corner & center）表示一个object。

即在由top-left corner & bottom-right corner去encode边界信息的同时，通过引入center，使得模型可以explore每个predicted bounding box的visual patter（获得object的internal信息）

在具体的做法上，是将 visual patterns within object 转化成 keypoint detection

[paper reading] CenterNet (Triplets)

② 检查Central Region可以找出正确的prediction

Intuition

该思路部分沿袭RoI Pooling的思想，通过efficient discrimination（Central Region），使得one-stage方法一定程度上具有了two-stage方法的resample能力

具体来说：如果predicted bounding box和ground-truth box有高IoU，则Center-Region中的Center KeyPoint也会被预测为相同的类别

Technique

CenterNet Architecture

[paper reading] CenterNet (Triplets)

Components

[Center Pooling](#Center Pooling)
[Cascade Corner Pooling](#Cascade Corner Pooling)
[Central Region Exploration](#Central Region Exploration)

Improvement

AP Improvement

small、medium、large object的AP均有提升，绝大部分的提升来自small object

原因：Center Information。incorrect bounding box越小，能在其Central Region检测到center keypoint的可能性越小

small object

[paper reading] CenterNet (Triplets)

medium & large object

AR Improvement

原因：滤除了incorrect bounding box，相当于提升了accurate location but lower scores的bounding box的confidence

Center Pooling

Cascade Corner Pooling 和 Center Pooling 都可以通过结合不同方向的 Corner Pooling 实现

Why

geometric center并不一定带有recognizable visual pattern

Purpose

better detection of center keypoint！！！

具体来说，是为Central Region提供recognizable visual pattern，以感知proposal中心位置的信息，从而检测bounding box的正确性

Steps

[paper reading] CenterNet (Triplets)

对于Center Pooling的输入feature map，在水平和垂直方向取max summed response

backbone输出feature map
在水平和垂直方向分别找到最大值
将其加到一起

[paper reading] CenterNet (Triplets)

Cascade Corner Pooling

Cascade Corner Pooling 和 Center Pooling 都可以通过结合不同方向的 Corner Pooling 实现

Why

corner在object之外，缺少local appearance feature

Purpose

better detection of corners！！！

具体来说，是丰富top-left corner和bottom-right corner收集的信息，以同时感知boundary和internal信息

Steps

[paper reading] CenterNet (Triplets)

在输入feature map的boundary和internal方向，去max summed response（双方向的pooling更稳定更鲁棒，能提高准确率和召回率）

在boundary方向上找boundary max
在boundary max的位置，向internal方向上找internal max
把2个max加起来（加到corner的位置）

[paper reading] CenterNet (Triplets)

Central Region Exploration

Scale-Aware Central Region

原因：

recall v s . precision \text{recall} \ vs. \text{precision} recall vs.precision
Central Region的选择：

对不同size的bounding box生成不同大小Central Region
- small bounding box ==> large central region
  
  原因：small center region会导致small bounding box的low recall
- large bounding box ==> small central region
  
  原因：small center region会导致small bounding box的low recall
在实验中，使用2中Central Region：

具体使用哪种，由bounding box的scale决定：
- < 150 < 150 <150：n = 3 (left)
- > 150 > 150 >150：n = 5 (right)

Exploration

center keypoint落到Central Region中
center keypoint和bounding box的类别相同

Key Element

Baseline：CornerNet

Three outputs

heatmap：
- top-left corner
- bottom-right corner
每个heatmap都包括2个部分：
1. 不同category的keypoint的位置
2. 每个keypoint的confidence score
embedding：

对corner进行分组
offset：

把corner从heatmap去remap到input image

Generate BBox

对top-left corner和bottom-right corner分别取top-100
根据embedding distance对corner进行分组（embedding distance < T h r e s h o l d Threshold Threshold）
计算bounding box的confidence score（2个corner score的平均）

Drawbacks

CornerNet的False Discovery Rate（FD）很高（即：有大量的incorrect bounding box）

AP & FD的含义，见 [Metric AP & AR & FD](#Metric AP & AR & FD)

Generating BBox

选取 top-k 个center keypoints
center keypoint去remap到input image（使用offset）
在bounding box中定义Central Region
保留符合要求的bounding box
- center keypoint落到Central Region中
- center keypoint和bounding box的类别相同
计算bounding box的score

为top-left corner、bottom-right corner、center的average score

Training

Input & Output Size

input size：511×511
output size：128×128

Data Augmentation

同 CornerNet

Inferencing

Single-Scale Testing

以原分辨率，将original和flipped输入网络

Multi-Scale Testing

以分辨率 [ 0.6 , 1.0 , 1.2 , 1.5 , 1.8 ] [0.6, 1.0, 1.2,1.5,1.8] [0.6,1.0,1.2,1.5,1.8]，将original和flipped输入网络

Steps

根据70对Triplet确定70对bounding box

详见 [Generating BBox](#Generating BBox)
将flipped image再次flip，合并到原image上
Post-Processing：Soft-NMS
取top-100的bounding box

Ablation Experiment

[paper reading] CenterNet (Triplets)

Incorrect Bounding Box Reduction

[paper reading] CenterNet (Triplets)

Inference Speed

visual patterns exploration的cost很小

CenterNet某版本可以在精度和速度上同时超过CornerNet某版本

Center Pooling Ablation

结论：

Center Pooling可以大幅度提高large object的AP
原因：
- Center Pooling可以提取更丰富的internal visual patterns
- larger object包含更多的internal visual pattern

[paper reading] CenterNet (Triplets)

Cascade Corner Pooling Ablation

结论：
- 由于large object有丰富的internal visual patterns，Cascade Corner Pooling可以看到更多的object
- 过于丰富的internal visual patterns会影响其对boundary的敏感，导致inaccurate bounding box
  - 可以通过Center Pooling抑制错误的Bounding box

Central Region Exploration Ablation

结论：

提升了整体的AP，其中小目标AP提升最大
原因：

小目标的center keypoint更容易被located

Error Analysis

Exploration of visual patterns依赖于center keypoint实现 ==> Center keypoint的丢失会导致CenterNet丢失bounding box的visual pattern
Center keypoint还有很大的提升空间

Metric AP & AR & FD

AP：Average Precision Rate

是在所有category上，以10个Threshold（e.g. 0.5 : 0.05 : 0.95 0.5:0.05:0.95 0.5:0.05:0.95）上计算

可以反映网络可以预测多少高质量的bounding box（一般IoU ≥ 0.5 \ge0.5 ≥0.5）

是MS-COCO数据集最重要的metric

AR：Maximum Recall Rate

在每张图片上取固定数量的detection，在所有类别和10个IoU Threshold上取平均

FD：False Discovery Rate

反映incorrect bounding box的比例
FD = 1 − AP \text{FD} = 1-\text{AP} FD=1−AP

Small & Medium & Large

small object： area < 3 2 2 \text{area}<32^2 area<322
medium object： 3 2 2 < area < 9 6 2 32^2<\text{area}<96^2 322<area<962
large object： area > 9 6 2 \text{area}>96^2 area>962

Math

Central Region

[paper reading] CenterNet (Triplets)

Loss Function

主要分为：

Detection Loss
- Corner Detection Loss L det co \text{L}_{\text{det}}^{\text{co}} Ldetco
- Center Detection Loss L det ce \text{L}_{\text{det}}^{\text{ce}} Ldetce
Pull & Push Loss

仅对Corner进行
- Pull Loss L pull co \text{L}_{\text{pull}}^{\text{co}} Lpullco
- Push Loss L push co \text{L}_{\text{push}}^{\text{co}} Lpushco
Offset Loss
- Corner offset Loss L off co \text{L}_{\text{off}}^{\text{co}} Loffco
- Center offset Loss L off ce \text{L}_{\text{off}}^{\text{ce}} Loffce

[paper reading] CenterNet (Triplets)

α = β = 0.1 \alpha=\beta = 0.1 α=β=0.1
γ = 1 \gamma=1 γ=1

Use Yourself

……

Related Work

Anchor-Based Method

Introduction

Anchor-Based Method有2个关键点：

放置预定义size和ratio的anchor
根据ground-truth对positive bounding box进行regression

drawbacks

需要大量的anchor（以保持和ground-truth box的足够高的IoU）
anchor的size和ratio需要手工设计（带来大量的超参数需要调试）
anchor和ground-truth没有对齐

KeyPoint-Based Method

这里主要指的是CornerNet

Introduction

即：使用一对corner表示一个object

drawbacks

referring到global信息的能力相对较弱

换句话说，即：对object的boundary信息敏感
无法确知哪对KeyPoints应该表示object

详见 [Problem to Solve](#Problem to Solve)

Two-Stage Method

Steps

Extract RoIs ==> stage-1
classify & regress RoIs ==> stage-2

Models

RCNN：

selective search获得RoI
CNN作为classifier

SPP-Net & Faster-RCNN：

在feature map中提取RoIs

Faster-RCNN：

使用RPN对anchor进行regression，实现了end-to-end训练

Mask-RCNN：

Faster-RCNN + mask-prediction branch
同时实现detection和segmentation

R-FCN：

将FC层替换成了position-sensitive score maps

Cascade RCNN：

通过训练一系列IoU阈值逐渐升高的detector，解决了2个问题：

训练时的overfitting
推断时的quality mismatch

One-stage Method

one-stage方法的通病：缺少对cropped region的additional look

Steps

直接对anchor box进行classify和regress

Models

YOLOv1：

image ==> S×S grid
不使用anchor，直接去学习bounding box的size

YOLOv2：

重新使用了较多的anchor
使用了新的bounding box regression方法

SSD：

使用不同convolutional stage的feature map进行classify和regress

DSSD：

SSD + deconvolution ==> 结合low-level和high-level的feature

R-SSD：

对不同feature layer，进行pooling和deconvolution ==> 结合low-level和high-level的feature

RON：

reverse connection
objectness prior

RefineDet：

对location和size进行2次refine，继承了one-stage和two-stage的优点

CornerNet：

keypoint-based method
用一对corner表示一个object

Problems

Cascade Corner Pooling的internal方向，怎么找boundary方向的最大值呢？
AP和AR的含义到底是什么？
为什么CornerNet去referring目标的global information的能力很弱？

[paper reading] CenterNet (Triplets)

[paper reading] CenterNet (Triplets)

文章目录

Motivation

Problem to Solve

Idea

Intuition

Technique

CenterNet Architecture

Components

Improvement

Center Pooling

Why

Purpose

Steps

Cascade Corner Pooling

Why

Purpose

Steps

Central Region Exploration

Scale-Aware Central Region

Exploration

Key Element

Baseline：CornerNet

Three outputs

Generate BBox

Drawbacks

Generating BBox

Training

Input & Output Size

Data Augmentation

Inferencing

Single-Scale Testing

Multi-Scale Testing

Steps

Ablation Experiment

Incorrect Bounding Box Reduction

Inference Speed

Center Pooling Ablation

Cascade Corner Pooling Ablation

Central Region Exploration Ablation

Error Analysis

Metric AP & AR & FD

AP：Average Precision Rate

AR：Maximum Recall Rate

FD：False Discovery Rate

Small & Medium & Large

Math

Central Region

Loss Function

Use Yourself

Related Work

Anchor-Based Method

Introduction

drawbacks

KeyPoint-Based Method

Introduction

drawbacks

Two-Stage Method

Steps

Models

One-stage Method

Steps

Models

Problems

相关推荐