[paper reading] CenterNet (Triplets)

[paper reading] CenterNet (Triplets)

GitHub:Notes of Classic Detection Papers

本来想放到GitHub的,结果GitHub不支持公式。
没办法只能放到****,但是格式也有些乱
强烈建议去GitHub上下载源文件,来阅读学习!!!这样阅读体验才是最好的
当然,如果有用,希望能给个star

topic motivation technique key element math use yourself relativity
CenterNet
(triple)
Problem to Solve
Idea
Intuition
CenterNet Architecture
Center Pooling
Cascade Corner Pooling
Central Region Exploration
Baseline:CornerNet
Generating BBox
Training
Inferencing
Ablation Experiment
Error Analysis
Metric AP & AR & FD
Small & Medium & Large
Central Region
Loss Function
…… Related Work

Motivation

Problem to Solve

keypoint-based方法的弊端(这里主要指的是CornerNet):

由于缺少对于cropped regionadditional look,无法获得bounding box regionvisual pattern,会导致产生大量的incorrect bounding box

[paper reading] CenterNet (Triplets)

① CornerNet 会产生很多的错误的bounding box

Idea

用一个keypoint triplettop-left corner & bottom-right corner & center)表示一个object

即在由top-left corner & bottom-right cornerencode边界信息的同时,通过引入center,使得模型可以explore每个predicted bounding box的visual patter(获得object的internal信息

具体的做法上,是将 visual patterns within object 转化成 keypoint detection

[paper reading] CenterNet (Triplets)

② 检查Central Region可以找出正确的prediction

Intuition

该思路部分沿袭RoI Pooling的思想,通过efficient discrimination(Central Region),使得one-stage方法一定程度上具有了two-stage方法的resample能力

具体来说:如果predicted bounding boxground-truth box高IoU,则Center-Region中的Center KeyPoint也会被预测为相同的类别

Technique

CenterNet Architecture

[paper reading] CenterNet (Triplets)

Components

  • [Center Pooling](#Center Pooling)
  • [Cascade Corner Pooling](#Cascade Corner Pooling)
  • [Central Region Exploration](#Central Region Exploration)

Improvement

  • AP Improvement

    small、medium、large object的AP均有提升绝大部分的提升来自small object

    原因:Center Informationincorrect bounding box越小,能在其Central Region检测到center keypoint的可能性越小

    [paper reading] CenterNet (Triplets)

    small object

[paper reading] CenterNet (Triplets)

medium & large object
  • AR Improvement

    原因:滤除incorrect bounding box,相当于提升accurate location but lower scoresbounding boxconfidence

Center Pooling

Cascade Corner Pooling 和 Center Pooling 都可以通过结合不同方向的 Corner Pooling 实现

Why

geometric center不一定带有recognizable visual pattern

Purpose

better detection of center keypoint!!!

具体来说,是为Central Region提供recognizable visual pattern,以感知proposal中心位置的信息,从而检测bounding box的正确性

Steps

[paper reading] CenterNet (Triplets)

对于Center Pooling的输入feature map,在水平和垂直方向max summed response

  1. backbone输出feature map
  2. 在水平和垂直方向分别找到最大值
  3. 将其加到一起

[paper reading] CenterNet (Triplets)

Cascade Corner Pooling

Cascade Corner Pooling 和 Center Pooling 都可以通过结合不同方向的 Corner Pooling 实现

Why

cornerobject之外,缺少local appearance feature

Purpose

better detection of corners!!!

具体来说,是丰富top-left corner和bottom-right corner收集的信息,以同时感知boundary和internal信息

Steps

[paper reading] CenterNet (Triplets)

在输入feature map的boundary和internal方向,去max summed response(双方向的pooling更稳定更鲁棒,能提高准确率和召回率)

  1. boundary方向上找boundary max
  2. boundary max的位置,向internal方向上找internal max
  3. 2个max加起来(加到corner的位置

[paper reading] CenterNet (Triplets)

Central Region Exploration

Scale-Aware Central Region

  • 原因

    recall  v s . precision \text{recall} \ vs. \text{precision} recall vs.precision

  • Central Region的选择

    对不同size的bounding box生成不同大小Central Region

    • small bounding box ==> large central region

      原因:small center region会导致small bounding boxlow recall

    • large bounding box ==> small central region

      原因:small center region会导致small bounding boxlow recall

    在实验中,使用2中Central Region:

    [paper reading] CenterNet (Triplets)

    具体使用哪种,由bounding box的scale决定:

    • < 150 < 150 <150:n = 3 (left)
    • > 150 > 150 >150:n = 5 (right)

Exploration

  • center keypoint落到Central Region中
  • center keypointbounding box类别相同

Key Element

Baseline:CornerNet

Three outputs

  • heatmap

    • top-left corner
    • bottom-right corner

    每个heatmap都包括2个部分:

    1. 不同categorykeypoint的位置
    2. 每个keypointconfidence score
  • embedding

    corner进行分组

  • offset

    cornerheatmapremapinput image

Generate BBox

  1. top-left corner和bottom-right corner分别取top-100
  2. 根据embedding distancecorner进行分组(embedding distance < T h r e s h o l d Threshold Threshold
  3. 计算bounding boxconfidence score(2个corner score的平均

Drawbacks

CornerNetFalse Discovery Rate(FD)很高(即:有大量的incorrect bounding box

AP & FD的含义,见 [Metric AP & AR & FD](#Metric AP & AR & FD)

Generating BBox

  1. 选取 top-kcenter keypoints

  2. center keypointremapinput image(使用offset

  3. bounding box中定义Central Region

  4. 保留符合要求bounding box

    • center keypoint落到Central Region中
    • center keypointbounding box类别相同
  5. 计算bounding boxscore

    top-left cornerbottom-right cornercenteraverage score

Training

Input & Output Size

  • input size:511×511
  • output size:128×128

Data Augmentation

同 CornerNet

Inferencing

Single-Scale Testing

原分辨率,将originalflipped输入网络

Multi-Scale Testing

以分辨率 [ 0.6 , 1.0 , 1.2 , 1.5 , 1.8 ] [0.6, 1.0, 1.2,1.5,1.8] [0.6,1.0,1.2,1.5,1.8],将originalflipped输入网络

Steps

  1. 根据70Triplet确定70bounding box

    详见 [Generating BBox](#Generating BBox)

  2. flipped image再次flip,合并到原image

  3. Post-ProcessingSoft-NMS

  4. top-100bounding box

Ablation Experiment

[paper reading] CenterNet (Triplets)

Incorrect Bounding Box Reduction

[paper reading] CenterNet (Triplets)

Inference Speed

visual patterns exploration的cost很小

CenterNet某版本可以在精度和速度上同时超过CornerNet某版本

Center Pooling Ablation

  • 结论

    Center Pooling可以大幅度提高large objectAP

  • 原因

    • Center Pooling可以提取更丰富的internal visual patterns
    • larger object包含更多的internal visual pattern

[paper reading] CenterNet (Triplets)

Cascade Corner Pooling Ablation

  • 结论

    • 由于large object丰富的internal visual patternsCascade Corner Pooling可以看到更多的object

    • 过于丰富的internal visual patterns影响其对boundary的敏感,导致inaccurate bounding box

      • 可以通过Center Pooling抑制错误的Bounding box

Central Region Exploration Ablation

  • 结论

    提升了整体的AP,其中小目标AP提升最大

  • 原因

    小目标center keypoint容易被located

Error Analysis

  1. Exploration of visual patterns依赖于center keypoint实现 ==> Center keypoint的丢失会导致CenterNet丢失bounding box的visual pattern

  2. Center keypoint还有很大的提升空间

Metric AP & AR & FD

AP:Average Precision Rate

是在所有category上,以10个Threshold(e.g. 0.5 : 0.05 : 0.95 0.5:0.05:0.95 0.5:0.05:0.95)上计算

可以反映网络可以预测多少高质量的bounding box(一般IoU ≥ 0.5 \ge0.5 0.5

是MS-COCO数据集最重要的metric

AR:Maximum Recall Rate

每张图片上取固定数量的detection,在所有类别10个IoU Threshold上取平均

FD:False Discovery Rate

反映incorrect bounding box的比例
FD = 1 − AP \text{FD} = 1-\text{AP} FD=1AP

Small & Medium & Large

  • small object area < 3 2 2 \text{area}<32^2 area<322

  • medium object 3 2 2 < area < 9 6 2 32^2<\text{area}<96^2 322<area<962

  • large object area > 9 6 2 \text{area}>96^2 area>962

Math

Central Region

[paper reading] CenterNet (Triplets)

Loss Function

主要分为:

  • Detection Loss

    • Corner Detection Loss L det co \text{L}_{\text{det}}^{\text{co}} Ldetco
    • Center Detection Loss L det ce \text{L}_{\text{det}}^{\text{ce}} Ldetce
  • Pull & Push Loss

    仅对Corner进行

    • Pull Loss L pull co \text{L}_{\text{pull}}^{\text{co}} Lpullco
    • Push Loss L push co \text{L}_{\text{push}}^{\text{co}} Lpushco
  • Offset Loss

    • Corner offset Loss L off co \text{L}_{\text{off}}^{\text{co}} Loffco
    • Center offset Loss L off ce \text{L}_{\text{off}}^{\text{ce}} Loffce

[paper reading] CenterNet (Triplets)

  • α = β = 0.1 \alpha=\beta = 0.1 α=β=0.1
  • γ = 1 \gamma=1 γ=1

Use Yourself

……

Related Work

Anchor-Based Method

Introduction

Anchor-Based Method有2个关键点:

  • 放置预定义size和ratioanchor
  • 根据ground-truthpositive bounding box进行regression

drawbacks

  • 需要大量的anchor(以保持和ground-truth box足够高的IoU

  • anchorsize和ratio需要手工设计(带来大量的超参数需要调试)

  • anchor和ground-truth没有对齐

KeyPoint-Based Method

这里主要指的是CornerNet

Introduction

即:使用一对corner表示一个object

drawbacks

  • referring到global信息能力相对较弱

    换句话说,即:对object的boundary信息敏感

  • 无法确知哪对KeyPoints应该表示object

详见 [Problem to Solve](#Problem to Solve)

Two-Stage Method

Steps

  • Extract RoIs ==> stage-1
  • classify & regress RoIs ==> stage-2

Models

RCNN

  • selective search获得RoI
  • CNN作为classifier

SPP-Net & Faster-RCNN

  • feature map中提取RoIs

Faster-RCNN

  • 使用RPNanchor进行regression,实现了end-to-end训练

Mask-RCNN

  • Faster-RCNN + mask-prediction branch
  • 同时实现detection和segmentation

R-FCN

  • FC层替换成了position-sensitive score maps

Cascade RCNN

通过训练一系列IoU阈值逐渐升高的detector,解决了2个问题:

  • 训练时的overfitting
  • 推断时的quality mismatch

One-stage Method

one-stage方法的通病:缺少cropped regionadditional look

Steps

直接anchor box进行classifyregress

Models

YOLOv1

  • image ==> S×S grid
  • 不使用anchor,直接去学习bounding box的size

YOLOv2

  • 重新使用了较多的anchor
  • 使用了新的bounding box regression方法

SSD

  • 使用不同convolutional stagefeature map进行classifyregress

DSSD

  • SSD + deconvolution ==> 结合low-level和high-level的feature

R-SSD

  • 对不同feature layer,进行pooling和deconvolution ==> 结合low-level和high-level的feature

RON

  • reverse connection
  • objectness prior

RefineDet

  • 对location和size进行2次refine,继承了one-stage和two-stage的优点

CornerNet

  • keypoint-based method
  • 用一对corner表示一个object

Problems

  • Cascade Corner Pooling的internal方向,怎么找boundary方向的最大值呢?
  • AP和AR的含义到底是什么?
  • 为什么CornerNet去referring目标的global information的能力很弱?