Paper Reading-YOLOv4

paper：YOLOv4
github：Darknet
个人感觉，这篇论文更像是一种好多方法的总结之类的论文，最近公司要做一个分享，做一下阅读笔记

标题传统的物体检测大概是由以下几个部分组成的（借用作者的原话）：

对于YOLOv4：

Paper Reading-YOLOv4

Backbone的作用是提取图像的特征，作者比较了CSPResNext50和CSPDarknet53后，选择了后者，原话是：

our numerous studies demonstrate that the CSPResNext50 is considerably better compared to CSPDarknet53 in terms of object classification on the ILSVRC2012 (ImageNet) dataset. However, conversely, the CSPDarknet53 is better compared to CSPResNext50 in terms of detecting objects on the MS COCO dataset.

简单理解就是，这两个网络，前者的效果在分类的数据集表现很好，但是后者在检测的数据集上表现更好。除此之外，还有其它的原因，与分类相比，检测需要更大的分辨率，更深的网络，更多的参数，具体如下：
Paper Reading-YOLOv4

The CSPResNext50 contains only 16 convolutional layers 3 × 3, a 425 × 425 receptive field and 20.6M parameters, while CSPDarknet53 contains 29 convolutional layers 3 × 3, a 725 × 725 receptive field and 27.6M parameters. This theoretical justification, together with our numerous experiments, show that CSPDarknet53 neural network is the optimal model of the two as the backbone for a detector.

在这里，作者也对感受野做了一些总结：

Paper Reading-YOLOv4

不同的感受野会有不一样的作用，YOLOv4为了增大感受野，使用了SPP和PANet，原话是：

We add the SPP block over the CSPDarknet53, since it significantly increases the receptive field, separates out the most significant context features and causes almost no reduction of the network operation speed. We use PANet as the method of parameter aggregation from different backbone levels for different detector levels, instead of the FPN used in YOLOv3.

SPP在YOLOv4中的结构为：

Paper Reading-YOLOv4

当然，之前说感受这里有很多方法的总结，可以从两个方面来概括：Bag of freebies 和 Bag of specials

关于Bag of freebies，原文中有解释：

Therefore, researchers always like to take this advantage and develop better training methods which can make the object detector receive better accuracy without increasing the inference cost. We call these methods that only change the training strategy or only increase the training cost as “bag of freebies.”

简单理解就是，不会增加inference的时间，可以增加模型的accuracy，这样的方法就是Bag of freebies

Bag of freebies: （1）data augmentation; （2）semantic distribution bias; （3）express the relationship of the degree of association between different categories with the one-hot hard representation; （4）Bounding Box (BBox) regression

而Bag of specials，就是会增加inference的时间，但是能大幅提高accuracy

For those plugin modules and post-processing methods that only increase the inference cost by a small amount but can significantly improve the accuracy of object detection, we call them “bag of specials”.

Bag of specials: （1）enhance receptive field; （2） attention module; （3） feature integration; （4）activation function; （5） post-processing

Additional Improvements：

在提出了这些方法后，作者做的进一步的改进就是：

Paper Reading-YOLOv4

modified SAM 和 modified PAN：

Paper Reading-YOLOv4
将Spatial-wise Attention变为Point-wise Attention，也就是从空间上的注意力到点注意力来修改SAM

关于PANet的改变，就是addition变为concatenation，我个人感觉是为了增加通道数，能搜集到更多的特征，减少信息的损失。

最后，从网上找的一张感觉比较不错的图：

Paper Reading-YOLOv4

标题传统的物体检测大概是由以下几个部分组成的（借用作者的原话）：

对于YOLOv4：

当然，之前说感受这里有很多方法的总结，可以从两个方面来概括：Bag of freebies 和 Bag of specials

Additional Improvements：

最后，从网上找的一张感觉比较不错的图：

相关推荐