Paper Reading-YOLOv4
paper:YOLOv4
github:Darknet
个人感觉,这篇论文更像是一种好多方法的总结之类的论文,最近公司要做一个分享,做一下阅读笔记
标题传统的物体检测大概是由以下几个部分组成的(借用作者的原话):
对于YOLOv4:
- Backbone的作用是提取图像的特征,作者比较了CSPResNext50和CSPDarknet53后,选择了后者,原话是:
our numerous studies demonstrate that the CSPResNext50 is considerably better compared to CSPDarknet53 in terms of object classification on the ILSVRC2012 (ImageNet) dataset. However, conversely, the CSPDarknet53 is better compared to CSPResNext50 in terms of detecting objects on the MS COCO dataset.
简单理解就是,这两个网络,前者的效果在分类的数据集表现很好,但是后者在检测的数据集上表现更好。除此之外,还有其它的原因,与分类相比,检测需要更大的分辨率,更深的网络,更多的参数,具体如下:
The CSPResNext50 contains only 16 convolutional layers 3 × 3, a 425 × 425 receptive field and 20.6M parameters, while CSPDarknet53 contains 29 convolutional layers 3 × 3, a 725 × 725 receptive field and 27.6M parameters. This theoretical justification, together with our numerous experiments, show that CSPDarknet53 neural network is the optimal model of the two as the backbone for a detector.
- 在这里,作者也对感受野做了一些总结:
不同的感受野会有不一样的作用,YOLOv4为了增大感受野,使用了SPP和PANet,原话是:
We add the SPP block over the CSPDarknet53, since it significantly increases the receptive field, separates out the most significant context features and causes almost no reduction of the network operation speed. We use PANet as the method of parameter aggregation from different backbone levels for different detector levels, instead of the FPN used in YOLOv3.
- SPP在YOLOv4中的结构为:
当然,之前说感受这里有很多方法的总结,可以从两个方面来概括:Bag of freebies 和 Bag of specials
- 关于Bag of freebies,原文中有解释:
Therefore, researchers always like to take this advantage and develop better training methods which can make the object detector receive better accuracy without increasing the inference cost. We call these methods that only change the training strategy or only increase the training cost as “bag of freebies.”
简单理解就是,不会增加inference的时间,可以增加模型的accuracy,这样的方法就是Bag of freebies
- Bag of freebies
- (1)data augmentation
- (2)semantic distribution bias
- (3)express the relationship of the degree of association between different categories with the one-hot hard representation
- (4)Bounding Box (BBox) regression
- 而Bag of specials,就是会增加inference的时间,但是能大幅提高accuracy
For those plugin modules and post-processing methods that only increase the inference cost by a small amount but can significantly improve the accuracy of object detection, we call them “bag of specials”.
- Bag of specials
- (1)enhance receptive field
- (2) attention module
- (3) feature integration
- (4)activation function
- (5) post-processing
Additional Improvements:
- 在提出了这些方法后,作者做的进一步的改进就是:
- modified SAM 和 modified PAN:
将Spatial-wise Attention变为Point-wise Attention,也就是从空间上的注意力到点注意力来修改SAM
关于PANet的改变,就是addition变为concatenation,我个人感觉是为了增加通道数,能搜集到更多的特征,减少信息的损失。