论文分享--A Strong Baseline for Re-ID

https://zhuanlan.zhihu.com/p/65631409

https://zhuanlan.zhihu.com/p/61831669

分享最近读到的一篇论文Bag of Tricks and A Strong Baseline for Deep Person Re-identification，这篇文章对person reid问题中的训练技巧进行了一个很好的总结，并提出了一个性能强大的baseline，在Market1501数据集上实现了rank-1 = 94.5%的准确率。

论文链接：https://arxiv.org/pdf/1903.07071.pdf

代码链接：https://github.com/michuanhaohao/reid-strong-baseline

作者：Hao Luo1∗, Youzhi Gu1∗, Xingyu Liao2∗, Shenqi Lai3, Wei Jiang1 1 Zhejiang University, 2 Chinese Academy of Sciences, 3 Xi’an Jiaotong University

Motivation

[1] We surveyed many works published on top conferences and found most of them were expanded on poor baselines.

对person re-id领域目前性能最好的方法进行了调研，发现大多数方法的baseline都比较低。

[2] For the academia, we hope to provide a strong baseline for researchers to achieve higher accuracies in person ReID.

对学术界，希望能够提供一个强大的baseline。

[3] For the community, we hope to give reviewers some references that what tricks will affect the performance of the ReID model. We suggest that when comparing the performance of the different methods, reviewers need to take these tricks into account.

对学术圈，希望让reviewer了解到trick的重要性。

[4] For the industry, we hope to provide some effective tricks to acquire better models without too much extra consumption.

对工业界，希望能够提供一个简单而有效的模型。

Contribution

[1] We collect some effective training tricks for person ReID. Among them, we design a new neck structure named as BNNeck. In addition, we evaluate the improvements from each trick on two widely used datasets.

本文总结了person re-id任务的一些训练技巧。同时提出了一个结构，BNNeck。

[2] We provide a strong ReID baseline, which achieves 94.5% and 85.9% mAP on Market1501. It is worth mentioned that the results are obtained with global features provided by ResNet50 backbone. To our best knowledge, it is the best performance acquired by global features in person ReID.

提出了一个强大的baseline并在Market1501上实现rank-1 = 94.5%， mAP = 85.9%。

[3] As a supplement, we evaluate the influences of the image size and the number of batch size on the performance of ReID models.

进行实验探究了图片尺寸和batch size大小对性能的影响。

Standard Baseline

[1] We initialize the ResNet50 with pre-trained parameters on ImageNet and change the dimension of the fully connected layer to N. N denotes the number of identities in the training dataset. 采用ImageNet上预训练过的ResNet50作为backbone。

[2] We randomly sample P identities and K images of per person to constitute a training batch. Finally the batch size equals to B = P×K. In this paper, we set P = 16 and K = 4.

为了使用triplet loss，每个batch中包括16个人，每个人4张图。

[3] We resize each image into 256 × 128 pixels and pad the resized image 10 pixels with zero values. Then randomly crop it into a 256 × 128 rectangular image.

图片预处理中采用了resize和random crop。

[4] Each image is flipped horizontally with 0.5 probability.

图片预处理中还采用了随机水平翻转。

[5] Each image is decoded into 32-bit floating point raw pixel values in [0, 1]. Then we normalize RGB channels by subtracting 0.485, 0.456, 0.406 and dividing by 0.229, 0.224, 0.225, respectively.

图片预处理中采用了归一化，使像素值分布满足均值为0，方差为1。

[6] The model outputs ReID features f and ID prediction logits p.

模型的输出包括特征f和预测ID概率p。

[7] ReID features f is used to calculate triplet loss. ID prediction logits p is used to calculated cross entropy loss. The margin m of triplet loss is set to be 0.3.

模型的输出f用于计算triplet loss，p用于计算交叉熵损失。

[8] Adam method is adopted to optimize the model. The initial learning rate is set to be 0.00035 and is decreased by 0.1 at the 40th epoch and 70th epoch respectively. Totally there are 120 training epochs.

优化器采用Adam，另一篇总结video based reid的文章[2]也使用了Adam作为优化器。