Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition

crop & warp

spp 论文笔记

In this paper, we introduce a spatial pyramid pooling layer to remove the fixed-size constraint of the network.

Add a SPP layer on top of the last convolutional layer.

The SPP layer pools the features and generates fixed-length outputs, which are then fed into the fully-connected layers.

也就是，这是一种信息聚集的方法，避免来cropping 和 warping

前面的卷积层可以接受任意大小的图片，然后输出不同大小的特征。
使用SPP产生固定大小输出

训练过程中，正类指与ground-truth 有0.5以上重叠，负类指以下。在每个mini-batch中，大约25%为正类。
使用1e-4学习率，训练250K mini-batch
使用1e-5，训练50k

使用bounding box regression，来post-process the prediction windows。