视频物体检测(VID) Deep Feature Flow for Video Recognition

Deep Feature Flow for Video Recognition
CVPR2017 MSRA出品
Code: https://github.com/msracver/Deep-Feature-Flow

基于单帧的目标检测和分割已经做的比较成熟，但是基于视频的目标检测和分割目前还是有问题的，最主要的问题就是直接将单帧的算法用于视频，计算量比较大，做不到实时。这里我们只对关键帧计算CNN特征提取，然后通过一个 flow field 将关键帧的CNN特征 propagate 其他帧去，避免了每一帧使用CNN网络提取特征图。 flow field 的计算量相对较小，尤其是 FlowNet2.0 的提出。

视频物体检测(VID) Deep Feature Flow for Video Recognition
上图主要说明本文的思路可行性，propagate 得到的特征图和 CNN网络计算的特征图效果差不多。

Deep Feature Flow

视频物体检测(VID) Deep Feature Flow for Video Recognition
对于一个 CNN 检测或分割网络可以分为两个子网络： feature network 特征提取网络， task network 任务网络
Consecutive video frames are highly similar. The similarity is even stronger in the deep feature maps
视频中的连续帧内容是高度相关的，在 CNN特征图中这种相似性表现的更明显

特征图的这种相似性可以帮助我们降低计算量，可以通过关键帧的特征图来 propagate 其他非关键帧的特征图 –spatial warping

The features in the deep convolutional layers encode the semantic concepts and correspond to spatial locations in the image
Such spatial correspondence allows us to cheaply propagate the feature maps by the manner of spatial warping

视频物体检测(VID) Deep Feature Flow for Video Recognition

视频物体检测(VID) Deep Feature Flow for Video Recognition

相关推荐