嵌入式目标检测--Fast YOLO: A Fast You Only Look Once System for Real-time Embedded Object Detection

Fast YOLO: A Fast You Only Look Once System for Real-time Embedded Object Detection in Video
https://arxiv.org/abs/1709.05943

针对在嵌入式设备使用CNN进行目标检测，本文对 YOLOv2进行改进，在稍微降低精度的情况下，减少模型的参数量，提高运算速度。在视频处理中，相对 YOLOv2 平均加速 ∼3.3X， run an average of ∼18FPS on a Nvidia Jetson TX1 embedded system

2 Methodology
Fast YOLO framework 主要包括两个部分：1） optimized YOLOv2 architecture，2）motion-adaptive inference

2.1 Optimized Network Architecture
CNN网络最优结构设计是一个难题，这个过程通常是一个专家在针对特征任务中的约束（ accuracy and the number of parameters）尝试各种网络结构寻找最佳网络设计。当前寻找最优网络结构通常是当做一个 hyper-parameter optimization problem，但是这个优化问题的解决 very time-consuming，大多数方法要么计数不可控，要么得到的解不是最优的。例如在超参数优化中常用的一个方法是 grid search，在大范围内尝试各种不同 network configurations ，将最好的 configuration 作为最终的网络结构。但是用于目标检测的CNN网络通常有很多参数，grid search 这种方法就不是 computationally tractable。

这里我们换个角度看问题，从 improving network efficiency 这个方向，我们参考 evolutionary deep intelligence framework [16, 17, 18]，用它来优化 YOLOv2 得到一个 optimized network architecture O-YOLOv2，这个新的网络比原来的 YOLOv2 参数减少 ∼2.8X

2.2 Motion-adaptive Inference
嵌入式目标检测--Fast YOLO: A Fast You Only Look Once System for Real-time Embedded Object Detection

因为在视频里存在大量信息冗余，所以不是所有的视频帧都含有 unique 信息，所以不用对每一帧进行 deep inference，这里我们引入一种基于运动信息的自适应 inference，we introduce a motion-adaptive inference approach to determine if deep inference is needed for a particular video frame

3 Results & Discussion

Pascal VOC dataset
嵌入式目标检测--Fast YOLO: A Fast You Only Look Once System for Real-time Embedded Object Detection

running on a Nvidia Jetson TX1 embedded system
嵌入式目标检测--Fast YOLO: A Fast You Only Look Once System for Real-time Embedded Object Detection

嵌入式目标检测--Fast YOLO: A Fast You Only Look Once System for Real-time Embedded Object Detection

相关推荐