【论文阅读】Pseudo Mask Augmented Object Detection

【论文阅读】Pseudo Mask Augmented Object Detection

abstract

In this work, we present a novel and effective framework to facilitate object detection with the instance-level segmentation information that is only supervised by bounding box annotation. Starting from the joint object detection and instance segmentation network, we propose to recursively estimate the pseudo ground-truth object masks from the instance-level object segmentation network training, and then enhance the detection network with top-down segmentation feedbacks. The pseudo ground truth mask and network parameters are optimized alternatively to mutually benefit each other. To obtain the promising pseudo masks in each iteration, we embed a graphical inference that incorporates the low-level image appearance consistency and the bounding box annotations to refine the segmentation masks predicted by the segmentation network. Our approach progressively improves the object detection performance by incorporating the detailed pixel-wise information learned from the weakly-supervised segmentation network. Extensive evaluation on the detection task in PASCAL VOC 2007 and 2012 [12] verifies that the proposed approach is effective.

【论文阅读】Pseudo Mask Augmented Object Detection
该方法首先使用一组卷积层提取图像特征,之后分成两个子网络,上面是实例分割子网络,下面是目标检测子网络。

instance-level object segmentation sub-network

实例分割子网络是基于Instance-sensitive fully convolutional networks方法做了一些修改,将k2k^2个position sensitive score maps 扩展到k2Ck^2*C个,从而可以进行多分类任务。
之后再对分割的结果进行Pseudo Mask Refinement,用的graph model方法:
iU(yi)+i,jV(yi,yj) \sum_{i} U\left(y_{i}\right)+\sum_{i, j} V\left(y_{i}, y_{j}\right)

其中,一元项U(yi)U\left(y_{i}\right)
U(yi)={0 if yi=0 and xibgt if yi=1 and xibgtlog(1probfg(xi)) if yi=0 and xibgtlog(probfg(xi)) if yi=1 and xibgt U\left(y_{i}\right)=\left\{\begin{array}{ll}{0} & {\text { if } y_{i}=0 \text { and } x_{i} \notin b^{g t}} \\ {\infty} & {\text { if } y_{i}=1 \text { and } x_{i} \notin b^{g t}} \\ {-\log \left(1-\operatorname{prob}_{f g}\left(x_{i}\right)\right)} & {\text { if } y_{i}=0 \text { and } x_{i} \in b^{g t}} \\ {-\log \left(\operatorname{prob}_{f g}\left(x_{i}\right)\right)} & {\text { if } y_{i}=1 \text { and } x_{i} \in b^{g t}}\end{array}\right.

二元项V(yi,yj)V\left(y_{i}, y_{j}\right)

V(yi,yj)=[yiyj]{hc(xi)hc(xj)22δc2ht(xi)ht(xj)22δt2} \begin{aligned} V\left(y_{i}, y_{j}\right)=[ & y_{i} \neq y_{j} ]\left\{-\frac{\left\|h_{c}\left(x_{i}\right)-h_{c}\left(x_{j}\right)\right\|_{2}^{2}}{\delta_{c}^{2}}\right.\\ &-\frac{\left\|h_{t}\left(x_{i}\right)-h_{t}\left(x_{j}\right)\right\|_{2}^{2}}{\delta_{t}^{2}} \} \end{aligned}

目标函数可以用graph cut solver求解最小值。

detection sub-network

使用Faster R-CNN,并将两个子网络联系起来(图中红色虚线)让分割子网络为检测提供top-down feedback information。在每个子网络进行ROI pooling之后,分割分支得到的每个ROI instance segmentation score maps使用1*1的卷积层,目的是得到与检测分支ROI feature map相同维度的特征图,之后将分割和检测得到的feature map进行element wise sum。