2017-05-28 Salient Object detection: A Discriminative Regional Feature Integration Approach
The learning of "Salient Object detection: A Discriminative Regional Feature Integration Approach" (H. Jiang, J. Wang, Z. Yuan, Y. Wu, N. Zheng, and S. Li, in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2013, pp.2083-2090)
Background
- The contrast descriptor, the difference of a region from its surrounding region, which is widely used as saliency (uniqueness) indicators, is not reliable in some images.
- Heuristically hand-crafting special features for saliency may not achieve very good results.
- A salient object is usually formed from a single image segmentation might be not reliable enough
Algorithm review
Integrate the regional contrast, regional property and regional backgroundness descriptors together to form the master saliency map, then fuse the saliency scores across multiple levels, yielding the saliency map.
Model
- Multi-level segmentation
Use the graph-based image segmentation approach to build multi-level segmentationS1,S2,…,Sm .
SuperpixelsR1,R2,…,Rk , saliency feature vector x, a saliency value a = f (x)
- training stage
Across the boundary of the salient object and the background and thus not reliable for training the random forest regressor.
Replace the pairwise similarity over two adjacent regions(Ri,Rj) with a score s(ai,aj)
, which learn using a boosted decision tree classifier from a set of positive pairs, Ri,Rj|ai=aj
and negative pairs {Ri,Rj|ai≠aj
. conduct the graph-based segmentation algorithm to group similar segments together. there is a parameter k specifying the allowed minimum size of the generated regions. Finally, we remove unconfident regions (A region is regarded to be unconfident if the number of pixels belonging to the salient object and the background are both smaller than 80% of the total number of pixels in the region). The saliency score of the remaining superpixels is set as 1 or 0 accordingly. Then use the remaining superpixels as training examples to train the random forest regressor. The training data consists of N pairs of (I, A∗), where A∗ is 1or 0.
Experimental results show that the levels of segmentation is 48 in the training phase.
- testing stage
Experimental results show that the levels of segmentation is 15 in the training phase. The method is the same with the training stage, but no more score s(ai,aj) is needed.
- Features
x=xcTxbTxoTT
xc : contrast descriptor, xb
backgroundness descriptor, xo
objectness descriptor.
- Contrast Descriptor
xc =R'∈Smα(R')W(p,p')D(v,v')
Here p (p’) is the mean position of the region R (R’), and v (v’) is the feature (the concatenation of 9 groups of features) of the region R (R’), and α (R’) is the normalized area of the region R’.
V use the color and texture features. The color features consist of the average values and a histogram with 256 bins in the RGB, HSV, and L*a*b* color spaces. The texture features consist of 15 absolute responses from the LM filters, a max-response histogram with 15 bins from the LM filters, and a histogram with 256 bins from the LBP feature.
- Image-Specific Backgroundness Descriptor
The pixels in the15-pixelwidenarrowborder region of the image, called pseudo-background, belongs to the background.
xb =D(v,v')
da1,a2=[a1-a1'…|at-at'|] , t is the number of elements in the vectors a1
anda2
. χ2h1,h2=i=1bh1i-h2i2h1i+h2i
, b is the number of histogram bins
- Objectness Descriptor
Salient object lies in the center of the image while the background usually scatters over the entire image.
The objectness descriptor aims to characterize the common property of the salient object and the background in various images, and consists of geometric and appearance features.
3 Use the random forest regressor to discriminative regional feature and get single multi-level saliency mapAm .
4 Fuse the saliency scores across multiple levels, and yield the saliency map.
The linear combinator is relatively easy and learnt in a standard way: The weights are estimated using a least square estimator: minimizing the sum of the losses(n=1NAn*-m=1MωmAnmF2) , where {An*n=1N
} are the ground truthsaliency maps of the N training images.
Yield the final saliency map A=m=1MωmAnm .
Experimental results
- The random forest regressor with 200 trees10 trained with randomly sampled 15 features 11 for node splitting reaches a balance between the efficiency and the effectiveness.
- The larger number of segmentations, thus more training data, yield better validation performance. Choose 48-level segmentations, resulting in around 1.7 million training region samples, to train the random forest regressor.
- In the testing stage, the AUC score increases when more levels of segmentations are adopted. The reason is that more layers increase the chance that some regions cover the most (even entire) part of an object. Finally, choose 15-level segmentations for testing in experiments.
- Unsupervised segmentation is used to generate training examples and supervised segmentation is used to generate training examples. The performance for salient object detection with supervised segmentation performs slightly better.
- The saliency features consist of three kinds: the contrast descriptor, the image-specific backgroundness descriptor, and the objectness descriptor. The contrast descriptor is the least important one; the objectness descriptor plays the most important role; the backgroundness descriptor is in-between.
- Illustrate the most important feature for each kind of descriptor. The most powerful backgroundness feature provides far less reliable information for salient object detection. By integrating all the weak information, much better saliency maps are achieved.
- Rather than considering all the 93 features, adopt the top 60 features, which occupy around 90% of the energy of total features, for training. This feature vector performs as well as the entire feature descriptor.
- The approach can deal well with the challenging cases where the background is cluttered. It is also worth pointing out that the approach performs well when the object touches the image border. However, the approach, like other algorithms may fail on extremely cluttered scenes.
Conclusion
This paper addresses the salient object detection problem using a discriminative regional feature integration approach. The approach integrate a lot of regional descriptors to compute the saliency scores, rather than heuristically compute saliency maps from different types of features and combine them to get the saliency map, and introduce the novel backgroundness descriptor, which is proved to be quite effective in our experiments.
Analysis
In random forests training stage, DRFI's supervised selection method for multi-layer training samples is well worth learning. In addition, the method of selecting objectness feature vectors is unique. HDCT's feature selection method is similar to that of DRFI. HDCT can improve the accuracy of DRFI algorithm by the two processing of the middle suspicious region.