Object Tracking Paper(7):ACFN--Attentional Correlation Filter Network for Adaptive Visual Tracking

Time:  1st March 2018,the first week.

The seventh paper: ACFN--Attentional Correlation Filter Network for Adaptive Visual Tracking/ Author: Jongwon Choi, Hyung Jin Chang, Sangdoo Yun, Tobias Fischer, Yiannis Demiris, Jin Young Choi/ Publication information: CVPR 2017

Outline: This paper brings forward a new framwork of tracking which is combined by a correlation network and an attention network. There are 260(2 features*2 kernels*13 scales*5 delayed update) different modules to give the relative validation scores. When the correlation network gives the modules' scores, it will be fed into the attention network. They break it into two sub-netwoks, namely, the prediction sub-network and the selection sub-network. The currrent output from the correlation  network and the previous outputs will be fed into the prediction sub-network whose first layer is LSTM layer and latter layers are four full convolution layers. The prediction sub-network will obtain output and it will be given into the selection sub-network to get a binary vector that is utilised to select the good results and those more appropriate modules in correlation filter network. The final output is based on the outputs from the outputs from the correlation network, the prediction output and the selection output. It is discovered that the modules by HOG features and Gaussian kernerl is often activated.

Object Tracking Paper(7):ACFN--Attentional Correlation Filter Network for Adaptive Visual Tracking

Methodology: 1. Features: Color+HOG, and it does not utilise the deep features  but a deep attention network.

2. Scale changes: flexible aspect ratio, and the scale candidates will be normalized to the same size in each modules.

3. Attentional Weight mechanism is a weighted sum osf those regression map from feature maps.

4. It deploys the efficiency of KCF in FFT.

5. The two sub-networks are trained separately, and the Long short-term layer is deployed to obtain the useful information from the previous frames.

6. The full occlusions and redetection mechanism.

Advantages: The speed of this work is fine from the aspect of 260 modules. Those diversity modules fullfills the dynamic features selection and scale changes. And the attention network can learn the information from previous frames. And the binary outputs accelarate the correlation network. The training procedure is not online.
Disadvantages: The features can be extended to deep features to increase the accuracy.