论文阅读笔记:From Image-level to Pixel-level Labeling with Convolutional Networks
论文阅读笔记:From Image-level to Pixel-level Labeling with Convolutional Networks
Introduce
Task
Weakly supervised learning for image semantic segmentation, only use image class label.
Contribution
- Combine MIL(Multiple Instance Learning) with classification CNN
- experiments is state of the art
Framework
Train
For input image , pass a backbone(i.e. Overfeat + Segmentation Net), output feature maps , then pass a LSE(Log-Sum-Exp) pooling, output . Finally compute a softmax cross entrophy loss for , backpropagation gradients to train backbone.
Inference
be the for location and class label. ILP be the by softmax.
Finally, pass a interpolation to restore input image resolution. Then use a threshold(Smoothing Prior) to get the final segmentation results.
Log-Sum-Exp(LSE)
LSE is a pooling method for to , it is more smooth. When is high LSE similar to max pooling, low LSE similar to average pooling.
For accuracy, performance be more high compare to max pooling and sum pooling.
Summary
- LSE is smooth pooling than max and average pooling. Maybe it is useful.