1 What is done

Presents a way to apply a fixed model to different situations where various calculation capabilities are provided.
A method called switchable batch normalization is applied in order to train the model successfully.
Experiments that proves this method works properly is demostrated.

2 How it works

2.1 Slimmable network

2.1.1 The idea

The general idea is rather simple. Since we would like to apply this model for different calculation capabilities, why not simply drop out some of the channels to reduce calculation? This idea is exactly the idea of network pruning, but the difference occurs that, this paper generates a model who workes on different pruning rates.
For example, when we drop out 10% of its channels, it works well. When we drop out 25%, 50% or even 75%, it works still with reasonable accuracy loss.
Paper Reading: Slimmable Neural Networks
Just like the figure above, the model works on [1.0x 0.75x 0.5x 0.25x] width. Note that width here indicates the channels of network layers.

2.1.2 Realization

To realize this seemingly remarkable property, we have to train it on [1.0x 0.75x 0.5x 0.25x etc] conditions. Thus, we give the [1.0x 0.75x 0.5x 0.25x etc] a name : switch. For example, 0.25× represents that the width in all layers are scaled by 0.25 of the full model.
Then, it’s time to train the model. Here is the pseudo code:
Paper Reading: Slimmable Neural Networks

2.2 Switchable batch normalization

This model use independent batch normalization parameters for each switch. Why should we do that? The following figure gives the answer:
Paper Reading: Slimmable Neural Networks
Left shows the training error rate for both with and without S-BN(switchable batch normalization), and they looks almost the same. But the validation error on right shows that training without S-BN may cause a unstable result.

3 The performance – experiments and results

3.1 Performance on image classification task

Paper Reading: Slimmable Neural Networks
On MobileNet v2, ShuffleNet and ResNet-50, the slimmable neural network achieves comparative results. Under comparative accuracy and FLOPs, the slimmable network model is equivalent to 4 models at the same time.

3.2 Performance on object detection, instances segmentation adn keypoints detection

Paper Reading: Slimmable Neural Networks
The result here is almost the same as image classification task. For Faster-RCNN, Mask-RCNN and Keypoints-RCNN with ResNet-50 as a backbone on COCO 2017 dataset, the slimmable neural network achieves comparative result.

3.3 Performance under different umber of switches

Paper Reading: Slimmable Neural Networks
To analysize how the number of switches would impact the accuracy, the author trained a 8-switch neural network. And the comparasion on MobileNet v1 among individually trained model, 4-switch and 8-switch model shows that, the slimmable neural network is insensitive to number of switches.

Paper Reading: Slimmable Neural Networks

Contents