【2】Model Compression As Constrained Optimization, with Application to Neural Nets. 2017
文章地址: https://arxiv.org/pdf/1707.01209.pdf
Part I: general framework.
We give a general formulation of model compression as constrained optimization.
Related work.
Four categories of model compression.
- Direct learning: : find the small model with the best loss regardless of the reference.
- Direct compression (DC): : find the closest approximation to the parameters of the reference model.
- Model compression as constrained optimization: It forces and to be models of the same type, by constraining the weights to be constructed from a low-dimensional parameterization , but must optimize the loss .
- Teacher-student : : find the closest approximation to the reference function , in some norm.
A constrained optimization formulation.
Types of compression
A “Learning-Compression” (LC) algorithm
Direct compression (DC) and the beginning of the path
Compression, generalization and model selection
Compression
Compression can also be seen as a way to prevent overfitting, since it aims at obtaining a smaller model with a similar loss to that of a well-trained reference model.
Generalization
The reference model was not trained well enough, so that the continued training that happens while compressing reduces the error.
Model selection
A good approximate strategy for model selection in neural nets is to train a large enough reference model and compress it as much as possible.