【2】Model Compression As Constrained Optimization, with Application to Neural Nets. 2017

文章地址: https://arxiv.org/pdf/1707.01209.pdf

Part I: general framework.

We give a general formulation of model compression as constrained optimization.

Related work.

Four categories of model compression.

Direct learning: $\min_\Theta L(h(x; \Theta))$ : find the small model with the best loss regardless of the reference.
Direct compression (DC): $\min_\Theta ∥w − ∆(\Theta)∥^2$ : find the closest approximation to the parameters of the reference model.
Model compression as constrained optimization: It forces $h$ and $f$ to be models of the same type, by constraining the weights $w$ to be constructed from a low-dimensional parameterization $w = ∆(Θ)$ , but $h$ must optimize the loss $L$ .
Teacher-student : $\min_\Theta \int_X p(x) ∥f (x; w) − h(x; \Theta)∥^2dx$ : find the closest approximation $h$ to the reference function $f$ , in some norm.

A constrained optimization formulation.

Types of compression
【2】Model Compression As Constrained Optimization, with Application to Neural Nets. 2017

A “Learning-Compression” (LC) algorithm

【2】Model Compression As Constrained Optimization, with Application to Neural Nets. 2017

Direct compression (DC) and the beginning of the path
【2】Model Compression As Constrained Optimization, with Application to Neural Nets. 2017

Compression, generalization and model selection

Compression
Compression can also be seen as a way to prevent overfitting, since it aims at obtaining a smaller model with a similar loss to that of a well-trained reference model.

Generalization
The reference model was not trained well enough, so that the continued training that happens while compressing reduces the error.

Model selection
A good approximate strategy for model selection in neural nets is to train a large enough reference model and compress it as much as possible.

【2】Model Compression As Constrained Optimization, with Application to Neural Nets. 2017

Part I: general framework.

Related work.

A constrained optimization formulation.

A “Learning-Compression” (LC) algorithm

Compression, generalization and model selection

相关推荐