【2】Model Compression As Constrained Optimization, with Application to Neural Nets. 2017

文章地址: https://arxiv.org/pdf/1707.01209.pdf

Part I: general framework.

We give a general formulation of model compression as constrained optimization.

Related work.

Four categories of model compression.

  1. Direct learning: minΘL(h(x;Θ))\min_\Theta L(h(x; \Theta)): find the small model with the best loss regardless of the reference.
  2. Direct compression (DC): minΘw(Θ)2\min_\Theta ∥w − ∆(\Theta)∥^2: find the closest approximation to the parameters of the reference model.
  3. Model compression as constrained optimization: It forces hh and ff to be models of the same type, by constraining the weights ww to be constructed from a low-dimensional parameterization w=(Θ)w = ∆(Θ), but hh must optimize the loss LL.
  4. Teacher-student : minΘXp(x)f(x;w)h(x;Θ)2dx\min_\Theta \int_X p(x) ∥f (x; w) − h(x; \Theta)∥^2dx: find the closest approximation hh to the reference function ff, in some norm.

A constrained optimization formulation.【2】Model Compression As Constrained Optimization, with Application to Neural Nets. 2017【2】Model Compression As Constrained Optimization, with Application to Neural Nets. 2017

Types of compression
【2】Model Compression As Constrained Optimization, with Application to Neural Nets. 2017【2】Model Compression As Constrained Optimization, with Application to Neural Nets. 2017

A “Learning-Compression” (LC) algorithm

【2】Model Compression As Constrained Optimization, with Application to Neural Nets. 2017
【2】Model Compression As Constrained Optimization, with Application to Neural Nets. 2017

Direct compression (DC) and the beginning of the path
【2】Model Compression As Constrained Optimization, with Application to Neural Nets. 2017

Compression, generalization and model selection

Compression
Compression can also be seen as a way to prevent overfitting, since it aims at obtaining a smaller model with a similar loss to that of a well-trained reference model.

Generalization
The reference model was not trained well enough, so that the continued training that happens while compressing reduces the error.

Model selection
A good approximate strategy for model selection in neural nets is to train a large enough reference model and compress it as much as possible.