【1】Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation -- NIPS2014

文章地址: https://arxiv.org/pdf/1404.0736.pdf
代码地址: https://cs.nyu.edu/~denton/compress_conv.zip
Contribution.

A collection of generic methods to exploit the redundancy inherent in deep CNNs.
Showing empirical speedups on convolutional layers by a factor of 2-3x and a reduction of parameters in fully connected layers by a factor of 5-10x.

Monochronmatic Convolution Approximation.
Let $W\in \mathbb{R}^{C\times X \times Y \times F} (96,7,7,3)$
For every output feature $f$ , consider the matrix $W_f \in \mathbb{R}^{C\times (XY)}$
Find the SVD, $W_f = U_fS_fV_f^{T}$ , where $U_f \in \mathbb{R}^{C\times C }(3,3), S_f \in \mathbb{R}^{C\times XY}(3,7\times 7 =49), V_f \in \mathbb{R}^{XY\times XY}(49,49)$ .
We can take the rank 1 approximation of $W_f$ , $\tilde{W}_f = \tilde{U}_f\tilde{S}_f\tilde{V}_f^{T}$ , where $\tilde{U}_f\in \mathbb{R}^{C\times 1}, \tilde{S}_f\in \mathbb{R}, \tilde{V}_f\in \mathbb{R}^{1\times XY}$ .

Further clustering the $F$ left singular vectors, $\tilde{U}_f$ into $C'$ clusters, $C'<F$ . (Kmeans)
$\tilde{W}_f =U_{c_f}\tilde{S}_f\tilde{V}_f^T$ , where $U_{c_f}\in\mathbb{R}^{C\times 1}$ is the cluster center for cluster $c_f$ .

Biclustering Approximation.
Let $W\in \mathbb{R}^{C\times X \times Y \times F}$ , $W_C\in\mathbb{R}^{C\times (XYF)}$ , $W_F\in\mathbb{R}^{(CXY)\times F}$ .
Clustering rows of $W_C$ into $G$ clusters.
Clustering columns of $W_F$ into $H$ clusters.
Then we get $H\times G$ sub-tensors $W(C_i, :, :,F_j),W_S\in\mathbb{R}^{\frac{C}{G}\times(XY)\times{\frac{F}{H}}}$
Each sub-tensor contains similar elements, and thus is easier to fit with a low-rank approximation.

Outer product decomposition ( $rank-K$ )
$W^{k+1}\leftarrow W^k-\alpha \otimes \beta\otimes \gamma$
where $\alpha\in \mathbb{R}^C,\beta\in\mathbb{R}^{XY},\gamma\in\mathbb{R}^{F}$
SVD decomposition
For $W\in\mathbb{R}^{m\times n k}$ , $\tilde{W}\approx\tilde{U}\tilde{S}\tilde{V}^T$ .
W can be compressed even further by applying SVD to $\tilde{V}$ .
Use $K_1$ and $K_2$ to denote the rank used in the first and second SVD.

Settings.
【1】Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation -- NIPS2014

【1】Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation -- NIPS2014

相关推荐