文章地址: https://arxiv.org/pdf/1404.0736.pdf
代码地址: https://cs.nyu.edu/~denton/compress_conv.zip
Contribution.
- A collection of generic methods to exploit the redundancy inherent in deep CNNs.
- Showing empirical speedups on convolutional layers by a factor of 2-3x and a reduction of parameters in fully connected layers by a factor of 5-10x.
Monochronmatic Convolution Approximation.
Let W∈RC×X×Y×F(96,7,7,3)
For every output feature f, consider the matrix Wf∈RC×(XY)
Find the SVD, Wf=UfSfVfT, where Uf∈RC×C(3,3),Sf∈RC×XY(3,7×7=49),Vf∈RXY×XY(49,49).
We can take the rank 1 approximation of Wf, W~f=U~fS~fV~fT, where U~f∈RC×1,S~f∈R,V~f∈R1×XY.
Further clustering the F left singular vectors, U~f into C′ clusters, C′<F. (Kmeans)
W~f=UcfS~fV~fT, where Ucf∈RC×1 is the cluster center for cluster cf.
Biclustering Approximation.
Let W∈RC×X×Y×F, WC∈RC×(XYF), WF∈R(CXY)×F.
Clustering rows of WC into G clusters.
Clustering columns of WF into H clusters.
Then we get H×G sub-tensors W(Ci,:,:,Fj),WS∈RGC×(XY)×HF
Each sub-tensor contains similar elements, and thus is easier to fit with a low-rank approximation.
- Outer product decomposition (rank−K)
Wk+1←Wk−α⊗β⊗γ
where α∈RC,β∈RXY,γ∈RF
- SVD decomposition
For W∈Rm×nk,W~≈U~S~V~T.
W can be compressed even further by applying SVD to V~.
Use K1 and K2 to denote the rank used in the first and second SVD.
Settings.
