AlexNet 结构详解
Alex 参加ImageNet时由于计算资源有限,所以用两块GUP进行并行训练,在这里将两个并行网络合并成一个。
Input: 227 * 227 * 3 images 虽然Alex 在论文中指出其input是224*224*3,但是这样的输入与其输出并不符合
[55*55*96] CONV1: 96 11 * 11 filters at stride 4,pad 0
[27*27*96] MAX POOL1:3*3 filters at stride 2
[27*27*96] NORM1: Normalization layer
[27*27*256] CONV2 SAME: 256 5*5 filters at stride 1,pad 2
[13*13*256] MAX POOL2: 3 * 3 filters at stride 2
[13*13*256] NORM2
[13*13*384] CONV3 SAME: 384 3*3 filters at stride 1, pad 1
[13*13*384] CONV4 SAME: 384 3*3 filters at stride 1, pad 1
[13*13*256] CONV5 SAME: 256 3*3 filters at stride 1, pad 1
[6*6*256] MAX POOL3: 3*3 filters at stride 2
[4096] FC6: 4096 neurons
[4096] FC7: 4096 neurons
[1000] FC8: 1000 neurons (class scores)
Details:
- first use of ReLU
- used Norm layers
- heavy data augmentation
- dropout 0.5
- batch size 128
- SGD Momentum 0.9
- Learning rate 1e-2, reduced by 10 manually when val accuracy plateaus
- L2 weight decay 5e-4
- 7 CNN ensemble: 18.2% -> 15.4%