cs231n-notes-Lecture-9:AlexNet/VGGNet/GoogleNet/ResNet
Lecture 9 CNN Architectures
AlexNet
VGGNet
- Small filters
Why?
- Stack of three 3x3 conv (stride 1) layers has same effective receptive field as one 7x7 conv layer
- But deeper, more non-linearities
- And fewer parameters: vs. for C channels per layer
- Deeper networks
GoogleNet
- Deeper networks : 22layers.
- efficient computation : Efficient “Inception” module.
- No Fc layers.
- Only 5 million parameters, 12x less than AlexNet. Most parameters in AlexNet or VGGNet are in fc layers.
- Computational complexity.
- Computational complexity solution : use “bottleneck” (1*1 convolutions, less filters) to reduce feature depth.
ResNet
Deeper Models may be harder to optimize although they have the ability to achiveve the same peformance theoretically because the additional layers can learn identify mapping.
But it’s ideal. Here is the solution of ResNet.
Residual: learn what should be added to or subtracted from the original input X. Intuitively, it’s easier to learn the bias than a direct mapping. The bias can be only a small change or zeros.
Training ResNet in practice:
- Batch Normalization after every CONV layer
- Xavier/2 initialization from He et al.
- SGD + Momentum (0.9)
- Learning rate: 0.1, divided by 10 when validation error plateaus
- Mini-batch size 256
- Weight decay of 1e-5
- No dropout used
Experimental Results
- Able to train very deep networks without degrading (152 layers on ImageNet, 1202 on Cifar)
- Deeper networks now achieve lowing training error as expected
- Swept 1st place in all ILSVRC and COCO 2015 competitions