cs231n-notes-Lecture-9：AlexNet/VGGNet/GoogleNet/ResNet

Lecture 9 CNN Architectures

AlexNet

VGGNet

Small filters
Why?
- Stack of three 3x3 conv (stride 1) layers has same effective receptive field as one 7x7 conv layer
- But deeper, more non-linearities
- And fewer parameters: $3 * (3^2C^2)$ vs. $7^2C^2$ for C channels per layer
Deeper networks

cs231n-notes-Lecture-9：AlexNet/VGGNet/GoogleNet/ResNet

GoogleNet

Deeper networks : 22layers.
efficient computation : Efficient “Inception” module.
No Fc layers.
Only 5 million parameters, 12x less than AlexNet. Most parameters in AlexNet or VGGNet are in fc layers.
Computational complexity.
Computational complexity solution : use “bottleneck” (1*1 convolutions, less filters) to reduce feature depth.

cs231n-notes-Lecture-9：AlexNet/VGGNet/GoogleNet/ResNet

ResNet

Deeper Models may be harder to optimize although they have the ability to achiveve the same peformance theoretically because the additional layers can learn identify mapping.
But it’s ideal. Here is the solution of ResNet.

Residual: learn what should be added to or subtracted from the original input X. Intuitively, it’s easier to learn the bias than a direct mapping. The bias can be only a small change or zeros.

cs231n-notes-Lecture-9：AlexNet/VGGNet/GoogleNet/ResNet

Training ResNet in practice:

Batch Normalization after every CONV layer
Xavier/2 initialization from He et al.
SGD + Momentum (0.9)
Learning rate: 0.1, divided by 10 when validation error plateaus
Mini-batch size 256
Weight decay of 1e-5
No dropout used

Experimental Results

Able to train very deep networks without degrading (152 layers on ImageNet, 1202 on Cifar)
Deeper networks now achieve lowing training error as expected
Swept 1st place in all ILSVRC and COCO 2015 competitions

cs231n-notes-Lecture-9：AlexNet/VGGNet/GoogleNet/ResNet

cs231n-notes-Lecture-9：AlexNet/VGGNet/GoogleNet/ResNet

Lecture 9 CNN Architectures

AlexNet

VGGNet

GoogleNet

ResNet

相关推荐