cs231n-notes-Lecture-9:AlexNet/VGGNet/GoogleNet/ResNet

Lecture 9 CNN Architectures

AlexNet

cs231n-notes-Lecture-9:AlexNet/VGGNet/GoogleNet/ResNet

VGGNet

  • Small filters

    Why?

    • Stack of three 3x3 conv (stride 1) layers has same effective receptive field as one 7x7 conv layer
    • But deeper, more non-linearities
    • And fewer parameters: 3(32C2)3 * (3^2C^2) vs.72C27^2C^2 for C channels per layer
  • Deeper networks

cs231n-notes-Lecture-9:AlexNet/VGGNet/GoogleNet/ResNet

GoogleNet

  • Deeper networks : 22layers.
  • efficient computation : Efficient “Inception” module.
  • No Fc layers.
  • Only 5 million parameters, 12x less than AlexNet. Most parameters in AlexNet or VGGNet are in fc layers.
  • Computational complexity.
  • Computational complexity solution : use “bottleneck” (1*1 convolutions, less filters) to reduce feature depth.

cs231n-notes-Lecture-9:AlexNet/VGGNet/GoogleNet/ResNet

ResNet

Deeper Models may be harder to optimize although they have the ability to achiveve the same peformance theoretically because the additional layers can learn identify mapping.
But it’s ideal. Here is the solution of ResNet.

Residual: learn what should be added to or subtracted from the original input X. Intuitively, it’s easier to learn the bias than a direct mapping. The bias can be only a small change or zeros.

cs231n-notes-Lecture-9:AlexNet/VGGNet/GoogleNet/ResNet

Training ResNet in practice:

  • Batch Normalization after every CONV layer
  • Xavier/2 initialization from He et al.
  • SGD + Momentum (0.9)
  • Learning rate: 0.1, divided by 10 when validation error plateaus
  • Mini-batch size 256
  • Weight decay of 1e-5
  • No dropout used

Experimental Results

  • Able to train very deep networks without degrading (152 layers on ImageNet, 1202 on Cifar)
  • Deeper networks now achieve lowing training error as expected
  • Swept 1st place in all ILSVRC and COCO 2015 competitions

cs231n-notes-Lecture-9:AlexNet/VGGNet/GoogleNet/ResNet