三、卷积神经网络结构及其发展历程--深度学习EECS498/CS231n

AlexNet

Output size:

通道数和滤波器数量保持一致，均为64

H/W = (H - K + 2P)/S + 1 = (227-11+4)/4+1=56

Memory(KB):

Number of output elements: C * Ｈ* Ｗ＝64*56 *56=200704; Bytes per element=4 (for 32-bit floating point). KB=200704 * 4/1024 = 784

Parameters(k):

Weight shape = Cout *Cin *K *K=64 *3 *11 *11

Bias shape = 64

Number of weights = 64x3x11x11 + 64 =23296

flop(M)!!!important

number of floating point operations (multiply+add)//since they can be done in one cycle

= (number of output elements) x (ops per element)

=(Cout x H x W) x (Cin xK xK) = 72855552

三、卷积神经网络结构及其发展历程--深度学习EECS498/CS231n

此处省略了紧随conv1之后的ReLu

对于pooling：

不该表channel数量
flop(M) = (number of output positions) * (flops per output position) = (Cout *Ｈ *W)x(KxK)=0.4MFLOP，注意和conv相比，pooling的计算代价小到忽略不计

三、卷积神经网络结构及其发展历程--深度学习EECS498/CS231n

How does AlexNet design? Trial and Error.

三、卷积神经网络结构及其发展历程--深度学习EECS498/CS231n

VGG

Design rules for VGG:

All conv are 3x3 stride 1 pad 1
All max pool are 2x2 stride 2
After pool, double #channels
Using convolutional stages, 5 stages for vgg16
1. Stage 1: conv-conv-pool
2. Stage 2: conv-conv-pool
3. Stage 3: conv-conv-pool
4. Stage 4: conv-conv-conv-[conv]-pool
5. Stage 5: conv-conv-conv-[conv]-pool

Why conv 3x3

Option 1: conv(5x5. c->c)

Params: 25c^2 FLOPs: 25C^2HW

Option 2: conv(3x3, c->c) conv(3x3, c->c)

Params: 18c^2 FLOPs: 18c^2HW 感受野相同，参数更少，计算消耗更小；此外，当我们选择了两个conv,我们可以在这两个conv之间插入一个relu，从而带给我们更多的depth和nonlinear computation

why double channels

三、卷积神经网络结构及其发展历程--深度学习EECS498/CS231n

注意FLOPS错了

但对pooling之后一半size的输入double通道数，能够使FLOPs保持一致，而conv layers at each spatial resolution take the same amount of computation

GoogLeNet: Inception Module

三、卷积神经网络结构及其发展历程--深度学习EECS498/CS231n

local unit with parallel branches that is repeated many times throughout the network.

Use 1*1 Bottleneck layers to reduce channels dimensions before the expensive conv

ResNet

what happens when we go deeper?

三、卷积神经网络结构及其发展历程--深度学习EECS498/CS231n

this is an optimization problem. Deeper models are harder to optimize, in particular don’t learn identity functions to emulate shallow models.

->change the network so learning identity functions with extra layers is easy.

三、卷积神经网络结构及其发展历程--深度学习EECS498/CS231n

this layer can now easily learn the identity function, if we set the weights of these two conv layers to zero, this block should compute the identity and makes the dnn easier to emulate the shallow networks. And it also help to improve the gradient flow of deep networks because the add gates make one copy of gradient and pass it through the shortcuts.

Learn from VGG: stages, 3x3 conv

Learn from Google: aggressive stem to downsample the input before applying residual blocks. and global pooling to avoid expensive FC

三、卷积神经网络结构及其发展历程--深度学习EECS498/CS231n

三、卷积神经网络结构及其发展历程--深度学习EECS498/CS231n

AlexNet

VGG

Why conv 3x3

why double channels

GoogLeNet: Inception Module

ResNet

相关推荐