GAN, DCGAN, cGAN Paper Reading
GAN: Generative Adversarial Nets
1. traditional GAN 2014
1.1 Paper:
https://arxiv.org/pdf/1406.2661
1.2 Understanding:
- E.g. From noise to real image.
- Generator: G(z;θg)
- Input: noise z;
- Output: generated image G(z).
- Discriminator: D(x; θd)
- Input: generated image: G(z), label: (0) + real image: x, label: (1)
- Output: predicted probability
- Loss:
- Log(D(x))+log(1-D(G(z)))
- For D: minimize term 1; for G: minimize term 2.
1.3 Pic:
1.4 Network:
Simple dense layers
2. Conditional GAN 2014
2.1 Paper: (unpaired cGAN)
https://arxiv.org/pdf/1511.06434.pdf
2.2 Key improvement:
add a condition y, y can be image, text or sound info.....
2.3 Type: (pair means the input data dependence)
- Un-paired cGAN
- Paired cGAN
2.4 Understanding:
- E.g. MNIST with condition y(label from 0-9)----unpaired cGAN
- Generator: G(z, y; θg)
- Input: noise z (100), label y(10)----one-hot label
- Map to hidden layer with Relu respectively, 200 and 1000 units
- Concatenate these two layers into 1200 units as the input of thefollowing layers
- Output: 784 units. (Mnist pic is 28*28)
- Input: noise z (100), label y(10)----one-hot label
- Discriminator: D(x; θd)
- Input: (real image (x), label y) 1, (generated image G(x,y) ) 0
- X and y map to maxout layer, 240 units and 5 pieces, 50 units and 5 pieces.
- Both map to hidden layer, 240 units and 4pieces and concatenate to feed sigmoid
- Output: predicted probability
- Input: (real image (x), label y) 1, (generated image G(x,y) ) 0
- Loss: Log(D(x, y))+log(1-D(G(z, y)))
- For D: minimize term 1; for G: minimize term 2.
2.5 Pic:
2.6 Network:
Simple dense layers
2.7 there are some papers about paired cGAN:
----This part I'll implement later.....
----To be continued!!!!
3.DCGAN: deep convolutional GAN 2015
3.1 Paper:
3.2 Key improvement:
Combine CNN and traditional GAN, the layers of GAN are convolutional layers instead of dense layers
3.3 Key ideas:
- Don't use MAX pooling or average pooling, while doing downsampling, change the stride of CNN
- Don't use fully connected layers, use reshape to change the input to larger channels.
- Use Batch normalization layer after CNN and activation layer.
3.4 Architecture pic
3.5 Training details:
- Pixels are maped b.t.w(between) [-1, 1], which is corresponding to the tanh activation function.
- SGD,mini batch = 128;
- weight initialization, mean=0, standard=0.02;
- LeakyRelu, slope of the leak = 0.2;
- Adam optimizer, learning rate = 0.0002, momentum, beta_1=0.5。