GAN, DCGAN, cGAN Paper Reading

GAN: Generative Adversarial Nets

1. traditional GAN 2014

1.1 Paper: 

https://arxiv.org/pdf/1406.2661

1.2 Understanding:

  1. E.g. From noise to real image.
  2. Generator: G(z;θg)
    1. Input: noise z;
    2. Output: generated image G(z).
  3. Discriminator: D(x; θd)
    1. Input: generated image: G(z), label: (0) + real image: x, label: (1)
    2. Output: predicted probability
  4. Loss:
    1. Log(D(x))+log(1-D(G(z)))
    2. For D: minimize term 1; for G: minimize term 2.

1.3 Pic: 

GAN, DCGAN, cGAN Paper Reading

1.4 Network:

Simple dense layers 

2. Conditional GAN 2014

2.1 Paper: (unpaired cGAN)

https://arxiv.org/pdf/1511.06434.pdf

2.2 Key improvement:

add a condition y, y can be image, text or sound info.....

2.3 Type: (pair means the input data dependence)

  1. Un-paired cGAN
  2. Paired cGAN

2.4 Understanding:

  1. E.g. MNIST with condition y(label from 0-9)----unpaired cGAN
  2. Generator: G(z, y; θg)
    1. Input: noise z (100), label y(10)----one-hot label
      1. Map to hidden layer with Relu respectively, 200 and 1000 units
      2. Concatenate these two layers into 1200 units as the input of thefollowing layers
    2. Output: 784 units. (Mnist pic is 28*28)
  3. Discriminator: D(x; θd)
    1. Input: (real image (x), label y) 1, (generated image G(x,y) ) 0
      1. X and y map to maxout layer, 240 units and 5 pieces, 50 units and 5 pieces.
      2. Both map to hidden layer, 240 units and 4pieces and concatenate to feed sigmoid
    2. Output: predicted probability
  4. Loss: Log(D(x, y))+log(1-D(G(z, y)))
    1. For D: minimize term 1; for G: minimize term 2.

2.5 Pic:

 GAN, DCGAN, cGAN Paper Reading

2.6 Network:

Simple dense layers

2.7 there are some papers about paired cGAN:

----This part I'll implement later.....

----To be continued!!!!

3.DCGAN: deep convolutional GAN 2015

3.1 Paper: 

arxiv.org/abs/1411.1784

3.2 Key improvement:

Combine CNN and traditional GAN, the layers of GAN are convolutional layers instead of dense layers

3.3 Key ideas:

  1. Don't use MAX pooling or average pooling, while doing downsampling, change the stride of CNN
  2. Don't use fully connected layers, use reshape to change the input to larger channels.
  3. Use Batch normalization layer after CNN and activation layer.

GAN, DCGAN, cGAN Paper Reading

3.4 Architecture pic

GAN, DCGAN, cGAN Paper Reading

3.5 Training details:

  1. Pixels are maped b.t.w(between) [-1, 1], which is corresponding to the tanh activation function.
  2. SGD,mini batch = 128;
  3. weight initialization, mean=0, standard=0.02;
  4. LeakyRelu, slope of the leak = 0.2;
  5. Adam optimizer, learning rate = 0.0002, momentum, beta_1=0.5。