DCGAN代码及实验结果分析

一.写在前面

本篇对DCGAN的tensorflow实现版本进行代码分析。Github:https://github.com/carpedm20/DCGAN-tensorflow

该代码中实现了针对有标签数据集和无标签数据集两类网络，两类网络的结构不一样。下面分别进行介绍。

二.针对有标签数据集的网络

为了便于解释，用具体例子替换变量。

有标签数据集以mnist数据集为例。batch_size为64，mnist图片大小为28×28，图片通道数c_dim为1，类别数y_dim为10；随机输入的维数z_dim为100。

Generator部分：

1.获得输入z

服从均匀分布的输入样本z(shape=[64,100])与具有one-hot形式的标签y(shape=[64,10])级联，整体作为Generator的输入z(shape=[64,110])。

2.获得第一个非线性层的输出h0

通过线性变换将z变换为维数gfc_dim=1024的数据，对其进行归一化（batch normalization）之后进行非线性Relu变换，得到h0(shape=[64,1024])；将h0与y级联，整体作为下一层的输入h0(shape=[64,1034])。

3.获得第二个非线性层的输出h1

通过线性变换将h0变换为64×2×7×7=6272的数据，对其进行归一化之后进行非线性Relu变换，得到h1(shape=[64,6272])；将h1进行reshape操作，得到h1(shape=[64,7,7,128])，将h1与yb(yb为y的reshape形式，即最后一维为label维，yb的shape为[64,1,1,10])级联，整体作为下一层的输入h1(shape=[64,7,7,138])。

4.获得第三个非线性层的输出h2

通过deconv2d操作，用128个卷积核，将h1变换为维数为64×14×14×128的数据，对其进行归一化之后进行非线性Relu变换，得到h2(shape=[64,14,14,128])；将h2与yb级联，整体作为下一层的输入h2(shape=[64,14,14,138])。

5.获得最终的生成图像generated_image

通过deconv2d操作，用1(c_dim)个卷积核，将h1变换为64×28×28×1的数据，不做块归一化，进行非线性Sigmoid变换，得到generated_image(shape=[64,28,28,1])。

Discriminator部分：

1.获得输入x

将真实/生成图像image(shape=[64,28,28,1])和yb(shape=[64,1,1,10])级联，整体作为Discriminator的输入x(shape=[64,28,28,11])。

2.获得第一个非线性层的输出h0

用c_dim+y_dim=1+10=11个大小为5×5×11(11是输入的最后一维)的卷积核对输入x进行二维卷积操作，随后进行非线性LeakyRelu变换，得到h0(shape=[64,14,14,11])；将h0与标签yb级联，整体作为下一层的输入h0(shape=[64,14,14,21])。

3.获得第二个非线性层的输出h1

用df_dim+y_dim=64+10=74个大小为5×5×21(21是输入的最后一维)的卷积核对h0进行二维卷积操作，对其进行归一化(batch normalization)之后，进行非线性LeakyRelu变换，得到h1(shape=[64,7,7,74])；对h1进行reshape，得到shape=[64,7*7*64]，并与标签y(shape=[64,10])级联，整体作为下一层的输入h1(shape=[64,7*7*64+10]=[64,3636])。

4.获得第三个非线性层的输出h2

对h1进行线性变换，输出维数为dfc_dim=1024的数据，对其进行归一化(batch normalization)，然后进行非线性LeakyRelu变换，得到h2(shape=[64,1024])；将h2与y进行级联，整体作为下一层的输入h2(shape=[64,1034])。

5.获得最终输出

对h2进行线性变换，得到shape=[64,1]的数据h3。然后输出sigmoid(h3)和h3。

三.针对无标签数据集的网络

为了便于解释，用具体例子替换变量。

假设在无标签数据集中，batch_size为64，图片大小为48×48，图片通道数c_dim为3；随机输入的维数z_dim为100。

Generator部分：

1.获得输入z

将服从均匀分布的输入样本z(shape=[64,100])，作为Generator的输入(shape=[64,100])。

2.获得第一个非线性层的输出h0

将z进行reshape，得到shape=[64,3,3,64*8]=[64,3,3,512]的数据，对其进行归一化（batch normalization）之后进行非线性Relu变换，得到h0(shape=[64,3,3,512])，作为下一层的输入。

3.获得第二个非线性层的输出h1

通过deconv2d操作，用256个卷积核，将h0变换为维数为64×6×6×256的数据，对其进行归一化之后进行非线性Relu变换，得到h1(shape=[64,6,6,256])，作为下一层的输入。

4.获得第三个非线性层的输出h2

通过deconv2d操作，用128个卷积核，将h1变换为维数为64×12×12×128的数据，对其进行归一化之后进行非线性Relu变换，得到h2(shape=[64,12,12,128])，作为下一层的输入。

5.获得第四个非线性层的输出h3

通过deconv2d操作，用64个卷积核，将h2变换为维数为64×24×24×64的数据，对其进行归一化之后进行非线性Relu变换，得到h3(shape=[64,24,24,64])，作为下一层的输入。

6.获得第五个非线性层的输出h4

通过deconv2d操作，用3个卷积核，将h3变换为维数为64×48×48×3的数据，对其进行归一化之后进行非线性Relu变换，得到h4(shape=[64,48,48,3])，作为下一层的输入。

7.获得最终的生成图像generated_image

将上一层得到的h4进行非线性tanh变换，得到generated_image(shape=[64,48,48,3])。

Discriminator部分：

1.获得输入x

将真实/生成图像image(shape=[64,48,48,3])，作为Discriminator的输入x。

2.获得第一个非线性层的输出h0

用df_dim=64个大小为5×5×3(3是输入的最后一维)的卷积核对输入x进行二维卷积操作，随后进行非线性LeakyRelu变换，得到h0(shape=[64,24,24,64])，作为下一层的输入。

3.获得第二个非线性层的输出h1

用df_dim×2=128个大小为5×5×64(64是输入的最后一维)的卷积核对h0进行二维卷积操作，对其进行归一化(batch normalization)之后，进行非线性LeakyRelu变换，得到h1(shape=[64,12,12,128])，整体作为下一层的输入。

4.获得第三个非线性层的输出h2

用df_dim×4=256个大小为5×5×128(128是输入的最后一维)的卷积核对h1进行二维卷积操作，对其进行归一化(batch normalization)之后，进行非线性LeakyRelu变换，得到h2(shape=[64,6,6,256])，整体作为下一层的输入。

5.获得第四个非线性层的输出h3

用df_dim×8=512个大小为5×5×256(256是输入的最后一维)的卷积核对h2进行二维卷积操作，对其进行归一化(batch normalization)之后，进行非线性LeakyRelu变换，得到h1(shape=[64,3,3,512])，整体作为下一层的输入。

6.获得最终输出h4

对h4进行reshape，得到shape=[64,3*3*512]，再对其行线性变换，得到shape=[64,1]的数据h4。然后输出sigmoid(h4)和h4。

四.对两种网络进行对比

有标签数据集：样本大小为28×28	无标签数据集：样本大小为48×48
Generator	Generator
1.随机输入z与y级联，输出z：shape=[64,110]	1.随机输入z，输出z：shape=[64,100]
2.线性变换+Relu，与y级联，输出h0：shape=[64,1034]	2.reshape+Relu，输出h0：shape=[64,3,3,512]
3.线性变换+Relu，与y级联，输出h1：shape=[64,7,7,138]	3.deconv2d+Relu，输出h1：shape=[64,6,6,256]
4.deconv2d+Relu，与y级联，输出h2：shape=[64,14,14,138]	4.deconv2d+Relu，输出h2：shape=[64,12,12,128]
5.deconv2d+Sigmoid，输出generated_image：shape=[64,28,28,1]	5.deconv2d+Relu，输出h3：shape=[64,24,24,64]
	6.deconv2d+Relu，输出h4：shape=[64,48,48,3]
	7.Tanh，输出generated_image：shape=[64,48,48,3]

Dircriminator	Dircriminator
1.image和yb级联，输出x：shape=[64,28,28,11]	1.输出image，x：shape=[64,48,48,3]。
2.conv2d+LeakyRelu，与yb级联，输出h0：shape=[64,14,14,21]	2.conv2d+LeakyRelu，输出h0：shape=[64,24,24,64]
3.conv2d+LeakyRelu，与yb级联，输出h1：shape=[64,3636]	3.conv2d+LeakyRelu，输出h1：shape=[64,12,12,128]
4.线性变换+LeakyRelu，与y级联，输出h2：shape=[64,1034]	4.conv2d+LeakyRelu，输出h2：shape=[64,6,6,256]
5.进行线性变换，输出sigmoid(h3)和h3：shape=[64,1]	5.conv2d+LeakyRelu，输出h3：shape=[64,3,3,512]
	6.reshape+线性变换，输出sigmoid(h4)和h4：shape=[64,1]

对于Generator：

有标签数据集是：3×Relu + 2×deconv2d + 1×sigmoid

无标签数据集是：5×Relu + 4×deconv2d + 1×tanh

对于Discriminator：

有标签数据集是：2×conv2d + 3×LeakyRelu

无标签数据集是：4×conv2d + 4×LeakyRelu

在Discriminator中，输出sigmoid(h4)是为了得到h4进行sigmoid后的shape，为下面的:self.d_loss_real = tf.reduce_mean(sigmoid_cross_entropy_with_logits(self.D_logits, tf.ones_like(self.D))) 做准备。而不是为了使用sigmoid(h4)**后的结果。

五.loss部分

1.loss组成

Generator的loss包括生成图像的loss，

Dircriminator的loass包括真实图像的loss加上生成图像的loss。

2.训练optimizer

Adam Optimizer

3.训练方式

每训练两次Generator，再训练一次Discriminator，防止Discriminator的loss的导数为零。

六.DCGAN的本实现版本与原论文对比

原论文中给出了针对LSUN无标签数据集的Generator网络。下面将其与本实现版本进行对比：

令生成图像的大小均为48×48，样本大小均为64：

原论文	本实现版本
1.随机输入z，输出z：shape=[64,100]	1.随机输入z，输出z：shape=[64,100]
2.reshape+Relu，输出h0：shape=[64,3,3,1024]	2.reshape+Relu，输出h0：shape=[64,3,3,512]
3.deconv2d+Relu，输出h1：shape=[64,6,6,512]	3.deconv2d+Relu，输出h1：shape=[64,6,6,256]
4.deconv2d+Relu，输出h2：shape=[64,12,12,256]	4.deconv2d+Relu，输出h2：shape=[64,12,12,128]
5.deconv2d+Relu，输出h3：shape=[64,24,24,128]	5.deconv2d+Relu，输出h3：shape=[64,24,24,64]
6.deconv2d+Relu，输出h4：shape=[64,48,48,3] 7.Tanh，输出generated_image：shape=[64,48,48,3]	6.deconv2d+Relu，输出h4：shape=[64,48,48,3] 7.Tanh，输出generated_image：shape=[64,48,48,3]