一、背景

前一阶段比较忙，很久没有继续做GAN的实验了。近期终于抽空做完了infoGAN，个人认为infoGAN是对GAN的更进一步改进，由于GAN是输入的随机生成噪声，所以生成的图像也是随机的，而infoGAN想要生成的是指定特征的图像，因此infoGAN对GAN的随机输入加了约束，这是其最大的改进之处。infoGAN是16年6月份由Xi Chen等人提出的一种模型。本实验主要利用infoGAN生成宽窄不一，高低各异的服装影像。

本实验以fashion-mnist数据集为例，用尽可能少的代码实现infoGAN。

[1]文章链接：https://arxiv.org/abs/1606.03657

二、infoGAN原理

infoGAN的原理网上介绍的也比较多，这里不再过多叙述。推荐一篇对原理讲解比较清楚的文章：

[2]InfoGAN介绍

从文章中作者的介绍来看，infoGAN最主要的贡献是引入了互信息(mutual information)，通过最大化(maximizing)GAN噪声变量子集和观测值之间的互信息，以实现对学习过程的可解译性。作者将实验应用于MNIST， CelebA，SVHN数据集，结果表明引入互信息的模型都能够取得更好的效果。

In this paper, we present a simple modiﬁcation to the generative adversarial network objective that encourages it to learn interpretable and meaningful representations. We do so by maximizing the mutual information between a ﬁxed small subset of the GAN’s noise variables and the observations, which turns out to be relatively straightforward. Despite its simplicity, we found our method to be surprisingly effective: it was able to discover highly semantic and meaningful hidden representations on a number of image datasets: digits (MNIST), faces (CelebA), and house numbers (SVHN). The quality of our unsupervised disentangled representation matches previous works that made use of supervised label information [5–9]. These results suggest that generative modelling augmented with a mutual information cost could be a fruitful approach for learning disentangled representations.

通俗一点来说，“GAN模型在生成器使用噪声z的时候没有加任何的限制，所以在以一种高度混合的方式使用z，z的任何一个维度都没有明显的表示一个特征，所以在数据生成过程中，我们无法得知什么样的噪声z可以用来生成数字1，什么样的噪声z可以用来生成数字3，我们对这些一无所知，这从一点程度上限制了我们对GAN的使用[2]”。而作者对infoGAN的改进便针对这个问题，“Info代表互信息，它表示生成数据x与隐藏编码c之间关联程度的大小，为了使的x与c之间关联密切，所以我们需要最大化互信息的值，据此对原始GAN模型的值函数做了一点修改，相当于加了一个互信息的正则化项。是一个超参，通过之后的实验选择了一个最优值1。[2]”。

后文中作者也提到了，infoGAN主要针对GAN的问题进行了改进，采用的模型基础仍是DCGAN，因此infoGAN的具体实现过程可以参照DCGAN。文章中作者进行了很多公式的推导，得出的最终结论为：

Hence, InfoGAN is deﬁned as the following minimax game with a variational regularization of mutual information and a hyperparameter λ:

不用理解上述公式也没关系，我们只要知道了infoGAN主要做了哪方面的改进就行。关于infoGAN的实现代码，网上也比较多，下面给出几个比较好的代码：

[3]https://github.com/openai/InfoGAN

[4]https://github.com/AndyHsiao26/InfoGAN-tensorflow

[5]https://github.com/hwalsuklee/tensorflow-generative-model-collections

本实验的目的就在于用最少，最简单的代码实现infoGAN，主要参考了[5]的实现过程，并在原代码的基础上进行了少量改进，只不过这次保留了原代码中的类，而我之前的GAN实现都是尽量写成函数的形式。

三、infoGAN实现

1.数据准备

这次的实验数据采用的是fashion-mnist数据集，顾名思义，该数据集与mnist数据集的格式相同，只不过该数据集是10类服饰，但图像仍是28*28的灰度图。

该数据集的下载地址为：https://github.com/zalandoresearch/fashion-mnist

打开上述地址，找到下面的数据集，点击download即可开始下载：

对抗神经网络学习（五）——infoGAN生成宽窄不一，高低各异的服装影像(tensorflow实现)

下载好的数据集，我们放在'./data/fashion-mnist/'文件夹下，这样就准备好了数据，不用解压，下面即可开始实验部分：

对抗神经网络学习（五）——infoGAN生成宽窄不一，高低各异的服装影像(tensorflow实现)

2.数据操作函数准备(utils.py)

这一部分主要准备一些数据操作函数，包括数据的加载，图像的存储，文件夹的建立等函数。这部分函数DCGAN中也能用到，因此直接将其拷贝过来，进行后续的使用。这里将该文件命名为utils.py，直接给出该文件的代码：

"""
Most codes from https://github.com/carpedm20/DCGAN-tensorflow
"""
from __future__ import division
import scipy.misc
import numpy as np
import os, gzip

import tensorflow as tf
import tensorflow.contrib.slim as slim


def load_mnist(dataset_name):
    data_dir = 'data/' + dataset_name
    def extract_data(filename, num_data, head_size, data_size):
        with gzip.open(filename) as bytestream:
            bytestream.read(head_size)
            buf = bytestream.read(data_size * num_data)
            data = np.frombuffer(buf, dtype=np.uint8).astype(np.float)
        return data

    data = extract_data(data_dir + '/train-images-idx3-ubyte.gz', 60000, 16, 28 * 28)
    trX = data.reshape((60000, 28, 28, 1))

    data = extract_data(data_dir + '/train-labels-idx1-ubyte.gz', 60000, 8, 1)
    trY = data.reshape((60000))

    data = extract_data(data_dir + '/t10k-images-idx3-ubyte.gz', 10000, 16, 28 * 28)
    teX = data.reshape((10000, 28, 28, 1))

    data = extract_data(data_dir + '/t10k-labels-idx1-ubyte.gz', 10000, 8, 1)
    teY = data.reshape((10000))

    trY = np.asarray(trY)
    teY = np.asarray(teY)

    X = np.concatenate((trX, teX), axis=0)
    y = np.concatenate((trY, teY), axis=0).astype(np.int)

    seed = 547
    np.random.seed(seed)
    np.random.shuffle(X)
    np.random.seed(seed)
    np.random.shuffle(y)

    y_vec = np.zeros((len(y), 10), dtype=np.float)
    for i, label in enumerate(y):
        y_vec[i, y[i]] = 1.0

    return X / 255., y_vec


def check_folder(log_dir):
    if not os.path.exists(log_dir):
        os.makedirs(log_dir)
    return log_dir


def show_all_variables():
    model_vars = tf.trainable_variables()
    slim.model_analyzer.analyze_vars(model_vars, print_info=True)


def save_images(images, size, image_path):
    return imsave(inverse_transform(images), size, image_path)


def merge(images, size):
    h, w = images.shape[1], images.shape[2]
    if (images.shape[3] in (3,4)):
        c = images.shape[3]
        img = np.zeros((h * size[0], w * size[1], c))
        for idx, image in enumerate(images):
            i = idx % size[1]
            j = idx // size[1]
            img[j * h:j * h + h, i * w:i * w + w, :] = image
        return img
    elif images.shape[3]==1:
        img = np.zeros((h * size[0], w * size[1]))
        for idx, image in enumerate(images):
            i = idx % size[1]
            j = idx // size[1]
            img[j * h:j * h + h, i * w:i * w + w] = image[:,:,0]
        return img
    else:
        raise ValueError('in merge(images,size) images parameter ''must have dimensions: HxW or HxWx3 or HxWx4')


def imsave(images, size, path):
    image = np.squeeze(merge(images, size))
    return scipy.misc.imsave(path, image)


def inverse_transform(images):
    return (images+1.)/2.

3.图层函数(layers.py)

这一部分主要是编写图层函数，并将这部分函数保存到layers.py文件当中，这里直接给出layers.py的代码：

"""
Most codes from https://github.com/carpedm20/DCGAN-tensorflow
"""
from utils import *

if "concat_v2" in dir(tf):
    def concat(tensors, axis, *args, **kwargs):
        return tf.concat_v2(tensors, axis, *args, **kwargs)
else:
    def concat(tensors, axis, *args, **kwargs):
        return tf.concat(tensors, axis, *args, **kwargs)

def bn(x, is_training, scope):
    return tf.contrib.layers.batch_norm(x,
                                        decay=0.9,
                                        updates_collections=None,
                                        epsilon=1e-5,
                                        scale=True,
                                        is_training=is_training,
                                        scope=scope)

def conv2d(input_, output_dim, k_h=5, k_w=5, d_h=2, d_w=2, stddev=0.02, name="conv2d"):
    with tf.variable_scope(name):
        w = tf.get_variable('w', [k_h, k_w, input_.get_shape()[-1], output_dim],
              initializer=tf.truncated_normal_initializer(stddev=stddev))
        conv = tf.nn.conv2d(input_, w, strides=[1, d_h, d_w, 1], padding='SAME')

        biases = tf.get_variable('biases', [output_dim], initializer=tf.constant_initializer(0.0))
        conv = tf.reshape(tf.nn.bias_add(conv, biases), conv.get_shape())

        return conv

def deconv2d(input_, output_shape, k_h=5, k_w=5, d_h=2, d_w=2, name="deconv2d", stddev=0.02, with_w=False):
    with tf.variable_scope(name):
        # filter : [height, width, output_channels, in_channels]
        w = tf.get_variable('w', [k_h, k_w, output_shape[-1], input_.get_shape()[-1]],
                            initializer=tf.random_normal_initializer(stddev=stddev))

        try:
            deconv = tf.nn.conv2d_transpose(input_, w, output_shape=output_shape, strides=[1, d_h, d_w, 1])

        # Support for verisons of TensorFlow before 0.7.0
        except AttributeError:
            deconv = tf.nn.deconv2d(input_, w, output_shape=output_shape, strides=[1, d_h, d_w, 1])

        biases = tf.get_variable('biases', [output_shape[-1]], initializer=tf.constant_initializer(0.0))
        deconv = tf.reshape(tf.nn.bias_add(deconv, biases), deconv.get_shape())

        if with_w:
            return deconv, w, biases
        else:
            return deconv

def lrelu(x, leak=0.2, name="lrelu"):
    return tf.maximum(x, leak*x)

def linear(input_, output_size, scope=None, stddev=0.02, bias_start=0.0, with_w=False):
    shape = input_.get_shape().as_list()

    with tf.variable_scope(scope or "Linear"):
        matrix = tf.get_variable("Matrix", [shape[1], output_size], tf.float32,
                 tf.random_normal_initializer(stddev=stddev))
        bias = tf.get_variable("bias", [output_size],
        initializer=tf.constant_initializer(bias_start))
        if with_w:
            return tf.matmul(input_, matrix) + bias, matrix, bias
        else:
            return tf.matmul(input_, matrix) + bias

4.infoGAN的模型实现(infoGAN.py)

这一部分主要编写infoGAN的实现，包括参数，生成器和判别器，以及loss函数，这部分的代码比较长，回头我在对其进行详细的解释。最终infoGAN的代码为：

from __future__ import division
import time

from layers import *
from utils import *


class infoGAN(object):
    model_name = "infoGAN"     # name for checkpoint

    def __init__(self, sess, epoch, batch_size, z_dim, dataset_name, checkpoint_dir, result_dir, log_dir, SUPERVISED=True):
        self.sess = sess
        self.dataset_name = dataset_name
        self.checkpoint_dir = checkpoint_dir
        self.result_dir = result_dir
        self.log_dir = log_dir
        self.epoch = epoch
        self.batch_size = batch_size

        if dataset_name == 'mnist' or dataset_name == 'fashion-mnist':
            # parameters
            self.input_height = 28
            self.input_width = 28
            self.output_height = 28
            self.output_width = 28

            self.z_dim = z_dim         # dimension of noise-vector
            self.y_dim = 12         # dimension of code-vector (label+two features)
            self.c_dim = 1

            self.SUPERVISED = SUPERVISED # if it is true, label info is directly used for code

            # train
            self.learning_rate = 0.0002
            self.beta1 = 0.5

            # test
            self.sample_num = 64  # number of generated images to be saved

            # code
            self.len_discrete_code = 10  # categorical distribution (i.e. label)
            self.len_continuous_code = 2  # gaussian distribution (e.g. rotation, thickness)

            # load mnist
            self.data_X, self.data_y = load_mnist(self.dataset_name)

            # get number of batches for a single epoch
            self.num_batches = len(self.data_X) // self.batch_size
        else:
            raise NotImplementedError

    def classifier(self, x, is_training=True, reuse=False):
        # Network Architecture is exactly same as in infoGAN (https://arxiv.org/abs/1606.03657)
        # Architecture : (64)5c2s-(128)5c2s_BL-FC1024_BL-FC128_BL-FC12S’
        # All layers except the last two layers are shared by discriminator
        # Number of nodes in the last layer is reduced by half. It gives better results.
        with tf.variable_scope("classifier", reuse=reuse):

            net = lrelu(bn(linear(x, 64, scope='c_fc1'), is_training=is_training, scope='c_bn1'))
            out_logit = linear(net, self.y_dim, scope='c_fc2')
            out = tf.nn.softmax(out_logit)

            return out, out_logit

    def discriminator(self, x, is_training=True, reuse=False):
        # Network Architecture is exactly same as in infoGAN (https://arxiv.org/abs/1606.03657)
        # Architecture : (64)4c2s-(128)4c2s_BL-FC1024_BL-FC1_S
        with tf.variable_scope("discriminator", reuse=reuse):

            net = lrelu(conv2d(x, 64, 4, 4, 2, 2, name='d_conv1'))
            net = lrelu(bn(conv2d(net, 128, 4, 4, 2, 2, name='d_conv2'), is_training=is_training, scope='d_bn2'))
            net = tf.reshape(net, [self.batch_size, -1])
            net = lrelu(bn(linear(net, 1024, scope='d_fc3'), is_training=is_training, scope='d_bn3'))
            out_logit = linear(net, 1, scope='d_fc4')
            out = tf.nn.sigmoid(out_logit)

            return out, out_logit, net

    def generator(self, z, y, is_training=True, reuse=False):
        # Network Architecture is exactly same as in infoGAN (https://arxiv.org/abs/1606.03657)
        # Architecture : FC1024_BR-FC7x7x128_BR-(64)4dc2s_BR-(1)4dc2s_S
        with tf.variable_scope("generator", reuse=reuse):

            # merge noise and code
            z = concat([z, y], 1)

            net = tf.nn.relu(bn(linear(z, 1024, scope='g_fc1'), is_training=is_training, scope='g_bn1'))
            net = tf.nn.relu(bn(linear(net, 128 * 7 * 7, scope='g_fc2'), is_training=is_training, scope='g_bn2'))
            net = tf.reshape(net, [self.batch_size, 7, 7, 128])
            net = tf.nn.relu(
                bn(deconv2d(net, [self.batch_size, 14, 14, 64], 4, 4, 2, 2, name='g_dc3'), is_training=is_training,
                   scope='g_bn3'))

            out = tf.nn.sigmoid(deconv2d(net, [self.batch_size, 28, 28, 1], 4, 4, 2, 2, name='g_dc4'))

            return out

    def build_model(self):
        # some parameters
        image_dims = [self.input_height, self.input_width, self.c_dim]
        bs = self.batch_size

        """ Graph Input """
        # images
        self.inputs = tf.placeholder(tf.float32, [bs] + image_dims, name='real_images')

        # labels
        self.y = tf.placeholder(tf.float32, [bs, self.y_dim], name='y')

        # noises
        self.z = tf.placeholder(tf.float32, [bs, self.z_dim], name='z')

        """ Loss Function """
        ## 1. GAN Loss
        # output of D for real images
        D_real, D_real_logits, _ = self.discriminator(self.inputs, is_training=True, reuse=False)

        # output of D for fake images
        G = self.generator(self.z, self.y, is_training=True, reuse=False)
        D_fake, D_fake_logits, input4classifier_fake = self.discriminator(G, is_training=True, reuse=True)

        # get loss for discriminator
        d_loss_real = tf.reduce_mean(
            tf.nn.sigmoid_cross_entropy_with_logits(logits=D_real_logits, labels=tf.ones_like(D_real)))
        d_loss_fake = tf.reduce_mean(
            tf.nn.sigmoid_cross_entropy_with_logits(logits=D_fake_logits, labels=tf.zeros_like(D_fake)))

        self.d_loss = d_loss_real + d_loss_fake

        # get loss for generator
        self.g_loss = tf.reduce_mean(
            tf.nn.sigmoid_cross_entropy_with_logits(logits=D_fake_logits, labels=tf.ones_like(D_fake)))

        ## 2. Information Loss
        code_fake, code_logit_fake = self.classifier(input4classifier_fake, is_training=True, reuse=False)

        # discrete code : categorical
        disc_code_est = code_logit_fake[:, :self.len_discrete_code]
        disc_code_tg = self.y[:, :self.len_discrete_code]
        q_disc_loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=disc_code_est, labels=disc_code_tg))

        # continuous code : gaussian
        cont_code_est = code_logit_fake[:, self.len_discrete_code:]
        cont_code_tg = self.y[:, self.len_discrete_code:]
        q_cont_loss = tf.reduce_mean(tf.reduce_sum(tf.square(cont_code_tg - cont_code_est), axis=1))

        # get information loss
        self.q_loss = q_disc_loss + q_cont_loss

        """ Training """
        # divide trainable variables into a group for D and a group for G
        t_vars = tf.trainable_variables()
        d_vars = [var for var in t_vars if 'd_' in var.name]
        g_vars = [var for var in t_vars if 'g_' in var.name]
        q_vars = [var for var in t_vars if ('d_' in var.name) or ('c_' in var.name) or ('g_' in var.name)]

        # optimizers
        with tf.control_dependencies(tf.get_collection(tf.GraphKeys.UPDATE_OPS)):
            self.d_optim = tf.train.AdamOptimizer(self.learning_rate, beta1=self.beta1) \
                .minimize(self.d_loss, var_list=d_vars)
            self.g_optim = tf.train.AdamOptimizer(self.learning_rate * 5, beta1=self.beta1) \
                .minimize(self.g_loss, var_list=g_vars)
            self.q_optim = tf.train.AdamOptimizer(self.learning_rate * 5, beta1=self.beta1) \
                .minimize(self.q_loss, var_list=q_vars)

        """" Testing """
        # for test
        self.fake_images = self.generator(self.z, self.y, is_training=False, reuse=True)

        """ Summary """
        d_loss_real_sum = tf.summary.scalar("d_loss_real", d_loss_real)
        d_loss_fake_sum = tf.summary.scalar("d_loss_fake", d_loss_fake)
        d_loss_sum = tf.summary.scalar("d_loss", self.d_loss)
        g_loss_sum = tf.summary.scalar("g_loss", self.g_loss)

        q_loss_sum = tf.summary.scalar("g_loss", self.q_loss)
        q_disc_sum = tf.summary.scalar("q_disc_loss", q_disc_loss)
        q_cont_sum = tf.summary.scalar("q_cont_loss", q_cont_loss)

        # final summary operations
        self.g_sum = tf.summary.merge([d_loss_fake_sum, g_loss_sum])
        self.d_sum = tf.summary.merge([d_loss_real_sum, d_loss_sum])
        self.q_sum = tf.summary.merge([q_loss_sum, q_disc_sum, q_cont_sum])

    def train(self):

        # initialize all variables
        tf.global_variables_initializer().run()

        # graph inputs for visualize training results
        self.sample_z = np.random.uniform(-1, 1, size=(self.batch_size , self.z_dim))
        self.test_labels = self.data_y[0:self.batch_size]
        self.test_codes = np.concatenate((self.test_labels, np.zeros([self.batch_size, self.len_continuous_code])),
                                           axis=1)

        # saver to save model
        self.saver = tf.train.Saver()

        # summary writer
        self.writer = tf.summary.FileWriter(self.log_dir + '/' + self.model_name, self.sess.graph)

        # restore check-point if it exits
        could_load, checkpoint_counter = self.load(self.checkpoint_dir)
        if could_load:
            start_epoch = (int)(checkpoint_counter / self.num_batches)
            start_batch_id = checkpoint_counter - start_epoch * self.num_batches
            counter = checkpoint_counter
            print(" [*] Load SUCCESS")
        else:
            start_epoch = 0
            start_batch_id = 0
            counter = 1
            print(" [!] Load failed...")

        # loop for epoch
        start_time = time.time()
        for epoch in range(start_epoch, self.epoch):

            # get batch data
            for idx in range(start_batch_id, self.num_batches):
                batch_images = self.data_X[idx*self.batch_size:(idx+1)*self.batch_size]

                # generate code
                if self.SUPERVISED == True:
                    batch_labels = self.data_y[idx * self.batch_size:(idx + 1) * self.batch_size]
                else:
                    batch_labels = np.random.multinomial(1,
                                                         self.len_discrete_code * [float(1.0 / self.len_discrete_code)],
                                                         size=[self.batch_size])

                batch_codes = np.concatenate((batch_labels, np.random.uniform(-1, 1, size=(self.batch_size, 2))),
                                             axis=1)

                batch_z = np.random.uniform(-1, 1, [self.batch_size, self.z_dim]).astype(np.float32)

                # update D network
                _, summary_str, d_loss = self.sess.run([self.d_optim, self.d_sum, self.d_loss],
                                                       feed_dict={self.inputs: batch_images, self.y: batch_codes,
                                                                  self.z: batch_z})
                self.writer.add_summary(summary_str, counter)

                # update G and Q network
                _, summary_str_g, g_loss, _, summary_str_q, q_loss = self.sess.run(
                    [self.g_optim, self.g_sum, self.g_loss, self.q_optim, self.q_sum, self.q_loss],
                    feed_dict={self.inputs: batch_images, self.z: batch_z, self.y: batch_codes})
                self.writer.add_summary(summary_str_g, counter)
                self.writer.add_summary(summary_str_q, counter)

                # display training status
                counter += 1
                print("Epoch: [%2d] [%4d/%4d] time: %4.4f, d_loss: %.8f, g_loss: %.8f" \
                      % (epoch, idx, self.num_batches, time.time() - start_time, d_loss, g_loss))

                # save training results for every 300 steps
                if np.mod(counter, 300) == 0:
                    samples = self.sess.run(self.fake_images,
                                            feed_dict={self.z: self.sample_z, self.y: self.test_codes})
                    tot_num_samples = min(self.sample_num, self.batch_size)
                    manifold_h = int(np.floor(np.sqrt(tot_num_samples)))
                    manifold_w = int(np.floor(np.sqrt(tot_num_samples)))
                    save_images(samples[:manifold_h * manifold_w, :, :, :], [manifold_h, manifold_w],
                                './' + check_folder(self.result_dir + '/' + self.model_dir) + '/' + self.model_name + '_train_{:02d}_{:04d}.png'.format(
                                    epoch, idx))

            # After an epoch, start_batch_id is set to zero
            # non-zero value is only for the first epoch after loading pre-trained model
            start_batch_id = 0

            # save model
            self.save(self.checkpoint_dir, counter)

            # show temporal results
            self.visualize_results(epoch)

        # save model for final step
        self.save(self.checkpoint_dir, counter)

    def visualize_results(self, epoch):
        tot_num_samples = min(self.sample_num, self.batch_size)
        image_frame_dim = int(np.floor(np.sqrt(tot_num_samples)))

        """ random noise, random discrete code, fixed continuous code """
        y = np.random.choice(self.len_discrete_code, self.batch_size)
        y_one_hot = np.zeros((self.batch_size, self.y_dim))
        y_one_hot[np.arange(self.batch_size), y] = 1

        z_sample = np.random.uniform(-1, 1, size=(self.batch_size, self.z_dim))

        samples = self.sess.run(self.fake_images, feed_dict={self.z: z_sample, self.y: y_one_hot})

        save_images(samples[:image_frame_dim * image_frame_dim, :, :, :], [image_frame_dim, image_frame_dim],
                    check_folder(self.result_dir + '/' + self.model_dir) + '/' + self.model_name + '_epoch%03d' % epoch + '_test_all_classes.png')

        """ specified condition, random noise """
        n_styles = 10  # must be less than or equal to self.batch_size

        np.random.seed()
        si = np.random.choice(self.batch_size, n_styles)

        for l in range(self.len_discrete_code):
            y = np.zeros(self.batch_size, dtype=np.int64) + l
            y_one_hot = np.zeros((self.batch_size, self.y_dim))
            y_one_hot[np.arange(self.batch_size), y] = 1

            samples = self.sess.run(self.fake_images, feed_dict={self.z: z_sample, self.y: y_one_hot})
            # save_images(samples[:image_frame_dim * image_frame_dim, :, :, :], [image_frame_dim, image_frame_dim],
            #             check_folder(self.result_dir + '/' + self.model_dir) + '/' + self.model_name + '_epoch%03d' % epoch + '_test_class_%d.png' % l)

            samples = samples[si, :, :, :]

            if l == 0:
                all_samples = samples
            else:
                all_samples = np.concatenate((all_samples, samples), axis=0)

        """ save merged images to check style-consistency """
        canvas = np.zeros_like(all_samples)
        for s in range(n_styles):
            for c in range(self.len_discrete_code):
                canvas[s * self.len_discrete_code + c, :, :, :] = all_samples[c * n_styles + s, :, :, :]

        save_images(canvas, [n_styles, self.len_discrete_code],
                    check_folder(self.result_dir + '/' + self.model_dir) + '/' + self.model_name + '_epoch%03d' % epoch + '_test_all_classes_style_by_style.png')

        """ fixed noise """
        assert self.len_continuous_code == 2

        c1 = np.linspace(-1, 1, image_frame_dim)
        c2 = np.linspace(-1, 1, image_frame_dim)
        xv, yv = np.meshgrid(c1, c2)
        xv = xv[:image_frame_dim,:image_frame_dim]
        yv = yv[:image_frame_dim, :image_frame_dim]

        c1 = xv.flatten()
        c2 = yv.flatten()

        z_fixed = np.zeros([self.batch_size, self.z_dim])

        for l in range(self.len_discrete_code):
            y = np.zeros(self.batch_size, dtype=np.int64) + l
            y_one_hot = np.zeros((self.batch_size, self.y_dim))
            y_one_hot[np.arange(self.batch_size), y] = 1

            y_one_hot[np.arange(image_frame_dim*image_frame_dim), self.len_discrete_code] = c1
            y_one_hot[np.arange(image_frame_dim*image_frame_dim), self.len_discrete_code+1] = c2

            samples = self.sess.run(self.fake_images,
                                    feed_dict={ self.z: z_fixed, self.y: y_one_hot})

            save_images(samples[:image_frame_dim * image_frame_dim, :, :, :], [image_frame_dim, image_frame_dim],
                        check_folder(self.result_dir + '/' + self.model_dir) + '/' + self.model_name + '_epoch%03d' % epoch + '_test_class_c1c2_%d.png' % l)

    @property
    def model_dir(self):
        return "{}_{}_{}_{}".format(
            self.model_name, self.dataset_name,
            self.batch_size, self.z_dim)

    def save(self, checkpoint_dir, step):
        checkpoint_dir = os.path.join(checkpoint_dir, self.model_dir, self.model_name)

        if not os.path.exists(checkpoint_dir):
            os.makedirs(checkpoint_dir)

        self.saver.save(self.sess,os.path.join(checkpoint_dir, self.model_name+'.model'), global_step=step)

    def load(self, checkpoint_dir):
        import re
        print(" [*] Reading checkpoints...")
        checkpoint_dir = os.path.join(checkpoint_dir, self.model_dir, self.model_name)

        ckpt = tf.train.get_checkpoint_state(checkpoint_dir)
        if ckpt and ckpt.model_checkpoint_path:
            ckpt_name = os.path.basename(ckpt.model_checkpoint_path)
            self.saver.restore(self.sess, os.path.join(checkpoint_dir, ckpt_name))
            counter = int(next(re.finditer("(\d+)(?!.*\d)",ckpt_name)).group(0))
            print(" [*] Success to read {}".format(ckpt_name))
            return True, counter
        else:
            print(" [*] Failed to find a checkpoint")
            return False, 0

5.main函数实现模型训练(main.py)

最终就是训练模型过程了，这了main.py文件的主要工作就是设置参数，创建infoGAN模型，进行模型的训练，main.py文件的主要代码为：

from infoGAN import infoGAN

import tensorflow as tf

"""main"""
def main():
    with tf.Session(config=tf.ConfigProto(allow_soft_placement=True)) as sess:
        # declare instance for GAN

        infogan = infoGAN(sess,
                      epoch=20,
                      batch_size=64,
                      z_dim=62,
                      dataset_name='fashion-mnist',
                      checkpoint_dir='checkpoint',
                      result_dir='results',
                      log_dir='logs')

        # build graph
        infogan.build_model()

        # show network architecture
        # show_all_variables()

        # launch the graph in a session
        infogan.train()
        print(" [*] Training finished!")

        # visualize learned generator
        infogan.visualize_results(20-1)
        print(" [*] Testing finished!")

if __name__ == '__main__':
    main()

6.模型的执行

模型的执行只需运行main.py文件即可，然后等模型训练完毕即可查看模型的结果。

四、实验结果

实验一共设置了20个epoch，训练效果比较好。这里直接展示关于衣服的训练结果。

当epoch=1时候，实验的结果为：

对抗神经网络学习（五）——infoGAN生成宽窄不一，高低各异的服装影像(tensorflow实现)

当epoch=5的时候，实验的结果为：

对抗神经网络学习（五）——infoGAN生成宽窄不一，高低各异的服装影像(tensorflow实现)

当epoch=10的时候，实验的结果为：

对抗神经网络学习（五）——infoGAN生成宽窄不一，高低各异的服装影像(tensorflow实现)

当epoch=20的时候，实验的结果为：

对抗神经网络学习（五）——infoGAN生成宽窄不一，高低各异的服装影像(tensorflow实现)

五、分析

1.实验的结果还是比较好的，即使epoch=1，也能够比较清晰的看出最后的生成服饰图像，而且也能够明显的看到生成的服饰宽窄各异的衣服。

2.所有文件的结构为：

-- data            (原始数据集的文件夹)
    |------ fashion-mnist
                |------ t10k-images-idx3-ubyte.gz
                |------ t10k-labels-idx1-ubyte.gz
                |------ train-images-idx3-ubyte.gz
                |------ train-labels-idx1-ubyte.gz
-- utils.py
    {
    import...

    def load_mnist(dataset_name):...

    def check_folder(log_dir):...

    def show_all_variables():...

    def save_images(images, size, image_path):...

    def merge(images, size):...

    def imsave(images, size, path):...

    def inverse_transform(images):...
    }
-- layers.py
    {
    import...

    def bn(x, is_training, scope):...

    def conv2d(input_, output_dim, k_h=5, k_w=5, d_h=2, d_w=2, stddev=0.02, name="conv2d"):...

    def deconv2d(input_, output_shape, k_h=5, k_w=5, d_h=2, d_w=2, name="deconv2d", stddev=0.02, with_w=False):...

    def lrelu(x, leak=0.2, name="lrelu"):...

    def linear(input_, output_size, scope=None, stddev=0.02, bias_start=0.0, with_w=False):...
    }
-- infoGAN.py
    {
    import...

    class infoGAN(object):
        def __init__(self, sess, epoch, batch_size, z_dim, dataset_name, checkpoint_dir, result_dir, log_dir, SUPERVISED=True):...
        def classifier(self, x, is_training=True, reuse=False):...
        def discriminator(self, x, is_training=True, reuse=False):...
        def generator(self, z, y, is_training=True, reuse=False):...
        def build_model(self):...
        def train(self):...
        def visualize_results(self, epoch):...

        def model_dir(self):...
        def save(self, checkpoint_dir, step):...
        def load(self, checkpoint_dir):...
    }
-- main.py
    {
    import ...
 
    def main():...

    if __name__ == '__main__':...
    }

对抗神经网络学习（五）——infoGAN生成宽窄不一，高低各异的服装影像(tensorflow实现)

一、背景

二、infoGAN原理

三、infoGAN实现

1.数据准备

2.数据操作函数准备(utils.py)

3.图层函数(layers.py)

4.infoGAN的模型实现(infoGAN.py)

5.main函数实现模型训练(main.py)

6.模型的执行

四、实验结果

五、分析

相关推荐