AttGAN: Facial Attribute Editing by Only Changing What You Want(TIP19)

III. ATTRIBUTE GAN (ATTGAN)

前提:所有attribute都是binary型的
AttGAN: Facial Attribute Editing by Only Changing What You Want(TIP19)

A. Testing Formulation

定义输入图像为xa\mathbf{x^a},包含nn个attribute a=[a1,,an]\mathbf{a}=\left [ a_1, \cdots, a_n \right ]

encoder网络GencG_{enc}xa\mathbf{x^a}编码为latent representation z\mathbf{z}
z=Genc(xa)(3) \mathbf{z} = G_{enc}(\mathbf{x^a}) \qquad(3)

target attribute记为b=[b1,,bn]\mathbf{b}=\left [ b_1, \cdots, b_n \right ]

decoder网络GdecG_{dec}z\mathbf{z}b\mathbf{b}作为输入,生成图像xb^\mathbf{x^{\hat{b}}}
xb^=Gdec(z,b)(4) \mathbf{x^{\hat{b}}} = G_{dec}(\mathbf{z}, \mathbf{b}) \qquad(4)

综合公式(3)和(4),有
xb^=Gdec(Genc(xa),b)(5) \mathbf{x^{\hat{b}}} = G_{dec}(G_{enc}(\mathbf{x^a}), \mathbf{b}) \qquad(5)

B. Training Formulation

整个训练过程是无监督的,因为ground truth xb\mathbf{x^b}是未知的

Reconstruction Loss

我们希望只编辑attribute改变的部分,同时保留其它attribute不变,因此引入reconstruction learning(文章给出了2个理由,感觉略显牵强)

b=a\mathbf{b}=\mathbf{a},得到生成图像xa^\mathbf{x^{\hat{a}}}
xa^=Gdec(z,a)(6) \mathbf{x^{\hat{a}}} = G_{dec}(\mathbf{z}, \mathbf{a}) \qquad(6)
那么xa^\mathbf{x^{\hat{a}}}xa\mathbf{x^a}应该比较近似,于是关于生成器GGReconstruction Loss定义如下
minGenc,Gdec Lrec=Exapdataxaxa^1(11) \underset{G_{enc},G_{dec}}{\min}\ \mathcal{L}_{rec}=\mathbb{E}_{\mathbf{x^a}\sim p_{data}} \left \| \mathbf{x^a}-\mathbf{x^{\hat{a}}} \right \|_1 \qquad(11)
使用1\ell_1 loss相较于2\ell_2 loss不容易模糊

Attribute Classification Constraint

生成图像xb^\mathbf{x^{\hat{b}}}应该确保包含属性b\mathbf{b},因此引入一个attribute classifier CC

于是关于生成器GGAttribute Classification Constraint定义如下
minGenc,GdecLclsg=Exapdata,bpattr[g(xa,b)](7) \underset{G_{enc}, G_{dec}}{\min}\mathcal{L}_{cls_g}=\mathbb{E}_{\mathbf{x^a}\sim p_data, \mathbf{b}\sim p_{attr}}\left [ \ell_g\left ( \mathbf{x^a}, \mathbf{b} \right ) \right ] \qquad(7)
g(xa,b)=i=1nbilogCi(xb^)(1bi)log(1Ci(xb^))(8) \ell_g(\mathbf{x^a}, \mathbf{b})=\sum_{i=1}^{n}-b_i\log C_i\left ( \mathbf{x^{\hat{b}}} \right )-(1-b_i)\log\left ( 1-C_i\left ( \mathbf{x^{\hat{b}}} \right ) \right ) \qquad(8)

attribute classifier CC的训练目标如下
minC Lclsc=Exapdata[r(xa,a)](9) \underset{C}{\min}\ \mathcal{L}_{cls_c}=\mathbb{E}_{\mathbf{x^a}\sim p_data}\left [ \ell_r(\mathbf{x^a}, \mathbf{a}) \right ] \qquad(9)
r(xa,a)=i=1nailogCi(xa)(1ai)log(1Ci(xa))(10) \ell_r(\mathbf{x^a}, \mathbf{a})=\sum_{i=1}^{n}-a_i\log C_i\left ( \mathbf{x^a} \right )-(1-a_i)\log\left ( 1-C_i\left ( \mathbf{x^a} \right ) \right ) \qquad(10)

Adversarial Loss

使用WGAN-GP版本的adversarial Loss,判别器DD和生成器GG的目标函数分别如下

minDL1Ladvd=ExapdataD(xa)+Exapdata,bpattrD(xb^)(12) \underset{\left \| D \right \|_L\leqslant 1}{\min} \mathcal{L}_{adv_{d}}=-\mathbb{E}_{\mathbf{x^a}\sim p_{data}}D(\mathbf{x^a})+\mathbb{E}_{\mathbf{x^a}\sim p_{data},\mathbf{b}\sim p_{attr}}D\left ( \mathbf{x^{\hat{b}}} \right ) \qquad(12)
minGenc,GdecLadvg=Exapdata,bpattrD(xb^)(13) \underset{G_{enc},G_{dec}}{\min}\mathcal{L}_{adv_g}=-\mathbb{E}_{\mathbf{x^a}\sim p_{data},\mathbf{b}\sim p_{attr}}D\left ( \mathbf{x^{\hat{b}}} \right ) \qquad(13)

Overall Objective

生成器GG的目标函数如下
minGenc,GdecLenc,dec=λ1Lrec+λ2Lclsg+Ladvg(14) \underset{G_{enc},G_{dec}}{\min}\mathcal{L}_{enc,dec}=\lambda_1\mathcal{L}_{rec}+\lambda_2\mathcal{L}_{cls_g}+\mathcal{L}_{adv_g} \qquad(14)

判别器DD和attribute classifier CC的目标函数如下
minD,C Ldis,cls=λ3Lclsc+Ladvd(15) \underset{D,C}{\min}\ \mathcal{L}_{dis,cls}=\lambda_3\mathcal{L}_{cls_c}+\mathcal{L}_{adv_d} \qquad(15)

C. Why are attribute-excluding details preserved?

AttGAN执行了multi-task learning,一个是face reconstruction task,另一个是attribute editing task

作者认为这两个task是高度相似的,它们之间的transferability gap非常小,因此the detail preservation ability learned from the face reconstruction task can be easily transfered to the attribute editing task

D. Extension for Attribute Style Manipulation

参考文献[28]和[26],引入一组style controllers θ=[θ1,,θi,,θn]\theta=\left [ \theta_1, \cdots, \theta_i, \cdots, \theta_n \right ],然后maximize the mutual information between the controllers and the output images to make them highly correlated
AttGAN: Facial Attribute Editing by Only Changing What You Want(TIP19)
具体来说,如Figure 3所示,额外引入一个style predictor QQ,encoder网络GdecG_{dec}额外接收θ\theta作为输入,生成具备target attribute b\mathbf{b}和style controllerθ\theta的图像xθ^b^\mathbf{x^{\hat{\theta}\hat{b}}}
xθ^b^=Gdec(Genc(xa),θ,b)(16) \mathbf{x^{\hat{\theta}\hat{b}}}=G_{dec}\left ( G_{enc}(\mathbf{x^a}), \theta, \mathbf{b} \right ) \qquad(16)

style controller θ\theta与生成图像xx^*之间的mutual information定义如下
I(θ;x)=maxQ Eθp(θ),xp(xθ)[logQ(θx)]+const(17) I\left ( \theta;x^* \right )=\underset{Q}{\max}\ \mathbb{E}_{\theta\sim p(\theta), x^*\sim p(x^*|\theta)}\left [ \log Q(\theta|x^*) \right ] + const \qquad(17)

故生成器GG新增一项损失函数如下
maxGenc,GdecI(θ;x)(18) \underset{G_{enc}, G_{dec}}{\max}I\left ( \theta;x^* \right ) \qquad(18)