Age Progression and Regression with Spatial Attention Modules(AAAI20)

Method

Problem Formulation

定义young face image为Iy\mathbf{I}_y,对应的age为αy\bm{\alpha}_y,是一个one-hot向量

给定目标age αo\bm{\alpha}_o(要求αo>αy\bm{\alpha}_o\gt\bm{\alpha}_y),我们希望学习一个age progressor GpG_p,能够生成older face image Io\mathbf{I}_o,即Io=Gp(Iy,αo)\mathbf{I}_o=G_p\left ( \mathbf{I}_y, \bm{\alpha}_o \right )
注:下标yy表示young,oo表示old

需要注意的是,数据集使用的是unpaired aging data,即不要求同一个人的young face image和older face image

另一方面,考虑age regression,引入age regressor GrG_r,利用之前生成的Io\mathbf{I}_o重构出young face image,即Iy=Gr(Io,αy)\mathbf{I}_y'=G_r\left ( \mathbf{I}_o, \bm{\alpha}_y \right )

我们将GpG_pGrG_r集成为一个统一的framework,得到unified
solution for both age progression and regression
Age Progression and Regression with Spatial Attention Modules(AAAI20)
如Figure 3所示,整个framework包含2个data flow cycle:age progression cycle和age regression cycle,涉及4个网络:GpG_pDpD_pGrG_rDrD_r

Network Architecture

本节介绍网络结构,因为age progression和age regression是类似的,所以我们将GpG_pGrG_r统称为GG,将DpD_pDrD_r统称为DD
Age Progression and Regression with Spatial Attention Modules(AAAI20)
Spatial Attention based Generator
GpG_pGrG_r的网络结构是相同的,以GpG_p为例,结构如Figure 2所示

已有的face aging工作采用的生成器结构是单路的(with single pathway),这样无法保证生成器只关注与aging相关的区域,生成结果会包含age-irrelevant changes和ghosting artifacts

为了解决这个问题,我们引入spatial attention mechanism,采用多路(多分支)的生成器结构

具体来说,如Figure 2(a)所示,设置一个FCN GpAG_p^A用于生成attention mask,另一个FCN GpIG_p^I是常规的生成器,最终利用attention mask融合两幅图像,得到最终的生成结果Io\mathbf{I}_o
Io=GpA(Iy,αo)Iy+(1GpA(Iy,αo))GpI(Iy,αo)(1) \mathbf{I}_o=G_p^A\left ( \mathbf{I}_y, \bm{\alpha}_o \right )\cdot\mathbf{I}_y+\left ( 1 - G_p^A\left ( \mathbf{I}_y, \bm{\alpha}_o \right ) \right )\cdot G_p^I\left ( \mathbf{I}_y, \bm{\alpha}_o \right ) \qquad(1)
其中GpA(Iy,αo)[0,1]H×WG_p^A\left ( \mathbf{I}_y, \bm{\alpha}_o \right )\in[0,1]^{H\times W}
注:此处的spatial attention mechanism其实就是GANimation中所用的

Discriminator
判别器DD的任务是区别real/fake images,同时还对image进行age的回归预测
注:本质上就是加了一个auxiliary predictor,这和换脸的套路是类似的

DD的结构是PatchGAN,包含6层Conv_4x4,Conv之后连接LeakyReLU

Loss Function

Adversarial Loss
采用least square adversarial loss
LGAN=EIy[(DpI(Gp(Iy,αo))1)2]+EIo[(DpI(Io)1)2]+EIy[DpI(Gp(Iy,αo))2]+EIo[(DrI(Gr(Io,αy))1)2]+EIy[(DrI(Iy)1)2]+EIo[DrI(Gr(Io,αy))2](2) \begin{aligned} \mathcal{L}_{GAN}&=\mathbb{E}_{\mathbf{I}_y}\left [ \left ( D_p^I\left ( G_p\left ( \mathbf{I}_y, \bm{\alpha}_o \right ) \right )-1 \right )^2 \right ] \\ &+\mathbb{E}_{\mathbf{I}_o}\left [ \left ( D_p^I\left ( \mathbf{I}_o \right )-1 \right )^2 \right ]+\mathbb{E}_{\mathbf{I}_y}\left [ D_p^I\left ( G_p\left ( \mathbf{I}_y, \bm{\alpha}_o \right ) \right )^2 \right ] \\ &+\mathbb{E}_{\mathbf{I}_o}\left [ \left ( D_r^I\left ( G_r\left ( \mathbf{I}_o, \bm{\alpha}_y \right ) \right )-1 \right )^2 \right ] \\ &+\mathbb{E}_{\mathbf{I}_y}\left [ \left ( D_r^I\left ( \mathbf{I}_y \right )-1 \right )^2 \right ]+\mathbb{E}_{\mathbf{I}_o}\left [ D_r^I\left ( G_r\left ( \mathbf{I}_o, \bm{\alpha}_y \right ) \right )^2 \right ] \qquad(2) \end{aligned}
注:对于GG来说,最小化LGAN\mathcal{L}_{GAN},对于DD来说,最大化LGAN\mathcal{L}_{GAN}

Reconstruction Loss
使用L1-norm来避免blurry
Lrecon=EIyGr(Gp(Iy,αo))Iy1+EIoGp(Gr(Io,αy))Io1(3) \begin{aligned} \mathcal{L}_{recon}&=\mathbb{E}_{\mathbf{I}_y}\left \| G_r\left ( G_p\left ( \mathbf{I}_y, \bm{\alpha}_o \right ) \right )-\mathbf{I}_y \right \|_1 \\ &+\mathbb{E}_{\mathbf{I}_o}\left \| G_p\left ( G_r\left ( \mathbf{I}_o, \bm{\alpha}_y \right ) \right )-\mathbf{I}_o \right \|_1 \qquad(3) \end{aligned}

Attention Activation Loss
在生成器中,attention mask很容易saturate to 1,为了解决这个问题,提出attention activation loss对attention mask进行约束
Lactv=EIyGpA(Iy,αo)2+EIoGrA(Io,αy)2(4) \mathcal{L}_{actv}=\mathbb{E}_{\mathbf{I}_y}\left \| G_p^A\left ( \mathbf{I}_y, \bm{\alpha}_o \right ) \right \|_2+\mathbb{E}_{\mathbf{I}_o}\left \| G_r^A\left ( \mathbf{I}_o, \bm{\alpha}_y \right ) \right \|_2 \qquad(4)
注:实际上就是对attention mask添加L2正则化,这和GANimiation中的一模一样

Age Regression Loss
在判别器中增加的age predictor分支记为DαD^\alpha,则对于生成器来说,生成图像的age要与target age越接近越好
Lreg=EIyDpα(Gp(Iy,αo))αo2+EIyDpα(Iy)αy2+EIoDrα(Gr(Io,αy))αy2+EIoDrα(Io)αo2(5) \begin{aligned} \mathcal{L}_{reg}&=\mathbb{E}_{\mathbf{I}_y}\left \| D_p^\alpha\left ( G_p\left ( \mathbf{I}_y, \bm{\alpha}_o \right ) \right )-\bm{\alpha}_o \right \|_2 \\ &+\mathbb{E}_{\mathbf{I}_y}\left \| D_p^\alpha\left ( \mathbf{I}_y \right )-\bm{\alpha}_y \right \|_2 \\ &+\mathbb{E}_{\mathbf{I}_o}\left \| D_r^\alpha\left ( G_r\left ( \mathbf{I}_o, \bm{\alpha}_y \right ) \right )-\bm{\alpha}_y \right \|_2 \\ &+\mathbb{E}_{\mathbf{I}_o}\left \| D_r^\alpha\left ( \mathbf{I}_o \right )-\bm{\alpha}_o \right \|_2 \qquad(5) \end{aligned}

Overall Loss
L=LGAN+λreconLrecon+λactvLactv+λregLreg(6) \mathcal{L}=\mathcal{L}_{GAN}+\lambda_{recon}\mathcal{L}_{recon}+\lambda_{actv}\mathcal{L}_{actv}+\lambda_{reg}\mathcal{L}_{reg} \qquad(6)
minGp,Gr maxDp,Dr L(7) \underset{G_p,G_r}{\min}\ \underset{D_p,D_r}{\max}\ \mathcal{L} \qquad(7)
Question:训练DpD_pDrD_r的损失函数感觉不应该直接写成L\mathcal{L}