Age Progression and Regression with Spatial Attention Modules（AAAI20）

Method

Problem Formulation

定义young face image为 $\mathbf{I}_y$ ，对应的age为 $\bm{\alpha}_y$ ，是一个one-hot向量

给定目标age $\bm{\alpha}_o$ （要求 $\bm{\alpha}_o\gt\bm{\alpha}_y$ ），我们希望学习一个age progressor $G_p$ ，能够生成older face image $\mathbf{I}_o$ ，即 $\mathbf{I}_o=G_p\left ( \mathbf{I}_y, \bm{\alpha}_o \right )$
注：下标 $y$ 表示young， $o$ 表示old

需要注意的是，数据集使用的是unpaired aging data，即不要求同一个人的young face image和older face image

另一方面，考虑age regression，引入age regressor $G_r$ ，利用之前生成的 $\mathbf{I}_o$ 重构出young face image，即 $\mathbf{I}_y'=G_r\left ( \mathbf{I}_o, \bm{\alpha}_y \right )$

我们将 $G_p$ 和 $G_r$ 集成为一个统一的framework，得到unified
solution for both age progression and regression

如Figure 3所示，整个framework包含2个data flow cycle：age progression cycle和age regression cycle，涉及4个网络： $G_p$ 、 $D_p$ 、 $G_r$ 、 $D_r$

Network Architecture

本节介绍网络结构，因为age progression和age regression是类似的，所以我们将 $G_p$ 和 $G_r$ 统称为 $G$ ，将 $D_p$ 和 $D_r$ 统称为 $D$
Age Progression and Regression with Spatial Attention Modules（AAAI20）
Spatial Attention based Generator
$G_p$ 和 $G_r$ 的网络结构是相同的，以 $G_p$ 为例，结构如Figure 2所示

已有的face aging工作采用的生成器结构是单路的（with single pathway），这样无法保证生成器只关注与aging相关的区域，生成结果会包含age-irrelevant changes和ghosting artifacts

为了解决这个问题，我们引入spatial attention mechanism，采用多路（多分支）的生成器结构

具体来说，如Figure 2(a)所示，设置一个FCN $G_p^A$ 用于生成attention mask，另一个FCN $G_p^I$ 是常规的生成器，最终利用attention mask融合两幅图像，得到最终的生成结果 $\mathbf{I}_o$
$\mathbf{I}_o=G_p^A\left ( \mathbf{I}_y, \bm{\alpha}_o \right )\cdot\mathbf{I}_y+\left ( 1 - G_p^A\left ( \mathbf{I}_y, \bm{\alpha}_o \right ) \right )\cdot G_p^I\left ( \mathbf{I}_y, \bm{\alpha}_o \right ) \qquad(1)$
其中 $G_p^A\left ( \mathbf{I}_y, \bm{\alpha}_o \right )\in[0,1]^{H\times W}$
注：此处的spatial attention mechanism其实就是GANimation中所用的

Discriminator
判别器 $D$ 的任务是区别real/fake images，同时还对image进行age的回归预测
注：本质上就是加了一个auxiliary predictor，这和换脸的套路是类似的

$D$ 的结构是PatchGAN，包含6层Conv_4x4，Conv之后连接LeakyReLU

Loss Function

Adversarial Loss
采用least square adversarial loss
$\begin{aligned} \mathcal{L}_{GAN}&=\mathbb{E}_{\mathbf{I}_y}\left [ \left ( D_p^I\left ( G_p\left ( \mathbf{I}_y, \bm{\alpha}_o \right ) \right )-1 \right )^2 \right ] \\ &+\mathbb{E}_{\mathbf{I}_o}\left [ \left ( D_p^I\left ( \mathbf{I}_o \right )-1 \right )^2 \right ]+\mathbb{E}_{\mathbf{I}_y}\left [ D_p^I\left ( G_p\left ( \mathbf{I}_y, \bm{\alpha}_o \right ) \right )^2 \right ] \\ &+\mathbb{E}_{\mathbf{I}_o}\left [ \left ( D_r^I\left ( G_r\left ( \mathbf{I}_o, \bm{\alpha}_y \right ) \right )-1 \right )^2 \right ] \\ &+\mathbb{E}_{\mathbf{I}_y}\left [ \left ( D_r^I\left ( \mathbf{I}_y \right )-1 \right )^2 \right ]+\mathbb{E}_{\mathbf{I}_o}\left [ D_r^I\left ( G_r\left ( \mathbf{I}_o, \bm{\alpha}_y \right ) \right )^2 \right ] \qquad(2) \end{aligned}$
注：对于 $G$ 来说，最小化 $\mathcal{L}_{GAN}$ ，对于 $D$ 来说，最大化 $\mathcal{L}_{GAN}$

Reconstruction Loss
使用L1-norm来避免blurry
$\begin{aligned} \mathcal{L}_{recon}&=\mathbb{E}_{\mathbf{I}_y}\left \| G_r\left ( G_p\left ( \mathbf{I}_y, \bm{\alpha}_o \right ) \right )-\mathbf{I}_y \right \|_1 \\ &+\mathbb{E}_{\mathbf{I}_o}\left \| G_p\left ( G_r\left ( \mathbf{I}_o, \bm{\alpha}_y \right ) \right )-\mathbf{I}_o \right \|_1 \qquad(3) \end{aligned}$

Attention Activation Loss
在生成器中，attention mask很容易saturate to 1，为了解决这个问题，提出attention activation loss对attention mask进行约束
$\mathcal{L}_{actv}=\mathbb{E}_{\mathbf{I}_y}\left \| G_p^A\left ( \mathbf{I}_y, \bm{\alpha}_o \right ) \right \|_2+\mathbb{E}_{\mathbf{I}_o}\left \| G_r^A\left ( \mathbf{I}_o, \bm{\alpha}_y \right ) \right \|_2 \qquad(4)$
注：实际上就是对attention mask添加L2正则化，这和GANimiation中的一模一样

Age Regression Loss
在判别器中增加的age predictor分支记为 $D^\alpha$ ，则对于生成器来说，生成图像的age要与target age越接近越好
$\begin{aligned} \mathcal{L}_{reg}&=\mathbb{E}_{\mathbf{I}_y}\left \| D_p^\alpha\left ( G_p\left ( \mathbf{I}_y, \bm{\alpha}_o \right ) \right )-\bm{\alpha}_o \right \|_2 \\ &+\mathbb{E}_{\mathbf{I}_y}\left \| D_p^\alpha\left ( \mathbf{I}_y \right )-\bm{\alpha}_y \right \|_2 \\ &+\mathbb{E}_{\mathbf{I}_o}\left \| D_r^\alpha\left ( G_r\left ( \mathbf{I}_o, \bm{\alpha}_y \right ) \right )-\bm{\alpha}_y \right \|_2 \\ &+\mathbb{E}_{\mathbf{I}_o}\left \| D_r^\alpha\left ( \mathbf{I}_o \right )-\bm{\alpha}_o \right \|_2 \qquad(5) \end{aligned}$

Overall Loss
$\mathcal{L}=\mathcal{L}_{GAN}+\lambda_{recon}\mathcal{L}_{recon}+\lambda_{actv}\mathcal{L}_{actv}+\lambda_{reg}\mathcal{L}_{reg} \qquad(6)$
$\underset{G_p,G_r}{\min}\ \underset{D_p,D_r}{\max}\ \mathcal{L} \qquad(7)$
Question：训练 $D_p$ 和 $D_r$ 的损失函数感觉不应该直接写成 $\mathcal{L}$

Age Progression and Regression with Spatial Attention Modules（AAAI20）

Method

Problem Formulation

Network Architecture

Loss Function

相关推荐