Leveraging the Invariant Side of Generative Zero-Shot Learning【CVPR2019】

PDF:Leveraging the Invariant Side of Generative Zero-Shot Learning
code:implement by pytorch

摘要

Conventional zero-shot learning (ZSL) methods generally learn an embedding, e.g., visual-semantic mapping, to handle the unseen visual samples via an indirect manner. In this paper, we take the advantage of generative adversarial networks (GANs) and propose a novel method, named leveraging invariant side GAN (LisGAN), which can directly generate the unseen features from random noises which are conditioned by the semantic descriptions. Specifically, we train a conditional Wasserstein GANs in which the generator synthesizes fake unseen features from noises and the discriminator distinguishes the fake from real via a minimax game. Considering that one semantic description can correspond to various synthesized visual samples, and the semantic description, figuratively, is the soul of the generated features, we introduce soul samples as the invariant side of generative zero-shot learning in this paper. A soul sample is the meta-representation of one class. It visualizes the most semantically-meaningful aspects of each sample in the same category. We regularize that each generated sample (the varying side of generative ZSL) should be close to at least one soul sample (the invariant side) which has the same class label with it. At the zero-shot recognition stage, we propose to use two classifiers, which are deployed in a cascade way, to achieve a coarse-to-fine result. Experiments on five popular benchmarks verify that our proposed approach can outperform state-of-the-art methods with significant improvements.
本文利用条件WGAN生成不可见类的feature,然后利用可见类训练集中的feature与生成的feature训练一个分类器,利用该分类器即可完成零样本学习的预测。
本文有两个创新点
1.提出用soul samples解决visual object的multi-view的质量问题(详见下文),同时其可以约束GAN的generator生成的fake feature.
2.在训练分类器时,提出使用串联的分类器,以达到得到由粗到细的效果。做法是:将第一个分类器输出具有高确信度的feature加入到第一个分类器的输入数据中,然后训练第二个分类器。加入的数据中可能包含不可见类的feature。(提升0.5%~1%)

网络框图

Leveraging the Invariant Side of Generative Zero-Shot Learning【CVPR2019】

标记:
soul samples:由可见类聚类取簇中心得来(K=3)。

为什么用soul example?

Leveraging the Invariant Side of Generative Zero-Shot Learning【CVPR2019】

训练过程(代码和论文稍有一点不一样)

WGAN的generator和discriminator交替更新权重:
LD=E[D(G(z,a))]E[D(x)]βE[(x^D(x^)21)2] L_D = E[D(G(z,a))] - E[D(x)]-\beta E[(||\bigtriangledown_{\hat{x}}D(\hat{x})||_2 -1)^2]
LG=E[D(G(z,a))]λE[logP(yG(z,a)]+LR1+LR2 L_G = -E[D(G(z,a))]- \lambda E[logP(y|G(z,a)] + L_{R_1} + L_{R_2}
其中后两者为约束fake feature的生成。

实验结果

Leveraging the Invariant Side of Generative Zero-Shot Learning【CVPR2019】

结论与未来工作

In this paper, we propose a novel zero-shot learning method by taking advantage of generative adversarial networks. Specially, we deploy conditional WGAN to synthesize fake unseen samples from random noises. To guarantee that each generated sample is close to real ones and their corresponding semantic descriptions, we introduce soul samples regularizations in the GAN generator. At the zero-shot recognition stage, we further propose to use a cascade classifier to fine-tune the accuracy. Extensive experiments on five popular benchmarks verified that our method can outperform previous state-of-the-art ones with remarkable advances. In our future work, we will explore data augmentation with GAN which can be used to synthesize more semantic descriptions to cover more unseen samples.
黑体部分很清楚地写出本文的贡献点。未来工作是对generator进行约束,减轻bias problem。