【美味蟹堡王今日开业】论文学习笔记10-03
StyleNet: Generating Attractive Visual Captions with Styles[paper]
- factored LSTM module and multi-task learning
Abstract
We propose a novel framework named StyleNet to address the task of generating attractive captions for images and videos with different styles. To this end, we devise a novel model component, named factored LSTM, which automatically distills the style factors in the monolingual text corpus. Then at runtime, we can explicitly control the style in the caption generation process so as to produce attractive visual captions with the desired style. Our approach achieves this goal by leveraging two sets of data: 1) factual image/video-caption paired data, and 2) stylized monolingual text data (e.g., romantic and humorous sentences). We show experimentally that StyleNet outperforms existing approaches for generating visual captions with different styles, measured in both automatic and human evaluation metrics on the newly collected FlickrStyle10K image caption dataset, which contains 10K Flickr images with corresponding humorous and romantic captions.
Contributions
• To the best of our knowledge, we are the first to investigate the problem of generating attractive image captions with styles without using supervised stylespecific image-caption paired data.
• We propose an end-to-end trainable StyleNet framework, which automatically distills the style factors from monolingual textual corpora. In caption generation, the style factor can be explicitly incorporated to produce attractive captions with the desired style.
• We have collected a new Flickr stylized image caption dataset. We expect that this dataset can help advance the research of image captioning with styles.
• We demonstrate that our StyleNet framework and Flickr stylized image caption dataset can also be used to produce attractive video captions.
Face Aging with Identity-Preserved Conditional Generative Adversarial Networks[paper]
Abstract
Face aging is of great importance for cross-age recognition and entertainment related applications. However, the lack of labeled faces of the same person across a long age range makes it challenging. Because of different aging speed of different persons, our face aging approach aims at synthesizing a face whose target age lies in some given age group instead of synthesizing a face with a certain age. By grouping faces with target age together, the objective of face aging is equivalent to transferring aging patterns of faces within the target age group to the face whose aged face is to be synthesized. Meanwhile, the synthesized face should have the same identity with the input face. Thus we propose an Identity-Preserved Conditional Generative Adversarial Networks (IPCGANs) framework, in which a Conditional Generative Adversarial Networks module functions as generating a face that looks realistic and is with the target age, an identity-preserved module preserves the identity information and an age classifier forces the generated face with the target age. Both qualitative and quantitative experiments show that our method can generate more realistic faces in terms of image quality, person identity and age consistency with human observations.
Introduction(part)
However, the lack of training samples for a given person over a long range of years [15][20] [3] [21] makes face aging still an extremely challenging task in computer vision.
Traditional face aging methods can roughly be categorized into prototype-based approaches [11] and physical model-based approaches [25]. Prototype-based approaches usually compute an average face within a face age group first, and the difference between different average faces from different groups would be treated as aging pattern which would be used for synthesizing an aged face [11]. Consequently, person-specific information of each person would be lost, which results in the synthesized faces look unrealistic. By contrast, physical model-based approaches model the shape and texture changes with age in terms of hair colors, muscles, and wrinkles, etc. with a parametric model, which usually requires lots of training samples and is computationally expensive. Inspired by the success of CGANs, we propose an Identity-Preserved Conditional Generative Adversarial Networks (IPCGANs) for face aging. Specifically, our IPCGANs consists of three modules: a CGANs module, an identity-preserved module and an age classifier. The generator of CGANs takes an input image and a target age code as its input and generates a face with the target age. The generated face is expected to be indistinguishable from real faces in that target age group by the discriminator. To keep identity information, we introduce a perceptual loss [4] in the objective of IPCGANs. Finally, to guarantee the synthesized faces fall into the target age group, we send the generated aged faces to a pre-trained age classifier and add an age classification loss to the objective. Since all components of our IPCAGANs are differentiable with respect to the model parameters, the whole network can be trained in an end-to-end fashion.
The contributions of this paper are summarized as follows:
1. We propose to impose an identity-preserved term and an age classification term into the objective of our IPCGANs. The former lets the aged faces keep the same identity with the input face. The latter is to make sure the generated faces be with the target age. Extensive experiments validate the effectiveness of both terms for preserving the identity information and making the face aging effect evident.
2. Other than quantitatively evaluate the quality of the synthesized faces, we also propose to conduct face verification and face age classification for the generated aged faces by means of user study. Our proposed data augmentation experiment also validates the effectiveness of IPCGANs.
3. IPCGANs is not limited to face aging problem, it is a general framework. Without any modification, IPCGANs can be applied to multi-attribute generation task, like hair colors, facial expressions, etc, which can be used for imbalanced data classification scenes.
UV-GAN: Adversarial Facial UV Map Completion for Pose-invariant Face Recognition[paper]
Abstract
Recently proposed robust 3D face alignment methods establish either dense or sparse correspondence between a 3D face model and a 2D facial image. The use of these methods presents new challenges as well as opportunities for facial texture analysis. In particular, by sampling the image using the fitted model, a facial UV can be created. Unfortunately, due to self-occlusion, such a UV map is always incomplete. In this paper, we propose a framework for training Deep Convolutional Neural Network (DCNN) to complete the facial UV map extracted from in-the-wild images. To this end, we first gather complete UV maps by fitting a 3D Morphable Model (3DMM) to various multiview image and video datasets, as well as leveraging on a new 3D dataset with over 3,000 identities. Second, we devise a meticulously designed architecture that combines local and global adversarial DCNNs to learn an identity-preserving facial UV completion model. We demonstrate that by attaching the completed UV to the fitted mesh and generating instances of arbitrary poses, we can increase pose variations for training deep face recognition/verification models, and minimise pose discrepancy during testing, which lead to better performance. Experiments on both controlled and in-the-wild UV datasets prove the effectiveness of our adversarial UV completion model. We achieve state-of-theart verification accuracy, 94.05%, under the CFP frontalprofile protocol only by combining pose augmentation during training and pose discrepancy reduction during testing. We will release the first in-the-wild UV dataset (we refer as WildUV) that comprises of complete facial UV maps from 1,892 identities for research purposes.
很棒的review:CVPR 2018 中国论文分享会之 「GAN 与合成」
Attentive Generative Adversarial Network for Raindrop Removal from A Single Image[paper]
Deep Joint Rain Detection and Removal from a Single Image[paper]
A Common Framework for Interactive Texture Transfer[paper]
- 嗯不过这个工作使用的是传统框架,没有用神经网络
DA-GAN: Instance-level Image Translation by Deep Attention Generative Adversarial Networks (with Supplementary Materials)[paper]
- 这篇文章好像很不错的样子!已经有很多解读:
[1] DA-GAN技术:计算机帮你创造奇妙“新物种”【科普】
[2] https://blog.****.net/sinat_31790817/article/details/79658006【译,结构细节】
[3] https://github.com/wolegechu/learning-every-day/issues/2【高层分析很赞】
[4] http://chenpeng.online/2018/05/23/DA-GAN/
Abstract
Unsupervised image translation, which aims in translating two independent sets of images, is challenging in discovering the correct correspondences without paired data. Existing works build upon Generative Adversarial Network (GAN) such that the distribution of the translated images are indistinguishable from the distribution of the target set. However, such set-level constraints cannot learn the instance-level correspondences (e.g. aligned semantic parts in object configuration task). This limitation often results in false positives (e.g. geometric or semantic artifacts), and further leads to mode collapse problem. To address the above issues, we propose a novel framework for instance-level image translation by Deep Attention GAN (DA-GAN). Such a design enables DA-GAN to decompose the task of translating samples from two sets into translating instances in a highly-structured latent space. Specifically, we jointly learn a deep attention encoder, and the instancelevel correspondences could be consequently discovered through attending on the learned instance pairs. Therefore, the constraints could be exploited on both set-level and instance-level. Comparisons against several state-of-the-arts demonstrate the superiority of our approach, and the broad application capability, e.g, pose morphing, data augmentation, etc., pushes the margin of domain translation problem.
![]()