DL之Attention:Attention算法的相关论文、设计思路、关键步骤等配图集合之详细攻略

DL之Attention:Attention算法的相关论文、设计思路、关键步骤等配图集合之详细攻略

 

Attention算法的相关论文

DL之Attention:Attention算法的相关论文、设计思路、关键步骤等配图集合之详细攻略

1、关于注意力机制的更多细节,可以参考原始论文《Neural Machine Translation by Jointly Learning to Align and Translate》。此外还有改进 版的注意力机制《Effective Approaches to Attention-based Neural Machine Translation》。
论文地址:http://cn.arxiv.org/pdf/1409.0473v7

 

 

Attention算法的设计思路

1、

ABSTRACT:Neural machine translation is a recently proposed approach to machine translation. Unlike the traditional statistical machine translation, the neural machine translation aims at building a single neural network that can be jointly tuned to maximize the translation performance. The models proposed recently for neural machine translation often belong to a family of encoder–decoders and encode a source sentence into a fixed-length vector from which a decoder generates a translation. In this paper, we conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder–decoder architecture, and propose to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly. With this new approach, we achieve a translation performance comparable to the existing state-of-the-art phrase-based system on the task of English-to-French translation. Furthermore, qualitative analysis reveals that the (soft-)alignments found by the model agree well with our intuition.

DL之Attention:Attention算法的相关论文、设计思路、关键步骤等配图集合之详细攻略

 

Attention算法的关键步骤

1、Four sample alignments found by RNNsearch-50. The x-axis and y-axis of each plot correspond to the words in the source sentence (English) and the generated translation (French), respectively. Each pixel shows the weight αij of the annotation of the j-th source word for the i-th target word (see Eq. (6)), in grayscale (0: black, 1: white). (a) an arbitrary sentence. (b–d) three randomly selected samples among the sentences without any unknown words and of length between 10 and 20 words from the test set.

DL之Attention:Attention算法的相关论文、设计思路、关键步骤等配图集合之详细攻略