【论文阅读】Incorporating copying mechanisim in sequence-to-sequence learning

摘要：

借鉴人类面对复杂生词死记硬背的方式，本文提出了一种机制，将拷贝的方式结合seq2seq模型（CopyNet），能比较好的处理OOV问题，可以将部分实体直接复制到输出中。

介绍：

回答中的一些子序列是copy的问句中的，如图：

【论文阅读】Incorporating copying mechanisim in sequence-to-sequence learning

这些往往是一些名词实体、日期等

背景：

介绍了seq2seq模型

encoder: 【论文阅读】Incorporating copying mechanisim in sequence-to-sequence learning

decoder: 【论文阅读】Incorporating copying mechanisim in sequence-to-sequence learning

attention 机制：

上下文向量ct是变化的（会遗忘和记住）【论文阅读】Incorporating copying mechanisim in sequence-to-sequence learning

CopyNet模型：

类似死记硬背，理解的少，但是更好的保留了字面的完整性。

3.1 模型架构

模型架构如图：

【论文阅读】Incorporating copying mechanisim in sequence-to-sequence learning

整体上还是encoder-decoder模式，袁淑茹经过encoder转化为合适的表达形式，decoder再将其转化为目标词汇

encoder部分是一个双向的RNN，将源序列转化为等长的hidden state ht，每个ht都与原序列中xt有关，输出的{h1,h2,,,hts}作为一个矩阵M，是decoder的输入

decoder部分以M为输入，与传统的RNN decoder不同之处在于以下几点：

预测：两种模型来预测，生成式模型和拷贝模型，后者直接从源序列中拷贝

状态更新：在t-1时刻的状态用来更新t时刻的状态，区别在于不仅仅使用word embedding ，还使用M矩阵中相应位置的hidden-state，因为copy的时候位置信息很重要

读入M：CopyNet会选择性的读取M，来获得内容和位置信息

3.2 用拷贝和生成预测

构造V为常用词表（高频），UNK是OOV部分的词，X表示输入序列中unique的词（只出现过一次），X中是可能包含V中没有的词的。最终的输入序列是三者的并集。

给定decoder M和当前状态后最终用于预测的式子是：

【论文阅读】Incorporating copying mechanisim in sequence-to-sequence learning

结合这个图看可能更清楚：

【论文阅读】Incorporating copying mechanisim in sequence-to-sequence learning 这两个都是具体的yt是vi或xj的概率

3.3 状态更新

decoder中的状态更新：

【论文阅读】Incorporating copying mechanisim in sequence-to-sequence learning

在拷贝机制中，【论文阅读】Incorporating copying mechanisim in sequence-to-sequence learning

【论文阅读】Incorporating copying mechanisim in sequence-to-sequence learning

【论文阅读】Incorporating copying mechanisim in sequence-to-sequence learning 也表示选择性的读取M矩阵，是专门为拷贝机制设定的

3.4 混合处理M矩阵

M矩阵中既包含了内容信息，又有位置信息。COPYNET在attentive read时由内容（语义）信息和语言模型来驱动，即生成模式；在拷贝模式时，由位置信息来控制。

位置更新的方式如图：

【论文阅读】Incorporating copying mechanisim in sequence-to-sequence learning

学习过程

【论文阅读】Incorporating copying mechanisim in sequence-to-sequence learning

实验：

一共分为三个实验：

简单规则构造的合成数据。文本摘要相关的真实数据。简单对话系统的数据。

【论文阅读】Incorporating copying mechanisim in sequence-to-sequence learning

摘要：

介绍：

背景：

CopyNet模型：

学习过程

实验：

相关推荐