您的位置: 首页 > 文章 > 《A Through Examination of the CNN_Daily Mail Reading Comprehension Task》——Stanford Attentive Reader

《A Through Examination of the CNN_Daily Mail Reading Comprehension Task》——Stanford Attentive Reader

分类: 文章 • 2025-02-18 14:48:46

序

论文其他细节不再注意，只关注它的网络结构。
可能是年代比较久远，github上只有一个这篇论文的代码…还是python2.7的

模型结构

《A Through Examination of the CNN_Daily Mail Reading Comprehension Task》——Stanford Attentive Reader

模型分三部分：
第一部分，编码：问题的词编码一样，先通过一个embedding表，把词编程embedding，然后过双向GRU，前向和后向连在一起表示这个token出的表示，同样对问题也编码，只说了问题编码后的维度：h,估计和其他论文一样，都是前向后向的最后一个concat到一起。

《A Through Examination of the CNN_Daily Mail Reading Comprehension Task》——Stanford Attentive Reader

第二部分：attention部分，跟其他论文一样，只是attention的计算方式变了：bilinear term，公式见下：
大概率感觉这个Ws矩阵应该是个变量，需要学习出来。
第三部分： predict部分，细节在下面的对比里面说

《A Through Examination of the CNN_Daily Mail Reading Comprehension Task》——Stanford Attentive Reader

和 attentive reader对比

第一

attention匹配函数不一样，而且这个变化对于结果好贡献很大。

第二

和attentive reader对比，这里直接用o去预测了，没有像attentive reader一样再加上question 的embedding q，并且表现也不差。

第三

这个模型最后预测时不用整个词库，只用了entity的词库。
最搞笑的是：加粗那一句，他们说只有第一个是最重要的，其他都是为了简化模型，所以模型核心就是换了一个attention 匹配函数，和张俊林大佬说的一样。
The original model considers all the words from the vocabulary V in making predictions. We think this is unnecessary, and only predict among entities which appear in the passage. Of these changes, only the first seems important; the other two just aim at keeping the model simple.

END

本篇完