Fusion Convolutional Attention Network for Opinion Spam Detection(ICONIP 2019)

Fusion Convolutional Attention Network for Opinion Spam Detection(ICONIP 2019)

contribution:

  1. 本文设计了一个层次化网络来自动学习语义表达和用户信息。不需要手动设计特征并且不受数据集限制
  2. 本文提出了融合注意力集之来融合user-word和user-sentence特征
  3. 实验表明本方法在四个数据集上取得了较大的提升

模型:

D:review dataset

U:user’s meta datas

分类函数:![C:\Users\94205\AppData\Roaming\Typora\typora-user-images](https://img-blog.****img.cn/2020083015340468.png#pic_center)

3.1user representation

Fusion Convolutional Attention Network for Opinion Spam Detection(ICONIP 2019)
Fusion Convolutional Attention Network for Opinion Spam Detection(ICONIP 2019)

Fusion Convolutional Attention Network for Opinion Spam Detection(ICONIP 2019)

Fusion Convolutional Attention Network for Opinion Spam Detection(ICONIP 2019)
Fusion Convolutional Attention Network for Opinion Spam Detection(ICONIP 2019)

3.2text representation

作者将每个文档分成L个句子,每个句子包含T个词。首先使用卷积核Fusion Convolutional Attention Network for Opinion Spam Detection(ICONIP 2019)
Fusion Convolutional Attention Network for Opinion Spam Detection(ICONIP 2019)

一个句子得到的表达为:
Fusion Convolutional Attention Network for Opinion Spam Detection(ICONIP 2019)
Fusion Convolutional Attention Network for Opinion Spam Detection(ICONIP 2019)

然后用m个filter得到多个feature vectorsFusion Convolutional Attention Network for Opinion Spam Detection(ICONIP 2019)

3.3word-user fusion attention layer

为了在word level学习user-aware的文本表达,作者提出word-user fusion attention来把用户信息结合到word-level的向量

Fusion Convolutional Attention Network for Opinion Spam Detection(ICONIP 2019)
Fusion Convolutional Attention Network for Opinion Spam Detection(ICONIP 2019)

是T-h+1份user representation的copy。$W_e \in R^{m \times m},W_u \in R^{m \times m} $是权重矩阵,然后我们得到了user-ware的word representation ZR(Th+1)×mZ \in R^{(T-h+1) \times m}.

从用户的角度来看,不是所有的单词对于用户的偏好反应都是相等的,因为用attention layer来捕捉不同词的重要程度。

Fusion Convolutional Attention Network for Opinion Spam Detection(ICONIP 2019)

where WzRm×nW_z \in R^{m \times n},VR(Th+1)×nV \in R^{(T-h+1) \times n}是学习的参数,vwRn×1v_w \in R^{n \times 1}是一个context vector。训练过程中这个词向量的初始化是随机的并且是同时学习的。最后得到句子向量 s(h)Rms^{(h)} \in R^m是不同word representation的加权表达

CNN重我们设置卷积核大小h3,4,5h \in {3,4,5},因此得到Fusion Convolutional Attention Network for Opinion Spam Detection(ICONIP 2019)
,where sR3ms \in R^{3m}

3.4 sentence-user fusion attention layer

经过句子编码之后,文档矩阵可以被表示为Fusion Convolutional Attention Network for Opinion Spam Detection(ICONIP 2019)

为了加强用户信息的影响,sentence-aware fusion attention用于抓取user-aware的重要句子并通过加权求和得到文档表达。

Fusion Convolutional Attention Network for Opinion Spam Detection(ICONIP 2019)
Fusion Convolutional Attention Network for Opinion Spam Detection(ICONIP 2019)
Fusion Convolutional Attention Network for Opinion Spam Detection(ICONIP 2019)
是权重矩阵,MRL×mM \in R^{L \times m}是S的hidden representation,vRm×1v \in R^{m \times1}是context vector

3.5 spam classification

Fusion Convolutional Attention Network for Opinion Spam Detection(ICONIP 2019)

loss function
Fusion Convolutional Attention Network for Opinion Spam Detection(ICONIP 2019)
overall loss function

Fusion Convolutional Attention Network for Opinion Spam Detection(ICONIP 2019)
4.实验

数据集介绍:
Fusion Convolutional Attention Network for Opinion Spam Detection(ICONIP 2019)

与baseline对比

Fusion Convolutional Attention Network for Opinion Spam Detection(ICONIP 2019)
Fusion Convolutional Attention Network for Opinion Spam Detection(ICONIP 2019)

ablation study

Fusion Convolutional Attention Network for Opinion Spam Detection(ICONIP 2019)