【SO-PMI】Sentiment classification of Movie and Product Reviews Using Contextual Valence Shifters

Sentiment classification of Movie and Product Reviews Using Contextual Valence Shifters [SO-PMI–2005.1]

In this paper, author proposed a method to determine the sentiment orientation by taking into account contextual valence shifters, which are negations and intensifiers. They used General Inquirer to identify positive and negative terms, as well as negations, overstatements and understatements. By using SO-PMI, they finally figure it out that including contextual valence shifters improves the accuracy of the classification.

In order to prove the accuracy improvement is because of valence shifter instead of other factors, they also compare the basic system and improved system. As for basic system, it only contains methods from GI, while improved system also included CTRW and web corpus. From the result, it is clear that the improvement is statistically significant for movie reviews, while as for products reviews is not.

Moreover, they also want to know which one is more effective, negation or intensifiers? By adding different terms from GI&CTRW, it shows that negation plays an important part in improve accuracy and adding extra overstatements and understatements did not make much difference.

They also noticed that in most cases of their method, classifying product reviews performs better than movie reviews. The resons are just like the paper ”semantic orientation applied to unsupervised classification of reviews” : movies has the worse performance, and the explaination is sometimes positive reviews mention unpleasant things and negative reviews often mention pleasant things because of different plots.

文中提到的术语/方法:

  • 贝叶斯信念网络(Bayesian belief network)与朴素贝叶斯的区别
    【SO-PMI】Sentiment classification of Movie and Product Reviews Using Contextual Valence Shifters
    马尔可夫毯:一个结点的马尔可夫毯为该结点的子结点、父结点以及子结点的父结点;在特征选择时常用,比如有时需要获取目标变量的所有信息,用这个方法只要看其相应的马尔可夫毯中的内容即可
  • Tabu Search(禁忌搜索):一个用来跳脱局部最优解的搜索方法。其先创立一个初始化的方案;基于此,算法“移动”到一相邻的方案。经过许多连续的移动过程,提高解的质量。
  • Bootstrapping: 利用有限的样本进行多次重复抽样,常用于样本数量有限的情况

图摘自https://blog.****.net/weixin_38438451/article/details/82849802