CS224n-04 Word Window Classification and Neural Networks

04 Word Window Classification and Neural Networks

本节内容:

1、分类的背景
2、词向量在分类上的应用
3、窗口(上下文)分类和交叉熵误差推导技巧
4、单层神经网络
5、最大间隔损失和反向传播

分类符号约定

1、通常训练数据集包含CS224n-04 Word Window Classification and Neural Networks

2、x是输入数据,比如:单词(所以或者向量)、上下文窗口、句子、文档
3、y是我们预测的标签。例如:
类别:情感、命名实体、买/卖
其他单词
多个单词序列

分类介绍

训练数据:CS224n-04 Word Window Classification and Neural Networks

使用比如逻辑回归分类2维词向量,得到线性决策边界。
一般的ML方法:假设x是确定的,训练逻辑回归只修改参数W,值改变决策边界
最后预测x:
CS224n-04 Word Window Classification and Neural Networks
CS224n-04 Word Window Classification and Neural Networks

sofrmax算法细节

整理为两步:CS224n-04 Word Window Classification and Neural Networks
1、取权值矩阵W的一列乘以x.CS224n-04 Word Window Classification and Neural Networks
2.归一化得到softmax函数的概率,CS224n-04 Word Window Classification and Neural Networks

softmax和交叉熵误差

1、对于每一个训练样本{x,y},我们的目标是最大化正确类别y的概率。
2、然后变为最小化为负log概率:
CS224n-04 Word Window Classification and Neural Networks

背景:为什么是交叉熵误差

1、假设一个真实的概率分布为:正确为1,错误为0。
CS224n-04 Word Window Classification and Neural Networks
2、由于p是一个one-hot向量,只有当左边是一个真实标签的负log概率。

KL散度

1、交叉熵可以重新写成熵和KL散度两个分布:
CS224n-04 Word Window Classification and Neural Networks
2、因为H(P)是0,如果在求梯度时没有贡献,最小化上面等式,就是最小化KL散度的p和q。
3、KL散度不是一个分布,但是是一种测量两个概率分布p和q差异的非对称方法。

全数据集上的分类

CS224n-04 Word Window Classification and Neural Networks

分类:正则项

CS224n-04 Word Window Classification and Neural Networks

细节:一般的ML优化方法

CS224n-04 Word Window Classification and Neural Networks

分类不同的词向量

CS224n-04 Word Window Classification and Neural Networks

新练词向量使失去泛化

设置:对电影评论数据情感分析训练逻辑回归单词在训练数据中有“TV” and “telly”,在测试数据中有“television”,在于训练词向量中他们是相似的单词。
CS224n-04 Word Window Classification and Neural Networks

当我们重新训练了词向量会发生什么?

训练数据中的数据运动了,预训练的词没有出现在训练数据中。
CS224n-04 Word Window Classification and Neural Networks

启示:
当我们的训练数据集很小,我们不能训练词向量,会出现过拟合,失去泛化能力。
如果数据量很大,训练应该就会得到很好的词向量结果。

词向量相关术语

1、词向量矩阵L也叫lookup table。
2、词向量=词嵌入=词表示
3、主要方法有word2vec、Glove。
CS224n-04 Word Window Classification and Neural Networks
4、这样的就表示为词的特征。
5、新方向(课程后):character models

Window classification

1、分类一个单词很少去做。
2、关注的问题就像:上下文出现的歧义。(消歧)
CS224n-04 Word Window Classification and Neural Networks
3、想法:分类一个在上下文窗口中的词。(命名实体识别)

CS224n-04 Word Window Classification and Neural Networks
4、在上下文中分类一个词很可能存在,比如:在窗口中平均每一个单词但是可能失去了位置信息。

5、通过给中心词设置一个标签来训练softmax分类器,并把他周围的词向量连接起来。
CS224n-04 Word Window Classification and Neural Networks

Simplest window classifier: Softmax

CS224n-04 Word Window Classification and Neural Networks
怎么更新词向量呢?

Updating concatenated word vectors

接下来只需要求导就好了(JJxx求导,注意这里的xx指的是窗口所有单词的词向量拼接向量。)。
CS224n-04 Word Window Classification and Neural Networks
CS224n-04 Word Window Classification and Neural Networks
CS224n-04 Word Window Classification and Neural Networks
CS224n-04 Word Window Classification and Neural Networks
CS224n-04 Word Window Classification and Neural Networks
CS224n-04 Word Window Classification and Neural Networks
CS224n-04 Word Window Classification and Neural Networks
CS224n-04 Word Window Classification and Neural Networks
CS224n-04 Word Window Classification and Neural Networks

Softmax (= logistic regression) alone not very powerful(效果有限)

1、softmax只是在原始空间上得到一个线性分类边界。
2、小数据集上有一个好的效果。
3、大数据集效果有限。
CS224n-04 Word Window Classification and Neural Networks

Neural Nets for the Win!

CS224n-04 Word Window Classification and Neural Networks

Demystifying neural networks

CS224n-04 Word Window Classification and Neural Networks

CS224n-04 Word Window Classification and Neural Networks
CS224n-04 Word Window Classification and Neural Networks
CS224n-04 Word Window Classification and Neural Networks
CS224n-04 Word Window Classification and Neural Networks
CS224n-04 Word Window Classification and Neural Networks
CS224n-04 Word Window Classification and Neural Networks

A more powerful, neural net window classifier

CS224n-04 Word Window Classification and Neural Networks
CS224n-04 Word Window Classification and Neural Networks

Summary: Feed-forward Computation

CS224n-04 Word Window Classification and Neural Networks

Main intuition for extra layer

CS224n-04 Word Window Classification and Neural Networks
CS224n-04 Word Window Classification and Neural Networks
CS224n-04 Word Window Classification and Neural Networks
CS224n-04 Word Window Classification and Neural Networks
CS224n-04 Word Window Classification and Neural Networks
CS224n-04 Word Window Classification and Neural Networks
CS224n-04 Word Window Classification and Neural Networks
CS224n-04 Word Window Classification and Neural Networks
CS224n-04 Word Window Classification and Neural Networks
CS224n-04 Word Window Classification and Neural Networks
CS224n-04 Word Window Classification and Neural Networks
CS224n-04 Word Window Classification and Neural Networks
CS224n-04 Word Window Classification and Neural Networks
CS224n-04 Word Window Classification and Neural Networks
CS224n-04 Word Window Classification and Neural Networks


参考:
http://web.stanford.edu/class/cs224n/
http://www.hankcs.com/nlp/word-vector-representations-word2vec.html
https://zhuanlan.zhihu.com/p/26530524