您的位置: 首页 > 文章 > CS224n-04 Word Window Classification and Neural Networks

CS224n-04 Word Window Classification and Neural Networks

分类: 文章 • 2025-01-12 15:19:52

04 Word Window Classification and Neural Networks

本节内容：

1、分类的背景

2、词向量在分类上的应用

3、窗口（上下文）分类和交叉熵误差推导技巧

4、单层神经网络

5、最大间隔损失和反向传播

分类符号约定

1、通常训练数据集包含 CS224n-04 Word Window Classification and Neural Networks

CS224n-04 Word Window Classification and Neural Networks

2、x是输入数据，比如：单词（所以或者向量）、上下文窗口、句子、文档

3、y是我们预测的标签。例如：

类别：情感、命名实体、买/卖

其他单词

多个单词序列

分类介绍

训练数据：

CS224n-04 Word Window Classification and Neural Networks

使用比如逻辑回归分类2维词向量，得到线性决策边界。

一般的ML方法：假设x是确定的，训练逻辑回归只修改参数W，值改变决策边界

最后预测x：

CS224n-04 Word Window Classification and Neural Networks

CS224n-04 Word Window Classification and Neural Networks

sofrmax算法细节

整理为两步：

CS224n-04 Word Window Classification and Neural Networks

1、取权值矩阵W的一列乘以x. CS224n-04 Word Window Classification and Neural Networks

CS224n-04 Word Window Classification and Neural Networks

2.归一化得到softmax函数的概率， CS224n-04 Word Window Classification and Neural Networks

CS224n-04 Word Window Classification and Neural Networks

softmax和交叉熵误差

1、对于每一个训练样本{x,y}，我们的目标是最大化正确类别y的概率。

2、然后变为最小化为负log概率：

CS224n-04 Word Window Classification and Neural Networks

背景：为什么是交叉熵误差

1、假设一个真实的概率分布为：正确为1，错误为0。

CS224n-04 Word Window Classification and Neural Networks

2、由于p是一个one-hot向量，只有当左边是一个真实标签的负log概率。

KL散度

1、交叉熵可以重新写成熵和KL散度两个分布：

CS224n-04 Word Window Classification and Neural Networks

2、因为H（P）是0，如果在求梯度时没有贡献，最小化上面等式，就是最小化KL散度的p和q。

3、KL散度不是一个分布，但是是一种测量两个概率分布p和q差异的非对称方法。

全数据集上的分类

CS224n-04 Word Window Classification and Neural Networks

分类：正则项

CS224n-04 Word Window Classification and Neural Networks

细节：一般的ML优化方法

CS224n-04 Word Window Classification and Neural Networks

分类不同的词向量

CS224n-04 Word Window Classification and Neural Networks

新练词向量使失去泛化

设置：对电影评论数据情感分析训练逻辑回归单词在训练数据中有“TV” and “telly”，在测试数据中有“television”，在于训练词向量中他们是相似的单词。

CS224n-04 Word Window Classification and Neural Networks

当我们重新训练了词向量会发生什么？

训练数据中的数据运动了，预训练的词没有出现在训练数据中。

CS224n-04 Word Window Classification and Neural Networks

启示：

当我们的训练数据集很小，我们不能训练词向量，会出现过拟合，失去泛化能力。

如果数据量很大，训练应该就会得到很好的词向量结果。

词向量相关术语

1、词向量矩阵L也叫lookup table。

2、词向量=词嵌入=词表示

3、主要方法有word2vec、Glove。

CS224n-04 Word Window Classification and Neural Networks

4、这样的就表示为词的特征。

5、新方向（课程后）：character models

Window classification

1、分类一个单词很少去做。

2、关注的问题就像：上下文出现的歧义。（消歧）

CS224n-04 Word Window Classification and Neural Networks

3、想法：分类一个在上下文窗口中的词。（命名实体识别）

CS224n-04 Word Window Classification and Neural Networks

4、在上下文中分类一个词很可能存在，比如：在窗口中平均每一个单词但是可能失去了位置信息。

5、通过给中心词设置一个标签来训练softmax分类器，并把他周围的词向量连接起来。

CS224n-04 Word Window Classification and Neural Networks

Simplest window classifier: Softmax

CS224n-04 Word Window Classification and Neural Networks

怎么更新词向量呢？

Updating concatenated word vectors

接下来只需要求导就好了（JJ对xx求导，注意这里的xx指的是窗口所有单词的词向量拼接向量。）。

CS224n-04 Word Window Classification and Neural Networks

CS224n-04 Word Window Classification and Neural Networks

CS224n-04 Word Window Classification and Neural Networks

CS224n-04 Word Window Classification and Neural Networks

CS224n-04 Word Window Classification and Neural Networks

CS224n-04 Word Window Classification and Neural Networks

CS224n-04 Word Window Classification and Neural Networks

CS224n-04 Word Window Classification and Neural Networks

CS224n-04 Word Window Classification and Neural Networks

Softmax (= logistic regression) alone not very powerful（效果有限）

1、softmax只是在原始空间上得到一个线性分类边界。

2、小数据集上有一个好的效果。

3、大数据集效果有限。

CS224n-04 Word Window Classification and Neural Networks

Neural Nets for the Win!

CS224n-04 Word Window Classification and Neural Networks

Demystifying neural networks

CS224n-04 Word Window Classification and Neural Networks

CS224n-04 Word Window Classification and Neural Networks

CS224n-04 Word Window Classification and Neural Networks

CS224n-04 Word Window Classification and Neural Networks

CS224n-04 Word Window Classification and Neural Networks

CS224n-04 Word Window Classification and Neural Networks

CS224n-04 Word Window Classification and Neural Networks

A more powerful, neural net window classifier

CS224n-04 Word Window Classification and Neural Networks

CS224n-04 Word Window Classification and Neural Networks

Summary: Feed-forward Computation

CS224n-04 Word Window Classification and Neural Networks

Main intuition for extra layer

CS224n-04 Word Window Classification and Neural Networks

CS224n-04 Word Window Classification and Neural Networks

CS224n-04 Word Window Classification and Neural Networks

CS224n-04 Word Window Classification and Neural Networks

CS224n-04 Word Window Classification and Neural Networks

CS224n-04 Word Window Classification and Neural Networks

CS224n-04 Word Window Classification and Neural Networks

CS224n-04 Word Window Classification and Neural Networks

CS224n-04 Word Window Classification and Neural Networks

CS224n-04 Word Window Classification and Neural Networks

CS224n-04 Word Window Classification and Neural Networks

CS224n-04 Word Window Classification and Neural Networks

CS224n-04 Word Window Classification and Neural Networks

CS224n-04 Word Window Classification and Neural Networks

CS224n-04 Word Window Classification and Neural Networks

参考：

http://web.stanford.edu/class/cs224n/

http://www.hankcs.com/nlp/word-vector-representations-word2vec.html

https://zhuanlan.zhihu.com/p/26530524