台大week4-笔记

用心努力

回顾
word2vec
Glove
Word Vector Evaluation
softmax与cross entropy

一、回顾

省略。。。。。。。

二、word2vec

1、引言

word embedding方法，可以从未标记的training corpus构造这其中每个词的vector，来表征其语义，vector的用处：

可以计算两个词的相似度
可以作为一个包含语义的重要特征
不断更新

2、Word2Vec Skip-Gram

目的：predict surrounding words within a window of each word
目标函数：maximize the probability of any context word given the current center word
损失函数：

优点：快，有新词汇加入更新也容易

结构

台大week4-笔记

target word vector：input vector通过Hidden Layer Weight Matrix——WV∗N，得到Hidden Layer ，input vector是one-hot vector相当于把WV∗N中该词汇相应的特征提取出来形成了一个word vector lookup table
context word vector：Hidden Layer通过Output layer weight matrix ——W′N∗V，加权取和得到final score
softmax：final score layer通过softmax得到最终得到该窗口中C个词的概率，总词汇数有V个

再推导损失函数：
台大week4-笔记
推导了一堆说是用传统的SVD方法去求解太麻烦了，每次都得花费较大的计算成本，那么就想到limit the number of output vectors that must be updated per training instance——hierarchical softmax, sampling。
hierarchical softmax就带过了，详细介绍了Negative Sampling，公式太复杂了，但理念是，比如专门取50个Negative Sampling去减小它的softmax后的概率，这样次数一多就体现效果了。

台大week4-笔记

Negative Sampling：

分布均匀就Random sampling
不均匀就按照出现频率的3/4次方去采样 less frequent words sampled more often

3、Word2Vec算法列举

Skip-gram: predicting surrounding words given the target word
CBOW (continuous bag-of-words): predicting the target word given the surrounding words
LM (Language modeling): predicting the next words given the proceeding contexts

都不具有统计学的概念，那么改进后引出Glove！