Deep Learning Specialization 5: Sequence Models - Week 2 - Word Embeddings

本周课程讲的是词向量的生成及简单应用，正如老师所说

在大量学者地不断研究探索下，词向量的生成简单到像魔法。

1.Introduction to word embeddings

1.1 Word Representation

词的表示探讨的是如何从one-hot编码到向量化的过程，完成这一转换之后还能带来一些其它好处。

1-hot representation -> Featurized representation

I want a glass of orange ?

I want a glass of apple ?

t-SNE: 一种可视化高维embedding的方法

1.2 Transfer learning and word embeddings

Learn word embeddings from large text corpus. (1-100B words) (or downlad pre-trained embedding online)
Transfer embedding to new task with smaller trainning set.
Optional: Continue to finetune the word embeddings with new data.

网上有不少预训练的embedding，还是非常赞的。使用也很简单，直接按照作业中的使用就可以了。

1.3 Properties of word embeddings

Analogies Reseaoning: Man $\rightarrow$ Woman, King $\rightarrow$ ?
$\arg \underset{w}{\max}\text{sim} \left( e_w, e_{\text{king}}-e_{\text{man}} + e_{\text{woman}} \right)$
可以选用 $\text{cosine sim}(u, v) = \frac{u^Tv}{\left \| u \right \|_2 \left \| v \right \|_2}$

2. Learning Word Embeddings

问题：预测下一个词是什么

先选定target，在target前后选取单个词作为Context
$p(t|c) = \frac{e^{\theta_t^Te_c}}{\sum_{j=1}^{10,000}e^{\theta_j^Te_c}}$

目的：不是为了在监督学习上取得很好的效果，而是要学习embedding

2.1 Word2Vec

Skip-grams：研究发现不使用相邻下一位也能取得很好的效果，只需要在context周围随机选词作为target就好。

Softmax计算整个分母代价非常高，问题可以继续简化为，对于给定context，任意给一个词是否是target。然后一个正例配上数个负例即可。

Negative Sampling：按下面的分布对负例进行采样（玄学，但是确实有效）
$P\left( w_i \right) = \frac{f(w_i)^{3/4}}{\sum_{j=1}^{10,000}f(w_j)^{3/4}}$

2.2 GloVe

Gloable vectors for word representation

令 $x_{ij}$ 为context i与target j共同出现的频次
$\min \sum_{i=1}^\text{10,000} \sum_{j=1}^\text{10,000} f(x_{ij}) \left( \theta_i^Te_j + b_i + b^\prime_j - \log x_{ij} \right)^2$
$f(x_{ij})$ 是加权函数，至少要保证在 $x_{ij} = 0$ 时， $f(x_{ij}) = 0$ .

相当于先总结出来学习目标的统计形式，再求解。

3. Applications using Word Embeddings

应用是在输入层面的改变，可以把字词的one-hot输入，直接替换成词向量输入，可以在样本量不大的情况下在同义词/类比上面获得一些泛化性能。

比如：可以如下直接输入到softmax或者RNN做一个情感分析。
Deep Learning Specialization 5: Sequence Models - Week 2 - Word Embeddings

也可以做一些人文关怀的事情，在算法层面去消除一些性别歧视什么的。方法非常直接，不列了。