Stanford 深度自然语言处理 学习笔记(三)
归一化因子的计算代价很大(softmax的分母部分)。Negative Sampling用sigmoid表概率。
- 主要的思路:对一对实际的词对(一个中心词及一个窗口内的其他词)和一些随机的词对(一个中心词及一个随机词)训练二元逻辑回归模型(参见 这篇文章)
保证一些出现比较少的词可以被尽可能多的抽样
(参见寒小阳博客)
这里的 表示“错误的”或者“负面的”语料库。我们可以从词库中随机采样来产生这样的“负面的”语料库。新目标函数就变成了:
GloVe (点我看论文)
P是共现矩阵
GloVe论文学习笔记:
Glove直接捕获全局的统计数据。用对不同探测词k的共现概率的比值代替概率本身。
,其中
exhibit the exchange symmetry if not for the log(Xi) on the right-hand side. However, this term is independent of k so it can be absorbed into a bias bi for wi . Finally, adding an additional bias ˜bk for ˜wk restores the symmetry.
Drawback:weighs all co-occurrences equally, even those that happen rarely or never
Solution:weighted least squares regression model. Introducing a weighting function f(Xij) into the cost function.
Properties of f(x):
1. f(0) = 0. If f is viewed as a continuous function, it should vanish as x → 0 fast enough that the is finite.
2. f(x) should be non-decreasing so that rare co-occurrences are not overweighted.
3. f(x) should be relatively small for large values of x, so that frequent co-occurrences are not overweighted.