Stanford 深度自然语言处理学习笔记（三）

归一化因子的计算代价很大（softmax的分母部分）。Negative Sampling用sigmoid表概率。

主要的思路：对一对实际的词对（一个中心词及一个窗口内的其他词）和一些随机的词对（一个中心词及一个随机词）训练二元逻辑回归模型（参见这篇文章）

Stanford 深度自然语言处理学习笔记（三）保证一些出现比较少的词可以被尽可能多的抽样

Stanford 深度自然语言处理学习笔记（三）（参见寒小阳博客）

这里的 Stanford 深度自然语言处理学习笔记（三）表示“错误的”或者“负面的”语料库。我们可以从词库中随机采样来产生这样的“负面的”语料库。新目标函数就变成了：

Stanford 深度自然语言处理学习笔记（三）

GloVe (点我看论文)

Stanford 深度自然语言处理学习笔记（三）

P是共现矩阵

GloVe论文学习笔记：

Glove直接捕获全局的统计数据。用对不同探测词k的共现概率的比值代替概率本身。

Stanford 深度自然语言处理学习笔记（三）

Stanford 深度自然语言处理学习笔记（三），其中

exhibit the exchange symmetry if not for the log(Xi) on the right-hand side. However, this term is independent of k so it can be absorbed into a bias bi for wi . Finally, adding an additional bias ˜bk for ˜wk restores the symmetry. Stanford 深度自然语言处理学习笔记（三）

Drawback:weighs all co-occurrences equally, even those that happen rarely or never

Solution:weighted least squares regression model. Introducing a weighting function f(Xij) into the cost function.

Stanford 深度自然语言处理学习笔记（三） Properties of f(x):

1. f(0) = 0. If f is viewed as a continuous function, it should vanish as x → 0 fast enough that the Stanford 深度自然语言处理学习笔记（三） is finite.
2. f(x) should be non-decreasing so that rare co-occurrences are not overweighted.
3. f(x) should be relatively small for large values of x, so that frequent co-occurrences are not overweighted.

Stanford 深度自然语言处理学习笔记（三）

Stanford 深度自然语言处理 学习笔记（三）

相关推荐

Stanford 深度自然语言处理学习笔记（三）