Datawhale 零基础⼊⻔NLP-Task13 Contextual Word Representations and Pretraining

1.Representations for a word

使用LSTM之类的循环神经网络训练word vector，会有一定的局限性。类似虽然把词典背熟了，但是不够灵活，不会紧密的联系上下文进行理解。
Datawhale 零基础⼊⻔NLP-Task13 Contextual Word Representations and Pretraining

2.TagLM - “Pre-ELMo”

结合词嵌入模型和循环神经网络模型，一方面保留来“词典”的渊博知识，另一方面可以根据上下文进行理解。
Datawhale 零基础⼊⻔NLP-Task13 Contextual Word Representations and Pretraining

3.ELMo: Embeddings from Language Models

使用ELMo之后，可以直接从大量文本学习训练。
Datawhale 零基础⼊⻔NLP-Task13 Contextual Word Representations and Pretraining

4.The Motivation for Transformers

循环神经网络有个不太好的地方，那就是无法方便的进行并行运算，即无法用空间换取时间。
Datawhale 零基础⼊⻔NLP-Task13 Contextual Word Representations and Pretraining

谷歌发表了“Attention is all your need”，描述来一种全新的可以并行训练的方法——Transformer，为bert打下了基础。

5.Bert

Datawhale 零基础⼊⻔NLP-Task13 Contextual Word Representations and Pretraining
Bert一出，横扫各大NLP比赛的榜单。Bert的实现方式就是基于Transformer，彻底的将encoder和decoder均换上来Transformer。