[coursera/SequenceModels/week2]NLP&Word Embeddings (summary&question)

2.1 Introduction to Word Embeddings

2.1.1 Word Representation

Featurized representation: word embedding

use an n-dimensional vector to represent one word

[coursera/SequenceModels/week2]NLP&Word Embeddings (summary&question)

2.1.2 Using word embeddings

[coursera/SequenceModels/week2]NLP&Word Embeddings (summary&question)

Transfer learning and word embeddings

[coursera/SequenceModels/week2]NLP&Word Embeddings (summary&question)

2.1.3 Properties of word embeddings

[coursera/SequenceModels/week2]NLP&Word Embeddings (summary&question)

Cosine similarity

[coursera/SequenceModels/week2]NLP&Word Embeddings (summary&question)

2.1.4 Embedding matrix

[coursera/SequenceModels/week2]NLP&Word Embeddings (summary&question)

2.2 Learning Word Embeddings: Word2VEC & Glove

2.2.1 learn word embeddings

[coursera/SequenceModels/week2]NLP&Word Embeddings (summary&question)

2.2.2 Word2Vec

[coursera/SequenceModels/week2]NLP&Word Embeddings (summary&question)

2.2.3 GloVe word vectors

[coursera/SequenceModels/week2]NLP&Word Embeddings (summary&question)

2.3 Applications using Word Embeddings

2.3.1 Sentiment Classification

[coursera/SequenceModels/week2]NLP&Word Embeddings (summary&question)

RNN for sentiment classification

[coursera/SequenceModels/week2]NLP&Word Embeddings (summary&question)

2.3.2 Debiasing word embeddings

bias problem

[coursera/SequenceModels/week2]NLP&Word Embeddings (summary&question)



Ee   is computationally wasteful. 


Suppose you learn a word embedding for a vocabulary of 10000 words. Then the embedding vectors should be 10000 dimensional, so as to capture the full range of variation and meaning in those words.


Question 2

2. Question 2

What is t-SNE?

A linear transformation that allows us to solve analogies on word vectors

A supervised learning algorithm for learning word embeddings

An open-source sequence modeling library

Question 3

3. Question 3

Suppose you download a pre-trained word embedding which has been trained on a huge corpus of text. You then use this word embedding to train an RNN for a language task of recognizing if someone is happy from a short snippet of text, using a small training set.

x (input text) y (happy?)
I'm feeling wonderful today! 1
I'm bummed my cat is ill. 0
Really enjoying this! 1

Then even if the word “ecstatic” does not appear in your small training set, your RNN might reasonably be expected to recognize “I’m ecstatic” as deserving a label y=1.


Question 4

4. Question 4

Which of these equations do you think should hold for a good word embedding? (Check all that apply)

Question 5

5. Question 5

Let E be an embedding matrix, and let e1234 be a one-hot vector corresponding to word 1234. Then to get the embedding of word 1234, why don’t we call Ee1234 in Python?

It is computationally wasteful.

The correct formula is ETe1234.

This doesn’t handle unknown words (<UNK>).

Question 6

6. Question 6

When learning word embeddings, we create an artificial task of estimating P(targetcontext). It is okay if we do poorly on this artificial prediction task; the more important by-product of this task is that we learn a useful set of word embeddings.


Question 7

7. Question 7

In the word2vec algorithm, you estimate P(tc), where t is the target word and c is a context word. How are t and c chosen from the training set? Pick the best answer.

c is the one word that comes immediately before t.

c is a sequence of several words immediately before t.

c is the sequence of all the words in the sentence before t.

Question 8

8. Question 8

Suppose you have a 10000 word vocabulary, and are learning 500-dimensional word embeddings. The word2vec model uses the following softmax function:


Which of these statements are correct? Check all that apply.

Question 9

9. Question 9

Suppose you have a 10000 word vocabulary, and are learning 500-dimensional word embeddings.The GloVe model minimizes this objective:


Which of these statements are correct? Check all that apply.

Question 10

10. Question 10

You have trained word embeddings using a text dataset of m1 words. You are considering using these word embeddings for a language task, for which you have a separate labeled dataset of m2 words. Keeping in mind that using word embeddings is a form of transfer learning, under which of these circumstance would you expect the word embeddings to be helpful?

m1 << m2