Anda di halaman 1dari 17

Word embeddings

We want a compact representation of text so that we could use


it for neural nets!

«Word»
Sparse vector products

text token 1-hot linear


0
0
0

0
«word»
Word! 0 Wx ...
(id1337)
1
0
0

0
0
Sparse vector products

token 1-hot linear


0 W0
0 W1
0 W2
… …
0 W1335
«word» 0 W1336
(id1337) 1 W1337
0 W1338
0 W1339
… …
0 Wn-1
0 Wn
Sparse vector products

1-hot
token (n tokens) linear
0 W0
0 W1
0 W2
… …
0 W1335
«word» 0 W1336
(id1337) 1 W1337
0 W1338
0 W1339
… …
0 Wn-1
0 Wn
Sparse vector products

1-hot hidden layer


token (n tokens) h units
0
0
0

«word»
(id1337)
0
0
1
dot
W ij
?
i=1…n
0 j = 1... h
0

0
0
Embedding

1-hot hidden layer


token (n tokens) h units
0
0
0

«word»
(id1337)
0
0
1
dot
W ij
i=1…n
0 j = 1... h
0
… row 1337
0
0
Embedding: word2vec
“Peace is a lie, there is only passion”
1-hot hidden layer
(n tokens) h units
0 0
0 1
0 0
… …
0
0
1
dot W ij W jk ~
1
0
0
i=1…n j=1…h
0 j = 1... h k = 1... n 0
0 1
… …
0 1
0 0
Embedding: word2vec
the distributional hypothesis : similar context = similar meaning

Yang Huijeong, http://cscp2.sogang.ac.kr/CSE4187_02/index.php/%ED%8C%8C%EC%9D%BC:8.png


Embedding: word2vec

Side effect: synonyms


“nice” ~ “beautiful”
“hard” ~ “difficult”
Embedding: word2vec

Side effect: synonyms


“nice” ~ “beautiful”
“hard” ~ “difficult”

Side effect: word algebra


“king” - “man” + “woman” ~ “queen”
“moscow” - “russia” + “france” ~ “paris”
Embedding: word2vec
Side effect: word algebra
Softmax problem

Replace words LARGE


with vectors Dense
(row of matrix) layer

hidden layer
h units
Softmax problem
Dense layer, 10^5 units
(Your CPUs gonna burn)
“Embedding layer”
Just takes row from matrix
(super fast)

Replace words Multiply


with vectors by large
(row of matrix) matrix

hidden layer
h units
More word embeddings

Faster softmax:
• Hierarchical softmax, negative samples, …
• learn more
More word embeddings

Faster softmax:
• Hierarchical softmax, negative samples, …
• learn more
Alternative models: GloVe
More word embeddings

Faster softmax:
• Hierarchical softmax, negative samples, …
• learn more
Alternative models: GloVe
Sentence level:
• Doc2vec, skip-thought (using rnn)
More word embeddings

Faster softmax:
• Hierarchical softmax, negative samples, …
• learn more
Alternative models: GloVe
Sentence level:
• Doc2vec, skip-thought (using rnn)

To be continued...
in the NLP course

Anda mungkin juga menyukai