• Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean.
Distributed Representations of Words and Phrases and their
Compositionality.
HxV weights
Hidden layer to predict output from features of the input words H nodes
softmax
output weights
softmax
sparse representation
softmax
days
minutes
hours
From words to phrases
• Find words that appear frequently together and
infrequently in other contexts.
count ( wi w j )
score( wi , w j )
count ( wi ) count ( w j )
• The bigrams with score above the chosen
threshold are then used as phrases.
• The δ is used as a discounting coefficient and
prevents too many phrases consisting of very
infrequent words to be formed.
Examples - analogy
Examples – distance (rare words)
Examples – addition
Parameters
• Architecture: skip-gram (slower, better for infrequent
words) vs CBOW (fast)
• The training algorithm: hierarchical softmax (better for
infrequent words) vs negative sampling (better for
frequent words, better with low dimensional vectors)
• Sub-sampling of frequent words: can improve both
accuracy and speed for large data sets (useful values
are in range 1e-3 to 1e-5)
• Dimensionality of the word vectors: usually more is
better, but not always
• Context (window) size: for skip-gram usually around
10, for CBOW around 5
Machine translation
using distributed representations
1. Build monolingual models of languages using
large amounts of text.
2. Use a small bilingual dictionary to learn a
linear projection between the languages.
3. Translate a word by projecting its vector
representation from the source language
space to the target language space.
4. Output the most similar word vector from
target language space as the translation.
English vs Spanish
Translation accuracy
if len(sys.argv) < 2:
print "Usage: matrix.py <vectorfile> <wordfile>"
sys.exit(1)
model = gensim.models.Word2Vec.load_word2vec_format(sys.argv[1],
binary=True)
with open(sys.argv[2]) as f:
words = f.read().splitlines()
for w1 in words:
s = ""
for w2 in words:
if s != "": s += ","
s += str(model.similarity(w1, w2))
print s
Discovery of structural form - animals
Discovery of structural form - cities