28 November 2012
Version 2.9 EN BU
Gregor Heinrich
1 / 35
28 November 2012
Version 2.9 EN BU
Gregor Heinrich
1 / 35
Overview
Introduction
Generic topic models
Inference methods
Application to virtual communities
Conclusions and outlook
Gregor Heinrich
2 / 35
recommendation
authorship
annotation
authorship
citation
citation
Retrieval performance.
The most prominent
retrieval measures are
precision and recall.
Recall is defined as
the ratio between the
number of retrieved
relevant
items
to
the total number of
existing relevant items.
Precision is defined as
the ratio between the
Retrieval performance.
The most prominent
retrieval measures are
precision and recall.
Recall is defined as
the ratio between the
number of retrieved
relevant
items
to
the total number of
existing relevant items.
Precision is defined as
the ratio between the
similarity
3 / 35
4/24/12 1:58 AM
restaurant
teller
adhere
bank
atm
computer
location
pressure unit
network
verb
stick
verb
long object
bar
furniture
furniture
table
prevent
employees
staff
people
judicial assembly
verb
counter
glue
long object
people
court
personnel
location
yard
Gregor Heinrich
4 / 35
4/24/12 1:58 AM
restaurant
teller
adhere
bank
atm
computer
location
pressure unit
network
verb
stick
verb
long object
long object
bar
furniture
furniture
table
prevent
employees
staff
people
judicial assembly
verb
counter
glue
people
court
personnel
location
yard
Gregor Heinrich
4 / 35
words
...
...
documents
5 / 35
...
...
topics
words
documents
5 / 35
...
...
bar
topics
words
documents
5 / 35
wine
...
...
restaurant
topics
words
documents
5 / 35
rhythm
drum
bar
wine
...
...
restaurant
topics
words
Leipzigs Bars
and Restaurants
documents
5 / 35
rhythm
drum
bar
wine
...
...
restaurant
topics
words
Leipzigs Bars
and Restaurants
documents
5 / 35
...
...
topics
words
documents
5 / 35
p(w | z)
1 2 3
z
w1,1
|
w1,2
w1,3
{z
w2,1
}
document 1
w2,2
w2,3
{z
document 2
wm,n
}
word n
document m
Gregor Heinrich
6 / 35
p(w | z1 )
w1,1
|
p(w | z2 )
1 2 3
1 2 3
z1
z2
w1,2
w1,3
{z
w2,1
}
document 1
w2,2
zm
w2,3
{z
document 2
wm,n
}
word n
document m
Gregor Heinrich
6 / 35
p(w | z1,1 ) . . .
. . . p(w | z2,3 )
z1,1
z1,2
z1,3
z2,1
z2,2
z2,3
zm,n
w1,1
w1,2
w1,3
w2,1
w2,2
w2,3
wm,n
{z
document 1
{z
document 2
word n
document m
6 / 35
p(w | z1,1 ) . . .
. . . p(w | z2,3 )
z1,1
z1,2
z1,3
z2,1
z2,2
z2,3
zm,n
w1,1
w1,2
w1,3
w2,1
w2,2
w2,3
wm,n
{z
document 1
{z
document 2
word n
document m
6 / 35
123
123
p(w | z) Dir(~)
Distributions generated
from prior distributions
Speech + other discrete
data: Dirichlet distribution
important prior:
Defined on simplex:
Surface containing all
discrete distributions
~ controls
Parameter
behaviour
p3
~ = (4, 4, 2)
p1
p2
3 = 1
3 = 1
3 = 1
1 = 1
2 = 1
1 = 1
2 = 1
Gregor Heinrich
1 = 1
2 = 1
7 / 35
~m
Dir()
zm,n
~k
topic k
wm,n
word n
document m
Dir()
8 / 35
~m
Dir()
zm,n
~k
topic k
wm,n
word n
document m
Dir()
8 / 35
~m
topic 1
Dir()
~1
t
r
d
ll
ran ba foo gri
tau
s
re
topic 2
zm,n
...
~2
ert
sic m
nc mu hyth
r
co
r ...
ba
~k
topic k
wm,n
word n
document m
Dir()
8 / 35
~1
topic 1 topic 2 . . .
~m
topic 1
Dir()
~1
t
r
d
ll
ran ba foo gri
tau
s
re
topic 2
zm,n
...
~2
ert
sic m
nc mu hyth
r
co
r ...
ba
~k
topic k
wm,n
word n
document m
Dir()
8 / 35
~1
topic 1 topic 2 . . .
~m
topic 1
Dir()
~1
t
r
d
ll
ran ba foo gri
tau
s
re
topic 2
zm,n
...
~2
ert
sic m
nc mu hyth
r
co
r ...
ba
~k
topic k
wm,n
word n
document m
Dir()
8 / 35
~1
topic 1 topic 2 . . .
~m
topic 1
Dir()
~1
t
r
d
ll
ran ba foo gri
tau
s
re
topic 2
zm,n
...
~2
ert
sic m
nc mu hyth
r
co
r ...
ba
~k
topic k
wm,n
word n
document m
Dir()
8 / 35
9 / 35
~am McCallum
Hierarchy (Li and
2006; Li et al. 2007),
Image features
andx captions
(Barnard et al. 2003) etc.
~
xm, j
x
m,n
m,n
>400, Googlem, jScholar
>1300.
cm, j
wm,narea
~k
Expanding
research
practical relevance
~with
k
j [1, J ] n [1, N ] k [1, K]
[1, K]
But: kNo
existing
analysis
as generic model class
m [1, M]
m
9 / 35
9 / 35
9 / 35
Research questions
Gregor Heinrich
10 / 35
Overview
Introduction
Generic topic models
Inference methods
Application to virtual communities
Conclusions and outlook
Gregor Heinrich
11 / 35
Gregor Heinrich
12 / 35
~m
cm,n
zm,n
zm,n
~c
c[1,C]
wm,n
~k
k[1,K]
wm,n
~k
k[1,K]
n[1,Nm ]
n[1,Nm ]
m[1,M]
m[1,M]
z1m,n
~r
~ rm
z2m,n
~x
~ m,x
z3m,n
~0
~ m,0
zTm,n
~T
~ m,T
ztm,n
m,n
T [1,|T |]
~T,t
T,t[1,|T ||t|]
y[1,K]
~k
wm,n
~k
k[1,|t|+|T |+1]
[1,L]
wm,n
n[1,Nm ]
n[1,Nm ]
m[1,M]
m[1,M]
(Blei et al. 2003; Rosen-Zvi et al. 2004; Li and McCallum 2006; Li et al. 2007)
Gregor Heinrich
13 / 35
~1
~2
~3
Dir(~k |)
~k
xin
k = f (xin )
xout
14 / 35
~k
~k
K
xout
xin
k = f (xin )
~k
K
xout
xin
k = f (xin )
xout1
xout2
xin2
14 / 35
~k
~k
K
x1
k = f (xin )
~k
K
x2
k = f (xin )
xout1
xout2
xin2
14 / 35
xin1
xin2
~k |
x1
~k |
x2
xout1
~k |
xout2
14 / 35
xin1
xin2
Dir(~k |)
~k |
Dir(~k |)
x1
~k |
Dir(~k |)
x2
~k |
xout1
xout2
14 / 35
~ k |)
Dir(
~m |
Dir(~
k |)
zm,n
~k |
wm,n
14 / 35
~ k |)
Dir(
~m |
Dir(~
k |)
zm,n
~k |
wm,n
14 / 35
~ k |)
Dir(
m=1
~m |
Dir(~
k |)
zm,n
~k |
wm,n
14 / 35
~ k |)
Dir(
m=1
~m |
Dir(~
k |)
z1,1 =3
~k |
wm,n
14 / 35
~ k |)
Dir(
m=1
~m |
Dir(~
k |)
z1,1 =3
~k |
wm,n
14 / 35
~ k |)
Dir(
m=1
~m |
Dir(~
k |)
z1,1 =3
~k |
w1,1 =2
14 / 35
~m |
[M]
zm,n =k
~k |
[K]
wm,n =t
~m
[V]
zm,n
wm,n
~k
k[1,K]
n[1,Nm ]
m[1,M]
xm,n =x
~am
[M]
[A]
zm,n =k
~ x |
[K]
~k |
wm,n =t
~m
[V]
cm,n
c[1,C]
m
[M]
~ rm | ~r
z2m,n =x
[s1 ]
~ m,x |~ x
z3m,n =y
[s2 ]
~ y | ~
[M]
k[1,K]
n[1,Nm ]
m[1,M]
[V]
z1m,n
~r
~ rm
~x
~ m,x
z2m,n
z3m,n
y[1,K]
~ m,0 |~0
zTm,n =T
[M]
wm,n
~k
wm,n =t
zm,n
~c
zTm,n =T
[|T |]
ztm,n =t
~ m,T |~T
[|t|]
~T,t |
`m,n
[3]
~k
wm,n
[1,L]
~ `,T,t |
[|T |+|t|+1]
wm,n
n[1,Nm ]
m[1,M]
[V]
~0
~ m,0
zTm,n
~T
~ m,T
ztm,n
m,n
T [1,|T |]
~k
k[1,|t|+|T |+1]
~T,t
T,t[1,|T ||t|]
wm,n
n[1,Nm ]
m[1,M]
(Blei et al. 2003; Rosen-Zvi et al. 2004; Li and McCallum 2006; Li et al. 2007)
Gregor Heinrich
15 / 35
Overview
Introduction
Generic topic models
Inference methods
Application to virtual communities
Conclusions and outlook
Gregor Heinrich
16 / 35
Gregor Heinrich
17 / 35
~mr | r
1
Gregor Heinrich
x
H1
~m,x |
y
H2
~y |
w
V
18 / 35
~mr | r
1
Gregor Heinrich
x
H1
~m,x |
y
H2
~y |
w
V
18 / 35
?
m
~mr | r
p( 1,
Gregor Heinrich
?
?
H 1,
~m,x |
2,
?
?
H 2,
~y |
3 | V , A)
18 / 35
?
m
~mr | r
p( 1,
Gregor Heinrich
?
?
H 1,
~m,x |
2,
?
?
H 2,
~y |
3 | V , A)
18 / 35
~mr | r
~m,x |
?
Hi2
~y |
| Hi , V , A)
18 / 35
~mr | r
~m,x |
~y |
[`]
Y ni
+
k,t
p(Hi |Hi , V, A)
P i
+
n
t k,t
`
Gregor Heinrich
beta({nk,t }Tt=1 + )
beta({ni
}T + )
k,t t=1
19 / 35
~mr | r
~m,x |
~y |
[`]
Y ni
+
k,t
p(Hi |Hi , V, A)
P i
n
+
t k,t
`
Gregor Heinrich
beta({nk,t }Tt=1 + )
beta({ni
}T + )
k,t t=1
19 / 35
q(m, x)
q((m, x), y)
q(y, w)
[`]
Y ni
Y
+
k,t
=
P
p(Hi |Hi , V, A)
q(k, t) [`]
i
t nk,t +
`
`
Gregor Heinrich
beta({nk,t }Tt=1 + )
beta({ni
}T + )
k,t t=1
19 / 35
~a |
~z |
q(a, z) q(z, b)
N1. Dirichletmultinomial
~a |
~y |
~ ca
~z |
~a |
~z |
~z |
c
a,z
q(z, b)
~a |
x
y
~b |
~k |
~x |
~a |
z1
~b |
z2
~z |
20 / 35
data =
validate
deploy
data =
compile
C/Java code
module
Java model
instance
topic model
specification
code
templates
optimise
generate
NoMM code
generator
{z
Java VM
C/Java code
prototype
{z
native platform
Gregor Heinrich
21 / 35
data =
validate
deploy
data =
compile
C/Java code
module
Java model
instance
topic model
specification
code
templates
optimise
generate
NoMM code
generator
{z
Java VM
C/Java code
prototype
{z
native platform
Gregor Heinrich
21 / 35
~m |
[M]
~ xm,x |xx
[M]
xmn=x
[X]
ymnroot
=y
[Y]
model = HPAM2
sup
description:
Hierarchical PAM model 2 (HPAM2)
words
sub
x
y
w
words
Gregor Heinrich
wmn
[V]
x=0:k=0
x , 0, y = 0 : k = 1 + x
x, y , 0 : k = 1 + X + y
words
// hidden edge
for (hx = 0; hx < X; hx++) {
// hidden edge
for (hy = 0; hy < Y; hy++) {
mxsel = X * m + hx;
mxjsel = hx;
if (hx == 0)
ksel = 0;
else if (hy == 0)
ksel = 1 + hx;
else
ksel = 1 + X + hy;
pp[hx][hy] = (nmx[m][hx] + alpha[hx])
* (nmxy[mxsel][hy] + alphax[mxjsel][hy])
/ (nmxysum[mxsel] + alphaxsum[mxjsel])
* (nkw[ksel][w[m][n]] + beta)
/ (nkwsum[ksel] + betasum);
psum += pp[hx][hy];
} // for h
} // for h
sup
words
sequences:
# variables sampled for each (m,n)
w, x, y : m, n
network:
# each line one NoMM node
m
>> theta | alpha
>>
m,x >> thetax | alphax[x] >>
x,y >> phi[k]
>>
# java code to assign k
k : {
if (x==0) { k = 0; }
else if (y==0) k = 1 + x;
else k = 1 + X + y;
}.
~k |
sub
sub
words
words
22 / 35
~m |
[M]
[M]
xmn=x
[X]
ymn=y
~ xm,x |xx
[Y]
model = HPAM2
description:
Hierarchical PAM model 2 (HPAM2)
sequences:
# variables sampled for each (m,n)
w, x, y : m, n
network:
# each line one NoMM node
m
>> theta | alpha
>>
m,x >> thetax | alphax[x] >>
x,y >> phi[k]
>>
# java code to assign k
k : {
if (x==0) { k = 0; }
else if (y==0) k = 1 + x;
else k = 1 + X + y;
}.
Gregor Heinrich
x
y
w
~k |
wmn
[V]
x=0:k=0
x , 0, y = 0 : k = 1 + x
x, y , 0 : k = 1 + X + y
// hidden edge
for (hx = 0; hx < X; hx++) {
// hidden edge
for (hy = 0; hy < Y; hy++) {
mxsel = X * m + hx;
mxjsel = hx;
if (hx == 0)
ksel = 0;
else if (hy == 0)
ksel = 1 + hx;
else
ksel = 1 + X + hy;
pp[hx][hy] = (nmx[m][hx] + alpha[hx])
* (nmxy[mxsel][hy] + alphax[mxjsel][hy])
/ (nmxysum[mxsel] + alphaxsum[mxjsel])
* (nkw[ksel][w[m][n]] + beta)
/ (nkwsum[ksel] + betasum);
psum += pp[hx][hy];
} // for h
} // for h
22 / 35
Iteration 1
Documenttopic matrix (200 documents, 50 topics)
Gregor Heinrich
23 / 35
Iteration 5
Documenttopic matrix (200 documents, 50 topics)
Gregor Heinrich
23 / 35
Iteration 10
Documenttopic matrix (200 documents, 50 topics)
Gregor Heinrich
23 / 35
Iteration 15
Documenttopic matrix (200 documents, 50 topics)
Gregor Heinrich
23 / 35
Iteration 20
Documenttopic matrix (200 documents, 50 topics)
Gregor Heinrich
23 / 35
Iteration 30
Documenttopic matrix (200 documents, 50 topics)
Gregor Heinrich
23 / 35
Iteration 40
Documenttopic matrix (200 documents, 50 topics)
Gregor Heinrich
23 / 35
Iteration 50
Documenttopic matrix (200 documents, 50 topics)
Gregor Heinrich
23 / 35
Iteration 60
Documenttopic matrix (200 documents, 50 topics)
Gregor Heinrich
23 / 35
Iteration 80
Documenttopic matrix (200 documents, 50 topics)
Gregor Heinrich
23 / 35
Iteration 100
Documenttopic matrix (200 documents, 50 topics)
Gregor Heinrich
23 / 35
Iteration 120
Documenttopic matrix (200 documents, 50 topics)
Gregor Heinrich
23 / 35
Iteration 150
Documenttopic matrix (200 documents, 50 topics)
Gregor Heinrich
23 / 35
Iteration 200
Documenttopic matrix (200 documents, 50 topics)
Gregor Heinrich
23 / 35
Iteration 300
Documenttopic matrix (200 documents, 50 topics)
Gregor Heinrich
23 / 35
Gregor Heinrich
23 / 35
model
perplexity
LDA
PAM4
PAM4
PAM4
1
10
100
iterations
1000
5000
dim.
500
40 40
40 40
40 40
30.2
7.4
24.1
49.8
24 / 35
Overview
Introduction
Generic topic models
Inference methods
Application to virtual communities
Conclusions and outlook
Gregor Heinrich
25 / 35
Gregor Heinrich
26 / 35
~a |
~z |
q(a, z) q(z, b)
N1. Dirichletmultinomial
~a |
~y |
~ ca
~z |
~a |
~z |
~z |
c
a,z
q(z, b)
~a |
x
y
~b |
~k |
~x |
~a |
z1
~b |
z2
~z |
Process:
27 / 35
Define
modelling task
and metrics
Define
evidence
Write
NoMM script
Create
model terminals
Generate
and adapt
Gibbs sampler
Formulate
model
assumptions
Implement
target metric
Compose
model and
predict properties
Evaluate
based on
test corpus
10
Optimise
and integrate for
target platform
Process:
27 / 35
authorship
Retrieval performance.
The most prominent
retrieval measures are
precision and recall.
Recall is defined as
the ratio between the
number of retrieved
relevant
items
to
the total number of
existing relevant items.
Precision is defined as
the ratio between the
annotation
28 / 35
~am
1
2
AB TH
TH
tags
Retrieval performance.
The most prominent
retrieval measures are
precision and recall.
Recall is defined as
the ratio between the
number of retrieved
relevant
items
to
the total number of
existing relevant items.
Precision is defined as
the ratio between the
~ m 14
w
33
1
3
~cm
5 14 33
document m
28 / 35
~am
1
2
AB TH
TH
tags
Retrieval performance.
The most prominent
retrieval measures are
precision and recall.
Recall is defined as
the ratio between the
number of retrieved
relevant
items
to
the total number of
existing relevant items.
Precision is defined as
the ratio between the
~ m 14
w
33
1
3
~cm
5 14 33
document m
28 / 35
~am
1
2
AB TH
TH
tags
Retrieval performance.
The most prominent
retrieval measures are
precision and recall.
Recall is defined as
the ratio between the
number of retrieved
relevant
items
to
the total number of
existing relevant items.
Precision is defined as
the ratio between the
~ m 14
w
33
1
3
~cm
5 14 33
document m
28 / 35
Model assumptions
authors
AB
~am
1
2
AB TH
TH
tags
Retrieval performance.
The most prominent
retrieval measures are
precision and recall.
Recall is defined as
the ratio between the
number of retrieved
relevant
items
to
the total number of
existing relevant items.
Precision is defined as
the ratio between the
~ m 14
w
33
1
3
~cm
5 14 33
document m
Gregor Heinrich
29 / 35
Model assumptions
authors
AB
~am
1
2
AB TH
TH
tags
Retrieval performance.
The most prominent
retrieval measures are
precision and recall.
Recall is defined as
the ratio between the
number of retrieved
relevant
items
to
the total number of
existing relevant items.
Precision is defined as
the ratio between the
~ m 14
w
33
1
3
~cm
5 14 33
document m
Gregor Heinrich
29 / 35
Model assumptions
authors
AB
~am
1
2
AB TH
TH
tags
Retrieval performance.
The most prominent
retrieval measures are
precision and recall.
Recall is defined as
the ratio between the
number of retrieved
relevant
items
to
the total number of
existing relevant items.
Precision is defined as
the ratio between the
~ m 14
w
33
1
3
~cm
5 14 33
document m
Gregor Heinrich
29 / 35
Model construction
~am
wm,n
authors
word
[M, Nm ]
~cm
tags
~ , ~c) . . .
p(. . . | ~a, w
(5) Model construction: (a) Start with terminal nodes (from step 3)
Gregor Heinrich
30 / 35
Model construction
~am
document
xm,n
...
wm,n
[M, Nm ]
word
author
~cm
tags
Gregor Heinrich
30 / 35
Model construction
~am
document
xm,n
author
zm,n
wm,n
word topic
word
q(x, z)
[M, Nm ]
~cm
tags
Gregor Heinrich
30 / 35
Model construction
~am
document
xm,n
author
q(x, z)
zm,n
q(z, w)
word topic
wm,n
[M, Nm ]
word
~cm
tags
Gregor Heinrich
30 / 35
Model construction
word topic
~am
document
xm,nj
author
zm,n
q(x,zy)
q(z, w)
[M, Nm ]
word
ym, j
tag topic
wm,n
q(y, c)
cm, j
[M, Jm ]
tag
Gregor Heinrich
30 / 35
Model construction
word topic
~am
document
xm,n
author
zm,n
q(x,zy)
q(z, w)
[M, Nm ]
word
ym, j
tag topic
wm,n
q(y, c)
cm, j
[M, Jm ]
tag
Gregor Heinrich
30 / 35
Model construction
word topic
~am
document
xm, j
author
zm,n
q(x,zy)
q(z, w)
[M, Nm ]
word
ym, j
tag topic
wm,n
q(y, c)
cm, j
[M, Jm ]
tag
Gregor Heinrich
30 / 35
~am
xm, j
xm,n
~x
x [1, A]
~k
k [1, K]
ym, j
zm,n
cm, j
wm,n
~k
j [1, Jm ] n [1, Nm ]
m [1, M]
Gregor Heinrich
k [1, K]
30 / 35
-150
0.9
-200
0.8
-250
coherence score
AP@10
0.7
0.6
0.5
0.4
0.3
0.2
word queries
tag queries
-300
-350
-400
-450
-500
ATM
ETT
ETT
ATM
ETT
31 / 35
Overview
Introduction
Generic topic models
Inference methods
Application to virtual communities
Conclusions and outlook
Gregor Heinrich
32 / 35
Gregor Heinrich
33 / 35
Gregor Heinrich
33 / 35
33 / 35
33 / 35
33 / 35
Multiplicative Updating
Rule for Blind Separation
Derived from the Method
of Scoring,
Blind Separation of
Filtered Sources Using
State-Space Approach,
cites
cites
cites
authors
Rokni_U
Shouval_H
Cichocki_A
authors cites
New Approximations of
Differential Entropy for
Independent Component
Analysis and Projection
Pursuit,
2
authors
authors
cites
8.
2.
Yang_H
9.
[independent component
analysis]
cites
authors
2
Blind Separation of
Delayed and Convolved
Sources,
cites
cites
5.
Parra_L
Lin_J
cites
authors
A Non-linear Information
Maximisation Algorithm
that Performs Blind
Separation
authors
cites
authors
authors
Factorizing Multivariate
Function Classes,
cites
authors
Bell_A
Unsupervised
Classification with NonGaussian Mixture Models
Using ICA,
Oja_E
10.
cites
authors
authors
6.
4.
authors
authors
authors
7.
3.
Lee_T
authors
Hyvarinen_A
Independent Component
Analysis of
Eiectroencephalographic
Data
authors
authors
cites
authors
Algorithms for
Independent Components
Analysis and Higher
Order Statistics,
authors
authors
authors
authors
cites
Semiparametric Approach
to Multichannel Blind
Deconvolution of
Nonminimum Phase
Systems,
authors
authors
Independent Component
Analysis for Identification
of Artifacts in
Magnetoencephalographi
c Recordings,
authors
authors
Symplectic Nonlinear
Component Analysis
cites
4
4
Unmixing Hyperspectral
citesData,
file:///data/workspace/knowceans-freshmind-lucene3/fmica.svg
33 / 35
Page 1 o
33 / 35
Outlook
New applications and NoMM structures, e.g., time as variable
Alternative inference methods:
Generic Collapsed Variational Bayes (Teh et al. 2007): Structure
similar to Collapsed Gibbs-Sampler
Non-parametric methods: Learning model dimensions using Dirichlet
or PitmanYor process priors (Teh et al. 2004; Buntine and Hutter
2010), NoMM polymorphism (Heinrich 2011a)
Gregor Heinrich
34 / 35
Thank you!
Q+A
Gregor Heinrich
35 / 35
References I
References
Barnard, K., P. Duygulu, D. Forsyth, N. de Freitas, D. Blei, and M. Jordan (2003, August).
Matching words and pictures.
JMLR Special Issue on Machine Learning Methods for Text and Images 3(6), 11071136.
Bellegarda, J. (2000, August).
Exploiting latent semantic information in statistical language modeling.
Proc. IEEE 88(8), 12791296.
Blei, D., A. Ng, and M. Jordan (2003, January).
Latent Dirichlet allocation.
Journal of Machine Learning Research 3, 9931022.
Buntine, W. and M. Hutter (2010).
A Bayesian review of the Poisson-Dirichlet process.
arXiv:1007.0296v1 [math.ST].
Chang, J., J. Boyd-Graber, S. Gerrish, C. Wang, and D. Blei (2009).
Reading tea leaves: How humans interpret topic models.
In Proc. Neural Information Processing Systems (NIPS).
Gregor Heinrich
36 / 35
References II
Dietz, L., S. Bickel, and T. Scheffer (2007, June).
Unsupervised prediction of citation influences.
In Proceedings of the 24th International Conference on Machine Learning, Corvallis, Oregon,
USA.
Heinrich, G. (2009).
A generic approach to topic models.
In Proc. European Conf. on Mach. Learn. / Principles and Pract. of Know. Discov. in Databases
(ECML/PKDD), Part 1, pp. 517532.
Heinrich, G. (2010).
Actorsmediaqualities: a generic model for information retrieval in virtual communities.
In Proc. 7th International Workshop on Innovative Internet Community Systems (I2CS 2007), part
of I2CS Jubilee proceedings, Lecture Notes in Informatics, GI.
Heinrich, G. (2011a, March).
Infinite LDA Implementing the HDP with minimum code complexity.
Technical note TN2011/1, arbylon.net.
Heinrich, G. (2011b).
Typology of mixed-membership models: Towards a design method.
In Proc. European Conf. on Mach. Learn. / Principles and Pract. of Know. Discov. in Databases
(ECML/PKDD).
Gregor Heinrich
37 / 35
References III
Heinrich, G. and M. Goesele (2009).
Variational Bayes for generic topic models.
In Proc. 32nd Annual German Conference on Artificial Intelligence (KI2009).
Heinrich, G., J. Kindermann, C. Lauth, G. Paa, and J. Sanchez-Monzon (2005).
Investigating word correlation at different scopes a latent concept approach.
In Workshop Lexical Ontology Learning at Int. Conf. Mach. Learning.
Heinrich, G., F. Logemann, V. Hahn, C. Jung, G. Figueiredo, and W. Luk (2011).
HW/SW co-design for heterogeneous multi-core platforms: The hArtes toolchain, Chapter Audio
array processing for telepresence, pp. 173207.
Springer.
Li, W., D. Blei, and A. McCallum (2007).
Mixtures of hierarchical topics with pachinko allocation.
In International Conference on Machine Learning.
Li, W. and A. McCallum (2006).
Pachinko allocation: DAG-structured mixture models of topic correlations.
In ICML 06: Proceedings of the 23rd international conference on Machine learning, New York,
NY, USA, pp. 577584. ACM.
Gregor Heinrich
38 / 35
References IV
Mimno, D., H. M. Wallach, E. Talley, M. Leenders, and A. McCallum (2011, July).
Optimizing semantic coherence in topic models.
In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing,
Edinburgh, UK, pp. 262272.
Newman, D., A. Asuncion, P. Smyth, and M. Welling (2009, August).
Distributed algorithms for topic models.
JMLR 10, 18011828.
Porteous, I., D. Newman, A. Ihler, A. Asuncion, P. Smyth, and M. Welling (2008).
Fast collapsed Gibbs sampling for latent Dirichlet allocation.
In KDD 08: Proceeding of the 14th ACM SIGKDD international conference on Knowledge
discovery and data mining, New York, NY, USA, pp. 569577. ACM.
Rosen-Zvi, M., T. Griffiths, M. Steyvers, and P. Smyth (2004).
The author-topic model for authors and documents.
In Proc. 20th Conference on Uncertainty in Artificial Intelligence (UAI).
Teh, Y., M. Jordan, M. Beal, and D. Blei (2004).
Hierarchical Dirichlet processes.
Technical Report 653, Department of Statistics, University of California at Berkeley.
Teh, Y. W., D. Newman, and M. Welling (2007).
A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation.
In Advances in Neural Information Processing Systems, Volume 19.
Gregor Heinrich
39 / 35
Appendix
Gregor Heinrich
40 / 35
Bundesliga
Polizei / Unfall
Polizei verletzt schwer Auto Unfall Fahrer Angaben schwer+verletzt Menschen Wagen Verletzungen Lawine Mann vier Meter Strae
Tschetschenien
Politik / Hessen
FDP Koch Hessen CDU Koalition Gerhardt Wagner Liberalen hessischen Westerwelle Wolfgang Roland+Koch Wolfgang+Gerhardt
Wetter
Die Grunen
Grunen
Parteitag Atomausstieg Trittin Grune
Partei Trennung Mandat Ausstieg Amt
Russische Politik Russland Putin Moskau russischen russische Jelzin Wladimir Tschetschenien Rus
slands Wladimir+Putin Kreml Boris Prasidenten
Polizei / Schulen
Bigram-LDA: Topics from 18400 dpa news messages, Jan. 2000 (Heinrich et al. 2005)
Gregor Heinrich
41 / 35
ki
xi
ki
~k |
[K]
~k
xi
[T ]
42 / 35
hi
vi
hi
X
hi
hi
~k
`=1
ki1
ki2
vi
hi
vi
~k
~k
~k
~k
~k
~k
~k
`=2
`=3
`=4
`=5
`=6
`=7
`=1
~k
h1i
~k |
h2i
`=2
Gregor Heinrich
h3i
~k |
~k |
~k |
`=3
`=5
`=8
`=8
`=4
h4i
v5i
~k |
`=6
h6i
~k |
~k |
v8i
v7i
`=7
43 / 35
~x (0)
x1
(1)
44 / 35
~x
~x (0)
x1
(1)
44 / 35
~x
i=1
p(x1 | x2 )
x1
(1)
44 / 35
~x
i=1
x1
(1)
44 / 35
p(x2 | x1 )
~x
i=2
x1
(1)
44 / 35
~x
i=2
x1
(1)
44 / 35
~x
i=1
x1
(1)
44 / 35
~x
i=1
x1
(1)
44 / 35
~x
i=2
x1
(1)
44 / 35
~x
i=1
x1
(1)
44 / 35
~x
i=2
x1
(1)
44 / 35
i=1
x1
(1)
44 / 35
i=2
x1
(1)
44 / 35
i=1
x1
(1)
44 / 35
X
xi
I
~ j
A
J
~ k)
xi Mult(xi |
k = fk (parents(xi ), i)
Gregor Heinrich
~ k Dir(
~k |
~ j)
j = fj (known parents(xi ), i) .
(2)
(3)
45 / 35
p(X, |A) =
Y" Y
`L
{z
Y"Y
`L
~x )
p(xi,out |
i,in
~ k , ~nk ,
~)
f (
{z
{z
#[`]
components Dirichlet 4
levels
#[`]
|k
~ k |~)
p(
(4)
Gregor Heinrich
46 / 35
p(X, | A) =
YY
`
Mult(xi` | ` , ki` )
Y
k
`
~` |
Dir(
k ~j )
Y 1 Y 1 `
Y Y
j
ki ,xi
=
k,xi
B(~j )
`
(5)
(6)
Y Y 1 Y +n 1 `
=
k,xj i k,t
B(~j )
(7)
Y Y B(~nk +
~
)
j
~ k | ~nk +
~
Dir(
)
=
j
B(~j )
(8)
47 / 35
p(Hid | X\Hid , A) =
Gregor Heinrich
~ j)
B(~nk \Xid +
`{H d ,Sd }
(9)
(10)
(11)
(12)
48 / 35
Inference: q-functions
ni
+
k,t
q(k, t) =
=
P
i
~ j)
B(~nk \xid +
t nk,t +
~ j)
B(~nk +
|xid |=1
d + + (xd xd )
d +
nk,t \xi,2
nk,t \xi,1
i,1
i,2
= P
P
d +
d ++1
n
\x
n
\x
t k,t i,1
t k,t i,2
|xid |=2
...
Gregor Heinrich
49 / 35
q-functions: Polya
urn and sampling weights
~ k}
E{
Figure: Polya
urn: sampling with over-replacement.
ti
B(~nk + ) |t|=1 nk,t +
= ti
= smoothed ratio of occurrences
q(k, t) ,
B(~nki + )
nk + T
t={u,v}
i
nu
+
k,u
i
+ T
nu
k
i
nv
+ + (u v)
k,v
nkvi + T + 1
, q(k, u v)
...
Gregor Heinrich
50 / 35
q-functions: Polya
urn and sampling weights
~ k}
E{
Figure: Polya
urn: sampling with over-replacement.
ti
B(~nk + ) |t|=1 nk,t +
= smoothed ratio of occurrences
q(k, t) ,
= ti
B(~nki + )
nk + T
t={u,v}
i
nu
+
k,u
i
nu
+ T
k
i
nv
+ + (u v)
k,v
nkvi + T + 1
, q(k, u v)
...
Gregor Heinrich
50 / 35
q-functions: Polya
urn and sampling weights
~ k}
E{
Figure: Polya
urn: sampling with over-replacement.
ti
B(~nk + ) |t|=1 nk,t +
= smoothed ratio of occurrences
q(k, t) ,
= ti
B(~nki + )
nk + T
t={u,v}
i
+
nu
k,u
i
nu
+ T
k
i
nv
+ + (u v)
k,v
nkvi + T + 1
, q(k, u v)
...
Gregor Heinrich
50 / 35
q-functions: Polya
urn and sampling weights
~ k}
E{
Figure: Polya
urn: sampling with over-replacement.
ti
B(~nk + ) |t|=1 nk,t +
= ti
= smoothed ratio of occurrences
q(k, t) ,
B(~nki + )
nk + T
t={u,v}
i
nu
+
k,u
i
+ T
nu
k
i
nv
+ + (u v)
k,v
nkvi + T + 1
, q(k, u v)
...
Gregor Heinrich
50 / 35
q-functions: Polya
urn and sampling weights
~ k}
E{
Figure: Polya
urn and discrete parameters.
ti
B(~nk + ) |t|=1 nk,t +
= ti
= smoothed ratio of occurrences
q(k, t) ,
B(~nki + )
nk + T
t={u,v}
i
nu
+
k,u
i
+ T
nu
k
i
nv
+ + (u v)
k,v
nkvi + T + 1
, q(k, u v)
...
Gregor Heinrich
51 / 35
q-functions: Polya
urn and sampling weights
~ k}
E{
Figure: Polya
urn and discrete parameters.
ti
B(~nk + ) |t|=1 nk,t +
= ti
= smoothed ratio of occurrences
q(k, t) ,
B(~nki + )
nk + T
| {z }
i
i
nu
+ nv
+ + (u v)
t={u,v}
k,v
k,u
8 6 5 +
, q(k, u v)
=
i
B
+ T
nu
nkvi + T + 1
k
..
B 8 6 4 .+
ti
ti =
Gregor Heinrich
51 / 35
q-functions: Polya
urn and sampling weights
~ k}
E{
Figure: Polya
urn and discrete parameters.
ti
B(~nk + ) |t|=1 nk,t +
= ti
= smoothed ratio of occurrences
q(k, t) ,
B(~nki + )
nk + T
| {z
ui }
i
nv
+ + (u v)
t={u,v} nk,u +
k,v
, q(k, u v)
= 8u+
nk i + T
nkvi + T + 1
. . . 8 6 4 +
ti
Gregor Heinrich
51 / 35
q-functions: Polya
urn and sampling weights
~ k}
E{
Figure: Polya
urn and discrete parameters.
ti
B(~nk + ) |t|=1 nk,t +
= ti
= smoothed ratio of occurrences
q(k, t) ,
B(~nki + )
nk + T
| {z }
i
i
nu
+ nv
+ + (u v)
t={u,v}
k,v
k,u
8 6 5 +
, q(k, u v)
=
i
B
+ T
nu
nkvi + T + 1
k
..
B 7 6 4 .+
ui
vi
ti = { u i , v i }
Gregor Heinrich
51 / 35
q-functions: Polya
urn and sampling weights
~ k}
E{
Figure: Polya
urn and discrete parameters.
ti
B(~nk + ) |t|=1 nk,t +
= ti
= smoothed ratio of occurrences
q(k, t) ,
B(~nki + )
nk + T
i
nu
+
k,u
t={u,v}
...
i
+ T
nu
k
ui
Gregor Heinrich
{z
ui
i
nv
+ + (u v)
k,v
nkvi + T + 1
, q(k, u v)
51 / 35
q-functions: Polya
urn and sampling weights
~ k}
E{
Figure: Polya
urn and discrete parameters.
ti
B(~nk + ) |t|=1 nk,t +
= ti
= smoothed ratio of occurrences
q(k, t) ,
B(~nki + )
nk + T
i
nu
+
k,u
t={u,v}
...
i
+ T
nu
k
ui
Gregor Heinrich
{z
ui
i
nv
+ + (u v)
k,v
}|
nkvi + T + 1
{z
vi
u=v
1
, q(k, u v)
q(k,
)
51 / 35
q-functions: Polya
urn and sampling weights
~ k}
E{
Figure: Polya
urn and discrete parameters.
ti
B(~nk + ) |t|=1 nk,t +
= ti
= smoothed ratio of occurrences
q(k, t) ,
B(~nki + )
nk + T
i
nu
+
k,u
t={u,v}
...
i
+ T
nu
k
ui
Gregor Heinrich
{z
ui
i
nv
+ + (u v)
k,v
}|
nkvi + T + 1
vi
{z
vi
u=v
1
, q(k, u v)
q(k,
)
51 / 35
q-functions: Polya
urn and sampling weights
~ k}
E{
Figure: Polya
urn and discrete parameters.
ti
B(~nk + ) |t|=1 nk,t +
= ti
= smoothed ratio of occurrences
q(k, t) ,
B(~nki + )
nk + T
i
nu
+
k,u
t={u,v}
...
i
+ T
nu
k
ui
Gregor Heinrich
{z
ui
i
nv
+ + (u v)
k,v
}|
nkvi + T + 1
{z
vi
u=v
1
, q(k, u v)
q(k,
)
51 / 35
ID.
Name
ai
N2.
ai
NonDirichlet
prior
N4.
Nondiscrete
output
E1
a |
j
E1S(zi ): n i
zi
k |
N1A: j 1
N1B: j = f (ai , i)
=2
zi
ca
z |
=1
ai
a |
zi
z |
zi
z |
ai
Aggregation
a |
=1
zm
Autonomous
edges
E3.
Coupled
edges
ai j
=1
a |
m | ,
vm
=2
x |
yj
y |
=3
=2
z |
zi
a |
z |
=1
bi
cj
bi
ci
=3
C2.
Combined
indices
C3.
Interleaved
indices
ai
bj
ai
bj
ai
C4.
Switch
C5.
Node
coupling
Gregor
bi
ai
bj
=1
a |
xi
k |
ci j
C2A: k = (i, xi )
C2B: k = (xi , y j )
C2C: k = g(i, j, xi , y j )
b |
=2
=1
a |
xi
b |
=2
=1
a |
yj
b |
=2
=1
a |
=3
yj
=3
z |
=3
z |
xi
s=1
z |
=4
=3
x |
yj
y |
b |
C5A: i j
=2 C5B: i = j =4
a
Alternative distributions on the simplex: CTM [Blei & Laerty 2007]:
exp , N (, ); TLM [Wallach 2008]: hierarchy of Dirichlet priors
w(z|, ) = q(a, z)p(vi | z ) ; M-step: estimate z
prediction: vm =
v m
Common cause for observations: Hidden relational model (HRM) [Xu et al. 2006],
Link-LDA [Erosheva et al. 2004]
w(x, y|) = q(a, x)q(b, y)q(k, c)
Dierent dependent causes, relation: hPAM [Li et al. 2007a], HRM [Xu et al.
2006], Multi-LDA [Porteous et al. 2008a]
C3B: w(x, y|) = q(a, x)q(b, y)[q(x, c c)](xy) [q(x, c c )q(y, c c)]1(xy)
s=0
?
C3A: i j
C3B: i = j
zi
si
Regression/supervised learning: Supervised LDA [Blei & McAulie 2007], Relational topic model [Chang & Blei 2009]
=3
E2A: i j
E2B: i = j
ai
bi
regression
xi
Mixture/admixture: LDA [Blei et al. 2003b], PAM [Li & McCallum 2006]; LDCC
[Shafiei & Milios 2006] (E1S)
ca , ) = ca,z q(z, b)
w(z|
c
p(b|a) = z a,z
z,b
=2
=2
z |
m jm (z zj )
E2.
vi
z p(z | )
=1
zi
N5+E4.
bi
=2
=1
a |
bi
=2
a p(
a | )
ai
bi,n
C1A: k = i
C1B: k = zi
=1
Observed
parameters
N3.
Structure diagram
N1,E1,C1.
DirMult
nodes,
unbranched
171
di
ci
dj
Select complex submodels: Multi-grain LDA [Titov & McDonald 2008], Entitytopic models [Newman et al. 2006a]
C5A: w(xi , y j |) = q(ai , xi )q(b j , y j )q(xi , ci d j )q(y j , c i d j )
Figure 9.2: NoMM sub-structure properties. Notation (also see (9.3)): ab adds counts n(a) +n(b) ;
Heinrich
A generic approach to topic models
a b prevents i for a in (9.1); c combines sequences {c , c , c }, as applicable.
52 / 35
<<collects>>
<<implements>>
<<collects>>
<<collects>>
<<implements>>
<<collects>>
Gregor Heinrich
53 / 35
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
Gregor Heinrich
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
54 / 35
slk
Zi
}|
u U[0, 1]
{z
Zi,0
2
uZi,0
{
}
uZi,1
uZi,2
0,1
0..2 uZi,3
3
4
...
{z
Z4known
{z
Zi,4
uZi,4 0..3
4
X }|
{z }
Z4unknown
55 / 35
11
Multi-processor parallelisation
using shared memory (OpenMP)
21
P1
12
22
13
23
P2
14
24
...
...
1M
2M
global parameters
pmfs over vocabulary
...
~%m |
xmn
sync
~ mx |
ymn
Gregor Heinrich
PP
document-specific parameters
pmfs over (sub-)topics
~y |
wmn
56 / 35
32.5
sa
pa
spa
30
27.5
speedup
10
7.5
5
P
2.5
1
0
10
20
50
100
200
500
57 / 35
25
15
150
ia
pa
pb
pc
9
8
ipa
ipas
ipcs
10
7
10,10
20,20
20,40
40,40
20,100
speedup vs. ia
8
6
ipa
ipb
ipc
6
5
4
3
10,10
100
5
speedup vs. ia
speedup vs. a
20
20,20
20,40
40,40
20,100
20,20
40,40
50
20,100
K,L
Gregor Heinrich
58 / 35
pa
ipa
10,10
20,20
20,40
20,100
perplexity
2500
2000
1500 0
10
convergence:
10
10
iteration
dependent
10
indep.
59 / 35
~am
[M]
xm,nj
[Am ]
zm,n
wm,n
q(z, w)
{M, Nm }
[V]
[K]
q(x,zy)
ym, j
[K]
cm, j
q(y, c)
{M, Jm }
[C]
Lining up q-functions:
(13)
{x,z}m,n
nx,z
{x,z}m,n
{x,y}m,j
{x,y}m,j
ny
nz,wm,n +
zm,n
+ K nz
nx
nx,y
+ V
(14)
ny,cm,j +
ym,j
+ K ny + C
(15)
Retrieval uber
Anfrage-Likelihood-Modell:
p(~
w | a) =
w~
w
z a,z z,w
Gregor Heinrich
p(~c | a) =
c~c
y a,y y,c .
(16)
60 / 35
~am
[M]
xm,nj
[Am ]
zm,n
~x |
[K]
wm,n
~z |
ym, j
[K]
Lining up q-functions:
{M, Nm }
[V]
cm, j
~y |
{M, Jm }
[C]
(13)
{x,z}m,n
nx,z
{x,z}m,n
{x,y}m,j
{x,y}m,j
ny
nz,wm,n +
zm,n
+ K nz
nx
nx,y
+ V
(14)
ny,cm,j +
ym,j
+ K ny + C
(15)
Retrieval uber
Anfrage-Likelihood-Modell:
p(~
w | a) =
w~
w
z a,z z,w
Gregor Heinrich
p(~c | a) =
c~c
y a,y y,c .
(16)
60 / 35
xm,n
xm,n
Appendix E
p(
w, cc, a, x,z|, ,w) =
m,n
m,n
Nm
M
k
m=1
n [1, Nm ]
m [1, M]
xm,n
x [1, A]
k [1, K]
~am
c [1, C]
Next, we integrate out the model parameters, introducing our knowledge on the types of distribuzm,nconjugacy:
z1,2
(a) ETT2
n=1
x ) am,x
p(wm,n |
zm,n )p(zm,n |
w
m,n
m,n
j=1
m,n
Jkm [1, K]
x ) am,x
y )p(ym, j |
p(cm, j |
m, j
m, j
m, j
Nm
n [1, Nm ]
k [1, K]
m [1, M]
(b) ETT3
(E.3)
ETT models:
Bayesian networks.
Figure E.2: Iterated
=
~x
m=1 n=1
p(wm,n |
zm,n )
Jm
M
p(c
k=1
p(
k |) dk
|) d
p(
k
k
w m,, xj ym,, jetc.,
to count
statistics,
n x,k , nk,c , etc. The
m=1 j=1 m,n m,n
k=1
his appendix sketches the traditional derivation of model inference and likelihood equations in Note the change in indexing from tokens,
superscripts in counts distinguish
the
branches
ofm the model: w = words,
c = tags. To solve the
N
Jm
M
hapter 10. Comparing this tothe NoMM-based
ym, j derivations
zm,n developed in
the thesis illustrates integrals in (E.5), either the Dirichlet
integral
of
the
first
type
can
be
used
[Abramowitz
& Stegun
p(|)
p(z
|
)
a
p(y
|
x [1, A]
~k
k,t
d
k
k,c
k
d
C () c=1
V () t=1
The Gibbs full conditional cank=1
be determined
from (E.6) byk=1
applying
the chain rule. Because
or comparison with the NoMM
representations
10, the Bayesian
networks of the three
j [1,in
JmChaper
]
n [1, Nm ]
k [1, K]
k [1, K]
A
K
M fashion,
A
(y) alternating
d
(E.5)
m,a root of the model
a,k
z
and
y
are
sampled
independently.
However,
the
author
association
atathe
m [1,explained
M]
() k=1
m,n
m, j
ig. E.2(b) refers to the duplicate draw of the C3B structure
in Sec. 9.3.
a=1 K
m=1 a=1
(y)
(z)
(y)n), n x,k
must be sampled jointly for both.
Using
i
=
(m,
+
n
and
the
sum
notation
(z)
K
A = n x,k
M
(z)
(y)
(z)
(y)
(nk + ) (nk + )
(na + na x,k+ )
nm,a
+nm,a
V
(z)
=
am,a
. (E.6)
nk = t=1 nk,t , etc., the full conditional
for word
becomes: ()
V ()
tokens
C ()
K
E.2
(a)derivation:
Experttagtopic
Example
Experttagtopicmodel
model 1 (ETT)
p(
w, c, a, x,z, , , |, , ) = p(
w|z, )p(|) p(c|y, )p( |)
x ) am,x
=
p(wm,n |
zm,n )p(zm,n |
m,n
m,n
m=1
j=1
x ) am,x
y )p(ym, j |
p(cm, j |
m, j
m, j
m, j
p(|) p(|) p( |) .
(E.1)
246
n=1
Jm
a=1
k=1
m=1
i , a, c)
p(zi =k, xi =x|wi =t,zi , y, xi , w
p(
w,z, y, x)
p(
w|z, y)
p(z |x)
p(x)
=
=
p(
w,zi , y, xi ) p(
wi |zi , y)p(wi ) p(zi |xi ) p(xi )
(Heinrich
2011b)
he Bayesian network of the ETT1 model
is shown in
Fig. E.1. The details of the derivation
rategy have been explained for instance in [Heinrich 2009b]; it is similar to the strategies used
n literature.1 We start with the complete-data likelihood of the corpus:
(nk(z) + )
(n x + )
am,x
(z)
(nk,i
+ ) (n x,i + )
(nk,t + ) (nk,i + V)
(nk,t,i + ) (nk + V)
(E.8)
(z)
(n x,k
+ ) (n(z)
+ K)
E. x,i
APPLICATION
APPENDIX
am,x MODELS
(E.9)
(z)
(n(z)
x,k,i + ) (n x + K)
(z)
nk,t,i + n x,k,i +
=
(z)
am,x
For the tag branch, the derivation
is analogous,
now re-defining
i = (m, j):
nk,i + V
n + K
(E.7)
(E.10)
x,i
(E.2)
(y)
= q(k, t) q(x, k) am,x .
nk,c,i + n x,k,i +
, a, ci )
p(yi =k, xi =x|ci =c,zi , yi , xi , w
a
nk,i + V n(y) + K m,x
x,i
(E.11)
(E.12)
(E.13)
Alternative derivation strategies for topic model Gibbs samplers have been published in [Griths 2002] working The dierence of (E.11) and (E.13) to (10.3) is a result of the definition of n x,k as a summed count
Heinrich
A joint
generic
approach
61 / 35
and the to
facttopic
that bothmodels
branches are disjointly sampled.
) p(wi |
a p(zi |zi , w
wi , z)p(zi |zi ) and [McCallumGregor
et al. 2007] who
use the chain rule via the
token likelihood,
1
Retrieval performance.
The most prominent
retrieval measures are
precision and recall.
Recall is defined as
the ratio between the
number of retrieved
relevant
items
to
the total number of
existing relevant items.
Precision is defined as
the ratio between the
3.
Retrieval performance.
The most prominent
retrieval measures are
precision and recall.
Recall is defined as
the ratio between the
number of retrieved
relevant
items
to
the total number of
existing relevant items.
Precision is defined as
the ratio between the
1/2
AP@5 =
4.
Retrieval performance.
The most prominent
retrieval measures are
precision and recall.
Recall is defined as
the ratio between the
number of retrieved
relevant
items
to
the total number of
existing relevant items.
Precision is defined as
the ratio between the
5.
Retrieval performance.
The most prominent
retrieval measures are
precision and recall.
Recall is defined as
the ratio between the
number of retrieved
relevant
items
to
the total number of
existing relevant items.
Precision is defined as
the ratio between the
+2/4 +3/5
3
Retrieval performance.
The most prominent
retrieval measures are
precision and recall.
Recall is defined as
the ratio between the
number of retrieved
relevant
items
to
the total number of
existing relevant items.
Precision is defined as
the ratio between the
AP@5 =
2.
Retrieval performance.
The most prominent
retrieval measures are
precision and recall.
Recall is defined as
the ratio between the
number of retrieved
relevant
items
to
the total number of
existing relevant items.
Precision is defined as
the ratio between the
Retrieval performance.
The most prominent
retrieval measures are
precision and recall.
Recall is defined as
the ratio between the
number of retrieved
relevant
items
to
the total number of
existing relevant items.
Precision is defined as
the ratio between the
1/1 +2/2
Retrieval performance.
The most prominent
retrieval measures are
precision and recall.
Recall is defined as
the ratio between the
number of retrieved
relevant
items
to
the total number of
existing relevant items.
Precision is defined as
the ratio between the
= 0.533
Retrieval performance.
The most prominent
retrieval measures are
precision and recall.
Recall is defined as
the ratio between the
number of retrieved
relevant
items
to
the total number of
existing relevant items.
Precision is defined as
the ratio between the
+3/5
3
= 0.867
Gregor Heinrich
62 / 35
Vector Kernels (10); Support Vector Method for Novelty Detection (12)
The Entropy Regularization Information Criterion (12, support vector
machines, regularization) . . .
Gregor Heinrich
63 / 35
(9, tags: face recognition, invariances, pattern recognition); Image Representation for Facial Expression Coding (12, tags: face recognition, image, ICA) . . .
Task and Spatial Frequency Effects on Face Specialization (10, tags: face
Representing Face Images for Emotion Classification (9, tags: classification, face recognition, image)
SEXNET: A Neural Network Identifies Sex From Human Faces (3, tags:
neural networks, object recognition, pattern recognition)
Gregor Heinrich
64 / 35
face images faces image facial visual human video database detection
image images texture pixel resolution pyramid regions pixels region search
speech speaker acoustic vowel phonetic phoneme utterances spoken formant
bayesian prior density posterior entropy evidence likelihood distributions
filter frequency signals phase channel amplitude frequencies temporal spectrum
activation boltzmann annealing temperature neuron stochastic schedule machine
cell firing cells neuron activity excitatory inhibitory synaptic potential membrane
convergence stochastic descent optimization batch density global update
Gregor Heinrich
65 / 35
3. A. risk
B. return
C. stock
D. trading
E. processor
F. prediction
4. A. language
B. word
C. stress
D. grammar
E. neural
F. syllable
5. A. circuit
B. bayesian
C. analog
D. voltage
E. vlsi
F. chip
6. A. validation
B. set
C. variance
D. regression
E. selection
F. bias
-150
-200
coherence score
1. A. orientation
B. cortex
C. visual
D. ocular
E. acoustic
F. eye
-250
-300
-350
-400
-450
-500
LDA
Gregor Heinrich
ATM
ETT1/J20 ETT1/J100
66 / 35
3. A. risk
B. return
C. stock
D. trading
E. processor
F. prediction
4. A. language
B. word
C. stress
D. grammar
E. neural
F. syllable
5. A. circuit
B. bayesian
C. analog
D. voltage
E. vlsi
F. chip
6. A. validation
B. set
C. variance
D. regression
E. selection
F. bias
-150
-200
coherence score
1. A. orientation
B. cortex
C. visual
D. ocular
E. acoustic
F. eye
-250
-300
-350
-400
-450
-500
LDA
Gregor Heinrich
ATM
ETT1/J20 ETT1/J100
66 / 35