Anda di halaman 1dari 162

A generic approach to topic models

and its application to virtual communities


Gregor Heinrich
PhD presentation (English translation incl. backup slides, 45min)
Faculty of Mathematics and Computer Science
University of Leipzig

28 November 2012
Version 2.9 EN BU

Gregor Heinrich

A generic approach to topic models

1 / 35

A generic approach to topic models


and its application to virtual communities
Gregor Heinrich
PhD presentation (English translation incl. backup slides, 45min)
Faculty of Mathematics and Computer Science
University of Leipzig

28 November 2012
Version 2.9 EN BU

Gregor Heinrich

A generic approach to topic models

1 / 35

Overview

Introduction
Generic topic models
Inference methods
Application to virtual communities
Conclusions and outlook

Gregor Heinrich

A generic approach to topic models

2 / 35

Motivation: Virtual communities


cooperation
annotation
authorship
Retrieval performance.
The most prominent
retrieval measures are
precision and recall.
Recall is defined as
the ratio between the
number of retrieved
relevant
items
to
the total number of
existing relevant items.
Precision is defined as
the ratio between the

recommendation
authorship

annotation

authorship
citation
citation
Retrieval performance.
The most prominent
retrieval measures are
precision and recall.
Recall is defined as
the ratio between the
number of retrieved
relevant
items
to
the total number of
existing relevant items.
Precision is defined as
the ratio between the

Retrieval performance.
The most prominent
retrieval measures are
precision and recall.
Recall is defined as
the ratio between the
number of retrieved
relevant
items
to
the total number of
existing relevant items.
Precision is defined as
the ratio between the

similarity

Virtual communities = groups of persons who exchange information


and knowledge electronically
Examples: organisations, digital libraries, Web 2.0 applications
incl. social networks
Data are multimodal: text content; authorship, citation, annotations
and recommendations; cooperation and other social relations
Typical case: discrete data with high dynamics and large volumes
Gregor Heinrich

A generic approach to topic models

3 / 35

Motivation: Unsupervised mining of discrete data


Identification of relationships in large data volumes
Only data (and possibly model) required (information retrieval,
network analysis, clustering, NLP methods)
Density problem: Features too sparse for analysis in
high-dimensional feature space
Vocabulary problem: Semantic similarity , lexical similarity
(polysemy, synonymy, etc.)

4/24/12 1:58 AM

restaurant

teller

adhere
bank

atm

computer

location
pressure unit

network

verb

stick

verb

long object

bar
furniture

furniture

table

prevent

employees
staff

people

judicial assembly
verb

counter

glue

long object

people

court

personnel
location

yard

Gregor Heinrich

A generic approach to topic models

4 / 35

Motivation: Unsupervised mining of discrete data


Identification of relationships in large data volumes
Only data (and possibly model) required (information retrieval,
network analysis, clustering, NLP methods)
Density problem: Features too sparse for analysis in
high-dimensional feature space
Vocabulary problem: Semantic similarity , lexical similarity
(polysemy, synonymy, etc.)

4/24/12 1:58 AM

restaurant

teller

adhere
bank

atm

computer

location
pressure unit

network

verb

stick

verb

long object

long object

bar
furniture

furniture

table

prevent

employees
staff

people

judicial assembly
verb

counter

glue

people

court

personnel
location

yard

Gregor Heinrich

A generic approach to topic models

4 / 35

words

...

...

Topic models as approach

documents

Probabilistic representations of grouped discrete data


Illustrative for text: Words grouped in documents
Latent Topics = Probability distributions over vocabulary. Dominant
terms of a topic are semantically similar.
Language = Mixture of topics (latent semantic structure)

Reduce vocabulary problem: Find semantic relations


Reduce density problem: Dimensionality reduction
Gregor Heinrich

A generic approach to topic models

5 / 35

...

...

Topic models as approach

topics
words

documents

Probabilistic representations of grouped discrete data


Illustrative for text: Words grouped in documents
Latent Topics = Probability distributions over vocabulary. Dominant
terms of a topic are semantically similar.
Language = Mixture of topics (latent semantic structure)

Reduce vocabulary problem: Find semantic relations


Reduce density problem: Dimensionality reduction
Gregor Heinrich

A generic approach to topic models

5 / 35

Topic models as approach


rhythm
drum

...

...

bar

topics
words

documents

Probabilistic representations of grouped discrete data


Illustrative for text: Words grouped in documents
Latent Topics = Probability distributions over vocabulary. Dominant
terms of a topic are semantically similar.
Language = Mixture of topics (latent semantic structure)

Reduce vocabulary problem: Find semantic relations


Reduce density problem: Dimensionality reduction
Gregor Heinrich

A generic approach to topic models

5 / 35

Topic models as approach


rhythm
drum
bar

wine

...

...

restaurant

topics

words

documents

Probabilistic representations of grouped discrete data


Illustrative for text: Words grouped in documents
Latent Topics = Probability distributions over vocabulary. Dominant
terms of a topic are semantically similar.
Language = Mixture of topics (latent semantic structure)

Reduce vocabulary problem: Find semantic relations


Reduce density problem: Dimensionality reduction
Gregor Heinrich

A generic approach to topic models

5 / 35

Topic models as approach


Playing Drums
for Beginners

rhythm
drum

Rhythm & Spice


Jamaican Grill

bar

wine

...

...

restaurant

topics

words

Leipzigs Bars
and Restaurants

documents

Probabilistic representations of grouped discrete data


Illustrative for text: Words grouped in documents
Latent Topics = Probability distributions over vocabulary. Dominant
terms of a topic are semantically similar.
Language = Mixture of topics (latent semantic structure)

Reduce vocabulary problem: Find semantic relations


Reduce density problem: Dimensionality reduction
Gregor Heinrich

A generic approach to topic models

5 / 35

Topic models as approach


Playing Drums
for Beginners

rhythm
drum

Rhythm & Spice


Jamaican Grill

bar

wine

...

...

restaurant

topics

words

Leipzigs Bars
and Restaurants

documents

Probabilistic representations of grouped discrete data


Illustrative for text: Words grouped in documents
Latent Topics = Probability distributions over vocabulary. Dominant
terms of a topic are semantically similar.
Language = Mixture of topics (latent semantic structure)

Reduce vocabulary problem: Find semantic relations


Reduce density problem: Dimensionality reduction
Gregor Heinrich

A generic approach to topic models

5 / 35

...

...

Topic models as approach

topics
words

documents

Probabilistic representations of grouped discrete data


Illustrative for text: Words grouped in documents
Latent Topics = Probability distributions over vocabulary. Dominant
terms of a topic are semantically similar.
Language = Mixture of topics (latent semantic structure)

Reduce vocabulary problem: Find semantic relations


Reduce density problem: Dimensionality reduction
Gregor Heinrich

A generic approach to topic models

5 / 35

Language models: Unigram model

p(w | z)
1 2 3
z

w1,1
|

w1,2

w1,3

{z

w2,1
}

document 1

w2,2

w2,3

{z

document 2

wm,n
}

word n
document m

One distribution for all data

Gregor Heinrich

A generic approach to topic models

6 / 35

Language models: Unigram mixture model

p(w | z1 )

w1,1
|

p(w | z2 )
1 2 3

1 2 3

z1

z2

w1,2

w1,3

{z

w2,1
}

document 1

w2,2

zm

w2,3

{z

document 2

wm,n
}

word n
document m

One distribution per document

Gregor Heinrich

A generic approach to topic models

6 / 35

Language models: Unigram admixture model

p(w | z1,1 ) . . .

. . . p(w | z2,3 )

z1,1

z1,2

z1,3

z2,1

z2,2

z2,3

zm,n

w1,1

w1,2

w1,3

w2,1

w2,2

w2,3

wm,n

{z

document 1

{z

document 2

word n
document m

One distribution per word basic topic model


Gregor Heinrich

A generic approach to topic models

6 / 35

Language models: Unigram admixture model

p(w | z1,1 ) . . .

. . . p(w | z2,3 )

z1,1

z1,2

z1,3

z2,1

z2,2

z2,3

zm,n

w1,1

w1,2

w1,3

w2,1

w2,2

w2,3

wm,n

{z

document 1

{z

document 2

word n
document m

One distribution per word basic topic model


Gregor Heinrich

A generic approach to topic models

6 / 35

Bayesian topic models: The Dirichlet distribution


Bayesian methodology:
123

123

123

p(w | z) Dir(~)

Distributions generated
from prior distributions
Speech + other discrete
data: Dirichlet distribution
important prior:
Defined on simplex:
Surface containing all
discrete distributions
~ controls
Parameter
behaviour

p3
~ = (4, 4, 2)

p1

Bayesian topic model:

p2
3 = 1

3 = 1

3 = 1

Latent Dirichlet Allocation


(LDA) (Blei et al. 2003)

1 = 1

2 = 1

1 = 1

2 = 1

Gregor Heinrich

1 = 1

2 = 1

A generic approach to topic models

7 / 35

Latent Dirichlet Allocation

~m

Dir()

zm,n

~k

topic k

wm,n
word n
document m

Dir()

Latent Dirichlet Allocation (Blei et al. 2003)


Gregor Heinrich

A generic approach to topic models

8 / 35

Latent Dirichlet Allocation


Concert tonight at Rhythm and Spice Restaurant . . .

~m

Dir()

zm,n

~k

topic k

wm,n
word n
document m

Dir()

Latent Dirichlet Allocation (Blei et al. 2003)


Gregor Heinrich

A generic approach to topic models

8 / 35

Latent Dirichlet Allocation


Concert tonight at Rhythm and Spice Restaurant . . .

~m

topic 1

Dir()

~1

t
r
d
ll
ran ba foo gri
tau
s
re
topic 2

zm,n

...

~2

ert

sic m
nc mu hyth
r
co

r ...
ba

~k

topic k

wm,n
word n
document m

Dir()

Generating word distributions for all topics


Gregor Heinrich

A generic approach to topic models

8 / 35

Latent Dirichlet Allocation


Concert tonight at Rhythm and Spice Restaurant . . .
document 1

~1

topic 1 topic 2 . . .

~m

topic 1

Dir()

~1

t
r
d
ll
ran ba foo gri
tau
s
re
topic 2

zm,n

...

~2

ert

sic m
nc mu hyth
r
co

r ...
ba

~k

topic k

wm,n
word n
document m

Dir()

Generating topic distribution for document


Gregor Heinrich

A generic approach to topic models

8 / 35

Latent Dirichlet Allocation


Concert tonight at Rhythm and Spice Restaurant . . .
document 1

~1

topic 1 topic 2 . . .

~m

topic 1

Dir()

~1

t
r
d
ll
ran ba foo gri
tau
s
re
topic 2

zm,n

...

~2

ert

sic m
nc mu hyth
r
co

r ...
ba

~k

topic k

wm,n
word n
document m

Dir()

Sampling the topic index for first word, z = 2


Gregor Heinrich

A generic approach to topic models

8 / 35

Latent Dirichlet Allocation


Concert tonight at Rhythm and Spice Restaurant . . .
document 1

~1

topic 1 topic 2 . . .

~m

topic 1

Dir()

~1

t
r
d
ll
ran ba foo gri
tau
s
re
topic 2

zm,n

...

~2

ert

sic m
nc mu hyth
r
co

r ...
ba

~k

topic k

wm,n
word n
document m

Dir()

Sampling a word from term distribution for topic 2, concert


Gregor Heinrich

A generic approach to topic models

8 / 35

State of the art


Large number of published models that extend LDA:
Authors (Rosen-Zvi et al. 2004),
Citations (Dietz et al. 2007),
Hierarchy (Li and McCallum 2006; Li et al. 2007),
Image features and captions (Barnard et al. 2003) etc.
Results for topic model (title + abstract) only since 2012: ACM
>400, Google Scholar >1300.

Expanding research area with practical relevance


But: No existing analysis as generic model class

Partly tedious derivation, especially for inference algorithms


Conjecture:
Important properties generic across models
Simplifications for derivation of concrete model properties, inference
algorithms and design methods
Gregor Heinrich

A generic approach to topic models

9 / 35

State of the art


Large number of published models that extend LDA:
Experttagtopic model
Authors (Rosen-Zvi
et al. 2004),
(Heinrich 2011)

Citations (Dietz et al. 2007),

~am McCallum
Hierarchy (Li and
2006; Li et al. 2007),

Image features
andx captions
(Barnard et al. 2003) etc.
~
xm, j
x

m,n

[1, A] + abstract) only since 2012: ACM


Results for topic modelx (title
y
z

m,n
>400, Googlem, jScholar
>1300.

cm, j
wm,narea
~k
Expanding
research
practical relevance
~with

k
j [1, J ] n [1, N ] k [1, K]
[1, K]
But: kNo
existing
analysis
as generic model class
m [1, M]
m

Partly tedious derivation, especially for inference algorithms


Conjecture:
Important properties generic across models
Simplifications for derivation of concrete model properties, inference
algorithms and design methods
Gregor Heinrich

A generic approach to topic models

9 / 35

State of the art


Large number of published models that extend LDA:
Authors (Rosen-Zvi et al. 2004),
Citations (Dietz et al. 2007),
Hierarchy (Li and McCallum 2006; Li et al. 2007),
Image features and captions (Barnard et al. 2003) etc.
Results for topic model (title + abstract) only since 2012: ACM
>400, Google Scholar >1300.

Expanding research area with practical relevance


But: No existing analysis as generic model class

Partly tedious derivation, especially for inference algorithms


Conjecture:
Important properties generic across models
Simplifications for derivation of concrete model properties, inference
algorithms and design methods
Gregor Heinrich

A generic approach to topic models

9 / 35

State of the art


Large number of published models that extend LDA:
Authors (Rosen-Zvi et al. 2004),
Citations (Dietz et al. 2007),
Hierarchy (Li and McCallum 2006; Li et al. 2007),
Image features and captions (Barnard et al. 2003) etc.
Results for topic model (title + abstract) only since 2012: ACM
>400, Google Scholar >1300.

Expanding research area with practical relevance


But: No existing analysis as generic model class

Partly tedious derivation, especially for inference algorithms


Conjecture:
Important properties generic across models
Simplifications for derivation of concrete model properties, inference
algorithms and design methods
Gregor Heinrich

A generic approach to topic models

9 / 35

Research questions

How can topic models be described in a generic way in order to use


their properties across particular applications?
Can generic topic models be implemented generically and, if so,
can repeated structures be exploited for optimisations?
How can generic models be applied to data in virtual communities?

Gregor Heinrich

A generic approach to topic models

10 / 35

Overview

Introduction
Generic topic models
Inference methods
Application to virtual communities
Conclusions and outlook

Gregor Heinrich

A generic approach to topic models

11 / 35

How can topic models be described in a generic way in order to use


their properties across particular applications?

Gregor Heinrich

A generic approach to topic models

12 / 35

Topic models: Example structures


~m

~m

cm,n

zm,n

zm,n

~c

c[1,C]

wm,n

~k
k[1,K]

wm,n

~k
k[1,K]

n[1,Nm ]

(a) Latent Dirichlet allocation (LDA)

n[1,Nm ]
m[1,M]

m[1,M]

(b) Authortopic model (ATM)

z1m,n

~r

~ rm

z2m,n

~x

~ m,x

z3m,n

~0

~ m,0

zTm,n

~T

~ m,T

ztm,n

m,n

T [1,|T |]

~T,t

T,t[1,|T ||t|]

y[1,K]

~k

wm,n

~k
k[1,|t|+|T |+1]

[1,L]

wm,n
n[1,Nm ]

n[1,Nm ]

m[1,M]

m[1,M]

(c) Pachinko allocation model (PAM4)

(d) Hierarchical PAM (hPAM)

(Blei et al. 2003; Rosen-Zvi et al. 2004; Li and McCallum 2006; Li et al. 2007)
Gregor Heinrich

A generic approach to topic models

13 / 35

Generic topic models NoMMs

~1
~2
~3
Dir(~k |)

~k
xin

k = f (xin )

xout

Generic characteristics of topic models:


Levels with discrete components ~k , generated from Dirichlet
distributions
Coupling via values of discrete variables x

Network of mixed membership (NoMM):

Compact representation for topic models


Directed acyclical graph
Node: sample from mixture component, selection via incoming
edges; terminal node: observation
Edge: propagation of discrete values to child nodes.
Gregor Heinrich

A generic approach to topic models

14 / 35

Generic topic models NoMMs

~k

~k
K

xin1 k = f (xin1 , xin2 )

xout

xin

k = f (xin )

~k
K

xout

xin

k = f (xin )

xout1
xout2

xin2

Generic characteristics of topic models:


Levels with discrete components ~k , generated from Dirichlet
distributions
Coupling via values of discrete variables x

Network of mixed membership (NoMM):

Compact representation for topic models


Directed acyclical graph
Node: sample from mixture component, selection via incoming
edges; terminal node: observation
Edge: propagation of discrete values to child nodes.
Gregor Heinrich

A generic approach to topic models

14 / 35

Generic topic models NoMMs

~k

~k
K

xin1 k = f (xin1 , xin2 )

x1

k = f (xin )

~k
K

x2

k = f (xin )

xout1
xout2

xin2

Generic characteristics of topic models:


Levels with discrete components ~k , generated from Dirichlet
distributions
Coupling via values of discrete variables x

Network of mixed membership (NoMM):

Compact representation for topic models


Directed acyclical graph
Node: sample from mixture component, selection via incoming
edges; terminal node: observation
Edge: propagation of discrete values to child nodes.
Gregor Heinrich

A generic approach to topic models

14 / 35

Generic topic models NoMMs

xin1
xin2

~k |

x1

~k |

x2

xout1
~k |

xout2

Generic characteristics of topic models:


Levels with discrete components ~k , generated from Dirichlet
distributions
Coupling via values of discrete variables x

Network of mixed membership (NoMM):

Compact representation for topic models


Directed acyclical graph
Node: sample from mixture component, selection via incoming
edges; terminal node: observation
Edge: propagation of discrete values to child nodes.
Gregor Heinrich

A generic approach to topic models

14 / 35

Generic topic models NoMMs

xin1
xin2

Dir(~k |)

~k |

Dir(~k |)

x1

~k |

Dir(~k |)

x2

~k |

xout1
xout2

Generic characteristics of topic models:


Levels with discrete components ~k , generated from Dirichlet
distributions
Coupling via values of discrete variables x

Network of mixed membership (NoMM):

Compact representation for topic models


Directed acyclical graph
Node: sample from mixture component, selection via incoming
edges; terminal node: observation
Edge: propagation of discrete values to child nodes.
Gregor Heinrich

A generic approach to topic models

14 / 35

Generic topic models NoMMs


LDA

~ k |)
Dir(

~m |

Dir(~
k |)

zm,n

~k |

wm,n

Generic characteristics of topic models:


Levels with discrete components ~k , generated from Dirichlet
distributions
Coupling via values of discrete variables x

Network of mixed membership (NoMM):

Compact representation for topic models


Directed acyclical graph
Node: sample from mixture component, selection via incoming
edges; terminal node: observation
Edge: propagation of discrete values to child nodes.
Gregor Heinrich

A generic approach to topic models

14 / 35

Generic topic models NoMMs


LDA

~ k |)
Dir(

~m |

Dir(~
k |)

zm,n

~k |

wm,n

Generic characteristics of topic models:


Levels with discrete components ~k , generated from Dirichlet
distributions
Coupling via values of discrete variables x

Network of mixed membership (NoMM):

Compact representation for topic models


Directed acyclical graph
Node: sample from mixture component, selection via incoming
edges; terminal node: observation
Edge: propagation of discrete values to child nodes.
Gregor Heinrich

A generic approach to topic models

14 / 35

Generic topic models NoMMs


LDA
~1

~ k |)
Dir(

m=1

~m |

Dir(~
k |)

zm,n

~k |

wm,n

Generic characteristics of topic models:


Levels with discrete components ~k , generated from Dirichlet
distributions
Coupling via values of discrete variables x

Network of mixed membership (NoMM):

Compact representation for topic models


Directed acyclical graph
Node: sample from mixture component, selection via incoming
edges; terminal node: observation
Edge: propagation of discrete values to child nodes.
Gregor Heinrich

A generic approach to topic models

14 / 35

Generic topic models NoMMs


LDA
~1

~ k |)
Dir(

m=1

~m |

Dir(~
k |)

z1,1 =3

~k |

wm,n

Generic characteristics of topic models:


Levels with discrete components ~k , generated from Dirichlet
distributions
Coupling via values of discrete variables x

Network of mixed membership (NoMM):

Compact representation for topic models


Directed acyclical graph
Node: sample from mixture component, selection via incoming
edges; terminal node: observation
Edge: propagation of discrete values to child nodes.
Gregor Heinrich

A generic approach to topic models

14 / 35

Generic topic models NoMMs


LDA
~3

~ k |)
Dir(

m=1

~m |

Dir(~
k |)

z1,1 =3

~k |

wm,n

Generic characteristics of topic models:


Levels with discrete components ~k , generated from Dirichlet
distributions
Coupling via values of discrete variables x

Network of mixed membership (NoMM):

Compact representation for topic models


Directed acyclical graph
Node: sample from mixture component, selection via incoming
edges; terminal node: observation
Edge: propagation of discrete values to child nodes.
Gregor Heinrich

A generic approach to topic models

14 / 35

Generic topic models NoMMs


LDA
~3

~ k |)
Dir(

m=1

~m |

Dir(~
k |)

z1,1 =3

~k |

w1,1 =2

Generic characteristics of topic models:


Levels with discrete components ~k , generated from Dirichlet
distributions
Coupling via values of discrete variables x

Network of mixed membership (NoMM):

Compact representation for topic models


Directed acyclical graph
Node: sample from mixture component, selection via incoming
edges; terminal node: observation
Edge: propagation of discrete values to child nodes.
Gregor Heinrich

A generic approach to topic models

14 / 35

Topic models as NoMMs


m

~m |

[M]

zm,n =k

~k |

[K]

wm,n =t

~m

[V]

zm,n

(a) Latent Dirichlet allocation, LDA

wm,n

~k
k[1,K]

n[1,Nm ]
m[1,M]

xm,n =x

~am

[M]

[A]

zm,n =k

~ x |

[K]

~k |

wm,n =t

~m

[V]

cm,n

(b) Authortopic model, ATM

c[1,C]

m
[M]

~ rm | ~r

z2m,n =x
[s1 ]

~ m,x |~ x

z3m,n =y
[s2 ]

~ y | ~

[M]

k[1,K]

n[1,Nm ]
m[1,M]

[V]

z1m,n

~r

~ rm

~x

~ m,x

z2m,n

z3m,n

y[1,K]

~ m,0 |~0

zTm,n =T
[M]

wm,n

~k

wm,n =t

(c) Pachinko allocation model, PAM


m

zm,n

~c

zTm,n =T
[|T |]

ztm,n =t

~ m,T |~T

[|t|]

~T,t |

`m,n
[3]

~k

wm,n

[1,L]

~ `,T,t |

[|T |+|t|+1]

wm,n

n[1,Nm ]
m[1,M]

[V]

~0

~ m,0

zTm,n

~T

~ m,T

ztm,n

m,n

T [1,|T |]

(d) Hierarchical pachinko allocation model, hPAM

~k
k[1,|t|+|T |+1]

~T,t

T,t[1,|T ||t|]

wm,n
n[1,Nm ]
m[1,M]

(Blei et al. 2003; Rosen-Zvi et al. 2004; Li and McCallum 2006; Li et al. 2007)

Gregor Heinrich

A generic approach to topic models

15 / 35

Overview

Introduction
Generic topic models
Inference methods
Application to virtual communities
Conclusions and outlook

Gregor Heinrich

A generic approach to topic models

16 / 35

Can generic topic models be implemented generically. . . ?

Gregor Heinrich

A generic approach to topic models

17 / 35

Bayesian inference problem and Gibbs sampler


Bayesian inference: inversion of generative process:
Find distributions over parameters and latent variables/topics H ,
given observations V and Dirichlet parameters A
= Determine posterior distribution p(H, | V, A)
Intractability approximative approaches
Gibbs sampling: Variant of Markov-Chain Monte Carlo (MCMC)
In topic models: Marginalise parameters (Collapsed GS)
Sample topics Hi for each data point i in turn: Hi p(Hi | Hi , V, A)

~mr | r

1
Gregor Heinrich

x
H1

~m,x |

y
H2

A generic approach to topic models

~y |

w
V
18 / 35

Bayesian inference problem and Gibbs sampler


Bayesian inference: inversion of generative process:
Find distributions over parameters and latent variables/topics H ,
given observations V and Dirichlet parameters A
= Determine posterior distribution p(H, | V, A)
Intractability approximative approaches
Gibbs sampling: Variant of Markov-Chain Monte Carlo (MCMC)
In topic models: Marginalise parameters (Collapsed GS)
Sample topics Hi for each data point i in turn: Hi p(Hi | Hi , V, A)

~mr | r

1
Gregor Heinrich

x
H1

~m,x |

y
H2

A generic approach to topic models

~y |

w
V
18 / 35

Bayesian inference problem and Gibbs sampler


Bayesian inference: inversion of generative process:
Find distributions over parameters and latent variables/topics H ,
given observations V and Dirichlet parameters A
= Determine posterior distribution p(H, | V, A)
Intractability approximative approaches
Gibbs sampling: Variant of Markov-Chain Monte Carlo (MCMC)
In topic models: Marginalise parameters (Collapsed GS)
Sample topics Hi for each data point i in turn: Hi p(Hi | Hi , V, A)

?
m

~mr | r

p( 1,
Gregor Heinrich

?
?
H 1,

~m,x |

2,

?
?
H 2,

A generic approach to topic models

~y |

3 | V , A)
18 / 35

Bayesian inference problem and Gibbs sampler


Bayesian inference: inversion of generative process:
Find distributions over parameters and latent variables/topics H ,
given observations V and Dirichlet parameters A
= Determine posterior distribution p(H, | V, A)
Intractability approximative approaches
Gibbs sampling: Variant of Markov-Chain Monte Carlo (MCMC)
In topic models: Marginalise parameters (Collapsed GS)
Sample topics Hi for each data point i in turn: Hi p(Hi | Hi , V, A)

?
m

~mr | r

p( 1,
Gregor Heinrich

?
?
H 1,

~m,x |

2,

?
?
H 2,

A generic approach to topic models

~y |

3 | V , A)
18 / 35

Bayesian inference problem and Gibbs sampler


Bayesian inference: inversion of generative process:
Find distributions over parameters and latent variables/topics H ,
given observations V and Dirichlet parameters A
= Determine posterior distribution p(H, | V, A)
Intractability approximative approaches
Gibbs sampling: Variant of Markov-Chain Monte Carlo (MCMC)
In topic models: Marginalise parameters (Collapsed GS)
Sample topics Hi for each data point i in turn: Hi p(Hi | Hi , V, A)

~mr | r

Hi1 , Hi2 p( Hi1,


Gregor Heinrich

~m,x |

?
Hi2

A generic approach to topic models

~y |

| Hi , V , A)
18 / 35

Sampling distribution for NoMMs


m

~mr | r

~m,x |

~y |

Gibbs sampler can be generically derived (Heinrich 2009)


Typical case: Quotients of factors over levels `:

[`]
Y ni
+
k,t

p(Hi |Hi , V, A)
P i
+

n
t k,t
`

nk,t = count of co-occurrences between input and output values of a


level (components and samples)
More complex variants covered by q(k, t) ,

Gregor Heinrich

A generic approach to topic models

beta({nk,t }Tt=1 + )
beta({ni
}T + )
k,t t=1
19 / 35

Sampling distribution for NoMMs


m

~mr | r

~m,x |

~y |

Gibbs sampler can be generically derived (Heinrich 2009)


Typical case: Quotients of factors over levels `:

[`]
Y ni
+
k,t

p(Hi |Hi , V, A)
P i
n
+

t k,t
`

nk,t = count of co-occurrences between input and output values of a


level (components and samples)
More complex variants covered by q(k, t) ,

Gregor Heinrich

A generic approach to topic models

beta({nk,t }Tt=1 + )
beta({ni
}T + )
k,t t=1
19 / 35

Sampling distribution for NoMMs


m

q(m, x)

q((m, x), y)

q(y, w)

Gibbs sampler can be generically derived (Heinrich 2009)


Typical case: Quotients of factors over levels `:

[`]

Y ni
Y
+

k,t

=
P
p(Hi |Hi , V, A)
q(k, t) [`]
i
t nk,t +
`
`

nk,t = count of co-occurrences between input and output values of a


level (components and samples)
More complex variants covered by q(k, t) ,

Gregor Heinrich

A generic approach to topic models

beta({nk,t }Tt=1 + )
beta({ni
}T + )
k,t t=1
19 / 35

Typology of NoMM substructures


x
a

~a |

~z |

q(a, z) q(z, b)
N1. Dirichletmultinomial

~a |

~y |

~ ca

~z |

q(a, x y) q(x, b) q(y, c)

~a |

~z |

~z |

c
a,z
q(z, b)

q(a, z) q(z, b) q(z, c)

N2. Observed parameters

E3. Coupled edges

~a |

x
y

~b |

~k |

q(a, x) q(b, y) q(k, c), k = f (x, y)

E2. Autonomous edges


z

~x |

C2. Combined indices

~a |

z1

~b |

z2

~z |

q(a, z1 ) q(b, z2 ) q(z1 , c c ) q(z2 , c c)


C3. Interleaved indices

NoMM substructures: Nodes, edges/branches, component


indices/merging of edges:
Representation via q-functions and likelihood
Multiple samples per data point: q(a, x y) for respective level
Library incl. additional structures: alternative distributions,
regression, aggregation etc. q-functions + other factors
Gregor Heinrich

A generic approach to topic models

20 / 35

Implementation: Gibbs meta-sampler

data =

validate

deploy

data =

compile
C/Java code
module

Java model
instance
topic model
specification
code
templates

optimise
generate

NoMM code
generator

{z

Java VM

Code generator for topic models in Java and C

C/Java code
prototype

{z

native platform

Separation of knowledge domains: topic model applications vs.


machine learning vs. computing architecture

Gregor Heinrich

A generic approach to topic models

21 / 35

Implementation: Gibbs meta-sampler

data =

validate

deploy

data =

compile
C/Java code
module

Java model
instance
topic model
specification
code
templates

optimise
generate

NoMM code
generator

{z

Java VM

Code generator for topic models in Java and C

C/Java code
prototype

{z

native platform

Separation of knowledge domains: topic model applications vs.


machine learning vs. computing architecture

Gregor Heinrich

A generic approach to topic models

21 / 35

Example NoMM script and generated kernel: hPAM2


m

~m |

[M]

~ xm,x |xx

[M]

xmn=x
[X]

ymnroot
=y
[Y]

model = HPAM2

sup

description:
Hierarchical PAM model 2 (HPAM2)

words

sub

x
y
w

words

Gregor Heinrich

wmn

[V]
x=0:k=0
x , 0, y = 0 : k = 1 + x
x, y , 0 : k = 1 + X + y

words

// hidden edge
for (hx = 0; hx < X; hx++) {
// hidden edge
for (hy = 0; hy < Y; hy++) {
mxsel = X * m + hx;
mxjsel = hx;
if (hx == 0)
ksel = 0;
else if (hy == 0)
ksel = 1 + hx;
else
ksel = 1 + X + hy;
pp[hx][hy] = (nmx[m][hx] + alpha[hx])
* (nmxy[mxsel][hy] + alphax[mxjsel][hy])
/ (nmxysum[mxsel] + alphaxsum[mxjsel])
* (nkw[ksel][w[m][n]] + beta)
/ (nkwsum[ksel] + betasum);
psum += pp[hx][hy];
} // for h
} // for h

sup

words

sequences:
# variables sampled for each (m,n)
w, x, y : m, n
network:
# each line one NoMM node
m
>> theta | alpha
>>
m,x >> thetax | alphax[x] >>
x,y >> phi[k]
>>
# java code to assign k
k : {
if (x==0) { k = 0; }
else if (y==0) k = 1 + x;
else k = 1 + X + y;
}.

~k |

sub

sub

words

words

A generic approach to topic models

22 / 35

Example NoMM script and generated kernel: hPAM2


m

~m |

[M]

[M]

xmn=x
[X]

ymn=y

~ xm,x |xx

[Y]

model = HPAM2
description:
Hierarchical PAM model 2 (HPAM2)
sequences:
# variables sampled for each (m,n)
w, x, y : m, n
network:
# each line one NoMM node
m
>> theta | alpha
>>
m,x >> thetax | alphax[x] >>
x,y >> phi[k]
>>
# java code to assign k
k : {
if (x==0) { k = 0; }
else if (y==0) k = 1 + x;
else k = 1 + X + y;
}.

Gregor Heinrich

x
y
w

~k |

wmn

[V]
x=0:k=0
x , 0, y = 0 : k = 1 + x
x, y , 0 : k = 1 + X + y

// hidden edge
for (hx = 0; hx < X; hx++) {
// hidden edge
for (hy = 0; hy < Y; hy++) {
mxsel = X * m + hx;
mxjsel = hx;
if (hx == 0)
ksel = 0;
else if (hy == 0)
ksel = 1 + hx;
else
ksel = 1 + X + hy;
pp[hx][hy] = (nmx[m][hx] + alpha[hx])
* (nmxy[mxsel][hy] + alphax[mxjsel][hy])
/ (nmxysum[mxsel] + alphaxsum[mxjsel])
* (nkw[ksel][w[m][n]] + beta)
/ (nkwsum[ksel] + betasum);
psum += pp[hx][hy];
} // for h
} // for h

A generic approach to topic models

22 / 35

DocumentTopic distribution in Gibbs sampler

Iteration 1
Documenttopic matrix (200 documents, 50 topics)

Gregor Heinrich

A generic approach to topic models

23 / 35

DocumentTopic distribution in Gibbs sampler

Iteration 5
Documenttopic matrix (200 documents, 50 topics)

Gregor Heinrich

A generic approach to topic models

23 / 35

DocumentTopic distribution in Gibbs sampler

Iteration 10
Documenttopic matrix (200 documents, 50 topics)

Gregor Heinrich

A generic approach to topic models

23 / 35

DocumentTopic distribution in Gibbs sampler

Iteration 15
Documenttopic matrix (200 documents, 50 topics)

Gregor Heinrich

A generic approach to topic models

23 / 35

DocumentTopic distribution in Gibbs sampler

Iteration 20
Documenttopic matrix (200 documents, 50 topics)

Gregor Heinrich

A generic approach to topic models

23 / 35

DocumentTopic distribution in Gibbs sampler

Iteration 30
Documenttopic matrix (200 documents, 50 topics)

Gregor Heinrich

A generic approach to topic models

23 / 35

DocumentTopic distribution in Gibbs sampler

Iteration 40
Documenttopic matrix (200 documents, 50 topics)

Gregor Heinrich

A generic approach to topic models

23 / 35

DocumentTopic distribution in Gibbs sampler

Iteration 50
Documenttopic matrix (200 documents, 50 topics)

Gregor Heinrich

A generic approach to topic models

23 / 35

DocumentTopic distribution in Gibbs sampler

Iteration 60
Documenttopic matrix (200 documents, 50 topics)

Gregor Heinrich

A generic approach to topic models

23 / 35

DocumentTopic distribution in Gibbs sampler

Iteration 80
Documenttopic matrix (200 documents, 50 topics)

Gregor Heinrich

A generic approach to topic models

23 / 35

DocumentTopic distribution in Gibbs sampler

Iteration 100
Documenttopic matrix (200 documents, 50 topics)

Gregor Heinrich

A generic approach to topic models

23 / 35

DocumentTopic distribution in Gibbs sampler

Iteration 120
Documenttopic matrix (200 documents, 50 topics)

Gregor Heinrich

A generic approach to topic models

23 / 35

DocumentTopic distribution in Gibbs sampler

Iteration 150
Documenttopic matrix (200 documents, 50 topics)

Gregor Heinrich

A generic approach to topic models

23 / 35

DocumentTopic distribution in Gibbs sampler

Iteration 200
Documenttopic matrix (200 documents, 50 topics)

Gregor Heinrich

A generic approach to topic models

23 / 35

DocumentTopic distribution in Gibbs sampler

Iteration 300
Documenttopic matrix (200 documents, 50 topics)

Gregor Heinrich

A generic approach to topic models

23 / 35

DocumentTopic distribution in Gibbs sampler

Iteration 500, converged stationary state


Documenttopic matrix (200 documents, 50 topics)

Gregor Heinrich

A generic approach to topic models

23 / 35

Fast sampling: Hybrid scaling methods


PAM4
indep.
depend.

model

perplexity

LDA
PAM4
PAM4
PAM4
1

10

100

iterations

1000

5000

dim.
500

40 40
40 40
40 40

ser. par. indep. speedup

30.2
7.4
24.1
49.8

Serial and parallel scaling methods:


Generalised results for LDA to generic NoMMs, specifically (Porteous
et al. 2008; Newman et al. 2009) + novel approach

Problem: Sampling space for stat. dependent variables: K L . . .


Independence assumption: Separate samplers with dimensions
K + L + ...  K L ...
Empirical result: Iterations , but topic quality comparable
Hybrid approaches with independent samplers highly effective
Implementation: complexity covered by meta-sampler
Gregor Heinrich

A generic approach to topic models

24 / 35

Overview

Introduction
Generic topic models
Inference methods
Application to virtual communities
Conclusions and outlook

Gregor Heinrich

A generic approach to topic models

25 / 35

How can generic models be applied to data in virtual communities?

Gregor Heinrich

A generic approach to topic models

26 / 35

NoMM design process


x
a

~a |

~z |

q(a, z) q(z, b)
N1. Dirichletmultinomial

~a |

~y |

~ ca

~z |

q(a, x y) q(x, b) q(y, c)

~a |

~z |

~z |

c
a,z
q(z, b)

q(a, z) q(z, b) q(z, c)

N2. Observed parameters

E3. Coupled edges

~a |

x
y

~b |

~k |

q(a, x) q(b, y) q(k, c), k = f (x, y)

E2. Autonomous edges


z

~x |

C2. Combined indices

~a |

z1

~b |

z2

~z |

q(a, z1 ) q(b, z2 ) q(z1 , c c ) q(z2 , c c)


C3. Interleaved indices

Typology Library of NoMM substructures


Idea: Construct models from simple substructures that connect
terminal nodes:

Terminal nodes multimodal data (virtual communities...)


Substructures relationships in data; latent semantics

Process:

Assumptions on dependencies in data


Iterative association to structures in model (usage of typology)
Gibbs distribution known! model behaviour: q(x, y) = rich get richer

Implementation and test with Gibbs meta-sampler; possibly iteration


Gregor Heinrich

A generic approach to topic models

27 / 35

NoMM design process

Define
modelling task
and metrics

Define
evidence

Write
NoMM script

Create
model terminals

Generate
and adapt
Gibbs sampler

Formulate
model
assumptions

Implement
target metric

Compose
model and
predict properties

Evaluate
based on
test corpus

10

Optimise
and integrate for
target platform

Typology Library of NoMM substructures


Idea: Construct models from simple substructures that connect
terminal nodes:

Terminal nodes multimodal data (virtual communities...)


Substructures relationships in data; latent semantics

Process:

Assumptions on dependencies in data


Iterative association to structures in model (usage of typology)
Gibbs distribution known! model behaviour: q(x, y) = rich get richer

Implementation and test with Gibbs meta-sampler; possibly iteration


Gregor Heinrich

A generic approach to topic models

27 / 35

Application: Expert finding with tag annotations


authorship

authorship

Retrieval performance.
The most prominent
retrieval measures are
precision and recall.
Recall is defined as
the ratio between the
number of retrieved
relevant
items
to
the total number of
existing relevant items.
Precision is defined as
the ratio between the

annotation

Scenario: Expert finding via documents with tag annotations


Authors of relevant documents experts

Frequently documents with additional annotations, here: tags

Goal: Enable tag queries, improve quality of text queries


Problem: Tags often incomplete, partly wrong
Connection of tags and experts via topics

(1) Data: For each document m: text w


~ m , authors ~am , tags ~cm
(2) Goal: Tag query ~c 0 : p(~c 0 | a) = max, word query w
~ 0 : p(~
w 0 | a) = max
(3) Terminal nodes: Authors in input, words and tags in output
Gregor Heinrich

A generic approach to topic models

28 / 35

Application: Expert finding with tag annotations


authors
AB

~am

1
2

AB TH

TH

tags
Retrieval performance.
The most prominent
retrieval measures are
precision and recall.
Recall is defined as
the ratio between the
number of retrieved
relevant
items
to
the total number of
existing relevant items.
Precision is defined as
the ratio between the

~ m 14
w

33

1
3

~cm
5 14 33

document m

Scenario: Expert finding via documents with tag annotations


Authors of relevant documents experts

Frequently documents with additional annotations, here: tags

Goal: Enable tag queries, improve quality of text queries


Problem: Tags often incomplete, partly wrong
Connection of tags and experts via topics

(1) Data: For each document m: text w


~ m , authors ~am , tags ~cm
(2) Goal: Tag query ~c 0 : p(~c 0 | a) = max, word query w
~ 0 : p(~
w 0 | a) = max
(3) Terminal nodes: Authors in input, words and tags in output
Gregor Heinrich

A generic approach to topic models

28 / 35

Application: Expert finding with tag annotations


authors
AB

~am

1
2

AB TH

TH

tags
Retrieval performance.
The most prominent
retrieval measures are
precision and recall.
Recall is defined as
the ratio between the
number of retrieved
relevant
items
to
the total number of
existing relevant items.
Precision is defined as
the ratio between the

~ m 14
w

33

1
3

~cm
5 14 33

document m

Scenario: Expert finding via documents with tag annotations


Authors of relevant documents experts

Frequently documents with additional annotations, here: tags

Goal: Enable tag queries, improve quality of text queries


Problem: Tags often incomplete, partly wrong
Connection of tags and experts via topics

(1) Data: For each document m: text w


~ m , authors ~am , tags ~cm
(2) Goal: Tag query ~c 0 : p(~c 0 | a) = max, word query w
~ 0 : p(~
w 0 | a) = max
(3) Terminal nodes: Authors in input, words and tags in output
Gregor Heinrich

A generic approach to topic models

28 / 35

Application: Expert finding with tag annotations


authors
AB

~am

1
2

AB TH

TH

tags
Retrieval performance.
The most prominent
retrieval measures are
precision and recall.
Recall is defined as
the ratio between the
number of retrieved
relevant
items
to
the total number of
existing relevant items.
Precision is defined as
the ratio between the

~ m 14
w

33

1
3

~cm
5 14 33

document m

Scenario: Expert finding via documents with tag annotations


Authors of relevant documents experts

Frequently documents with additional annotations, here: tags

Goal: Enable tag queries, improve quality of text queries


Problem: Tags often incomplete, partly wrong
Connection of tags and experts via topics

(1) Data: For each document m: text w


~ m , authors ~am , tags ~cm
(2) Goal: Tag query ~c 0 : p(~c 0 | a) = max, word query w
~ 0 : p(~
w 0 | a) = max
(3) Terminal nodes: Authors in input, words and tags in output
Gregor Heinrich

A generic approach to topic models

28 / 35

Model assumptions
authors
AB

~am

1
2

AB TH

TH

tags
Retrieval performance.
The most prominent
retrieval measures are
precision and recall.
Recall is defined as
the ratio between the
number of retrieved
relevant
items
to
the total number of
existing relevant items.
Precision is defined as
the ratio between the

~ m 14
w

33

1
3

~cm
5 14 33

document m

(4) Model assumptions:


(a) Expertise of an author is weighted with the portion of authorship
(b) Semantics of expertise expressed by topics z. Each author has a
single field of expertise (topic distribution).
(c) Semantics of tags expressed by topics y

Gregor Heinrich

A generic approach to topic models

29 / 35

Model assumptions
authors
AB

~am

1
2

AB TH

TH

tags
Retrieval performance.
The most prominent
retrieval measures are
precision and recall.
Recall is defined as
the ratio between the
number of retrieved
relevant
items
to
the total number of
existing relevant items.
Precision is defined as
the ratio between the

~ m 14
w

33

1
3

~cm
5 14 33

document m

(4) Model assumptions:


(a) Expertise of an author is weighted with the portion of authorship
(b) Semantics of expertise expressed by topics z. Each author has a
single field of expertise (topic distribution).
(c) Semantics of tags expressed by topics y

Gregor Heinrich

A generic approach to topic models

29 / 35

Model assumptions
authors
AB

~am

1
2

AB TH

TH

tags
Retrieval performance.
The most prominent
retrieval measures are
precision and recall.
Recall is defined as
the ratio between the
number of retrieved
relevant
items
to
the total number of
existing relevant items.
Precision is defined as
the ratio between the

~ m 14
w

33

1
3

~cm
5 14 33

document m

(4) Model assumptions:


(a) Expertise of an author is weighted with the portion of authorship
(b) Semantics of expertise expressed by topics z. Each author has a
single field of expertise (topic distribution).
(c) Semantics of tags expressed by topics y

Gregor Heinrich

A generic approach to topic models

29 / 35

Model construction

~am

wm,n

authors

word

[M, Nm ]

~cm
tags

~ , ~c) . . .
p(. . . | ~a, w
(5) Model construction: (a) Start with terminal nodes (from step 3)

Gregor Heinrich

A generic approach to topic models

30 / 35

Model construction

~am

document

xm,n

...

wm,n
[M, Nm ]
word

author

~cm
tags

p(x, . . . |) am,x q(x, . . . ) . . .


(b) Authorship ~am given as observed distribution
node sampels author x of a word

Gregor Heinrich

A generic approach to topic models

30 / 35

Model construction

~am

document

xm,n
author

zm,n

wm,n

word topic

word

q(x, z)

[M, Nm ]

~cm
tags

p(x, z, . . . |) am,x q(x, z) . . .


(c) Each author has only a single field of expertise (topic distribution)
q(x, z) associates (word-)topics with sampled authors x (cf. ATM)

Gregor Heinrich

A generic approach to topic models

30 / 35

Model construction

~am

document

xm,n
author

q(x, z)

zm,n

q(z, w)

word topic

wm,n
[M, Nm ]
word

~cm
tags

p(x, z, . . . |) am,x q(x, z) q(z, w) . . .


(d) Topic distribution over terms
connect z and w via q(z, w)

Gregor Heinrich

A generic approach to topic models

30 / 35

Model construction

word topic

~am

document

xm,nj
author

zm,n

q(x,zy)

q(z, w)

[M, Nm ]
word

ym, j

tag topic

wm,n

q(y, c)

cm, j
[M, Jm ]
tag

p(x, z, y |) am,x q(x, z y) q(z, w) q(y, c)


(e) Introduce tag topics ym,j for cm,j as distributions over tags
q(x, z y) overlays values for z and y

Gregor Heinrich

A generic approach to topic models

30 / 35

Model construction

word topic

~am

document

xm,n
author

zm,n

q(x,zy)

q(z, w)

[M, Nm ]
word

ym, j

tag topic

wm,n

q(y, c)

cm, j
[M, Jm ]
tag

p(x, z, y |) am,x q(x, z y) q(z, w) q(y, c)


(e) Introduce tag topics ym,j for cm,j as distributions over tags
q(x, z y) overlays values for z and y

Gregor Heinrich

A generic approach to topic models

30 / 35

Model construction

word topic

~am

document

xm, j
author

zm,n

q(x,zy)

q(z, w)

[M, Nm ]
word

ym, j

tag topic

wm,n

q(y, c)

cm, j
[M, Jm ]
tag

p(x, z, y |) am,x q(x, z y) q(z, w) q(y, c)


(e) Introduce tag topics ym,j for cm,j as distributions over tags
q(x, z y) overlays values for z and y

Gregor Heinrich

A generic approach to topic models

30 / 35

Model construction ordinary approach


Experttagtopic model
(Heinrich 2011)

~am

xm, j

xm,n

~x

x [1, A]

~k

k [1, K]

ym, j

zm,n

cm, j

wm,n

~k

j [1, Jm ] n [1, Nm ]
m [1, M]

Gregor Heinrich

k [1, K]

A generic approach to topic models

30 / 35

-150

0.9

-200

0.8

-250

coherence score

AP@10

Experttagtopic model: Evaluation

0.7
0.6
0.5
0.4
0.3
0.2

word queries

tag queries

-300
-350
-400
-450
-500

ATM

ETT

ETT

ATM

ETT

NIPS Corpus: 2.3 million words, 2037 authors, 165 tags


Retrieval: Average Precision @10:
Term queries: ETT > ATM
Tag queries: Similarly good AP values

Topic coherence (Mimno et al. 2011): ETT > ATM


Semi-supervised learning: Tag queries retrieve items without tags
Gregor Heinrich

A generic approach to topic models

31 / 35

Overview

Introduction
Generic topic models
Inference methods
Application to virtual communities
Conclusions and outlook

Gregor Heinrich

A generic approach to topic models

32 / 35

Conclusions: Resesarch contributions

Networks of Mixed Membership: Generic model and


domain-specific compact representation of topic models
Inference algorithms: Generic Gibbs sampler
Fast sampling methods (serial, parallel, independent)
Implementation in Gibbs meta-sampler

Design process based on typology of NoMM substructures

Application to virtual communities: Experttagtopic model for


expert finding with annotated documents
Contribution to facilitated model-based construction of topic
models, specifically for virtual communities and other multimodal
scenarios

Gregor Heinrich

A generic approach to topic models

33 / 35

Conclusions: Resesarch contributions

Networks of Mixed Membership: Generic model and


domain-specific compact representation of topic models
Inference algorithms: Generic Gibbs sampler
Fast sampling methods (serial, parallel, independent)
Implementation in Gibbs meta-sampler
+ Variational Inference for NoMMs (Heinrich and Goesele 2009)

Design process based on typology of NoMM substructures

Application to virtual communities: Experttagtopic model for


expert finding with annotated documents
Contribution to facilitated model-based construction of topic
models, specifically for virtual communities and other multimodal
scenarios

Gregor Heinrich

A generic approach to topic models

33 / 35

Conclusions: Resesarch contributions


Networks of Mixed Membership: Generic model and
domain-specific compact representation of topic models
Inference algorithms: Generic Gibbs sampler
Fast sampling methods (serial, parallel, independent)
Implementation in Gibbs meta-sampler
+ Variational Inference for NoMMs (Heinrich and Goesele 2009)

Design process based on typology of NoMM substructures


+ AMQ model: Meta-model for virtual communities as formal basis for
scenario modelling (Heinrich 2010)

Application to virtual communities: Experttagtopic model for


expert finding with annotated documents
Contribution to facilitated model-based construction of topic
models, specifically for virtual communities and other multimodal
scenarios
Gregor Heinrich

A generic approach to topic models

33 / 35

Conclusions: Resesarch contributions


Networks of Mixed Membership: Generic model and
domain-specific compact representation of topic models
Inference algorithms: Generic Gibbs sampler
Fast sampling methods (serial, parallel, independent)
Implementation in Gibbs meta-sampler
+ Variational Inference for NoMMs (Heinrich and Goesele 2009)

Design process based on typology of NoMM substructures


+ AMQ model: Meta-model for virtual communities as formal basis for
scenario modelling (Heinrich 2010)

Application to virtual communities: Experttagtopic model for


expert finding with annotated documents

+ Models ETT2 and ETT3 incl. novel NoMM structure; retrieval


approaches (Heinrich 2011b)

Contribution to facilitated model-based construction of topic


models, specifically for virtual communities and other multimodal
scenarios
Gregor Heinrich

A generic approach to topic models

33 / 35

Conclusions: Resesarch contributions


Networks of Mixed Membership: Generic model and
domain-specific compact representation of topic models
Inference algorithms: Generic Gibbs sampler
Fast sampling methods (serial, parallel, independent)
Implementation in Gibbs meta-sampler
+ Variational Inference for NoMMs (Heinrich and Goesele 2009)

Design process based on typology of NoMM substructures


+ AMQ model: Meta-model for virtual communities as formal basis for
scenario modelling (Heinrich 2010)

Application to virtual communities: Experttagtopic model for


expert finding with annotated documents

+ Models ETT2 and ETT3 incl. novel NoMM structure; retrieval


approaches (Heinrich 2011b)
+ Graph-based expert search using ETT models: Integration of
explorative search/browsing and distributions from topic models

Contribution to facilitated model-based construction of topic


models, specifically for virtual communities and other multimodal
scenarios
Gregor Heinrich

A generic approach to topic models

33 / 35

Conclusions: Resesarch contributions


4/15/12 1:47

Multiplicative Updating
Rule for Blind Separation
Derived from the Method
of Scoring,

The Efficiency and the


Robustness of Natural
Gradient Descent
Learning Rule,

Blind Separation of
Filtered Sources Using
State-Space Approach,
cites

cites

cites

A New Learning Algorithm


for Blind Signal Separation

Search for Information


Bearing Components in
Speech,

authors

Rokni_U
Shouval_H

Cichocki_A

authors cites

New Approximations of
Differential Entropy for
Independent Component
Analysis and Projection
Pursuit,
2

authors
authors

cites

8.

2.

Yang_H

9.

[independent component
analysis]

cites
authors
2

Blind Separation of
Delayed and Convolved
Sources,
cites

cites

5.

authors Component Analysis,

Parra_L

Lin_J
cites
authors

A Non-linear Information
Maximisation Algorithm
that Performs Blind
Separation

authors

cites
authors

authors

Factorizing Multivariate
Function Classes,

Edges are the


"Independent
Components" of Natural
Scenes,

cites

authors

Bell_A

Unsupervised
Classification with NonGaussian Mixture Models
Using ICA,

Oja_E

10.
cites

authors
authors

One-unit Learning Rules


for Independent

6.

4.

authors

Sparse Code Shrinkage.'


Denoising by Nonlinear
Maximum Likelihood
Estimation,

authors
authors

7.

3.

Lee_T

authors

Hyvarinen_A

1. [blind source separation]

Extended ICA Removes


Artifacts from
Electroencephalographic
Recordings,

Independent Component
Analysis of
Eiectroencephalographic
Data

authors

authors

cites

authors

Data Visualization and


Feature Selection New
cites
Algorithms for
Nongaussian Data,

Receptive Field Formation


in Natural Scene
Environments Comparison
of Single Cell Learning
Rules,

Algorithms for
Independent Components
Analysis and Higher
Order Statistics,

authors

authors
authors

authors

cites

Semiparametric Approach
to Multichannel Blind
Deconvolution of
Nonminimum Phase
Systems,

authors
authors

Independent Component
Analysis for Identification
of Artifacts in
Magnetoencephalographi
c Recordings,

authors

authors

Symplectic Nonlinear
Component Analysis
cites

Source Separation and


cites Density Estimation by
Faithful Equivariant SOM,

Maximum Likelihood Blind


Source Separation A
citesContext-Sensitive
Generalization of ICA,

4
4

Unmixing Hyperspectral

citesData,

Figure: ETT1: Expert search in community browser


Gregor Heinrich

file:///data/workspace/knowceans-freshmind-lucene3/fmica.svg

A generic approach to topic models

33 / 35

Page 1 o

Conclusions: Resesarch contributions


Networks of Mixed Membership: Generic model and
domain-specific compact representation of topic models
Inference algorithms: Generic Gibbs sampler
Fast sampling methods (serial, parallel, independent)
Implementation in Gibbs meta-sampler
+ Variational Inference for NoMMs (Heinrich and Goesele 2009)

Design process based on typology of NoMM substructures


+ AMQ model: Meta-model for virtual communities as formal basis for
scenario modelling (Heinrich 2010)

Application to virtual communities: Experttagtopic model for


expert finding with annotated documents

+ Models ETT2 and ETT3 incl. novel NoMM structure; retrieval


approaches (Heinrich 2011b)
+ Graph-based expert search using ETT models: Integration of
explorative search/browsing and distributions from topic models

Contribution to facilitated model-based construction of topic


models, specifically for virtual communities and other multimodal
scenarios
Gregor Heinrich

A generic approach to topic models

33 / 35

Outlook
New applications and NoMM structures, e.g., time as variable
Alternative inference methods:
Generic Collapsed Variational Bayes (Teh et al. 2007): Structure
similar to Collapsed Gibbs-Sampler
Non-parametric methods: Learning model dimensions using Dirichlet
or PitmanYor process priors (Teh et al. 2004; Buntine and Hutter
2010), NoMM polymorphism (Heinrich 2011a)

Improved support in design process:


Data-driven design: Search over model structures to obtain best
model for data set
Architecture-specific Gibbs meta-sampler, e.g., massively-parallel or
FPGA, cf. (Heinrich et al. 2011)

Integration with interactive user interfaces: Models can be created


on the fly, e.g., for visual analytics

Gregor Heinrich

A generic approach to topic models

34 / 35

Thank you!

Q+A

Gregor Heinrich

A generic approach to topic models

35 / 35

References I
References
Barnard, K., P. Duygulu, D. Forsyth, N. de Freitas, D. Blei, and M. Jordan (2003, August).
Matching words and pictures.
JMLR Special Issue on Machine Learning Methods for Text and Images 3(6), 11071136.
Bellegarda, J. (2000, August).
Exploiting latent semantic information in statistical language modeling.
Proc. IEEE 88(8), 12791296.
Blei, D., A. Ng, and M. Jordan (2003, January).
Latent Dirichlet allocation.
Journal of Machine Learning Research 3, 9931022.
Buntine, W. and M. Hutter (2010).
A Bayesian review of the Poisson-Dirichlet process.
arXiv:1007.0296v1 [math.ST].
Chang, J., J. Boyd-Graber, S. Gerrish, C. Wang, and D. Blei (2009).
Reading tea leaves: How humans interpret topic models.
In Proc. Neural Information Processing Systems (NIPS).

Gregor Heinrich

A generic approach to topic models

36 / 35

References II
Dietz, L., S. Bickel, and T. Scheffer (2007, June).
Unsupervised prediction of citation influences.
In Proceedings of the 24th International Conference on Machine Learning, Corvallis, Oregon,
USA.
Heinrich, G. (2009).
A generic approach to topic models.
In Proc. European Conf. on Mach. Learn. / Principles and Pract. of Know. Discov. in Databases
(ECML/PKDD), Part 1, pp. 517532.
Heinrich, G. (2010).
Actorsmediaqualities: a generic model for information retrieval in virtual communities.
In Proc. 7th International Workshop on Innovative Internet Community Systems (I2CS 2007), part
of I2CS Jubilee proceedings, Lecture Notes in Informatics, GI.
Heinrich, G. (2011a, March).
Infinite LDA Implementing the HDP with minimum code complexity.
Technical note TN2011/1, arbylon.net.
Heinrich, G. (2011b).
Typology of mixed-membership models: Towards a design method.
In Proc. European Conf. on Mach. Learn. / Principles and Pract. of Know. Discov. in Databases
(ECML/PKDD).

Gregor Heinrich

A generic approach to topic models

37 / 35

References III
Heinrich, G. and M. Goesele (2009).
Variational Bayes for generic topic models.
In Proc. 32nd Annual German Conference on Artificial Intelligence (KI2009).
Heinrich, G., J. Kindermann, C. Lauth, G. Paa, and J. Sanchez-Monzon (2005).
Investigating word correlation at different scopes a latent concept approach.
In Workshop Lexical Ontology Learning at Int. Conf. Mach. Learning.
Heinrich, G., F. Logemann, V. Hahn, C. Jung, G. Figueiredo, and W. Luk (2011).
HW/SW co-design for heterogeneous multi-core platforms: The hArtes toolchain, Chapter Audio
array processing for telepresence, pp. 173207.
Springer.
Li, W., D. Blei, and A. McCallum (2007).
Mixtures of hierarchical topics with pachinko allocation.
In International Conference on Machine Learning.
Li, W. and A. McCallum (2006).
Pachinko allocation: DAG-structured mixture models of topic correlations.
In ICML 06: Proceedings of the 23rd international conference on Machine learning, New York,
NY, USA, pp. 577584. ACM.

Gregor Heinrich

A generic approach to topic models

38 / 35

References IV
Mimno, D., H. M. Wallach, E. Talley, M. Leenders, and A. McCallum (2011, July).
Optimizing semantic coherence in topic models.
In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing,
Edinburgh, UK, pp. 262272.
Newman, D., A. Asuncion, P. Smyth, and M. Welling (2009, August).
Distributed algorithms for topic models.
JMLR 10, 18011828.
Porteous, I., D. Newman, A. Ihler, A. Asuncion, P. Smyth, and M. Welling (2008).
Fast collapsed Gibbs sampling for latent Dirichlet allocation.
In KDD 08: Proceeding of the 14th ACM SIGKDD international conference on Knowledge
discovery and data mining, New York, NY, USA, pp. 569577. ACM.
Rosen-Zvi, M., T. Griffiths, M. Steyvers, and P. Smyth (2004).
The author-topic model for authors and documents.
In Proc. 20th Conference on Uncertainty in Artificial Intelligence (UAI).
Teh, Y., M. Jordan, M. Beal, and D. Blei (2004).
Hierarchical Dirichlet processes.
Technical Report 653, Department of Statistics, University of California at Berkeley.
Teh, Y. W., D. Newman, and M. Welling (2007).
A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation.
In Advances in Neural Information Processing Systems, Volume 19.
Gregor Heinrich

A generic approach to topic models

39 / 35

Appendix

Gregor Heinrich

A generic approach to topic models

40 / 35

Example: Text mining for semantic clusters


Topic label

Dominant terms according to k,t = p(term |topic)

Bundesliga

Bochum Freiburg VfB


FC SC Munchen
Borussia SV VfL Kickers SpVgg Uhr Koln

Eintracht Bayern Hamburger Bayern+Munchen

Polizei / Unfall

Polizei verletzt schwer Auto Unfall Fahrer Angaben schwer+verletzt Menschen Wagen Verletzungen Lawine Mann vier Meter Strae

Tschetschenien

Rebellen russischen Grosny russische Tschetschenien Truppen Kaukasus Moskau


Angaben Interfax tschetschenischen Agentur

Politik / Hessen

FDP Koch Hessen CDU Koalition Gerhardt Wagner Liberalen hessischen Westerwelle Wolfgang Roland+Koch Wolfgang+Gerhardt

Grad Temperaturen Regen Schnee Suden


Norden Sonne Wetter Wolken Deutsch-

Wetter

land zwischen Nacht Wetterdienst Wind


Politik / Kroatien

Parlament Partei Stimmen Mehrheit Wahlen Wahl Opposition Kroatien Prasident


Parlamentswahlen Mesic Abstimmung HDZ

Die Grunen

Grunen
Parteitag Atomausstieg Trittin Grune
Partei Trennung Mandat Ausstieg Amt

Roestel Jahren Muller


Radcke Koalition

Russische Politik Russland Putin Moskau russischen russische Jelzin Wladimir Tschetschenien Rus
slands Wladimir+Putin Kreml Boris Prasidenten
Polizei / Schulen

Polizei Schulen Schuler


Tater
Polizisten Schule Tat Lehrer erschossen Beamten
Mann Polizist Beamte verletzt Waffe

Bigram-LDA: Topics from 18400 dpa news messages, Jan. 2000 (Heinrich et al. 2005)
Gregor Heinrich

A generic approach to topic models

41 / 35

Notation: Bayesian network vs. NoMM levels

ki

xi
ki

~k |

[K]

~k

xi
[T ]

parameters + hyperparameters nodes ( | )


variables ki , xi edges ki , xi

plates (i.i.d. repetitions) i, k indexes i + dimensions k


Gregor Heinrich

A generic approach to topic models

42 / 35

NoMM representation: Variable dependencies

hi

vi

hi

X
hi

hi

~k

`=1

ki1
ki2

vi

hi

vi

~k

~k

~k

~k

~k

~k

~k

`=2

`=3

`=4

`=5

`=6

`=7

`=1
~k

h1i

~k |

h2i

`=2

Gregor Heinrich

h3i

~k |

~k |

~k |

`=3

`=5

`=8

`=8

`=4
h4i
v5i

~k |

`=6

A generic approach to topic models

h6i

~k |

~k |

v8i
v7i

`=7

43 / 35

Collapsed Gibbs sampler


x2
~x
p(~x | V)

~x (0)
x1

Collapsed Gibbs sampler: Stochastic EM / MCMC:


NoMMs: parameters correlate with H marginalise
For each data point, i: draw latent variables, Hi = (yi , zi , . . . ), given all
other data, latent, Hi , and observed, V :
Hi p(Hi | Hi , V, A) .

(1)

Stationary state: full conditional distribution (1) simulates posterior

Faster absolute convergence for NoMMs than, e.g., variational


inference (Heinrich and Goesele 2009)
Gregor Heinrich

A generic approach to topic models

44 / 35

Collapsed Gibbs sampler


x2 ~x continuous, i [1, 2]
l
H discrete, i [1, W]

~x

~x (0)
x1

Collapsed Gibbs sampler: Stochastic EM / MCMC:


NoMMs: parameters correlate with H marginalise
For each data point, i: draw latent variables, Hi = (yi , zi , . . . ), given all
other data, latent, Hi , and observed, V :
Hi p(Hi | Hi , V, A) .

(1)

Stationary state: full conditional distribution (1) simulates posterior

Faster absolute convergence for NoMMs than, e.g., variational


inference (Heinrich and Goesele 2009)
Gregor Heinrich

A generic approach to topic models

44 / 35

Collapsed Gibbs sampler


x2

~x
i=1

p(x1 | x2 )

x1

Collapsed Gibbs sampler: Stochastic EM / MCMC:


NoMMs: parameters correlate with H marginalise
For each data point, i: draw latent variables, Hi = (yi , zi , . . . ), given all
other data, latent, Hi , and observed, V :
Hi p(Hi | Hi , V, A) .

(1)

Stationary state: full conditional distribution (1) simulates posterior

Faster absolute convergence for NoMMs than, e.g., variational


inference (Heinrich and Goesele 2009)
Gregor Heinrich

A generic approach to topic models

44 / 35

Collapsed Gibbs sampler


x2

~x
i=1

x1

Collapsed Gibbs sampler: Stochastic EM / MCMC:


NoMMs: parameters correlate with H marginalise
For each data point, i: draw latent variables, Hi = (yi , zi , . . . ), given all
other data, latent, Hi , and observed, V :
Hi p(Hi | Hi , V, A) .

(1)

Stationary state: full conditional distribution (1) simulates posterior

Faster absolute convergence for NoMMs than, e.g., variational


inference (Heinrich and Goesele 2009)
Gregor Heinrich

A generic approach to topic models

44 / 35

Collapsed Gibbs sampler


x2

p(x2 | x1 )

~x
i=2

x1

Collapsed Gibbs sampler: Stochastic EM / MCMC:


NoMMs: parameters correlate with H marginalise
For each data point, i: draw latent variables, Hi = (yi , zi , . . . ), given all
other data, latent, Hi , and observed, V :
Hi p(Hi | Hi , V, A) .

(1)

Stationary state: full conditional distribution (1) simulates posterior

Faster absolute convergence for NoMMs than, e.g., variational


inference (Heinrich and Goesele 2009)
Gregor Heinrich

A generic approach to topic models

44 / 35

Collapsed Gibbs sampler


x2

~x

i=2

x1

Collapsed Gibbs sampler: Stochastic EM / MCMC:


NoMMs: parameters correlate with H marginalise
For each data point, i: draw latent variables, Hi = (yi , zi , . . . ), given all
other data, latent, Hi , and observed, V :
Hi p(Hi | Hi , V, A) .

(1)

Stationary state: full conditional distribution (1) simulates posterior

Faster absolute convergence for NoMMs than, e.g., variational


inference (Heinrich and Goesele 2009)
Gregor Heinrich

A generic approach to topic models

44 / 35

Collapsed Gibbs sampler


x2

~x

i=1

x1

Collapsed Gibbs sampler: Stochastic EM / MCMC:


NoMMs: parameters correlate with H marginalise
For each data point, i: draw latent variables, Hi = (yi , zi , . . . ), given all
other data, latent, Hi , and observed, V :
Hi p(Hi | Hi , V, A) .

(1)

Stationary state: full conditional distribution (1) simulates posterior

Faster absolute convergence for NoMMs than, e.g., variational


inference (Heinrich and Goesele 2009)
Gregor Heinrich

A generic approach to topic models

44 / 35

Collapsed Gibbs sampler


x2

~x

i=1

x1

Collapsed Gibbs sampler: Stochastic EM / MCMC:


NoMMs: parameters correlate with H marginalise
For each data point, i: draw latent variables, Hi = (yi , zi , . . . ), given all
other data, latent, Hi , and observed, V :
Hi p(Hi | Hi , V, A) .

(1)

Stationary state: full conditional distribution (1) simulates posterior

Faster absolute convergence for NoMMs than, e.g., variational


inference (Heinrich and Goesele 2009)
Gregor Heinrich

A generic approach to topic models

44 / 35

Collapsed Gibbs sampler


x2

~x

i=2

x1

Collapsed Gibbs sampler: Stochastic EM / MCMC:


NoMMs: parameters correlate with H marginalise
For each data point, i: draw latent variables, Hi = (yi , zi , . . . ), given all
other data, latent, Hi , and observed, V :
Hi p(Hi | Hi , V, A) .

(1)

Stationary state: full conditional distribution (1) simulates posterior

Faster absolute convergence for NoMMs than, e.g., variational


inference (Heinrich and Goesele 2009)
Gregor Heinrich

A generic approach to topic models

44 / 35

Collapsed Gibbs sampler


x2

~x
i=1

x1

Collapsed Gibbs sampler: Stochastic EM / MCMC:


NoMMs: parameters correlate with H marginalise
For each data point, i: draw latent variables, Hi = (yi , zi , . . . ), given all
other data, latent, Hi , and observed, V :
Hi p(Hi | Hi , V, A) .

(1)

Stationary state: full conditional distribution (1) simulates posterior

Faster absolute convergence for NoMMs than, e.g., variational


inference (Heinrich and Goesele 2009)
Gregor Heinrich

A generic approach to topic models

44 / 35

Collapsed Gibbs sampler


x2

~x

i=2

x1

Collapsed Gibbs sampler: Stochastic EM / MCMC:


NoMMs: parameters correlate with H marginalise
For each data point, i: draw latent variables, Hi = (yi , zi , . . . ), given all
other data, latent, Hi , and observed, V :
Hi p(Hi | Hi , V, A) .

(1)

Stationary state: full conditional distribution (1) simulates posterior

Faster absolute convergence for NoMMs than, e.g., variational


inference (Heinrich and Goesele 2009)
Gregor Heinrich

A generic approach to topic models

44 / 35

Collapsed Gibbs sampler


x2
~x

i=1

x1

Collapsed Gibbs sampler: Stochastic EM / MCMC:


NoMMs: parameters correlate with H marginalise
For each data point, i: draw latent variables, Hi = (yi , zi , . . . ), given all
other data, latent, Hi , and observed, V :
Hi p(Hi | Hi , V, A) .

(1)

Stationary state: full conditional distribution (1) simulates posterior

Faster absolute convergence for NoMMs than, e.g., variational


inference (Heinrich and Goesele 2009)
Gregor Heinrich

A generic approach to topic models

44 / 35

Collapsed Gibbs sampler


x2
~x

i=2

x1

Collapsed Gibbs sampler: Stochastic EM / MCMC:


NoMMs: parameters correlate with H marginalise
For each data point, i: draw latent variables, Hi = (yi , zi , . . . ), given all
other data, latent, Hi , and observed, V :
Hi p(Hi | Hi , V, A) .

(1)

Stationary state: full conditional distribution (1) simulates posterior

Faster absolute convergence for NoMMs than, e.g., variational


inference (Heinrich and Goesele 2009)
Gregor Heinrich

A generic approach to topic models

44 / 35

Collapsed Gibbs sampler


x2
~x

i=1

x1

Collapsed Gibbs sampler: Stochastic EM / MCMC:


NoMMs: parameters correlate with H marginalise
For each data point, i: draw latent variables, Hi = (yi , zi , . . . ), given all
other data, latent, Hi , and observed, V :
Hi p(Hi | Hi , V, A) .

(1)

Stationary state: full conditional distribution (1) simulates posterior

Faster absolute convergence for NoMMs than, e.g., variational


inference (Heinrich and Goesele 2009)
Gregor Heinrich

A generic approach to topic models

44 / 35

Generic topic models: Generative process


X
xi
..
.
xi

X
xi
I

~ j

A
J

Generative process on level `:

~ k)
xi Mult(xi |

k = fk (parents(xi ), i)

Gregor Heinrich

~ k Dir(
~k |
~ j)

j = fj (known parents(xi ), i) .

A generic approach to topic models

(2)
(3)

45 / 35

Generic topic models: Complete-data likelihood


Likelihood of all hidden and visible data X = {H, V} and parameters :

p(X, |A) =

Y" Y
`L

{z

data items Discrete


t

Y"Y
`L

~x )
p(xi,out |
i,in

~ k , ~nk ,
~)
f (

{z

{z

#[`]

components Dirichlet 4

levels

#[`]

|k

~ k |~)
p(

~nk = (nk,1 , nk,2 , )

(4)

Product dependent on co-occurrences nk,t between input and output


values, xi,in =k and xi,out =t, on each level `
There are variants to component selection xi,in =k
There are mixture node variants, e.g., observed components

Gregor Heinrich

A generic approach to topic models

46 / 35

Generic topic models: Complete-data likelihood


The conjugacy between the multinomial and Dirichlet distributions of
model levels leads to a simple complete-data likelihood:

p(X, | A) =

YY
`

Mult(xi` | ` , ki` )

Y
k

`
~` |
Dir(
k ~j )

Y 1 Y 1 `
Y Y
j

ki ,xi
=
k,xi
B(~j )
`

(5)

(6)

Y Y 1 Y +n 1 `

=
k,xj i k,t

B(~j )

(7)

Y Y B(~nk +

~
)
j

~ k | ~nk +
~
Dir(

)
=
j

B(~j )

(8)

where brackets []` enclose a particular level `.


nk,t is how often k and t co-occur.
Gregor Heinrich

A generic approach to topic models

47 / 35

Inference: Generic full conditionals


Gibbs full conditionals are derived for groups of dependent hidden edges,
Hid H d X and surrounding edges Sid Sd considered observed. All
tokens co-located with a particular observation: Xid = {Hid , Sid }.
Full conditional via chain rule applied to (8) with integrated out:

p(Hid | X\Hid , A) =

H11 H21 H31 H41 H 1 d


H
H12 H22 H32 H42
H2d

Gregor Heinrich

p(Hid , Sid | X\{Hid , Sid }, A)

p(Sid | X\{Hid , Sid }, A)


p(X | A)
p(Xid | X\Xid , A) =
p(X\Xid | A)
#
Y " Y B(~nk +
~ j) `
=
~ j)
B(~nk \Xid +
`
k
#
Y " B(~nk +
~ j) `

~ j)
B(~nk \Xid +
`{H d ,Sd }

A generic approach to topic models

(9)
(10)

(11)

(12)

48 / 35

Inference: q-functions

ni
+
k,t
q(k, t) =
=
P
i
~ j)
B(~nk \xid +
t nk,t +
~ j)
B(~nk +

|xid |=1

d + + (xd xd )
d +
nk,t \xi,2
nk,t \xi,1
i,1
i,2

= P
P
d +
d ++1
n
\x
n
\x
t k,t i,1
t k,t i,2

|xid |=2

...

Gregor Heinrich

A generic approach to topic models

49 / 35


q-functions: Polya
urn and sampling weights

~ k}
E{

Figure: Polya
urn: sampling with over-replacement.
ti
B(~nk + ) |t|=1 nk,t +
= ti
= smoothed ratio of occurrences
q(k, t) ,
B(~nki + )
nk + T
t={u,v}

i
nu
+
k,u

i
+ T
nu
k

i
nv
+ + (u v)
k,v

nkvi + T + 1

, q(k, u v)

...

Gregor Heinrich

A generic approach to topic models

50 / 35


q-functions: Polya
urn and sampling weights

~ k}
E{

Figure: Polya
urn: sampling with over-replacement.
ti
B(~nk + ) |t|=1 nk,t +
= smoothed ratio of occurrences
q(k, t) ,
= ti
B(~nki + )
nk + T
t={u,v}

i
nu
+
k,u

i
nu
+ T
k

i
nv
+ + (u v)
k,v

nkvi + T + 1

, q(k, u v)

...

Gregor Heinrich

A generic approach to topic models

50 / 35


q-functions: Polya
urn and sampling weights

~ k}
E{

Figure: Polya
urn: sampling with over-replacement.
ti
B(~nk + ) |t|=1 nk,t +
= smoothed ratio of occurrences
q(k, t) ,
= ti
B(~nki + )
nk + T
t={u,v}

i
+
nu
k,u

i
nu
+ T
k

i
nv
+ + (u v)
k,v

nkvi + T + 1

, q(k, u v)

...

Gregor Heinrich

A generic approach to topic models

50 / 35


q-functions: Polya
urn and sampling weights

~ k}
E{

Figure: Polya
urn: sampling with over-replacement.
ti
B(~nk + ) |t|=1 nk,t +
= ti
= smoothed ratio of occurrences
q(k, t) ,
B(~nki + )
nk + T
t={u,v}

i
nu
+
k,u

i
+ T
nu
k

i
nv
+ + (u v)
k,v

nkvi + T + 1

, q(k, u v)

...

Gregor Heinrich

A generic approach to topic models

50 / 35


q-functions: Polya
urn and sampling weights

~ k}
E{

Figure: Polya
urn and discrete parameters.
ti
B(~nk + ) |t|=1 nk,t +
= ti
= smoothed ratio of occurrences
q(k, t) ,
B(~nki + )
nk + T
t={u,v}

i
nu
+
k,u

i
+ T
nu
k

i
nv
+ + (u v)
k,v

nkvi + T + 1

, q(k, u v)

...

Gregor Heinrich

A generic approach to topic models

51 / 35


q-functions: Polya
urn and sampling weights

~ k}
E{

Figure: Polya
urn and discrete parameters.
ti
B(~nk + ) |t|=1 nk,t +
= ti
= smoothed ratio of occurrences
q(k, t) ,
B(~nki + )
nk + T
| {z }
i
i
nu
+ nv
+ + (u v)
t={u,v}
k,v
k,u
 8 6 5 +


, q(k, u v)
=
i
B
+ T
nu
nkvi + T + 1
k


..
B 8 6 4 .+
ti

ti =

Gregor Heinrich

A generic approach to topic models

51 / 35


q-functions: Polya
urn and sampling weights

~ k}
E{

Figure: Polya
urn and discrete parameters.
ti
B(~nk + ) |t|=1 nk,t +
= ti
= smoothed ratio of occurrences
q(k, t) ,
B(~nki + )
nk + T
| {z
ui }
i
nv
+ + (u v)
t={u,v} nk,u +
k,v

, q(k, u v)
= 8u+
nk i + T
nkvi + T + 1
. . . 8 6 4 +
ti

Gregor Heinrich

A generic approach to topic models

51 / 35


q-functions: Polya
urn and sampling weights

~ k}
E{

Figure: Polya
urn and discrete parameters.
ti
B(~nk + ) |t|=1 nk,t +
= ti
= smoothed ratio of occurrences
q(k, t) ,
B(~nki + )
nk + T
| {z }
i
i
nu
+ nv
+ + (u v)
t={u,v}
k,v
k,u
 8 6 5 +


, q(k, u v)
=
i
B
+ T
nu
nkvi + T + 1
k


..
B 7 6 4 .+
ui

vi

ti = { u i , v i }
Gregor Heinrich

A generic approach to topic models

51 / 35


q-functions: Polya
urn and sampling weights

~ k}
E{

Figure: Polya
urn and discrete parameters.
ti
B(~nk + ) |t|=1 nk,t +
= ti
= smoothed ratio of occurrences
q(k, t) ,
B(~nki + )
nk + T
i
nu
+
k,u

t={u,v}

...

i
+ T
nu
k

ui

Gregor Heinrich

{z

ui

i
nv
+ + (u v)
k,v

nkvi + T + 1

A generic approach to topic models

, q(k, u v)

51 / 35


q-functions: Polya
urn and sampling weights

~ k}
E{

Figure: Polya
urn and discrete parameters.
ti
B(~nk + ) |t|=1 nk,t +
= ti
= smoothed ratio of occurrences
q(k, t) ,
B(~nki + )
nk + T
i
nu
+
k,u

t={u,v}

...

i
+ T
nu
k

ui

Gregor Heinrich

{z

ui

i
nv
+ + (u v)
k,v

}|

nkvi + T + 1
{z
vi

A generic approach to topic models

u=v
1

, q(k, u v)
q(k,

)
51 / 35


q-functions: Polya
urn and sampling weights

~ k}
E{

Figure: Polya
urn and discrete parameters.
ti
B(~nk + ) |t|=1 nk,t +
= ti
= smoothed ratio of occurrences
q(k, t) ,
B(~nki + )
nk + T
i
nu
+
k,u

t={u,v}

...

i
+ T
nu
k

ui

Gregor Heinrich

{z

ui

i
nv
+ + (u v)
k,v

}|

nkvi + T + 1
vi

{z
vi

A generic approach to topic models

u=v
1

, q(k, u v)
q(k,

)
51 / 35


q-functions: Polya
urn and sampling weights

~ k}
E{

Figure: Polya
urn and discrete parameters.
ti
B(~nk + ) |t|=1 nk,t +
= ti
= smoothed ratio of occurrences
q(k, t) ,
B(~nki + )
nk + T
i
nu
+
k,u

t={u,v}

...

i
+ T
nu
k

ui

Gregor Heinrich

{z

ui

i
nv
+ + (u v)
k,v

}|

nkvi + T + 1
{z
vi

A generic approach to topic models

u=v
1

, q(k, u v)
q(k,

)
51 / 35

NoMM substructure library: Gibbs weights and likelihood


9.3. SUB-STRUCTURE LIBRARY

ID.
Name
ai

N2.

ai

NonDirichlet
prior
N4.
Nondiscrete
output

E1

a |
j

E1S(zi ): n i

zi

k |

N1A: j 1
N1B: j = f (ai , i)

=2

zi

ca

z |

=1

ai

a |

zi

z |

zi

z |

ai

Aggregation

a |

=1

zm

Autonomous
edges

E3.
Coupled
edges

ai j

=1
a |

m | ,

vm

=2
x |

yj

y |

=3
=2
z |

zi
a |

z |

=1

bi
cj

bi
ci

=3

C2.
Combined
indices

C3.
Interleaved
indices

ai
bj

ai
bj

ai

C4.
Switch

C5.
Node
coupling

Gregor

bi

ai
bj

=1
a |

xi

k |

ci j

C2A: k = (i, xi )
C2B: k = (xi , y j )
C2C: k = g(i, j, xi , y j )

b |

=2
=1

a |

xi

b |

=2
=1
a |

yj

b |

=2
=1
a |

=3

yj

=3
z |

=3
z |

xi

s=1

z |

=4
=3
x |

yj
y |
b |

C5A: i j
=2 C5B: i = j =4

a
Alternative distributions on the simplex: CTM [Blei & Laerty 2007]:
exp , N (, ); TLM [Wallach 2008]: hierarchy of Dirichlet priors
w(z|, ) = q(a, z)p(vi | z ) ; M-step: estimate z

p(v|a) = z a,z p(v | z )


2

w(z|zm , vm , ) = q(a, z) q(z, w)N (vm |


v m , ) ;
M-step: estimate v , 2 |z, v (for linear regression, N5B)

prediction: vm =
v m

w(x, y|) = q(a, xy)q(x, b)q(y, c) E2A: q(ai j , xi y j ) = q(ai , xi y j )q(a j , xi y j )

p(b, c|a) = x a,x x,b y a,y y,c

Common mixture of causes: Multimodal LDA [Ramage et al. 2009]


w(z|) = q(a, z)q(z, b)q(z, c)

p(b, c|a) = z a,z z,b z,c

Common cause for observations: Hidden relational model (HRM) [Xu et al. 2006],
Link-LDA [Erosheva et al. 2004]
w(x, y|) = q(a, x)q(b, y)q(k, c)

p(c|a, b) = x [a,x y b,y k,c ] , k = g(x, y, i, j)

Dierent dependent causes, relation: hPAM [Li et al. 2007a], HRM [Xu et al.
2006], Multi-LDA [Porteous et al. 2008a]
C3B: w(x, y|) = q(a, x)q(b, y)[q(x, c c)](xy) [q(x, c c )q(y, c c)]1(xy)

C3A: p(c|a) = x a,x x,c , p(c|b) sim., C3B: p(c|a, b) =


x a,x x,c y b,y y,c

Dierent causes, same eect: proposed here


ci

s=0
?

Label distribution: ATM [Rosen-Zvi et al. 2004]


a , ) = p(zi |ai ,
a )q(z, b) ;
w(z|
a [Blei & Laerty 2007]
M-step: estimate

p(b|a) = z a,z z,b

C3A: w(xi , y j |) = q(ai , xi )q(b j , y j )q(xi , ci c j )q(y j , c i c j )


ci j

C3A: i j
C3B: i = j

zi

si

E1S(zi ): q(a, z)q(z, b1 b2 . . . bNi )

Regression/supervised learning: Supervised LDA [Blei & McAulie 2007], Relational topic model [Chang & Blei 2009]

=3

E2A: i j
E2B: i = j

ai

bi

regression

xi

w(z|) = q(a, z)q(z, b)

p(b|a) = z a,z z,b

Mixture/admixture: LDA [Blei et al. 2003b], PAM [Li & McCallum 2006]; LDCC
[Shafiei & Milios 2006] (E1S)
ca , ) = ca,z q(z, b)
w(z|
c
p(b|a) = z a,z
z,b

Non-multinomial observ.: Corr-LDA [Barnard et al. 2003], GMM [McLachlan &


Peel 2000]: p(v|) = N (x | , )

=2

=2
z |

m jm (z zj )

E2.

vi

z p(z | )

=1
zi

N5+E4.

bi

=2

=1

a |

bi

=2

a p(

a | )

ai

bi,n

C1A: k = i
C1B: k = zi

=1

Observed
parameters
N3.

Gibbs sampler weight w, Likelihood p for single token i


Modelled aspect, example models

Structure diagram

N1,E1,C1.
DirMult
nodes,
unbranched

171

di

ci
dj

w(z, s|) = q(a, z)[q(b, 1)q(z, c)](s1) [q(b, 2)q(z, d)](s2)

p(c, d|a, b) = z a,z [b,0 z,c + b,1 z,d ]

Select complex submodels: Multi-grain LDA [Titov & McDonald 2008], Entitytopic models [Newman et al. 2006a]
C5A: w(xi , y j |) = q(ai , xi )q(b j , y j )q(xi , ci d j )q(y j , c i d j )

C5B: w(x, y|) = q(a, x)q(b, y)[q(x, c d)](xy) [q(x, c d)q(y,


c d)]1(xy)

p(c, d|a, b) = x a,x x,c y b,y y,d

Correlation of submodels, relations: Simple relational component model


[Sinkkonen et al. 2008], Relational topic model [Chang & Blei 2009]

Figure 9.2: NoMM sub-structure properties. Notation (also see (9.3)): ab adds counts n(a) +n(b) ;
Heinrich
A generic approach to topic models
a b prevents i for a in (9.1); c combines sequences {c , c , c }, as applicable.

52 / 35

Gibbs meta-sampler: Java data structure


MixItem // interface: node or edge
MixNet // represents a NoMM

// global id (= unique variable name)


name : String
// 2 edges: multiple inputs C2
// 2 nodes: merged inputs C3
parents : List<MixItem>
// 2 edges: indep. branches E2
// 2 nodes: coupled branches E3
children : List<MixItem>
// type of item: seq., topic, qfixed...
datatype : enum
// link type: C and E classifications
linktype : enum

// nodes of the NoMM


nodes : List<MixNode>
// edges of the NoMM
edges : List<MixEdge>
// sequences of the NoMM
sequences : List<MixSequence>
// constants for the NoMM
constants : Map<String, String>
<<collects>>

<<collects>>

<<implements>>

<<collects>>
<<collects>>

<<implements>>

MixNode // NoMM node


// parameters kt
theta : Variable
P
// counts nkt , t nkt
ntheta, nthetasum : Variable
// hyperparameter .
alpha : Variable
<<collects>>

MixEdge // NoMM edge


// variable xmn
x : Variable
// range of x, T
T : Expression
// E2 edge siblings, for () expansion
siblingsE2 : List<MixEdge>
// flag: parent node emits subset of range
sparse : boolean

MixSequence // NoMM sequence


// subsequences, null for leaf
subseqs : List<MixSequence>
// supersequence, null for root
superseq : MixSequence
// sequence index variables, m, n, s
m, n, s : Variable
// sequence index ranges: M, Nm , W
M, Mq, Nm, Nmq, W, Wq : Expression
// flag: fixed topics for query
qfixed : boolean

<<collects>>

Gregor Heinrich

A generic approach to topic models

53 / 35

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48

/** run the main Gibbs sampling kernel */


public void run(int niter) {
// iteration loop
for (int iter = 0; iter < niter; iter++) {
// major loop, sequence [m][n]
for (int m = 0; m < M; m++) {
// component selectors
int mxsel = -1;
int mxjsel = -1;
int ksel = -1;
// minor loop, sequence [m][n]
for (int n = 0; n < w[m].length; n++) {
double psum;
double u;
// decrement counts
nmx[m][x[m][n]]--;
mxsel = X * m + x[m][n];
nmxy[mxsel][y[m][n]]--;
nmxysum[mxsel]--;
if (x[m][n] == 0)
ksel = 0;
else if (y[m][n] == 0)
ksel = 1 + x[m][n];
else
ksel = 1 + X + y[m][n];
nkw[ksel][w[m][n]]--;
nkwsum[ksel]--;
// compute weights
/* p(x_{m,n} \eq x, y_{m,n} \eq y ... (LaTeX omitted) */
psum = 0;
int hx = -1;
int hy = -1;
// hidden edge
for (hx = 0; hx < X; hx++) {
// hidden edge
for (hy = 0; hy < Y; hy++) {
mxsel = X * m + hx;
mxjsel = hx;
if (hx == 0)
ksel = 0;
else if (hy == 0)
ksel = 1 + hx;
else
ksel = 1 + X + hy;

Gregor Heinrich

49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96

pp[hx][hy] = (nmx[m][hx] + alpha[hx])


* (nmxy[mxsel][hy] + alphax[mxjsel][hy])
/ (nmxysum[mxsel] + alphaxsum[mxjsel])
* (nkw[ksel][w[m][n]] + beta)
/ (nkwsum[ksel] + betasum);
psum += pp[hx][hy];
} // for h
} // for h
// sample topics
u = rand.nextDouble() * psum;
psum = 0;
SAMPLED:
// each edge value
for (hx = 0; hx < X; hx++) {
// each edge value
for (hy = 0; hy < Y; hy++) {
psum += pp[hx][hy];
if (u <= psum)
break SAMPLED;
} // h
} // h
// assign topics
x[m][n] = hx;
y[m][n] = hy;
// increment counts
nmx[m][x[m][n]]++;
mxsel = X * m + x[m][n];
nmxy[mxsel][y[m][n]]++;
nmxysum[mxsel]++;
if (x[m][n] == 0)
ksel = 0;
else if (y[m][n] == 0)
ksel = 1 + x[m][n];
else
ksel = 1 + X + y[m][n];
nkw[ksel][w[m][n]]++;
nkwsum[ksel]++;
} // for n
} // for m
// estimate hyperparameters
estAlpha();
} // for iter
} // run()

A generic approach to topic models

54 / 35

Fast serial sampling: Using a normalisation bound


l

slk

Zi
}|

main mass s11


adjustment mass s01

u U[0, 1]

{z
Zi,0
2

uZi,0

{
}

uZi,1
uZi,2

0,1

0..2 uZi,3
3

4
...

{z

Z4known

{z
Zi,4

uZi,4 0..3
4
X }|

{z }

Z4unknown

Idea: Exploit saliency of few elements compute only largest


(=most likely) weights
Approximate normalisation via vector norms (Porteous et al. 2008)
Generalisation to multiple dependent variables: more expensive
higher-order vector norms higher sparsity of sampling space
Gregor Heinrich

A generic approach to topic models

55 / 35

Fast parallel sampling: Synchronisation methods

11

Multi-processor parallelisation
using shared memory (OpenMP)

21

P1
12

22

13

23

P2
14

24

...

...

1M

2M

global parameters
 pmfs over vocabulary

...

~%m |

xmn

sync

~ mx |

ymn

Gregor Heinrich

Main challenge: synchronisation


and communication of global data
Synchronisation methods (LDA +
generic NoMMs):
a. Nave synchronisation locks
b. Query read-only + MAP update
step for (split-state)
c. Local copies + reduction step
(=AD-LDA (Newman et al. 2009))

PP

document-specific parameters
 pmfs over (sub-)topics

~y |

wmn

A generic approach to topic models

56 / 35

Fast sampling: Serial parallel


LDA, NIPS

32.5
sa
pa
spa

30

27.5

speedup

10

7.5

5
P
2.5
1
0

10

20

50

100

200

500

Figure: Speed-up for fast sampling methods: LDA.


Gregor Heinrich

A generic approach to topic models

57 / 35

Fast sampling: Serial parallel independent


10

25

15

150

ia
pa
pb
pc

9
8

ipa
ipas
ipcs

10
7

10,10

20,20

20,40

40,40

20,100

speedup vs. ia

8
6

ipa
ipb
ipc

6
5
4
3

10,10

100

20,100 speedup vs. a

5
speedup vs. ia

speedup vs. a

20

20,20

20,40

40,40

20,100

20,20

40,40

50

20,100

K,L

(a) Parallel, independent

(b) Parallel, serial, independent

Figure: Speed-up for combined fast samplers: PAM4 (2 dependent variables).

Gregor Heinrich

A generic approach to topic models

58 / 35

Fast sampling: The impact of assumed independence


3000

pa
ipa
10,10
20,20
20,40
20,100

perplexity

2500

2000

1500 0
10

convergence:

10

10
iteration

dependent

10

indep.

Figure: Perplexity over iterations. Example model: PAM4.


Gregor Heinrich

A generic approach to topic models

59 / 35

ETT1 model: Derivation using NoMM structure


m

~am

[M]

xm,nj
[Am ]

zm,n

wm,n

q(z, w)

{M, Nm }

[V]

[K]

q(x,zy)

ym, j

[K]

cm, j

q(y, c)

{M, Jm }

[C]

Lining up q-functions:

p(x, z, y | ) am,x q(x, z y) q(z, w) q(y, c)

(13)

Transforming to standard Gibbs full conditionals:

p(xm,n =x, zm,n =z | ) am,x


p(xm,j =x, ym,j =y | ) am,x

{x,z}m,n

nx,z

{x,z}m,n
{x,y}m,j

{x,y}m,j

ny

nz,wm,n +
zm,n

+ K nz

nx

nx,y

+ V

(14)

ny,cm,j +

ym,j
+ K ny + C

(15)

Retrieval uber
Anfrage-Likelihood-Modell:

p(~
w | a) =

w~
w

z a,z z,w

Gregor Heinrich

p(~c | a) =

A generic approach to topic models

c~c

y a,y y,c .

(16)
60 / 35

ETT1 model: Derivation using NoMM structure


m

~am

[M]

xm,nj
[Am ]

zm,n
~x |

[K]

wm,n

~z |

ym, j

[K]

Lining up q-functions:

{M, Nm }

[V]

cm, j

~y |

{M, Jm }

[C]

p(x, z, y | ) am,x q(x, z y) q(z, w) q(y, c)

(13)

Transforming to standard Gibbs full conditionals:

p(xm,n =x, zm,n =z | ) am,x


p(xm,j =x, ym,j =y | ) am,x

{x,z}m,n

nx,z

{x,z}m,n
{x,y}m,j

{x,y}m,j

ny

nz,wm,n +
zm,n

+ K nz

nx

nx,y

+ V

(14)

ny,cm,j +

ym,j
+ K ny + C

(15)

Retrieval uber
Anfrage-Likelihood-Modell:

p(~
w | a) =

w~
w

z a,z z,w

Gregor Heinrich

p(~c | a) =

A generic approach to topic models

c~c

y a,y y,c .

(16)
60 / 35

Figure E.1: ETT1 model: Bayesian network.


x
c
cm,n

xm,n

xm,n

ETT1 model: Derivation using ordinary method (excerpt)


x [1, A]

Appendix E

p(
w, cc, a, x,z|, ,w) =

Details on application models


xm, j

m,n

m,n


Nm
M
k

m=1

n [1, Nm ]
m [1, M]

xm,n

x [1, A]

k [1, K]

~am

c [1, C]

Next, we integrate out the model parameters, introducing our knowledge on the types of distribuzm,nconjugacy:

z1,2

tions and their


m,n

(a) ETT2

n=1

x ) am,x
p(wm,n |
zm,n )p(zm,n |
w
m,n
m,n

j=1

dp(|) dp(|) dp( |)


M

m,n

Jkm [1, K]

x ) am,x
y )p(ym, j |

p(cm, j |
m, j
m, j
m, j
Nm

n [1, Nm ]

k [1, K]

m [1, M]

(b) ETT3

(E.3)

ETT models:
Bayesian networks.
Figure E.2: Iterated
=

~x

m=1 n=1

p(wm,n |
zm,n )


Jm
M

p(c

k=1

p(
k |) dk

|) d

p(

k
k
w m,, xj ym,, jetc.,
to count
statistics,
n x,k , nk,c , etc. The
m=1 j=1 m,n m,n
k=1
his appendix sketches the traditional derivation of model inference and likelihood equations in Note the change in indexing from tokens,
superscripts in counts distinguish
the
branches
ofm the model: w = words,
c = tags. To solve the
N
Jm
M
hapter 10. Comparing this tothe NoMM-based

ym, j derivations
zm,n developed in
the thesis illustrates integrals in (E.5), either the Dirichlet

integral
of
the
first
type
can
be
used
[Abramowitz
& Stegun

p(|)
p(z
|

)
a
p(y
|

m,n xm,n m,xm,n


m, j xm, j ) am,xm, j dm
he usefulness of that method to avoid tedious calculations.
m=1
n=1
j=1
1964], or one can observe that the Dirichlet
distributions
just re-parametrise
(due to conjugacy
with the multinomial, cf. Appendix B). Because the actual distributions integrate to one(E.4)
and

K
K
V determined.
C
E.1 Bayesian networks
models
vanish, solely their new normalisation
must
be
1
1 nk,c +1
cm, j
nk,t +1
wm,n
~ of application

x [1, A]

~k

k,t

d
k

k,c

k
d

C () c=1
V () t=1
The Gibbs full conditional cank=1
be determined
from (E.6) byk=1
applying
the chain rule. Because
or comparison with the NoMM
representations
10, the Bayesian
networks of the three
j [1,in
JmChaper
]
n [1, Nm ]
k [1, K]
k [1, K]

A
K
M fashion,
A
(y) alternating

(y) two variables


words
tagsn(z)
in+nan
the
+1
1and
n(z)
ariants of the experttagtopic models are presented in Figs. E.1 and E.2. The dashed plate in the Gibbs sampler will scan through
a,k
a,k
m,a +nm,a
a

d
(E.5)
m,a root of the model
a,k
z
and
y
are
sampled
independently.
However,
the
author
association
atathe
m [1,explained
M]
() k=1
m,n
m, j
ig. E.2(b) refers to the duplicate draw of the C3B structure
in Sec. 9.3.
a=1 K
m=1 a=1
(y)
(z)
(y)n), n x,k
must be sampled jointly for both.
Using
i
=
(m,
+
n
and
the
sum
notation
(z)
K
A = n x,k
M
(z)
(y)

(z)
(y)
(nk + ) (nk + )
(na + na x,k+ )
nm,a
+nm,a
V
(z)
=

am,a
. (E.6)
nk = t=1 nk,t , etc., the full conditional
for word
becomes: ()
V ()
tokens
C ()
K

E.2

(a)derivation:
Experttagtopic
Example
Experttagtopicmodel
model 1 (ETT)

p(
w, c, a, x,z, , , |, , ) = p(
w|z, )p(|) p(c|y, )p( |)

p(y |x, )p(z |x, )p(|) p(x |a)


Nm
M

x ) am,x
=
p(wm,n |
zm,n )p(zm,n |
m,n
m,n
m=1

j=1

x ) am,x
y )p(ym, j |
p(cm, j |
m, j
m, j
m, j

p(|) p(|) p( |) .

(E.1)
246

n=1

Jm

a=1

k=1

m=1

i , a, c)
p(zi =k, xi =x|wi =t,zi , y, xi , w
p(
w,z, y, x)
p(
w|z, y)
p(z |x)
p(x)
=
=

p(
w,zi , y, xi ) p(
wi |zi , y)p(wi ) p(zi |xi ) p(xi )

(Heinrich
2011b)
he Bayesian network of the ETT1 model
is shown in
Fig. E.1. The details of the derivation
rategy have been explained for instance in [Heinrich 2009b]; it is similar to the strategies used
n literature.1 We start with the complete-data likelihood of the corpus:

(nk(z) + )

(n x + )
am,x
(z)
(nk,i
+ ) (n x,i + )

(nk,t + ) (nk,i + V)
(nk,t,i + ) (nk + V)

(E.8)

(z)
(n x,k
+ ) (n(z)
+ K)
E. x,i
APPLICATION
APPENDIX
am,x MODELS
(E.9)
(z)
(n(z)
x,k,i + ) (n x + K)

(z)
nk,t,i + n x,k,i +
=
(z)
am,x
For the tag branch, the derivation
is analogous,
now re-defining
i = (m, j):
nk,i + V
n + K

(E.7)

(E.10)

x,i

(E.2)

(y)
= q(k, t) q(x, k) am,x .
nk,c,i + n x,k,i +
, a, ci )
p(yi =k, xi =x|ci =c,zi , yi , xi , w

a
nk,i + V n(y) + K m,x
x,i

= q(k, c) q(x, k) am,x .

(E.11)
(E.12)
(E.13)

Alternative derivation strategies for topic model Gibbs samplers have been published in [Griths 2002] working The dierence of (E.11) and (E.13) to (10.3) is a result of the definition of n x,k as a summed count
Heinrich
A joint
generic
approach
61 / 35
and the to
facttopic
that bothmodels
branches are disjointly sampled.
) p(wi |
a p(zi |zi , w
wi , z)p(zi |zi ) and [McCallumGregor
et al. 2007] who
use the chain rule via the
token likelihood,
1

ETT1 evaluation: Truncated Average Precision


1.
Retrieval performance.
The most prominent
retrieval measures are
precision and recall.
Recall is defined as
the ratio between the
number of retrieved
relevant
items
to
the total number of
existing relevant items.
Precision is defined as
the ratio between the

Retrieval performance.
The most prominent
retrieval measures are
precision and recall.
Recall is defined as
the ratio between the
number of retrieved
relevant
items
to
the total number of
existing relevant items.
Precision is defined as
the ratio between the

3.
Retrieval performance.
The most prominent
retrieval measures are
precision and recall.
Recall is defined as
the ratio between the
number of retrieved
relevant
items
to
the total number of
existing relevant items.
Precision is defined as
the ratio between the

1/2

AP@5 =

4.
Retrieval performance.
The most prominent
retrieval measures are
precision and recall.
Recall is defined as
the ratio between the
number of retrieved
relevant
items
to
the total number of
existing relevant items.
Precision is defined as
the ratio between the

5.
Retrieval performance.
The most prominent
retrieval measures are
precision and recall.
Recall is defined as
the ratio between the
number of retrieved
relevant
items
to
the total number of
existing relevant items.
Precision is defined as
the ratio between the

+2/4 +3/5
3

Retrieval performance.
The most prominent
retrieval measures are
precision and recall.
Recall is defined as
the ratio between the
number of retrieved
relevant
items
to
the total number of
existing relevant items.
Precision is defined as
the ratio between the

AP@5 =

2.

Retrieval performance.
The most prominent
retrieval measures are
precision and recall.
Recall is defined as
the ratio between the
number of retrieved
relevant
items
to
the total number of
existing relevant items.
Precision is defined as
the ratio between the

Retrieval performance.
The most prominent
retrieval measures are
precision and recall.
Recall is defined as
the ratio between the
number of retrieved
relevant
items
to
the total number of
existing relevant items.
Precision is defined as
the ratio between the

1/1 +2/2

Retrieval performance.
The most prominent
retrieval measures are
precision and recall.
Recall is defined as
the ratio between the
number of retrieved
relevant
items
to
the total number of
existing relevant items.
Precision is defined as
the ratio between the

= 0.533

Retrieval performance.
The most prominent
retrieval measures are
precision and recall.
Recall is defined as
the ratio between the
number of retrieved
relevant
items
to
the total number of
existing relevant items.
Precision is defined as
the ratio between the

+3/5
3

= 0.867

Figure: Average Precision at 5 (assuming 3 relevant documents in corpus)

Gregor Heinrich

A generic approach to topic models

62 / 35

ETT1 results: Term Retrieval


query: svm support vector machine kernel classifier hyperplane regression

1. Scholkopf B, lik = 76.272, tokens = 2830, docs = 10: judged relevant


From Regularization Operators to Support Vector Kernels (9); Improving
the Accuracy and Speed of Support Vector Machines (9); Shrinking the
Tube: A New Support Vector Regression Algorithm (11) . . .
2. Smola A, lik = 77.509, tokens = 2760, docs = 11: judged relevant

Support Vector Regression Machines (9); Prior Knowledge in Support

Vector Kernels (10); Support Vector Method for Novelty Detection (12)
The Entropy Regularization Information Criterion (12, support vector
machines, regularization) . . .

3. Vapnik V, lik = 77.525, tokens = 2332, docs = 10: judged relevant

Support Vector Regression Machines (9); Prior Knowledge in Support


Vector Kernels (10); Prior Knowledge in Support Vector Kernels (10);
Support Vector Method for Multivariate Density Estimation (12); . . .

4. Crisp D, lik = 81.401, tokens = 699, docs = 2: judged relevant

A Geometric Interpretation of t/-SVM Classifiers (12); Uniqueness of the


SVM Solution (12)

5. Burges C, lik = 81.630, tokens = 1309, docs = 5: judged relevant

Improving the Accuracy and Speed of Support Vector Machines (9); A


Geometric Interpretation of t/-SVM Classifiers (12); Uniqueness of the
SVM Solution (12) . . .

6. Laskov P, lik = 84.275, tokens = 738, docs = 1: judged relevant


An Improved Decomposition Algorithm for Regression Support Vector
Machines (12)
7. Steinhage V, lik = 84.600, tokens = 438, docs = 1: judged irrelevant

Nonlinear Discriminant Analysis Using Kernel Functions (12)

8. Bennett K, lik = 86.754, tokens = 384, docs = 1: judged relevant

Semi-Supervised Support Vector Machines (11)

9. Herbrich R, lik = 86.754, tokens = 462, docs = 2: judged irrelevant

Classification on Pairwise Proximity Data (11); Bayesian Transduction


(12, classification)

10. Chapelle O, lik = 87.431, tokens = 494, docs = 2: judged relevant


Model Selection for Support Vector Machines (12); Transductive Inference for Estimating Values of Functions (12, regression, classification)

Gregor Heinrich

A generic approach to topic models

63 / 35

ETT1 results: Tag retrieval


query: face recognition
1. Movellan J, lik = 4.680, tokens = 3153, docs = 8: judged relevant
Dyn. Features for Visual Speechreading: A System Comparison (9, no
tags); Image Representation for Facial Expression Coding (12, tags: face
recognition, image, ICA); Visual Speech Recognition with Stochastic
Networks (7, tags: HMM, speech recognition) . . .
2. Bartlett M, lik = 4.951, tokens = 812, docs = 3: judged relevant

Viewpoint Invariant Face Recognition using ICA and Attractor Networks

(9, tags: face recognition, invariances, pattern recognition); Image Representation for Facial Expression Coding (12, tags: face recognition, image, ICA) . . .

3. Dailey M, lik = 4.952, tokens = 903, docs = 2: judged relevant

Task and Spatial Frequency Effects on Face Specialization (10, tags: face

recognition); Facial Memory Is Kernel Density Estimation (Almost) (11,


no tags)

4. Padgett C, lik = 4.974, tokens = 499, docs = 1: judged relevant

Representing Face Images for Emotion Classification (9, tags: classification, face recognition, image)

5. Hager J, lik = 5.023, tokens = 377, docs = 2: judged relevant

Classifying Facial Action (8, tags: classification); Image Representation


for Facial Expression Coding (12, tags: face recognition, image, ICA)

6. Ekman P, lik = 5.027, tokens = 374, docs = 2: judged relevant


Image Representation for Facial Expression Coding (12, tags: face
recognition, image, ICA); Classifying Facial Action (8, tags: classification)
7. Phillips P, lik = 5.127, tokens = 795, docs = 1: judged relevant

Support Vector Machines Applied to Face Recognition (11, tags: face


recognition, SVM)

8. Gray M, lik = 5.159, tokens = 470, docs = 2: judged irrelevant

Dynamic Features for Visual Speechreading: A Systematic Comparison


(9, text: dynamic visual features; no tags)

9. Lawrence D, lik = 5.217, tokens = 265, docs = 1: judged relevant

SEXNET: A Neural Network Identifies Sex From Human Faces (3, tags:
neural networks, object recognition, pattern recognition)

10. Ahuja N, lik = 5.221, tokens = 366, docs = 2: judged relevant


A generic approach to topic models
A SNoW-Based Face Detector (12, tags: face recognition, image, vision)

Gregor Heinrich

64 / 35

ETT1 results: Tag query and expert topics


tag: face recognition (ETT1/J20)
0.82702 face images faces image facial visual human video database detection
0.09392 image images texture pixel resolution pyramid regions pixels region search
0.02696 wavelet video view images tracking user camera image motion shape
0.00117 eeg brain ica artifacts subjects activity subject erp signals scalp
0.00100 image images visual vision optical pixel surface edge disparity receptive
0.00094 orientation cortical dominance ocular cortex development lateral eye cells visual
0.00089 chip neuron synapse digital pulse analog synaptic chips synapses murray
0.00084 hinton object image energy cost images code visible zemel codes

author: Movellan J (ETT1/J20)


0.53816:
0.16216:
0.08954:
0.06216:
0.03939:
0.03508:
0.02770:
0.02154:

face images faces image facial visual human video database detection
image images texture pixel resolution pyramid regions pixels region search
speech speaker acoustic vowel phonetic phoneme utterances spoken formant
bayesian prior density posterior entropy evidence likelihood distributions
filter frequency signals phase channel amplitude frequencies temporal spectrum
activation boltzmann annealing temperature neuron stochastic schedule machine
cell firing cells neuron activity excitatory inhibitory synaptic potential membrane
convergence stochastic descent optimization batch density global update

author: Cottrell G (ETT1/J20)


0.41865:
0.27523:
0.17531:
0.11287:
0.07130:
0.06143:
0.03695:
0.02049:

recurrent nets correlation cascade activation connection epochs representations


face images faces image facial visual human video database detection
subjects human stimulus cue subject trials experiment perceptual psychophysical
tangent transformation image simard images invariant invariance euclidean
modules attractors cortex phase olfactory frequency bulb activity oscillatory eeg
word connectionist representations words activation production cognitive musical
node activation graph cycle nets message recurrence links connection child
visual attention contour search selective orientation iiii region saliency segment

Gregor Heinrich

A generic approach to topic models

65 / 35

ETT1 results: Topic coherence


Topic coherence (Mimno et al. 2011):

How often do top-ranked topic terms co-occur in documents?

Re-enacts human judgement in topic intrusion experiments (Chang


et al. 2009; Heinrich 2011b)
Words in topic (choose worst match (A-F) in every group):
2. A. likelihood
B. mixture
C. theorem
D. density
E. em
F. prior

3. A. risk
B. return
C. stock
D. trading
E. processor
F. prediction

4. A. language
B. word
C. stress
D. grammar
E. neural
F. syllable

5. A. circuit
B. bayesian
C. analog
D. voltage
E. vlsi
F. chip

6. A. validation
B. set
C. variance
D. regression
E. selection
F. bias

-150
-200
coherence score

1. A. orientation
B. cortex
C. visual
D. ocular
E. acoustic
F. eye

-250
-300
-350
-400
-450
-500
LDA

(a) Topic intrusion experiment

Gregor Heinrich

ATM

ETT1/J20 ETT1/J100

(b) Coherence scores

A generic approach to topic models

66 / 35

ETT1 results: Topic coherence


Topic coherence (Mimno et al. 2011):

How often do top-ranked topic terms co-occur in documents?

Re-enacts human judgement in topic intrusion experiments (Chang


et al. 2009; Heinrich 2011b)
Words in topic (choose worst match (A-F) in every group):
2. A. likelihood
B. mixture
C. theorem
D. density
E. em
F. prior

3. A. risk
B. return
C. stock
D. trading
E. processor
F. prediction

4. A. language
B. word
C. stress
D. grammar
E. neural
F. syllable

5. A. circuit
B. bayesian
C. analog
D. voltage
E. vlsi
F. chip

6. A. validation
B. set
C. variance
D. regression
E. selection
F. bias

-150
-200
coherence score

1. A. orientation
B. cortex
C. visual
D. ocular
E. acoustic
F. eye

-250
-300
-350
-400
-450
-500
LDA

(a) Topic intrusion experiment

Gregor Heinrich

ATM

ETT1/J20 ETT1/J100

(b) Coherence scores

A generic approach to topic models

66 / 35

Anda mungkin juga menyukai