0 Suka0 Tidak suka

2 tayangan176 halamanNMF

Jul 10, 2016

© © All Rights Reserved

PDF, TXT atau baca online dari Scribd

NMF

© All Rights Reserved

2 tayangan

NMF

© All Rights Reserved

- 06461669
- Design and Analysis of Algorithms may 2008 Question Paper
- M.tech Information Systems Syllabus.pdf
- KNOWER
- Qu4ntuM ComPuT1nG 5451.pdf
- Lecture 2
- hw1_06w
- Preface of Algorithms
- Release Notes
- Homework Operation Research I-1
- Mathematical Analysis of an Iterative Algorithm for Low-Density Code Decoding
- Main
- CS109.MathLect04.ComplexityAlgorithms
- Eismann 2008
- Article
- Paper on Rank Data
- Voice Mor - Copy
- Session 1 Uploaded
- Untitled
- aliman2013.pdf

Anda di halaman 1dari 176

Ankur Moitra

Institute for Advanced Study

and Princeton University

INFORMATION OVERLOAD!

Challenge: develop tools for automatic comprehension of data

INFORMATION OVERLOAD!

Challenge: develop tools for automatic comprehension of data

Discover hidden topics

Annotate documents according to these topics

INFORMATION OVERLOAD!

Challenge: develop tools for automatic comprehension of data

Discover hidden topics

Annotate documents according to these topics

INFORMATION OVERLOAD!

Challenge: develop tools for automatic comprehension of data

Discover hidden topics

Annotate documents according to these topics

INFORMATION OVERLOAD!

Challenge: develop tools for automatic comprehension of data

Discover hidden topics

Annotate documents according to these topics

INFORMATION OVERLOAD!

Challenge: develop tools for automatic comprehension of data

Discover hidden topics

Annotate documents according to these topics

INFORMATION OVERLOAD!

Challenge: develop tools for automatic comprehension of data

Discover hidden topics

Annotate documents according to these topics

INFORMATION OVERLOAD!

Challenge: develop tools for automatic comprehension of data

Discover hidden topics

Annotate documents according to these topics

Politics: (President Obama, 0.10), (congress, 0.08), (government, 0.07),

Politics: (President Obama, 0.10), (congress, 0.08), (government, 0.07),

Politics: (President Obama, 0.10), (congress, 0.08), (government, 0.07),

Each topic is a distribution on words

OUTLINE

OUTLINE

Are there efficient algorithms to find the topics?

OUTLINE

Are there efficient algorithms to find the topics?

Challenge: We cannot rigorously analyze algorithms used

in practice! (When do they work? run quickly?)

OUTLINE

Are there efficient algorithms to find the topics?

Challenge: We cannot rigorously analyze algorithms used

in practice! (When do they work? run quickly?)

Part I: An Optimization Perspective

Separability and Anchor Words

Algorithms for Separable Instances

OUTLINE

Are there efficient algorithms to find the topics?

Challenge: We cannot rigorously analyze algorithms used

in practice! (When do they work? run quickly?)

Part I: An Optimization Perspective

Separability and Anchor Words

Algorithms for Separable Instances

Part II: A Bayesian Perspective

Topic Models (e.g. LDA, CTM, PAM, )

Algorithms for Inferring the Topics

Experimental Results

WORD-BY-DOCUMENT MATRIX

WORD-BY-DOCUMENT MATRIX

words (m)

documents (n)

WORD-BY-DOCUMENT MATRIX

words (m)

documents (n)

i

M

j

relative frequency of word i in document j

WORD-BY-DOCUMENT MATRIX

words (m)

documents (n)

words (m)

documents (n)

= A

nonnegative

W

nonnegative

words (m)

documents (n)

= A

W

nonnegative

nonnegative

WLOG we can assume columns of A, W sum to one

words (m)

documents (n)

topics

= A

W

nonnegative

nonnegative

WLOG we can assume columns of A, W sum to one

E.g. personal finance, (0.15, money), (0.10, retire), (0.03, risk),

words (m)

documents (n)

topics

= A

W

nonnegative

nonnegative

WLOG we can assume columns of A, W sum to one

E.g. personal finance, (0.15, money), (0.10, retire), (0.03, risk),

words (m)

documents (n)

topics

= A

representation

W

nonnegative

nonnegative

WLOG we can assume columns of A, W sum to one

E.g. personal finance, (0.15, money), (0.10, retire), (0.03, risk),

words (m)

documents (n)

topics

= A

representation

W

nonnegative

nonnegative

WLOG we can assume columns of A, W sum to one

E.g. personal finance, (0.15, money), (0.10, retire), (0.03, risk),

words (m)

documents (n)

topics

= A

representation

W

nonnegative

nonnegative

WLOG we can assume columns of A, W sum to one

AN ABRIDGED HISTORY

AN ABRIDGED HISTORY

Machine Learning and Statistics:

Introduced by [Lee, Seung, 99]

Goal: extract latent relationships in the data

Applications to text classification, information retrieval,

collaborative filtering, etc [Hofmann 99], [Kumar et al 98],

[Xu et al 03], [Kleinberg, Sandler 04],

AN ABRIDGED HISTORY

Machine Learning and Statistics:

Introduced by [Lee, Seung, 99]

Goal: extract latent relationships in the data

Applications to text classification, information retrieval,

collaborative filtering, etc [Hofmann 99], [Kumar et al 98],

[Xu et al 03], [Kleinberg, Sandler 04],

Introduced by [Yannakakis 90] in context of extended

formulations; also related to the log-rank conjecture

AN ABRIDGED HISTORY

Machine Learning and Statistics:

Introduced by [Lee, Seung, 99]

Goal: extract latent relationships in the data

Applications to text classification, information retrieval,

collaborative filtering, etc [Hofmann 99], [Kumar et al 98],

[Xu et al 03], [Kleinberg, Sandler 04],

Introduced by [Yannakakis 90] in context of extended

formulations; also related to the log-rank conjecture

Physical Modeling:

Introduced by [Lawton, Sylvestre 71]

Local Search: given A, compute W, compute A.

Local Search: given A, compute W, compute A.

known to fail on worst-case inputs (stuck in local optima)

Local Search: given A, compute W, compute A.

known to fail on worst-case inputs (stuck in local optima)

highly sensitive to cost-function, update procedure,

regularization

Local Search: given A, compute W, compute A.

known to fail on worst-case inputs (stuck in local optima)

highly sensitive to cost-function, update procedure,

regularization

Can we give an efficient algorithm that works on all inputs?

Theorem [Vavasis 09]: It is NP-hard to compute NMF

Theorem [Vavasis 09]: It is NP-hard to compute NMF

Theorem [Vavasis 09]: It is NP-hard to compute NMF

What is the complexity of NMF as a function of r?

Theorem [Vavasis 09]: It is NP-hard to compute NMF

What is the complexity of NMF as a function of r?

Theorem [Arora, Ge, Kannan, Moitra, STOC12]: Can solve NMF

in time (nm) O(r^2) yet any algorithm that runs in time (nm)o(r)

would yield a 2o(n) algorithm for 3-SAT.

Theorem [Vavasis 09]: It is NP-hard to compute NMF

What is the complexity of NMF as a function of r?

Theorem [Arora, Ge, Kannan, Moitra, STOC12]: Can solve NMF

in time (nm) O(r^2) yet any algorithm that runs in time (nm)o(r)

would yield a 2o(n) algorithm for 3-SAT.

M

variables

system of polynomial

inequalities

Theorem [Vavasis 09]: It is NP-hard to compute NMF

What is the complexity of NMF as a function of r?

Theorem [Arora, Ge, Kannan, Moitra, STOC12]: Can solve NMF

in time (nm) O(r^2) yet any algorithm that runs in time (nm)o(r)

would yield a 2o(n) algorithm for 3-SAT.

M

system of polynomial

inequalities

variables

Can we reduce the number of variables from nr+mr to O(r2)?

Local Search: given A, compute W, compute A.

known to fail on worst-case inputs (stuck in local optima)

highly sensitive to cost-function, update procedure,

regularization

Can we give an efficient algorithm that works on all inputs?

Local Search: given A, compute W, compute A.

known to fail on worst-case inputs (stuck in local optima)

highly sensitive to cost-function, update procedure,

regularization

Can we give an efficient algorithm that works on all inputs?

Yes, if and only if r is constant

Local Search: given A, compute W, compute A.

known to fail on worst-case inputs (stuck in local optima)

highly sensitive to cost-function, update procedure,

regularization

Can we give an efficient algorithm that works on all inputs?

Yes, if and only if r is constant

Are the instances we actually want to solve somehow easier?

Local Search: given A, compute W, compute A.

known to fail on worst-case inputs (stuck in local optima)

highly sensitive to cost-function, update procedure,

regularization

Can we give an efficient algorithm that works on all inputs?

Yes, if and only if r is constant

Are the instances we actually want to solve somehow easier?

Focus of this talk: a natural condition so that a simple algorithm

provably works, quickly

words (m)

topics (r)

words (m)

topics (r)

If an anchor word

occurs then the

document is at least

partially about the topic

words (m)

topics (r)

personal finance

If an anchor word

occurs then the

document is at least

partially about the topic

words (m)

topics (r)

personal finance

If an anchor word

occurs then the

document is at least

partially about the topic

401k

words (m)

topics (r)

If an anchor word

occurs then the

document is at least

partially about the topic

401k

words (m)

topics (r)

baseball

If an anchor word

occurs then the

document is at least

partially about the topic

401k

topics (r)

baseball

words (m)

bunt

401k

If an anchor word

occurs then the

document is at least

partially about the topic

topics (r)

words (m)

bunt

401k

If an anchor word

occurs then the

document is at least

partially about the topic

topics (r)

movie reviews

words (m)

bunt

401k

If an anchor word

occurs then the

document is at least

partially about the topic

topics (r)

movie reviews

words (m)

bunt

401k

oscar-winning

If an anchor word

occurs then the

document is at least

partially about the topic

topics (r)

movie reviews

words (m)

bunt

401k

oscar-winning

If an anchor word

occurs then the

document is at least

partially about the topic

topics (r)

movie reviews

words (m)

bunt

401k

oscar-winning

If an anchor word

occurs then the

document is at least

partially about the topic

A is p-separable if each

topic has an anchor

word that occurs with

probability p

O(nmr + mr3.5) time algorithm for NMF when the topic matrix A

is separable

O(nmr + mr3.5) time algorithm for NMF when the topic matrix A

is separable

Topic Models: documents are stochastically generated as a

convex combination of topics

O(nmr + mr3.5) time algorithm for NMF when the topic matrix A

is separable

Topic Models: documents are stochastically generated as a

convex combination of topics

Theorem [Arora, Ge, Moitra, FOCS12]: There is a polynomial time

algorithm that learns the parameters of any topic model provided

that the topic matrix A is p-separable.

O(nmr + mr3.5) time algorithm for NMF when the topic matrix A

is separable

Topic Models: documents are stochastically generated as a

convex combination of topics

Theorem [Arora, Ge, Moitra, FOCS12]: There is a polynomial time

algorithm that learns the parameters of any topic model provided

that the topic matrix A is p-separable.

In fact our algorithm is highly practical, and runs orders of

magnitude faster with nearly-identical performance as the

current best (Gibbs Sampling)

O(nmr + mr3.5) time algorithm for NMF when the topic matrix A

is separable

Topic Models: documents are stochastically generated as a

convex combination of topics

Theorem [Arora, Ge, Moitra, FOCS12]: There is a polynomial time

algorithm that learns the parameters of any topic model provided

that the topic matrix A is p-separable.

In fact our algorithm is highly practical, and runs orders of

magnitude faster with nearly-identical performance as the

current best (Gibbs Sampling)

algorithms based on the method of moments

ANCHOR WORDS

VERTICES

ANCHOR WORDS

A

VERTICES

W

ANCHOR WORDS

A

VERTICES

W

Observation: If A is separable, the rows of W appear as rows

of M, we just need to find the anchor words!

Observation: If A is separable, the rows of W appear as rows

of M, we just need to find the anchor words!

How can we find the anchor words?

ANCHOR WORDS

A

VERTICES

W

ANCHOR WORDS

A

VERTICES

W

ANCHOR WORDS

A

VERTICES

W

ANCHOR WORDS

A

VERTICES

W

ANCHOR WORDS

A

VERTICES

W

ANCHOR WORDS

A

VERTICES

W

ANCHOR WORDS

A

VERTICES

W

ANCHOR WORDS

A

VERTICES

W

=

Deleting a word

changes the convex hull

it is an anchor word

Observation: If A is separable, the rows of W appear as rows

of M, we just need to find the anchor words!

How can we find the anchor words?

Observation: If A is separable, the rows of W appear as rows

of M, we just need to find the anchor words!

How can we find the anchor words?

Anchor words are extreme points; can be found by linear

programming (or a combinatorial distance-based algorithm)

Observation: If A is separable, the rows of W appear as rows

of M, we just need to find the anchor words!

How can we find the anchor words?

Anchor words are extreme points; can be found by linear

programming (or a combinatorial distance-based algorithm)

The NMF Algorithm:

Observation: If A is separable, the rows of W appear as rows

of M, we just need to find the anchor words!

How can we find the anchor words?

Anchor words are extreme points; can be found by linear

programming (or a combinatorial distance-based algorithm)

The NMF Algorithm:

Observation: If A is separable, the rows of W appear as rows

of M, we just need to find the anchor words!

How can we find the anchor words?

Anchor words are extreme points; can be found by linear

programming (or a combinatorial distance-based algorithm)

The NMF Algorithm:

paste these vectors in as rows in W

Observation: If A is separable, the rows of W appear as rows

of M, we just need to find the anchor words!

How can we find the anchor words?

Anchor words are extreme points; can be found by linear

programming (or a combinatorial distance-based algorithm)

The NMF Algorithm:

paste these vectors in as rows in W

find the nonnegative A so that AW M

(convex programming)

OUTLINE

Are there efficient algorithms to find the topics?

Challenge: We cannot rigorously analyze algorithms used

in practice! (When do they work? run quickly?)

Part I: An Optimization Perspective

Separability and Anchor Words

Algorithms for Separable Instances

Part II: A Bayesian Perspective

Topic Models (e.g. LDA, CTM, PAM, )

Algorithms for Inferring the Topics

Experimental Results

TOPIC MODELS

fixed

A

stochastic

W

TOPIC MODELS

fixed

A

stochastic

W

=

document #1: (1.0, personal finance)

TOPIC MODELS

fixed

A

stochastic

W

=

document #1: (1.0, personal finance)

TOPIC MODELS

fixed

A

stochastic

W

TOPIC MODELS

fixed

A

stochastic

W

=

document #2: (0.5, baseball); (0.5, movie review)

TOPIC MODELS

fixed

A

stochastic

W

=

document #2: (0.5, baseball); (0.5, movie review)

TOPIC MODELS

fixed

A

stochastic

W

TOPIC MODELS

Latent Dirichlet Allocation (Blei, Ng, Jordan)

fixed

A

Dirichlet

W

TOPIC MODELS

fixed

A

TOPIC MODELS

Correlated Topic Model (Blei, Lafferty)

fixed

A

Logisitic Normal

W

TOPIC MODELS

fixed

A

TOPIC MODELS

Pachinko Allocation Model (Li, McCallum)

fixed Multilevel DAG

A

TOPIC MODELS

Pachinko Allocation Model (Li, McCallum)

fixed Multilevel DAG

A

These models differ only in how W is generated

What if documents are short; can we still find A?

What if documents are short; can we still find A?

The crucial observation is, we can work with the Gram matrix

(defined next)

Gram Matrix

MM

Gram Matrix

MM

AT

A

W WT

Gram Matrix

MM

E[ M M T]

AT

A

W WT

Gram Matrix

MM

E[ M M T]

A E[ W WT] AT

AT

A

W WT

Gram Matrix

MM

E[ M M T]

R

W WT

A E[ W WT] AT

AT

ARAT

Gram Matrix

MM

E[ M M T]

A E[ W WT] AT

AT

R

W WT

nonnegative

ARAT

Gram Matrix

MM

E[ M M T]

A E[ W WT] AT

AT

R

W WT

nonnegative

separable!

ARAT

Gram Matrix

MM

E[ M M T]

A E[ W WT] AT

ARAT

AT

R

W WT

nonnegative

separable!

Anchor words are extreme rows of the Gram matrix!

What if documents are short; can we still find A?

The crucial observation is, we can work with the Gram matrix

(defined next)

What if documents are short; can we still find A?

The crucial observation is, we can work with the Gram matrix

(defined next)

Given enough documents, we can still find the anchor words!

What if documents are short; can we still find A?

The crucial observation is, we can work with the Gram matrix

(defined next)

Given enough documents, we can still find the anchor words!

How can we use the anchor words to find the rest of A?

What if documents are short; can we still find A?

The crucial observation is, we can work with the Gram matrix

(defined next)

Given enough documents, we can still find the anchor words!

How can we use the anchor words to find the rest of A?

The posterior distribution Pr[topic|word] is supported on

just one topic, for an anchor word

What if documents are short; can we still find A?

The crucial observation is, we can work with the Gram matrix

(defined next)

Given enough documents, we can still find the anchor words!

How can we use the anchor words to find the rest of A?

The posterior distribution Pr[topic|word] is supported on

just one topic, for an anchor word

We can use the anchor words to find Pr[topic|word] for all the

other words

points are now

(normalized)

rows of M M T

points are now

(normalized)

rows of M M T

points are now

(normalized)

rows of M M T

points are now

(normalized)

rows of M M T

points are now

(normalized)

rows of M M T

what we have:

A

Pr[topic|word]

points are now

(normalized)

rows of M M T

what we have:

A

Pr[topic|word]

what we want:

Pr[word|topic]

points are now

(normalized)

rows of M M T

what we have:

A

Pr[topic|word]

Bayes Rule

what we want:

Pr[word|topic]

Pr[word|topic] =

Pr[topic|word] Pr[word]

Pr[topic|word] Pr[word]

word

Pr[word|topic] =

Pr[topic|word] Pr[word]

Pr[topic|word] Pr[word]

word

Pr[word|topic] =

Pr[topic|word] Pr[word]

Pr[topic|word] Pr[word]

word

form the Gram matrix and find the anchor words

Pr[word|topic] =

Pr[topic|word] Pr[word]

Pr[topic|word] Pr[word]

word

form the Gram matrix and find the anchor words

write each word as a convex combination of the

anchor words to find Pr[topic|word]

Pr[word|topic] =

Pr[topic|word] Pr[word]

Pr[topic|word] Pr[word]

word

form the Gram matrix and find the anchor words

write each word as a convex combination of the

anchor words to find Pr[topic|word]

compute A from the formula above

Pr[word|topic] =

Pr[topic|word] Pr[word]

Pr[topic|word] Pr[word]

word

form the Gram matrix and find the anchor words

write each word as a convex combination of the

anchor words to find Pr[topic|word]

compute A from the formula above

This provably works for any topic model (LDA, CTM, PAM, etc )

provided A is separable and R is non-singular

Our first attempt used matrix inversion, which is noisy and

unstable and can produce small negative values

Our first attempt used matrix inversion, which is noisy and

unstable and can produce small negative values

METHODOLOGY:

We ran our algorithm on real and synthetic data:

synthetic data: train an LDA model on 1100 NIPS abstracts,

use this model to run experiments

Our first attempt used matrix inversion, which is noisy and

unstable and can produce small negative values

METHODOLOGY:

We ran our algorithm on real and synthetic data:

synthetic data: train an LDA model on 1100 NIPS abstracts,

use this model to run experiments

Our algorithm is fifty times faster and performs nearly the

same on all metrics we tried (l_1, log-likelihood, coherence,)

when compared to MALLET

EXPERIMENTAL RESULTS

[Arora, Ge, Halpern, Mimno, Moitra, Sontag, Wu, Zhu, ICML13]:

EXPERIMENTAL RESULTS

[Arora, Ge, Halpern, Mimno, Moitra, Sontag, Wu, Zhu, ICML13]:

3000

Seconds

Algorithm

2000

Gibbs

Recover

RecoverL2

1000

RecoverKL

0

0

25000

50000

Documents

75000

100000

EXPERIMENTAL RESULTS

[Arora, Ge, Halpern, Mimno, Moitra, Sontag, Wu, Zhu, ICML13]:

3000

Seconds

Algorithm

2000

Gibbs

Recover

RecoverL2

1000

RecoverKL

0

0

25000

50000

75000

100000

Documents

SynthNIPS, L1 error

2.0

Algorithm

L1.error

1.5

Gibbs

Recover

1.0

RecoverL2

RecoverKL

0.5

0.0

2000

4000

6000

8000

10000

20000

Documents

40000

60000

80000

100000

Infinite

EXPERIMENTAL RESULTS

[Arora, Ge, Halpern, Mimno, Moitra, Sontag, Wu, Zhu, ICML13]:

3000

Seconds

Algorithm

2000

Gibbs

Recover

RecoverL2

1000

RecoverKL

0

0

25000

50000

Documents

75000

100000

EXPERIMENTAL RESULTS

[Arora, Ge, Halpern, Mimno, Moitra, Sontag, Wu, Zhu, ICML13]:

3000

Seconds

Algorithm

2000

Gibbs

Recover

RecoverL2

1000

RecoverKL

0

0

25000

50000

75000

100000

Documents

LogProbPerToken

6.0

6.5

Algorithm

Gibbs

Recover

7.0

RecoverL2

RecoverKL

7.5

8.0

2000

4000

6000

8000

10000

20000

40000

Documents

60000

80000

100000

Infinite

Our first attempt used matrix inversion, which is noisy and

unstable and can produce small negative values

METHODOLOGY:

We ran our algorithm on real and synthetic data:

synthetic data: train an LDA model on 1100 NIPS abstracts,

use this model to run experiments

Our algorithm is fifty times faster and performs nearly the

same on all metrics we tried (l_1, log-likelihood, coherence,)

when compared to MALLET

Our first attempt used matrix inversion, which is noisy and

unstable and can produce small negative values

METHODOLOGY:

We ran our algorithm on real and synthetic data:

synthetic data: train an LDA model on 1100 NIPS abstracts,

use this model to run experiments

Our algorithm is fifty times faster and performs nearly the

same on all metrics we tried (l_1, log-likelihood, coherence,)

when compared to MALLET

MY WORK ON LEARNING

MY WORK ON LEARNING

Is Learning

Computationally

Easy?

MY WORK ON LEARNING

Nonnegative Matrix

Factorization

New Models

Is Learning

Computationally

Easy?

Topic Models

MY WORK ON LEARNING

computational geometry

Nonnegative Matrix

Factorization

New Models

Is Learning

Computationally

Easy?

Topic Models

MY WORK ON LEARNING

computational geometry

Nonnegative Matrix

Factorization

New Models

Topic Models

experiments

Is Learning

Computationally

Easy?

MY WORK ON LEARNING

computational geometry

Nonnegative Matrix

Factorization

New Models

Topic Models

experiments

Method of

Moments

Is Learning

Computationally

Easy?

MY WORK ON LEARNING

computational geometry

Nonnegative Matrix

Factorization

New Models

Topic Models

experiments

Is Learning

Computationally

Easy?

Method of

Moments

algebraic geometry

Nonnegative Matrix

Factorization

MY WORK ON LEARNING

computational geometry

Nonnegative Matrix

Factorization

New Models

Topic Models

experiments

Is Learning

Computationally

Easy?

Method of

Moments

algebraic geometry

Mixtures of

Gaussians

Nonnegative Matrix

Factorization

10

15

20

0.58

0.60

0.62

0.64

0.66

0.68

0.70

15

10

5

of Gaussians from random samples?

20

0.58

0.60

0.62

0.64

0.66

0.68

0.70

no provable guarantees

15

10

5

of Gaussians from random samples?

20

0.58

0.60

0.62

0.64

0.66

0.68

0.70

no provable guarantees

15

10

5

of Gaussians from random samples?

20

0.58

0.60

0.62

0.64

0.66

0.68

polynomial time alg. to learn the parameters of a mixture of a

constant number of Gaussians (even in high-dimensions)

0.70

no provable guarantees

15

10

5

of Gaussians from random samples?

20

0.58

0.60

0.62

0.64

0.66

0.68

polynomial time alg. to learn the parameters of a mixture of a

constant number of Gaussians (even in high-dimensions)

starting with [Dasgupta, 99] that

assumed negligible overlap.

0.70

no provable guarantees

15

10

5

of Gaussians from random samples?

20

0.58

0.60

0.62

0.64

0.66

0.68

polynomial time alg. to learn the parameters of a mixture of a

constant number of Gaussians (even in high-dimensions)

starting with [Dasgupta, 99] that

assumed negligible overlap. See

also [Belkin, Sinha 10]

0.70

MY WORK ON LEARNING

computational geometry

Nonnegative Matrix

Factorization

New Models

Topic Models

experiments

Is Learning

Computationally

Easy?

Method of

Moments

algebraic geometry

Mixtures of

Gaussians

Nonnegative Matrix

Factorization

MY WORK ON LEARNING

computational geometry

Nonnegative Matrix

Factorization

New Models

Deep Learning

experiments

local search

Is Learning

Computationally

Easy?

Method of

Moments

algebraic geometry

Mixtures of

Gaussians

Topic Models

Nonnegative Matrix

Factorization

MY WORK ON LEARNING

complex analysis

computational geometry

Population

Recovery

DNF

Deep Learning

Nonnegative Matrix

Factorization

New Models

experiments

local search

Is Learning

Computationally

Easy?

Method of

Moments

algebraic geometry

Mixtures of

Gaussians

Topic Models

Nonnegative Matrix

Factorization

MY WORK ON LEARNING

complex analysis

computational geometry

Population

Recovery

DNF

Deep Learning

Nonnegative Matrix

Factorization

New Models

experiments

local search

Is Learning

Computationally

Easy?

Method of

Moments

algebraic geometry

Mixtures of

Gaussians

Topic Models

Nonnegative Matrix

Factorization

Robustness

functional analysis

Linear Regression

MY WORK ON ALGORITHMS

MY WORK ON ALGORITHMS

Approximation Algorithms,

Metric Embeddings

MY WORK ON ALGORITHMS

Approximation Algorithms,

Metric Embeddings

Information Theory,

Communication Complexity

MY WORK ON ALGORITHMS

Approximation Algorithms,

Metric Embeddings

Information Theory,

Communication Complexity

?

f(x+y) = f(x) + f(y)

Combinatorics,

Smooth Analysis

f(x)

f(w)

f(y)

f(z)

Summary:

Often optimization problems abstracted from

learning are intractable!

Summary:

Often optimization problems abstracted from

learning are intractable!

Are there new models that better capture the

instances we actually want to solve in practice?

Summary:

Often optimization problems abstracted from

learning are intractable!

Are there new models that better capture the

instances we actually want to solve in practice?

These new models can lead to interesting theory

questions and highly practical and new algorithms

Summary:

Often optimization problems abstracted from

learning are intractable!

Are there new models that better capture the

instances we actually want to solve in practice?

These new models can lead to interesting theory

questions and highly practical and new algorithms

There are many exciting questions left to explore

at the intersection of algorithms and learning

Any Questions?

Summary:

Often optimization problems abstracted from

learning are intractable!

Are there new models that better capture the

instances we actually want to solve in practice?

These new models can lead to interesting theory

questions and highly practical and new algorithms

There are many exciting questions left to explore

at the intersection of algorithms and learning

- 06461669Diunggah olehMekaTron
- Design and Analysis of Algorithms may 2008 Question PaperDiunggah olehelimelek
- M.tech Information Systems Syllabus.pdfDiunggah olehShishir Oraon
- KNOWERDiunggah olehBenoit Jottreau
- Qu4ntuM ComPuT1nG 5451.pdfDiunggah olehMyaman Mon
- Lecture 2Diunggah olehkostas_ntougias5453
- hw1_06wDiunggah olehSharath
- Preface of AlgorithmsDiunggah olehAMIT RADHA KRISHNA NIGAM
- Release NotesDiunggah olehManuel Aragon Urquizo
- Homework Operation Research I-1Diunggah olehTai Tran Luong
- Mathematical Analysis of an Iterative Algorithm for Low-Density Code DecodingDiunggah olehvmaiz
- MainDiunggah olehfdddddd
- CS109.MathLect04.ComplexityAlgorithmsDiunggah olehadetejubello
- Eismann 2008Diunggah olehDeniz Summer
- ArticleDiunggah olehMarco Zárate
- Paper on Rank DataDiunggah olehAnurupa Kundu
- Voice Mor - CopyDiunggah olehDevaram Bhangadwa
- Session 1 UploadedDiunggah olehSahil Garg
- UntitledDiunggah olehOwais Patanwala
- aliman2013.pdfDiunggah olehAldiansyah Nasution
- Operation Research SyllabusDiunggah olehnirmalkr
- Chapter 1Diunggah olehBRGR
- sSmart Little Black BoxesDiunggah olehAndrew Rojas Paredes
- 1003.5623Diunggah olehMelissa Mulè
- Hydrothermal Scheduling Using Benders Decomposition- Accelerating TechniquesDiunggah olehkaren dejo
- CivilEngineeringSystemsByAndrewB.TemplemanilovepdfcompressedDiunggah olehKOKOLE
- 10.1.1.23.1469.pdfDiunggah olehJessy Clo Set
- SINGLE-OBJECTIVE LINEAR GOAL PROGRAMMING PROBLEM WITH NEUTROSOPHIC NUMBERSDiunggah olehMia Amalia
- co250-F11Diunggah olehAditya Sridhar
- Lpp in Construction SitesDiunggah olehsai vardhan

- ODE Cheat SheetDiunggah olehMuhammad Rizwan
- 01 Introduction to Feedforward Neural Networks [Hugo]Diunggah olehMuhammad Rizwan
- 07 Deep Reinforcement Learning [John]Diunggah olehMuhammad Rizwan
- 10 Torch Tutorial [Alex]Diunggah olehMuhammad Rizwan
- LA Cheat SheetDiunggah olehMuhammad Rizwan
- Jin Saunders Mmds08 2Diunggah olehMuhammad Rizwan
- On the Move - Dynamical Systems for Modeling, Measurement and Inference in Low Dimensional Signal ModelsDiunggah olehMuhammad Rizwan
- Garofolo VAC2011Diunggah olehMuhammad Rizwan
- Sparse Coding - An OverviewDiunggah olehMuhammad Rizwan
- NMF TutorialDiunggah olehMuhammad Rizwan
- Linear Factor Models and Auto-EncodersDiunggah olehMuhammad Rizwan
- OPT_MatlabDiunggah olehPraveen Krs
- Partial Search Vector Selection for Sparse Signal RepresentationDiunggah olehMuhammad Rizwan
- Correspondence Free Dictionary Learning for Cross View Action RecognitionDiunggah olehMuhammad Rizwan
- Op Tim MatlabDiunggah olehBiguest Bags
- KickoffDiunggah olehMuhammad Rizwan
- Complex Cheat SheetDiunggah olehMuhammad Rizwan
- English Cheat SheetDiunggah olehMuhammad Rizwan
- T3Diunggah olehMuhammad Rizwan
- sidebar_1Diunggah olehMuhammad Rizwan
- Ieeetran HowtoDiunggah olehMuhammad Rizwan
- Bare Jrnl TransmagDiunggah olehMuhammad Rizwan
- e1.pdfDiunggah olehMuhammad Rizwan
- Example ThesisDiunggah olehMuhammad Rizwan
- mcgilletdsample.pdfDiunggah olehMuhammad Rizwan
- CornellDiunggah olehMuhammad Rizwan
- thesis.pdfDiunggah olehMuhammad Rizwan

- US Federal Reserve: strucaDiunggah olehUSA_FederalReserve
- Raj Relocating Modern Science IntroDiunggah olehJulian Carrera
- Point Clouds to Deliverable in AutoCAD Plant 3DDiunggah olehflemster911
- amvi.pdfDiunggah olehNaveen Yadav
- Road Safety Audit and Actions on Black Spots R&B Wing on 9th Aug 2017Diunggah olehAkshay Kumar Sahoo
- A Brief History of Science_078671039XDiunggah olehФилип Уљаревић
- 2015 FITOTETAPI NYERI & ARTRITIS.pptxDiunggah olehBrilian Jelita
- Alexander MirtchevDiunggah olehAlexander Mirtchev
- only chapter of DHX by t.kuppan.pdfDiunggah olehSATHEESH KUMAR
- HEI Tech Sheet #130 Vacuum Condenser Drainage – Proper Design & Installation GuideDiunggah olehmagdyeng
- Learn English in a FW (Teens)Diunggah olehHernan Sulbaran
- 1ef91 2e Dme Tacan GstationsDiunggah olehVitthawat ST
- Best Ab Exercises by ActivationDiunggah olehCharles Johnson
- Soil DefinedDiunggah olehNikilsha Ramakrishnan
- 06.Movimiento FuegoDiunggah olehjjseguin
- cpd26.pdfDiunggah olehAnonymous meB4fDTg3
- The Most Effective Pre-reading Strategies for ComprehensionDiunggah olehVeronica Egas Villafuerte
- Dennard - Design of Ion-implanted MOSFETs With Very Small Physical Dimensions (IEEE 1974)Diunggah olehAvinash Gupta
- DominosDiunggah olehJimmy Sachde
- ”The Brain - is wider than the Sky - ” orDiunggah olehmicksoul
- Transmission Allison TipsDiunggah olehMendez Francisco
- Business PCL-I MA CR Session 7-8 Bhagwati Committee ReportDiunggah olehCaleb Michael Samuel
- Tech Report NWU-CS-02-10: RoboTA: An Agent Colony Architecture for Supporting EducationDiunggah oleheecs.northwestern.edu
- cy-bookDiunggah olehChirag Wasan
- 01709469Diunggah olehAtoui Issam
- Service Manual Hp Ml 330Diunggah olehzahir1975
- Control System Development for Small UAV GimbalDiunggah olehbendisudhakar3536
- Makalah Presentasi High Performance ConsultingDiunggah olehDiana Agustika
- Johann Sebastian BachDiunggah olehmariacarmenstoica
- Questionnaire for Power QualityDiunggah olehAKKI READER

## Lebih dari sekadar dokumen.

Temukan segala yang ditawarkan Scribd, termasuk buku dan buku audio dari penerbit-penerbit terkemuka.

Batalkan kapan saja.