Anda di halaman 1dari 52

The Canonical Tensor Decomposition and

Its Applications to Social Network Analysis

Evrim Acar, Tamara G. Kolda and Daniel M. Dunlavy


Sandia National Labs

Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United
States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.
What is Canonical Tensor
Decomposition?
CANDECOMP/PARAFAC (CP) model [Hitchcock’27, Harshman’70, Carroll & Chang’70]

= +…+
I

K
J
R components
CP Application: Neuroscience
Epileptic Seizure Localization:
c1 c2
b1
≈ b2
samples

+
Time

a1 a2
Channels
Scales
CP Application: Neuroscience
Epileptic Seizure Localization:
c1 c2
b1
≈ b2
samples

+
Time

a1 a2
Channels
Scales

Acar et al, 2007, De Vos et al, 2007


CP has Numerous Applications!
• Chemometrics
– Fluorescence Spectroscopy
– Chromatographic Data
Analysis
• Neuroscience Andersen and Bro, Journal
Mørup, Hansen and Arnfred,
Journal of Neuroscience
– Epileptic Seizure Localization of Chemometrics, 2003. Methods, 2007.
– Analysis of EEG and ERP
• Signal Processing
• Computer Vision
– Image compression, Sidiropoulos,
classification Giannakis and Bro, Hazan, Polak and
IEEE Trans. Signal Shashua, ICCV 2005.
– Texture analysis Processing, 2000.
• Social Network Analysis
– Web link analysis Bader, Berry, Browne,
– Conversation detection in Survey of Text Mining:
Clustering, Classification,
emails and Retrieval, 2nd Ed.,
– Text analysis 2007.
• Approximation of PDEs
Doostan and Iaccarino, Journal of
Computational Physics, 2009.
Algorithms: How Can We Compute CP?
Mathematical Details for CP
Columns: mode-1 fibers

Unfolding
(Matricization)
Mathematical Details for CP

Unfolding
(Matricization)

Row: mode-2 fibers


Mathematical Details for CP

Unfolding
(Matricization)

Tube: mode-3 fibers

= +…+

Matrix Khatri-Rao Product


CP is a Nonlinear
Optimization Problem
Given tensor and R (# of components), find matrices A, B, C that solve the
following problem:

Optimization Problem Objective Function

where the vector x comprises the


entries of A, B, and C stacked
column-wise:

= +…+
I
K
J

variables
Traditional Approach: CPALS
CPALS dating back to Harshman’70 and Carroll & Chang’70 solves for one factor
matrix at a time.
Optimization Problem

Each step can be converted to a


Alternating Algorithm matrix least squares problem:
for k = 1,…

I x JK JK x R

end IxR
I x JK JK x R
R x R matrix
Traditional Approach: CPALS

Optimization Problem

Repeat the following steps until “convergence”:

Very fast, but not always accurate.


Not guaranteed to converge to a stationary point.
Other issues, e.g., cannot exploit symmetry.
Our Approach: CPOPT
Unlike CPALS, CPOPT solves for all factor matrices simultaneously using a
gradient based optimization.
Optimization Problem

Define the objective function:


Rewriting the Objective Function

Inner Product

Norm
Derivative of 2nd Summand

Tensor-Vector Multiplication

Analogous formulas
exist for partials w.r.t.
columns of B and C.
Derivative of 3rd Summand

Analogous formulas
exist for partials w.r.t.
columns of B and C.
Objective and Gradient

Objective Function
= +…+

Gradient (for r = 1,…,R)


Gradient in Matrix Form

Objective Function
= +…+

Gradient

Note that this formulation can be used to


derive the ALS approach!
Indeterminacies of CP

• CP is often unique.
= +…+
• However, CP has two fundamental
indeterminacies
– Permutation – The components
can be reordered
Not a big deal.
• Swap a1, b1, c1 Leads to multiple,
with a3, b3, c3 but separated,
minima.
– Scaling – The vectors
comprising a single rank-one
factor can be scaled This leads to a
continuous space of
• Replace a1 and b1 equivalent solutions.
with 2 a1 and ½ b1
Adding Regularization

Objective Function

Gradient
Our methods:
CPOPT & CPOPTR
CPOPT: Apply derivative-based optimization method to the following objective
function:

CPOPTR: Apply derivative-based optimization method to the following regularized


objective function:
Another competing method:
CPNLS
CPNLS: Apply nonlinear least squares solver to the following equations:

Jacobian is of size .

Proposed by Paatero’97 and also


Tomasi and Bro’05.
Experimental Set-Up[Tomasi&Bro’06]
20 triplets Step 2: Construct tensor from factor
matrices and add noise. All
combinations of:
• Homoscedastic: 1%, 5%, 10%
• Heteroscedastic: 0%, 1%, 5%
Step 1: Generate random
factor matrices A, B, C
with Rtrue = 3 or 5
columns each and
collinearity set to 0.5,
i.e.,
Step 3: Use algorithm to extract factors, using
Rtrue and Rtrue+1 factors. Compare against
factors in Step 1. 180
tensors
= + + +

360 tests
R=3
Implementation Details

• All experiments were performed in MATLAB on a Linux


workstation (Quad-Core Intel Xeon 2.50GHz, 9 GB RAM).

• Methods
– CPALS – Alternating least squares. Used parafac_als in the Tensor Toolbox
(Bader & Kolda)
– CPNLS – Nonlinear least squares. Used PARAFAC3W, which implements
Levenberg-Marquadt (necessary due to scaling ambiguity), by Tomasi and
Bro.
– CPOPT – Optimization. Used routines in the Tensor Toolbox in calculation
of function values and gradients. Optimization via Nonlinear Conjugate
Gradient (NCG) method with Hestenes-Stiefel update, using Poblano (in-
house code to be released soon).
– CPOPTR – Optimization with regularization. Same as above.
(Regularization parameter = 0.02.)
CPOPT is Fast and Accurate
Generated 360 dense test problems (with ranks 3 and 5) and factorized with R as
the correct number of components and one more than that. Total of 720 tests for
each entry below.

KxKxK O(RK3) O(R3K3) O(RK3) O(RK3)


R = # components
Overfactoring has a significant impact
CPOPT is robust to overfactoring
Amino (http://www.models.life.ku.dk/)

Emission mode Emission mode Emission mode


0.16 0.2 0.3

0.14 0.18
0.25
0.16
0.12
0.14 0.2

0.1
0.12
0.15
0.08

1
0.1

3
0.1
0.06 0.08

0.06 0.05
0.04
0.04
0.02 0
0.02

0 0 -0.05
250 300 350 400 450 250 300 350 400 450 250 300 350 400 450
Emission wavelength Emission wavelength Emission wavelength
Application: Link Prediction
Link Prediction on Bibliometric Data
2007
2005…

…2004
1992
1991
authors

# of papers
by ith author conferences
at jth conf. in year k.

Question1: Can we use tensor decompositions to model the data and extract
meaningful factors?
Question2: Can we predict who is going to publish at which conferences in
future?
Components make sense!
year c1 c2 cR
DBLP + …

authors
b 1 b2 bR

a1 a2 aR

conferences
ar br cr
Hans Peter
Meinzer Thomas Martin
Author Mode Conference Mode Time mode
0.3 Lehmann 1.2 1

0.25 Heinrich BILDMED


1
0.9

Niemann 0.8

0.2 0.8 0.7

0.6
0.15 0.6
Coeffs.

Coeffs.

Coeffs.
0.5
0.1 0.4 CARS 0.4

0.05 0.2 DAGM 0.3

0.2
0 0
0.1

-0.05 -0.2 0
0 2000 4000 6000 8000 10000 12000 0 200 400 600 800 1000 1200 1400 1600 1800 1992 1994 1996 1998 2000 2002 2004
Authors Conferences Years
Components make sense!
year c1 c2 cR
+ …

authors
b 1 b2 bR
X
a1 a2 aR

conferences
ar br cr
Hans Peter
Meinzer Thomas Martin
Author Mode Conference Mode Time mode
0.3 Lehmann 1.2 1

0.25 Heinrich BILDMED


1
0.9

Niemann 0.8

0.2 0.8 0.7

0.6
0.15 0.6
Coeffs.

Coeffs.

Coeffs.
0.5
0.1 0.4 CARS 0.4

0.05 0.2 DAGM 0.3

0.2
0 0
0.1

-0.05 -0.2 0
0 2000 4000 6000 8000 10000 12000 0 200 400 600 800 1000 1200 1400 1600 1800 1992 1994 1996 1998 2000 2002 2004
Authors Conferences Years
Components make sense!
year c1 c2 cR
+ …

authors
b1 b2 bR

a1 a2 aR

conferences
ar br cr

Craig Boutilier
Author mode Conference mode Time mode
0.16 1.2 0.6

0.14 Daphne Koller 1 0.5

0.12
IJCAI
0.8 0.4
0.1

0.08 0.6 0.3


Coeffs.

Coeffs.

Coeffs.
0.06 0.4 0.2

0.04
0.2 0.1
0.02

0 0
0

-0.02 -0.2 -0.1


0 2000 4000 6000 8000 10000 12000 0 200 400 600 800 1000 1200 1400 1600 1800 1992 1994 1996 1998 2000 2002 2004
Authors Conferences Years
Components make sense!
year c1 c2 cR
+ …

authors
b1 b2 bR

a1 a2 aR

conferences
ar br cr

Craig Boutilier
Author mode Conference mode Time mode
0.16 1.2 0.6

0.14 Daphne Koller 1 0.5

0.12
IJCAI
0.8 0.4
0.1

0.08 0.6 0.3


Coeffs.

Coeffs.

Coeffs.
0.06 0.4 0.2

0.04
0.2 0.1
0.02

0 0
0

-0.02 -0.2 -0.1


0 2000 4000 6000 8000 10000 12000 0 200 400 600 800 1000 1200 1400 1600 1800 1992 1994 1996 1998 2000 2002 2004
Authors Conferences Years
Link Prediction Problem

TRAIN: …2004
1992 c1 c2 cR
1991
≈ + …
authors

b1 b2 bR

a1 a2 aR

conferences

TEST: 2007
2006 ~ 60K links out of 19 million
2005 possible <author, conf> pairs
authors

~ 0.3% dense
authors

~ 32K previously unseen links in


the training set
conferences conferences <authori, confj> = 0
<authori, confj> = 1
if i author publishes at jth conf.
th
Score for <authori, confj>
• Sign ambiguity:
c1 c2
≈ b1
+ b2

a1 a2

• Fix signs using the signs of the maximum magnitude entries and then compute a
score for each author-conference pair using the information from the time domain:
b1
a1
+ …
b1 b2 bR

0.7
c1 a1 a2 aR
0.6

0.5

0.4

t
0.3

0.2

0.1

time
-0.1
0 2 4 6 8 10 12 14
Score for <authori, confj>
• Sign ambiguity:
c1 c2
≈ b1
+ b2

a1 a2

• Fix signs using the signs of the maximum magnitude entries and then compute a
score for each author-conference pair using the information from the time domain:
b1
a1
+ …
b1 b2 bR

a1 0.45

0.4
c2 a2 aR
0.35

0.3

0.25

t
0.2

0.15

0.1

0.05

-0.05
0 2 4 6 8 10 12 14
Performance Measure: AUC

s: contains the scores for all possible pairs, e.g., ~19 million

authors
scores sorted scores labels

⎡s11⎤ ⎡ s95 ⎤ ⎡1 ⎤
⎢s ⎥ ⎢s ⎥ ⎢0 ⎥
conferences <authori, confj> = 0
⎢ 12⎥ sort ⎢ 23 ⎥ ⎢ ⎥ <authori, confj> = 1
⎢....⎥ ⎢.... ⎥ if i author publishes at jth conf.
th
⎢....⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢sij ⎥ ⎢.... ⎥ ⎢1 ⎥
⎢....⎥ ⎢.... ⎥ ⎢....⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢⎣sIJ ⎥⎦ ⎢⎣ s67 ⎥⎦ ⎣0 ⎦
N: number of 1’s
M: number of 0’s
Performance Measure: AUC

s: contains the scores for all possible pairs, e.g., ~19 million

TP rate FP rate Receiver Operating


scores sorted scores labels Characteristic (ROC)
⎡s11⎤ ⎡ s95 ⎤ ⎡1 ⎤ 1/N 0
Curve

⎢s ⎥
1

⎢s ⎥ ⎢0 ⎥ 0.9

⎢ 12⎥ sort ⎢ 23 ⎥ ⎢ ⎥
1/N 1/M 0.8

⎢....⎥ ⎢.... ⎥
0.7

TP rate
⎢....⎥ 0.6

⎢ ⎥ ⎢ ⎥ ⎢ ⎥
0.5

Area Under the curve


⎢sij ⎥ ⎢.... ⎥
0.4

⎢1 ⎥ 0.3
(AUC)
⎢....⎥ ⎢.... ⎥ ⎢....⎥
0.2

⎢ ⎥ ⎢ ⎥
0.1

⎢ ⎥ 0

⎢⎣sIJ ⎥⎦ ⎢⎣ s67 ⎥⎦
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

⎣0 ⎦ 1 1
FP rate
N: number of 1’s
M: number of 0’s
Performance Evaluation

CP
AUC=0.92
Predicting Links
for 2005 - 2007 (~ 60K):
RANDOM

AUC=0.87
Predicting Previously Unseen Links
for 2005 - 2007(~ 32K):
CP-WOPT: Handling Missing Data
Missing Data Examples

Missing data in different disciplines due to loss CHEMISTRY


Tomasi&Bro’05
of information, machine failures, different sampling
frequencies or experimental-set ups.
• Chemometrics
• Biomedical signal processing (e.g., EEG)
• Network traffic analysis (e.g., packet drops)
emission
• Computer vision (e.g., occlusions)
excitation
• …
EEG

subject 1 subject N subjects


channels

channels

+…+

time-frequency time-frequency
Modify the objective for CP

FOR HANDLING MISSING DATA


NO MISSING DATA Optimization Problem
Optimization Problem

Objective Function
Our approach: CP-WOPT
Objective Function
Objective and Gradient
Objective Function

Gradient (for r = 1,…,R; i=1,…I; j=1,..J; k=1,..K )


Gradient in Matrix Form
Objective Function

Gradient
Experimental Set-Up[Tomasi&Bro’05]
20 triplets
Step 2: Construct tensor from factor
matrices and add noise ( 2%
homoscedastic noise)
Step 1: Generate random
factor matrices A, B, C
with R = 5 or 10
columns each and
collinearity set to 0.5.

Step 4: Use algorithm to extract R Step 3: Set some entries to missing


factors. Compare against factors in • Percentage of Missing Data: 10%, 40%,
Step 1. 70%

= + …+ Missing: entries, fibers

R
CP-WOPT is Accurate!

Generated 40 test problems (with ranks 5 and 10) and factorized with an R-
component CP model. Each entry corresponds to the percentage of correctly
recovered solutions.

# known data entries


# variables
CP-WOPT is Accurate!

Generated 40 test problems (with ranks 5 and 10) and factorized with an R-
component CP model. Each entry corresponds to the percentage of correctly
recovered solutions.

CPNLS : Nonlinear least squares. Used INDAFAC, which implements Levenberg-


Marquadt [Tomasi and Bro’05].
Other alternative: ALS-based imputation (For comparisons, see Tomasi and Bro’05).
CP-WOPT is Fast!

Generated 60 test problems (with M =10%, 40% and 70%) and factorized with
an R-component CP model. Each entry corresponds to the average/std of the CP
models, which successfully recover the underlying factors.
CP-WOPT is useful for real data!
Thanks to Morten Mørup!

GOAL: To differentiate between left and right hand stimulation


subjects


channels

+ +

missing
time-frequency
COMPLETE DATA INCOMPLETE DATA
Summary & Future Work
• New CPOPT method
– Accurate & scalable
• Extend CPOPT to CP-WOPT to
handle missing data
– Accurate & scalable
• More open questions…
– Starting point?
– Tuning the optimization
– Regularization
– Exploiting sparsity
– Nonnegativity
• Application to link prediction
– On-going work comparing to other
methods
Thank you!
• More on tensors and tensor models:
– Survey : E. Acar and B. Yener, Unsupervised Multiway Data Analysis: A Literature Survey,
IEEE Transactions on Knowledge and Data Engineering, 21(1): 6-20, 2009.
– CPOPT : E. Acar, T. G. Kolda and D. M. Dunlavy, An Optimization Approach for Fitting
Canonical Tensor Decompositions, Submitted for publication.
– CP-WOPT : E. Acar, T.G. Kolda, D. M. Dunlavy and M. Mørup, Tensor Factorizations with
Missing Data, Submitted for publication.
– Link Prediction: E. Acar, T.G. Kolda and D. M. Dunlavy, Link Prediction on Evolving Data, in
preparation.
• Contact:
– Evrim Acar, eacarat@sandia.gov
– Tamara G. Kolda, tgkolda@sandia.gov
– Daniel M. Dunlavy, dmdunla@sandia.gov

Minisymposia on
Tensors and Tensor-based Computations

Anda mungkin juga menyukai