Tensor

The Canonical Tensor Decomposition and
Its Applications to Social Network Analysis
Evrim Acar, Tamara G. Kolda and Daniel M. Dunlavy

Sandia National Labs
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United
States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.
What is Canonical Tensor
Decomposition?
CANDECOMP/PARAFAC (CP) model [Hitchcock’27, Harshman’70, Carroll & Chang’70]
= +…+
I
K
J
R components
CP Application: Neuroscience
Epileptic Seizure Localization:
c1 c2
b1
≈ b2
samples
+
Time
a1 a2
Channels
Scales
CP Application: Neuroscience
Epileptic Seizure Localization:
c1 c2
b1
≈ b2
samples
+
Time
a1 a2
Channels
Scales
Acar et al, 2007, De Vos et al, 2007

CP has Numerous Applications!
• Chemometrics
– Fluorescence Spectroscopy
– Chromatographic Data
Analysis
• Neuroscience Andersen and Bro, Journal
Mørup, Hansen and Arnfred,
Journal of Neuroscience
– Epileptic Seizure Localization of Chemometrics, 2003. Methods, 2007.
– Analysis of EEG and ERP
• Signal Processing
• Computer Vision
– Image compression, Sidiropoulos,
classification Giannakis and Bro, Hazan, Polak and
IEEE Trans. Signal Shashua, ICCV 2005.
– Texture analysis Processing, 2000.
• Social Network Analysis
– Web link analysis Bader, Berry, Browne,
– Conversation detection in Survey of Text Mining:
Clustering, Classification,
emails and Retrieval, 2nd Ed.,
– Text analysis 2007.
• Approximation of PDEs
Doostan and Iaccarino, Journal of
Computational Physics, 2009.
Algorithms: How Can We Compute CP?
Mathematical Details for CP
Columns: mode-1 fibers
Unfolding
(Matricization)
Unfolding
(Matricization)
Row: mode-2 fibers

Unfolding
(Matricization)
Tube: mode-3 fibers
= +…+
Matrix Khatri-Rao Product

CP is a Nonlinear
Optimization Problem
Given tensor and R (# of components), find matrices A, B, C that solve the
following problem:
Optimization Problem Objective Function
where the vector x comprises the

entries of A, B, and C stacked
column-wise:
= +…+
I
K
J
variables
Traditional Approach: CPALS
CPALS dating back to Harshman’70 and Carroll & Chang’70 solves for one factor
matrix at a time.
Each step can be converted to a

Alternating Algorithm matrix least squares problem:
for k = 1,…
I x JK JK x R
end IxR
I x JK JK x R
R x R matrix
Traditional Approach: CPALS
Repeat the following steps until “convergence”:
Very fast, but not always accurate.

Not guaranteed to converge to a stationary point.
Other issues, e.g., cannot exploit symmetry.
Our Approach: CPOPT
Unlike CPALS, CPOPT solves for all factor matrices simultaneously using a
gradient based optimization.
Define the objective function:

Rewriting the Objective Function
Inner Product
Norm
Derivative of 2nd Summand
Tensor-Vector Multiplication
Analogous formulas
exist for partials w.r.t.
columns of B and C.
Derivative of 3rd Summand
Analogous formulas
exist for partials w.r.t.
columns of B and C.
Objective and Gradient
Objective Function
= +…+
Gradient (for r = 1,…,R)

Gradient in Matrix Form
Objective Function
= +…+
Gradient
Note that this formulation can be used to

derive the ALS approach!
Indeterminacies of CP
• CP is often unique.
= +…+
• However, CP has two fundamental
indeterminacies
– Permutation – The components
can be reordered
Not a big deal.
• Swap a1, b1, c1 Leads to multiple,
with a3, b3, c3 but separated,
minima.
– Scaling – The vectors
comprising a single rank-one
factor can be scaled This leads to a
continuous space of
• Replace a1 and b1 equivalent solutions.
with 2 a1 and ½ b1
Adding Regularization
Objective Function
Gradient
Our methods:
CPOPT & CPOPTR
CPOPT: Apply derivative-based optimization method to the following objective
function:
CPOPTR: Apply derivative-based optimization method to the following regularized

objective function:
Another competing method:
CPNLS
CPNLS: Apply nonlinear least squares solver to the following equations:
Jacobian is of size .
Proposed by Paatero’97 and also

Tomasi and Bro’05.
Experimental Set-Up[Tomasi&Bro’06]
20 triplets Step 2: Construct tensor from factor
matrices and add noise. All
combinations of:
• Homoscedastic: 1%, 5%, 10%
• Heteroscedastic: 0%, 1%, 5%
Step 1: Generate random
factor matrices A, B, C
with Rtrue = 3 or 5
columns each and
collinearity set to 0.5,
i.e.,
Step 3: Use algorithm to extract factors, using
Rtrue and Rtrue+1 factors. Compare against
factors in Step 1. 180
tensors
= + + +
360 tests
R=3
Implementation Details
• All experiments were performed in MATLAB on a Linux

workstation (Quad-Core Intel Xeon 2.50GHz, 9 GB RAM).
• Methods
– CPALS – Alternating least squares. Used parafac_als in the Tensor Toolbox
(Bader & Kolda)
– CPNLS – Nonlinear least squares. Used PARAFAC3W, which implements
Levenberg-Marquadt (necessary due to scaling ambiguity), by Tomasi and
Bro.
– CPOPT – Optimization. Used routines in the Tensor Toolbox in calculation
of function values and gradients. Optimization via Nonlinear Conjugate
Gradient (NCG) method with Hestenes-Stiefel update, using Poblano (in-
house code to be released soon).
– CPOPTR – Optimization with regularization. Same as above.
(Regularization parameter = 0.02.)
CPOPT is Fast and Accurate
Generated 360 dense test problems (with ranks 3 and 5) and factorized with R as
the correct number of components and one more than that. Total of 720 tests for
each entry below.
KxKxK O(RK3) O(R3K3) O(RK3) O(RK3)

R = # components
Overfactoring has a significant impact
CPOPT is robust to overfactoring
Amino (http://www.models.life.ku.dk/)
Emission mode Emission mode Emission mode

0.16 0.2 0.3
0.14 0.18
0.25
0.16
0.12
0.14 0.2
0.1
0.12
0.15
0.08
1
0.1
3
0.1
0.06 0.08
0.06 0.05
0.04
0.04
0.02 0
0.02
0 0 -0.05
250 300 350 400 450 250 300 350 400 450 250 300 350 400 450
Emission wavelength Emission wavelength Emission wavelength
Application: Link Prediction
Link Prediction on Bibliometric Data
2007
2005…
…2004
1992
1991
authors
# of papers
by ith author conferences
at jth conf. in year k.
Question1: Can we use tensor decompositions to model the data and extract
meaningful factors?
Question2: Can we predict who is going to publish at which conferences in
future?
Components make sense!
year c1 c2 cR
DBLP + …
≈
authors
b 1 b2 bR
a1 a2 aR
conferences
ar br cr
Hans Peter
Meinzer Thomas Martin
Author Mode Conference Mode Time mode
0.3 Lehmann 1.2 1
0.25 Heinrich BILDMED

1
0.9
Niemann 0.8
0.2 0.8 0.7
0.6
0.15 0.6
Coeffs.
Coeffs.
Coeffs.
0.5
0.1 0.4 CARS 0.4
0.05 0.2 DAGM 0.3
0.2
0 0
0.1
-0.05 -0.2 0
0 2000 4000 6000 8000 10000 12000 0 200 400 600 800 1000 1200 1400 1600 1800 1992 1994 1996 1998 2000 2002 2004
Authors Conferences Years
year c1 c2 cR
+ …
≈
authors
b 1 b2 bR
X
a1 a2 aR
conferences
ar br cr
Hans Peter
Meinzer Thomas Martin
Author Mode Conference Mode Time mode
0.3 Lehmann 1.2 1
0.25 Heinrich BILDMED

1
0.9
Niemann 0.8
0.2 0.8 0.7
0.6
0.15 0.6
Coeffs.
Coeffs.
Coeffs.
0.5
0.1 0.4 CARS 0.4
0.05 0.2 DAGM 0.3
0.2
0 0
0.1
-0.05 -0.2 0
0 2000 4000 6000 8000 10000 12000 0 200 400 600 800 1000 1200 1400 1600 1800 1992 1994 1996 1998 2000 2002 2004
year c1 c2 cR
+ …
≈
authors
b1 b2 bR
a1 a2 aR
conferences
ar br cr
Craig Boutilier
Author mode Conference mode Time mode
0.16 1.2 0.6
0.14 Daphne Koller 1 0.5
0.12
IJCAI
0.8 0.4
0.1
0.08 0.6 0.3

Coeffs.
Coeffs.
Coeffs.
0.06 0.4 0.2
0.04
0.2 0.1
0.02
0 0
0
-0.02 -0.2 -0.1

0 2000 4000 6000 8000 10000 12000 0 200 400 600 800 1000 1200 1400 1600 1800 1992 1994 1996 1998 2000 2002 2004
year c1 c2 cR
+ …
≈
authors
b1 b2 bR
a1 a2 aR
conferences
ar br cr
Craig Boutilier
Author mode Conference mode Time mode
0.16 1.2 0.6
0.14 Daphne Koller 1 0.5
0.12
IJCAI
0.8 0.4
0.1
0.08 0.6 0.3

Coeffs.
Coeffs.
Coeffs.
0.06 0.4 0.2
0.04
0.2 0.1
0.02
0 0
0
-0.02 -0.2 -0.1

0 2000 4000 6000 8000 10000 12000 0 200 400 600 800 1000 1200 1400 1600 1800 1992 1994 1996 1998 2000 2002 2004
Link Prediction Problem
TRAIN: …2004
1992 c1 c2 cR
1991
≈ + …
authors
b1 b2 bR
a1 a2 aR
conferences
TEST: 2007
2006 ~ 60K links out of 19 million
2005 possible <author, conf> pairs
authors
~ 0.3% dense
authors
~ 32K previously unseen links in

the training set
conferences conferences <authori, confj> = 0
<authori, confj> = 1
if i author publishes at jth conf.
th
Score for <authori, confj>
• Sign ambiguity:
c1 c2
≈ b1
+ b2
a1 a2
• Fix signs using the signs of the maximum magnitude entries and then compute a
score for each author-conference pair using the information from the time domain:
b1
a1
+ …
b1 b2 bR
0.7
c1 a1 a2 aR
0.6
0.5
0.4
t
0.3
0.2
0.1
time
-0.1
0 2 4 6 8 10 12 14
Score for <authori, confj>
• Sign ambiguity:
c1 c2
≈ b1
+ b2
a1 a2
• Fix signs using the signs of the maximum magnitude entries and then compute a
score for each author-conference pair using the information from the time domain:
b1
a1
+ …
b1 b2 bR
a1 0.45
0.4
c2 a2 aR
0.35
0.3
0.25
t
0.2
0.15
0.1
0.05
-0.05
0 2 4 6 8 10 12 14
Performance Measure: AUC
s: contains the scores for all possible pairs, e.g., ~19 million
authors
scores sorted scores labels
⎡s11⎤ ⎡ s95 ⎤ ⎡1 ⎤
⎢s ⎥ ⎢s ⎥ ⎢0 ⎥
conferences <authori, confj> = 0
⎢ 12⎥ sort ⎢ 23 ⎥ ⎢ ⎥ <authori, confj> = 1
⎢....⎥ ⎢.... ⎥ if i author publishes at jth conf.
th
⎢....⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢sij ⎥ ⎢.... ⎥ ⎢1 ⎥
⎢....⎥ ⎢.... ⎥ ⎢....⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢⎣sIJ ⎥⎦ ⎢⎣ s67 ⎥⎦ ⎣0 ⎦
N: number of 1’s
M: number of 0’s
Performance Measure: AUC
s: contains the scores for all possible pairs, e.g., ~19 million
TP rate FP rate Receiver Operating

scores sorted scores labels Characteristic (ROC)
⎡s11⎤ ⎡ s95 ⎤ ⎡1 ⎤ 1/N 0
Curve
⎢s ⎥
1
⎢s ⎥ ⎢0 ⎥ 0.9
⎢ 12⎥ sort ⎢ 23 ⎥ ⎢ ⎥
1/N 1/M 0.8
⎢....⎥ ⎢.... ⎥
0.7
TP rate
⎢....⎥ 0.6
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
0.5
Area Under the curve

⎢sij ⎥ ⎢.... ⎥
0.4
⎢1 ⎥ 0.3
(AUC)
⎢....⎥ ⎢.... ⎥ ⎢....⎥
0.2
⎢ ⎥ ⎢ ⎥
0.1
⎢ ⎥ 0
⎢⎣sIJ ⎥⎦ ⎢⎣ s67 ⎥⎦
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
⎣0 ⎦ 1 1
FP rate
N: number of 1’s
M: number of 0’s
Performance Evaluation
CP
AUC=0.92
Predicting Links
for 2005 - 2007 (~ 60K):
RANDOM
AUC=0.87
Predicting Previously Unseen Links
for 2005 - 2007(~ 32K):
CP-WOPT: Handling Missing Data
Missing Data Examples
Missing data in different disciplines due to loss CHEMISTRY

Tomasi&Bro’05
of information, machine failures, different sampling
frequencies or experimental-set ups.
• Chemometrics
• Biomedical signal processing (e.g., EEG)
• Network traffic analysis (e.g., packet drops)
emission
• Computer vision (e.g., occlusions)
excitation
• …
EEG
subject 1 subject N subjects
≈
channels
channels
+…+
time-frequency time-frequency
Modify the objective for CP
FOR HANDLING MISSING DATA

NO MISSING DATA Optimization Problem
Objective Function
Our approach: CP-WOPT
Objective Function
Objective and Gradient
Objective Function
Gradient (for r = 1,…,R; i=1,…I; j=1,..J; k=1,..K )

Gradient in Matrix Form
Objective Function
Gradient
Experimental Set-Up[Tomasi&Bro’05]
20 triplets
Step 2: Construct tensor from factor
matrices and add noise ( 2%
homoscedastic noise)
Step 1: Generate random
factor matrices A, B, C
with R = 5 or 10
columns each and
collinearity set to 0.5.
Step 4: Use algorithm to extract R Step 3: Set some entries to missing

factors. Compare against factors in • Percentage of Missing Data: 10%, 40%,
Step 1. 70%
= + …+ Missing: entries, fibers
R
CP-WOPT is Accurate!
Generated 40 test problems (with ranks 5 and 10) and factorized with an R-
component CP model. Each entry corresponds to the percentage of correctly
recovered solutions.
# known data entries

# variables
CP-WOPT is Accurate!
Generated 40 test problems (with ranks 5 and 10) and factorized with an R-
component CP model. Each entry corresponds to the percentage of correctly
recovered solutions.
CPNLS : Nonlinear least squares. Used INDAFAC, which implements Levenberg-

Marquadt [Tomasi and Bro’05].
Other alternative: ALS-based imputation (For comparisons, see Tomasi and Bro’05).
CP-WOPT is Fast!
Generated 60 test problems (with M =10%, 40% and 70%) and factorized with
an R-component CP model. Each entry corresponds to the average/std of the CP
models, which successfully recover the underlying factors.
CP-WOPT is useful for real data!
Thanks to Morten Mørup!
GOAL: To differentiate between left and right hand stimulation

subjects
≈
channels
+ +
missing
time-frequency
COMPLETE DATA INCOMPLETE DATA
Summary & Future Work
• New CPOPT method
– Accurate & scalable
• Extend CPOPT to CP-WOPT to
handle missing data
– Accurate & scalable
• More open questions…
– Starting point?
– Tuning the optimization
– Regularization
– Exploiting sparsity
– Nonnegativity
• Application to link prediction
– On-going work comparing to other
methods
Thank you!
• More on tensors and tensor models:
– Survey : E. Acar and B. Yener, Unsupervised Multiway Data Analysis: A Literature Survey,
IEEE Transactions on Knowledge and Data Engineering, 21(1): 6-20, 2009.
– CPOPT : E. Acar, T. G. Kolda and D. M. Dunlavy, An Optimization Approach for Fitting
Canonical Tensor Decompositions, Submitted for publication.
– CP-WOPT : E. Acar, T.G. Kolda, D. M. Dunlavy and M. Mørup, Tensor Factorizations with
Missing Data, Submitted for publication.
– Link Prediction: E. Acar, T.G. Kolda and D. M. Dunlavy, Link Prediction on Evolving Data, in
preparation.
• Contact:
– Evrim Acar, eacarat@sandia.gov
– Tamara G. Kolda, tgkolda@sandia.gov
– Daniel M. Dunlavy, dmdunla@sandia.gov
Minisymposia on
Tensors and Tensor-based Computations

Tensor

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Tensor

Diunggah oleh

Hak Cipta:

Format Tersedia

The Canonical Tensor Decomposition and

Its Applications to Social Network Analysis

Evrim Acar, Tamara G. Kolda and Daniel M. Dunlavy

Acar et al, 2007, De Vos et al, 2007

Row: mode-2 fibers

Tube: mode-3 fibers

Matrix Khatri-Rao Product

Optimization Problem Objective Function

where the vector x comprises the

Each step can be converted to a

Repeat the following steps until “convergence”:

Very fast, but not always accurate.

Define the objective function:

Gradient (for r = 1,…,R)

Note that this formulation can be used to

CPOPTR: Apply derivative-based optimization method to the following regularized

Proposed by Paatero’97 and also

• All experiments were performed in MATLAB on a Linux

KxKxK O(RK3) O(R3K3) O(RK3) O(RK3)

Emission mode Emission mode Emission mode

0.25 Heinrich BILDMED

0.2 0.8 0.7

0.05 0.2 DAGM 0.3

0.25 Heinrich BILDMED

0.2 0.8 0.7

0.05 0.2 DAGM 0.3

0.14 Daphne Koller 1 0.5

0.08 0.6 0.3

-0.02 -0.2 -0.1

0.14 Daphne Koller 1 0.5

0.08 0.6 0.3

-0.02 -0.2 -0.1

~ 32K previously unseen links in

TP rate FP rate Receiver Operating

Area Under the curve

Missing data in different disciplines due to loss CHEMISTRY

subject 1 subject N subjects

FOR HANDLING MISSING DATA

Gradient (for r = 1,…,R; i=1,…I; j=1,..J; k=1,..K )

Step 4: Use algorithm to extract R Step 3: Set some entries to missing

= + …+ Missing: entries, fibers

# known data entries

CPNLS : Nonlinear least squares. Used INDAFAC, which implements Levenberg-

GOAL: To differentiate between left and right hand stimulation

Anda mungkin juga menyukai