Diet
Air pollution
Personality
Customer satisfaction
Depression
Quality of Life
Some Applications of Factor Analysis
Multiple steps
“Stepwise optimal”
• many choices to be made!
• a choice at one step may impact the remaining
decisions
• considerable subjectivity
Data exploration is key
Strong theoretical model is critical
Steps in Exploratory Factor Analysis
Histograms
• normality
• discreteness
• outliers
Covariance and correlations between variables
• very high or low correlations?
Same scale
high = good, low = bad?
Data exploration
100
Frequency
Frequency
Frequency
0 30
0 40
0 40
1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
Frequency
Frequency
30
0 20
0
0
1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
30 60
30 60
Frequency
Frequency
Frequency
0
0
1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
70
Frequency
Frequency
0 30
0
1 2 3 4 5 1 2 3 4 5
NotDiscussPOST WorkHarderPOST
Correlation Matrix
. pwcorr notenjoy-workharder
5
5
5
3
3
3
1
1
1
1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
jitter(data[, 184 + i]) jitter(data[, 184 + i]) jitter(data[, 184 + i]) jitter(data[, 184 + i])
jitter(data[, 184 + j])
5
5
5
3
3
3
1
1
1
1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
jitter(data[, 184 + i]) jitter(data[, 184 + i]) jitter(data[, 184 + i]) jitter(data[, 184 + i])
jitter(data[, 184 + j])
5
3
3
1
1
1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
jitter(data[, 184 + i]) jitter(data[, 184 + i]) jitter(data[, 184 + i]) jitter(data[, 184 + i])
jitter(data[, 184 + j])
5
3
3
1
1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
. findit polychoric
. polychoric notenjoy-workharder
. matrix R = r(R)
Choosing Number of Factors
--------------------------------------------------------------------------
Factor | Eigenvalue Difference Proportion Cumulative
-------------+------------------------------------------------------------
Factor1 | 4.96356 3.14606 0.4512 0.4512
Factor2 | 1.81751 0.76378 0.1652 0.6165
Factor3 | 1.05373 0.27749 0.0958 0.7123
Factor4 | 0.77624 0.02065 0.0706 0.7828
Factor5 | 0.75559 0.22587 0.0687 0.8515
Factor6 | 0.52972 0.05654 0.0482 0.8997
Factor7 | 0.47318 0.24670 0.0430 0.9427
Factor8 | 0.22647 0.02484 0.0206 0.9633
Factor9 | 0.20163 0.07341 0.0183 0.9816
Factor10 | 0.12822 0.05407 0.0117 0.9933
Factor11 | 0.07415 . 0.0067 1.0000
--------------------------------------------------------------------------
Scree Plot for Cohesion Example
5
Scree plot of eigenvalues after factor
. screeplot
4
3
2
1
0
0 5 10
Number
Choose two factors: Now fit the model
. factormat R, n(134) ipf factor(2)
(obs=134)
.........
Factor loadings (pattern matrix) and unique variances
-------------------------------------------------
Variable | Factor1 Factor2 | Uniqueness
-------------+--------------------+--------------
notenjoy | -0.6091 0.2661 | 0.5582
notmiss | -0.6566 0.2648 | 0.4988
desireexceed | -0.6712 0.5373 | 0.2608
personalpe~m | -0.6342 0.4344 | 0.4091
importants~l | 0.5538 0.2162 | 0.6466
groupunited | 0.7164 0.4137 | 0.3156
responsibi~y | 0.8456 0.4197 | 0.1088
interact | 0.6271 0.2132 | 0.5613
problemshelp | 0.8187 0.3866 | 0.1802
notdiscuss | -0.2830 0.3072 | 0.8256
workharder | -0.4977 0.3260 | 0.6461
-------------------------------------------------
Interpretability?
Not interpretable at this stage
In an unrotated solution, the first factor describes most of
variability.
Ideally we want to
• spread variability more evenly among factors.
• make factors interpretable
To do this we “rotate” factors:
• redefine factors such that loadings on various factors tend to be
very high (-1 or 1) or very low (0)
• intuitively, it makes sharper distinctions in the meanings of the
factors
• We use “factor analysis” for rotation NOT principal
components!
Rotation does NOT improve fit!
Rotating Factors (Intuitively)
F2
F2
2
3
1 3 2
F1
4 4
F1
--------------------------------------------------------------------------
Factor | Variance Difference Proportion Cumulative
-------------+------------------------------------------------------------
Factor1 | 3.35544 0.72180 0.5603 0.5603
Factor2 | 2.63364 . 0.4397 1.0000
--------------------------------------------------------------------------
LR test: independent vs. saturated: chi2(55) = 959.26 Prob>chi2 = 0.0000
-------------------------------------------------
Variable | Factor1 Factor2 | Uniqueness
-------------+--------------------+--------------
notenjoy | -0.3118 0.5870 | 0.5582
notmiss | -0.3498 0.6155 | 0.4988
desireexceed | -0.1919 0.8381 | 0.2608
personalpe~m | -0.2269 0.7345 | 0.4091
importants~l | 0.5682 -0.1748 | 0.6466
groupunited | 0.8184 -0.1212 | 0.3156
responsibi~y | 0.9233 -0.1968 | 0.1088
interact | 0.6238 -0.2227 | 0.5613
problemshelp | 0.8817 -0.2060 | 0.1802
notdiscuss | -0.0308 0.4165 | 0.8256
workharder | -0.1872 0.5647 | 0.6461
-------------------------------------------------
Rotation options
“Orthogonal”
• maintains independence of factors
• more commonly seen
• usually at least one option
• Stata: varimax, quartimax, equamax, parsimax, etc.
“Oblique”
• allows dependence of factors
• make distinctions sharper (loadings closer to 0’s and
1’s
• can be harder to interpret once you lose
independence of factors
Uniqueness
-------------------------------------------------
Variable | Factor1 Factor2 | Uniqueness
-------------+--------------------+--------------
notenjoy | -0.3118 0.5870 | 0.5582
notmiss | -0.3498 0.6155 | 0.4988
desireexceed | -0.1919 0.8381 | 0.2608
personalpe~m | -0.2269 0.7345 | 0.4091
importants~l | 0.5682 -0.1748 | 0.6466
groupunited | 0.8184 -0.1212 | 0.3156
responsibi~y | 0.9233 -0.1968 | 0.1088
interact | 0.6238 -0.2227 | 0.5613
problemshelp | 0.8817 -0.2060 | 0.1802
notdiscuss | -0.0308 0.4165 | 0.8256
workharder | -0.1872 0.5647 | 0.6461
-------------------------------------------------
Revised without “notdiscuss”
-------------------------------------------------
Variable | Factor1 Factor2 | Uniqueness
-------------+--------------------+--------------
notenjoy | -0.3093 0.5811 | 0.5667
notmiss | -0.3345 0.6455 | 0.4715
desireexceed | -0.1783 0.8483 | 0.2486
personalpe~m | -0.2119 0.7551 | 0.3849
importants~l | 0.5618 -0.2057 | 0.6420
groupunited | 0.8265 -0.1271 | 0.3008
responsibi~y | 0.9247 -0.2089 | 0.1012
interact | 0.6160 -0.2469 | 0.5596
problemshelp | 0.8784 -0.2224 | 0.1789
workharder | -0.2023 0.5271 | 0.6813
-------------------------------------------------
Methods for Estimating Model
Naming of Factors
Each object (e.g. each cancer survivor) gets a factor score for
each factor.
Old data vs. New data
The factors themselves are variables
An individual’s score is weighted combination of scores on input
variables
These weights are NOT the factor loadings!
Loadings and weights determined simultaneously so that there is
no correlation between resulting factors.
Factor Scoring
. predict f1 f2
(regression scoring assumed)
Scoring coefficients
(method = regression; based on varimax rotated factors)
----------------------------------
Variable | Factor1 Factor2
-------------+--------------------
notenjoy | -0.03322 0.19223 Why different than loadings?
notmiss | 0.04725 0.13279 Factors are generally
desireexceed | 0.15817 0.54996
scaled to have
personalpe~m | -0.04037 0.21452
importants~l | 0.02971 -0.02168 variance 1.
groupunited | 0.12273 0.12938 Mean is arbitrary.
responsibi~y | 0.60379 0.07719
interact | 0.04594 -0.00870 * If based on Pearson correlation
problemshelp | 0.31516 0.06376 mean will be zero.
workharder | 0.11750 0.10810
----------------------------------
Orthgonal (i.e., independent)?
7
6
5
4
3
2
1 2 3 4 5 6
Scores for factor 2
Teamwork (Factor 1) by Program
1 2
7
6
5
4
3
2
pwcorr notenjoy-workharder
polychoric notenjoy-workharder
matrix R = r(R)
factormat R, pcf n(134)
screeplot
factormat R, n(134) ipf factor(2)
rotate