Anda di halaman 1dari 50

Factor Analysis

Elizabeth Garrett-Mayer, PhD


Georgiana Onicescu, ScM
Cancer Prevention and Control Statistics Tutorial
July 9, 2009
Motivating Example: Cohesion in Dragon Boat
paddler cancer survivors
 Dragon boat paddling is an ancient Chinese sport that offers a unique
blend of factors that could potentially enhance the quality of the lives of
cancer survivor participants.
 Evaluating the efficacy of dragon boating to improve the overall quality of
life among cancer survivors has the potential to advance our understanding
of factors that influence quality-of-life among cancer survivors.
 We hypothesize that physical activity conducted within the context of the
social support of a dragon boat team contributes significantly to improved
overall quality of life above and beyond a standard physical activity program
because the collective experience of dragon boating is likely enhanced by
team sport factors such as cohesion, teamwork, and the goal of
competition.
 Methods: 134 cancer survivors self-selected to an 8-week dragon boat
paddling intervention group or to an organized walking program. Each study
arm was comprised of a series of 3 groups of approximately 20-25
participants, with pre- and post-testing to compare quality of life and
physical performance outcomes between study arms.
Motivating Example: Cohesion

 We have a concept of what “cohesion” is, but we


can’t measure it directly.
 Merriam-Webster:
• the act or state of sticking together tightly
• the quality or state of being made one
 How do we measure it?
 We cannot simply say “how cohesive is your
team?” or “on a scale from 1-10, how do you
rate your team cohesion?”
 We think it combines several elements of “unity”
and “team spirit” and perhaps other “factors”
Factor Analysis
 Data reduction tool
 Removes redundancy or duplication from a set of
correlated variables
 Represents correlated variables with a smaller set of
“derived” variables.
 Factors are formed that are relatively independent of one
another.
 Two types of “variables”:
• latent variables: factors
• observed variables
Cohesion Variables:
G1 (I do not enjoy being a part of the social environment of this exercise group)
G2 (I am not going to miss the members of this exercise group when the program
ends)
G3 (I am unhappy with my exercise group’s level of desire to exceed)
G4 (This exercise program does not give me enough opportunities to improve my
personal performance)
G5 (For me, this exercise group has become one of the most important social
groups to which I belong)
G6 (Our exercise group is united in trying to reach its goals for performance)
G7 (We all take responsibility for the performance by our exercise group)
G8 (I would like to continue interacting with some of the members of this exercise
group after the program ends)
G9 (If members of our exercise group have problems in practice, everyone wants to
help them)
G10 (Members of our exercise group do not freely discuss each athlete’s
responsibilities during practice)
G11 (I feel like I work harder during practice than other members of this exercise
group)
Other examples

 Diet
 Air pollution
 Personality
 Customer satisfaction
 Depression
 Quality of Life
Some Applications of Factor Analysis

1. Identification of Underlying Factors:


• clusters variables into homogeneous sets
• creates new variables (i.e. factors)
• allows us to gain insight to categories
2. Screening of Variables:
• identifies groupings to allow us to select one variable to
represent many
• useful in regression (recall collinearity)
3. Summary:
• Allows us to describe many variables using a few factors
4. Clustering of objects:
• Helps us to put objects (people) into categories depending on
their factor scores
“Perhaps the most widely used (and misused) multivariate
[technique] is factor analysis. Few statisticians are neutral about
this technique. Proponents feel that factor analysis is the
greatest invention since the double bed, while its detractors feel
it is a useless procedure that can be used to support nearly any
desired interpretation of the data. The truth, as is usually the case,
lies somewhere in between. Used properly, factor analysis can
yield much useful information; when applied blindly, without
regard for its limitations, it is about as useful and informative as
Tarot cards. In particular, factor analysis can be used to explore
the data for patterns, confirm our hypotheses, or reduce the
Many variables to a more manageable number.

-- Norman Streiner, PDQ Statistics


Let’s work backwards

 One of the primary goals of factor analysis is


often to identify a measurement model for a
latent variable
 This includes
• identifying the items to include in the model
• identifying how many ‘factors’ there are in the latent
variable
• identifying which items are “associated” with which
factors
Standard Result
------------------------------------
Variable | Factor1 Factor2 |
-------------+--------------------+
notenjoy | -0.3118 0.5870 |
notmiss | -0.3498 0.6155 |
desireexceed | -0.1919 0.8381 |
personalpe~m | -0.2269 0.7345 |
importants~l | 0.5682 -0.1748 |
groupunited | 0.8184 -0.1212 |
responsibi~y | 0.9233 -0.1968 |
interact | 0.6238 -0.2227 |
problemshelp | 0.8817 -0.2060 |
notdiscuss | -0.0308 0.4165 |
workharder | -0.1872 0.5647 |
-----------------------------------
How to interpret?
 Loadings: represent correlations ------------------------------------
between item and factor Variable | Factor1 Factor2 |
 High loadings: define a factor -------------+--------------------+
notenjoy | -0.3118 0.5870 |
 Low loadings: item does not “load” on notmiss | -0.3498 0.6155 |
factor desireexceed | -0.1919 0.8381 |
 Easy to skim the loadings personalpe~m | -0.2269 0.7345 |
importants~l | 0.5682 -0.1748 |
 This example: groupunited | 0.8184 -0.1212 |
• factor 1 is defined by G5, G6, responsibi~y | 0.9233 -0.1968 |
G7, G8 G9 interact | 0.6238 -0.2227 |
• factor 2 is defined by G1, G2, problemshelp | 0.8817 -0.2060 |
notdiscuss | -0.0308 0.4165 |
 G3, G4, G10, G11
workharder | -0.1872 0.5647 |
 Other things to note: -----------------------------------
• factors are ‘independent’ (usually)
• we need to ‘name’ factors High loadings are highlighted
• important to check their face validity. in yellow.
• These factors can now be ‘calculated’
using this model
• Each person is assigned a factor score for
each factor
• Range between -1 to 1
How to interpret?
------------------------------------
 Authors may conclude Variable | Factor1 Factor2 |
-------------+--------------------+
something like: notenjoy | -0.3118 0.5870 |
notmiss | -0.3498 0.6155 |
“We were able to derive two desireexceed | -0.1919 0.8381 |
personalpe~m | -0.2269 0.7345 |
factors from the 11 items. The importants~l | 0.5682 -0.1748 |
groupunited | 0.8184 -0.1212 |
first factor is defined as responsibi~y | 0.9233 -0.1968 |
“teamwork.” The second factor interact |
problemshelp |
0.6238
0.8817
-0.2227 |
-0.2060 |
is defined as “personal notdiscuss | -0.0308
workharder | -0.1872
0.4165 |
0.5647 |
competitive nature .” These -----------------------------------

two factors describe 72% of the High loadings are highlighted


variance among the items.” in yellow.
Where did the results come from?

Based on the basic “Classical Test Theory Idea”:

For a case with just one factor:


Ideal: X1 = F + e1 var(ej) = var(ek) , j ≠ k
X2 = F + e2

Xm = F + em

Reality: X1 = λ1F + e1 var(ej) ≠ var(ek) , j ≠ k


X2 = λ2F + e2

Xm = λmF + em

(unequal “sensitivity” to change in factor)


(Related to Item Response Theory (IRT))
Multi-Factor Models

 Two factor orthogonal model


 ORTHOGONAL = INDEPENDENT
 Example: cohesion has two domains
X1 = λ11F1 + λ12F2 + e1
X2 = λ21F1 + λ22F2 + e2
…….
X11 = λ111F1 + λ112F2 + e11

 More generally, m factors, n observed variables


X1 = λ11F1 + λ12F2 +…+ λ1mFm + e1
X2 = λ21F1 + λ22F2 +…+ λ2mFm + e2
…….
Xn = λn1F1 + λn2F2 +…+ λnmFm + en
Loadings (estimated) in our example
 11 12    0.31 0.59 
    0.35 0.62 
 21 22  
 31 32   0.19 0.84 
   
 41 42    0.23 0.73 
 51 52   0.57  0.17
   
 61 62    0.82  0.12
    0.92  0.20
 71 72
  
 81 82   0.62  0.22
    0.88  0.21
 91 92   
101 102    0.03 0.42 
   
111 112   0.19 0.56 
The factor analysis process

 Multiple steps
 “Stepwise optimal”
• many choices to be made!
• a choice at one step may impact the remaining
decisions
• considerable subjectivity
 Data exploration is key
 Strong theoretical model is critical
Steps in Exploratory Factor Analysis

(1) Collect and explore data: choose relevant variables.


(2) Determine the number of factors
(3) Estimate the model using predefined number of factors
(4) Rotate and interpret
(5) (a) Decide if changes need to be made (e.g. drop
item(s), include item(s))
(b) repeat (3)-(4)
(6) Construct scales and use in further analysis
Data Exploration

 Histograms
• normality
• discreteness
• outliers
 Covariance and correlations between variables
• very high or low correlations?
 Same scale
 high = good, low = bad?
Data exploration
100
Frequency

Frequency

Frequency
0 30

0 40
0 40

1 2 3 4 5 1 2 3 4 5 1 2 3 4 5

NotEnjoyPOST NotMissPOST DesireExceedPOST


40 80
Frequency

Frequency

Frequency

30
0 20
0

0
1 2 3 4 5 1 2 3 4 5 1 2 3 4 5

PersonalPerformPOST ImportantSocialPOST GroupUnitedPOST


30 60

30 60
30 60
Frequency

Frequency

Frequency
0

0
1 2 3 4 5 1 2 3 4 5 1 2 3 4 5

ResponsibilityPOST InteractPOST ProblemsHelpPOST


20 40

70
Frequency

Frequency

0 30
0

1 2 3 4 5 1 2 3 4 5

NotDiscussPOST WorkHarderPOST
Correlation Matrix
. pwcorr notenjoy-workharder

| notenjoy notmiss desire~d person~m import~l groupu~d respon~y


-------------+---------------------------------------------------------------
notenjoy | 1.0000
notmiss | 0.3705 1.0000
desireexceed | 0.2609 0.3987 1.0000
personalpe~m | 0.2552 0.3472 0.5946 1.0000
importants~l | -0.2514 -0.3357 -0.1384 -0.3123 1.0000
groupunited | -0.1732 -0.2460 -0.2384 -0.1359 0.4364 1.0000
responsibi~y | -0.2554 -0.3663 -0.2908 -0.2507 0.4399 0.8016 1.0000
interact | -0.1847 -0.2966 -0.2162 -0.2294 0.4415 0.4251 0.5174
problemshelp | -0.2561 -0.2865 -0.2567 -0.1940 0.4159 0.6498 0.7748
notdiscuss | 0.1610 0.0763 0.2253 0.2193 -0.0242 0.0027 -0.0598
workharder | 0.3482 0.1606 0.3794 0.3848 -0.0010 -0.2765 -0.3083

| interact proble~p notdis~s workha~r


-------------+------------------------------------
interact | 1.0000
problemshelp | 0.5446 1.0000
notdiscuss | -0.0346 -0.0699 1.0000
workharder | -0.1063 -0.2358 0.2660 1.0000
Valid correlations?
jitter(data[, 184 + j])

jitter(data[, 184 + j])

jitter(data[, 184 + j])

jitter(data[, 184 + j])


5

5
5

5
3
3

3
1

1
1
1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5

jitter(data[, 184 + i]) jitter(data[, 184 + i]) jitter(data[, 184 + i]) jitter(data[, 184 + i])
jitter(data[, 184 + j])

jitter(data[, 184 + j])

jitter(data[, 184 + j])

jitter(data[, 184 + j])


5

5
5

5
3
3

3
1

1
1

1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5

jitter(data[, 184 + i]) jitter(data[, 184 + i]) jitter(data[, 184 + i]) jitter(data[, 184 + i])
jitter(data[, 184 + j])

jitter(data[, 184 + j])

jitter(data[, 184 + j])

jitter(data[, 184 + j])


5

5
3

3
1

1
1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5

jitter(data[, 184 + i]) jitter(data[, 184 + i]) jitter(data[, 184 + i]) jitter(data[, 184 + i])
jitter(data[, 184 + j])

jitter(data[, 184 + j])

jitter(data[, 184 + j])


5

5
3

3
1

1 2 3 4 5 1 2 3 4 5 1 2 3 4 5

jitter(data[, 184 + i]) jitter(data[, 184 + i]) jitter(data[, 184 + i])


Data Matrix

 Factor analysis is totally dependent on correlations


between variables.
 Factor analysis summarizes correlation structure

v1……...vk v1……...vk F1…..Fj


v1
O1 v1
.
. .
.
. .
.
. .
vk
. vk
.
.
. Correlation Factor
. Matrix Matrix
On
Implications for assumptions about X’s?
Data Matrix
Important implications

 Correlation matrix must be valid measure of


association
 Likert scale? i.e. “on a scale of 1 to K?”
 Consider previous set of plots
 Is Pearson (linear) correlation a reasonable
measure of association?
Correlation for categorical items

 Odds ratios? Nope. on the wrong scale.


 Need measures on scale of -1 to 1, with zero meaning
no association
 Solutions:
• tetrachoric correlation: for binary items
• polychoric correlation: for ordinal items
 -’choric corelations
• assume that variables are truncated versions of continuous
variables
• only appropriate if ‘continuous underlying’ assumption makes
sense
 Not available in many software packages for factor
analysis!
Polychoric Correlation Matrix
Polychoric correlation matrix

notenjoy notmiss desireexceed


notenjoy 1
notmiss .64411349 1
desireexceed .44814752 .60971951 1
personalperform .37687346 .49572253 .74640077
importantsocial -.33466689 -.35262233 -.18773414
groupunited -.26640575 -.25987331 -.32414348
responsibility -.38218019 -.43174724 -.34289848
interact -.31300025 -.41147172 -.28711931
problemshelp -.40864072 -.44688816 -.34338549
notdiscuss .28367782 .2071563 .33714715
workharder .49864257 .26866894 .50117974

personalperform importantsocial groupunited


personalperform 1
importantsocial -.42902852 1
groupunited -.22011768 .47698468 1
responsibility -.32272048 .49187407 .85603168
interact -.37003374 .51150655 .46469124
problemshelp -.31435615 .51458893 .75552992
notdiscuss .28191066 -.07289447 -.0934676
workharder .4766736 .02547056 -.35603256

responsibility interact problemshelp


responsibility 1
interact .59252523 1
problemshelp .84727982 .60910395 1
notdiscuss -.11548039 -.09653691 -.11580359
workharder -.37311526 -.13316066 -.30122735
Polychoric Correlation in Stata

. findit polychoric
. polychoric notenjoy-workharder
. matrix R = r(R)
Choosing Number of Factors

 Intuitively: The number of uncorrelated constructs that


are jointly measured by the X’s.
 Only useful if number of factors is less than number of
X’s (recall “data reduction”).

Use “principal components” to help decide


• type of factor analysis
• number of factors is equivalent to number of variables
• each factor is a weighted combination of the input variables:
F1 = a11X1 + a12X2 + ….
• Recall: For a factor analysis, generally,
X1 = a11F1 + a12F2 +...
Eigenvalues
 To select how many factors to use, consider
eigenvalues from a principal components analysis
 Two interpretations:
• eigenvalue  equivalent number of variables which the factor
represents
• eigenvalue  amount of variance in the data described by the
factor.
 Rules to go by:
• number of eigenvalues > 1
• scree plot
• % variance explained
• comprehensibility
 Note: sum of eigenvalues is equal to the number of
items
Cohesion Example
. factormat R, pcf n(134)
(obs=134)

Factor analysis/correlation Number of obs = 134


Method: principal-component factors Retained factors = 3
Rotation: (unrotated) Number of params = 30

--------------------------------------------------------------------------
Factor | Eigenvalue Difference Proportion Cumulative
-------------+------------------------------------------------------------
Factor1 | 4.96356 3.14606 0.4512 0.4512
Factor2 | 1.81751 0.76378 0.1652 0.6165
Factor3 | 1.05373 0.27749 0.0958 0.7123
Factor4 | 0.77624 0.02065 0.0706 0.7828
Factor5 | 0.75559 0.22587 0.0687 0.8515
Factor6 | 0.52972 0.05654 0.0482 0.8997
Factor7 | 0.47318 0.24670 0.0430 0.9427
Factor8 | 0.22647 0.02484 0.0206 0.9633
Factor9 | 0.20163 0.07341 0.0183 0.9816
Factor10 | 0.12822 0.05407 0.0117 0.9933
Factor11 | 0.07415 . 0.0067 1.0000
--------------------------------------------------------------------------
Scree Plot for Cohesion Example
5
Scree plot of eigenvalues after factor

. screeplot
4
3
2
1
0

0 5 10
Number
Choose two factors: Now fit the model
. factormat R, n(134) ipf factor(2)
(obs=134)

Factor analysis/correlation Number of obs = 134


Method: iterated principal factors Retained factors = 2
Rotation: (unrotated) Number of params = 21

.........
Factor loadings (pattern matrix) and unique variances

-------------------------------------------------
Variable | Factor1 Factor2 | Uniqueness
-------------+--------------------+--------------
notenjoy | -0.6091 0.2661 | 0.5582
notmiss | -0.6566 0.2648 | 0.4988
desireexceed | -0.6712 0.5373 | 0.2608
personalpe~m | -0.6342 0.4344 | 0.4091
importants~l | 0.5538 0.2162 | 0.6466
groupunited | 0.7164 0.4137 | 0.3156
responsibi~y | 0.8456 0.4197 | 0.1088
interact | 0.6271 0.2132 | 0.5613
problemshelp | 0.8187 0.3866 | 0.1802
notdiscuss | -0.2830 0.3072 | 0.8256
workharder | -0.4977 0.3260 | 0.6461
-------------------------------------------------
Interpretability?
 Not interpretable at this stage
 In an unrotated solution, the first factor describes most of
variability.
 Ideally we want to
• spread variability more evenly among factors.
• make factors interpretable
 To do this we “rotate” factors:
• redefine factors such that loadings on various factors tend to be
very high (-1 or 1) or very low (0)
• intuitively, it makes sharper distinctions in the meanings of the
factors
• We use “factor analysis” for rotation NOT principal
components!
 Rotation does NOT improve fit!
Rotating Factors (Intuitively)

F2
F2
2
3
1 3 2

F1
4 4

F1

Factor 1 Factor 2 Factor 1 Factor 2


x1 0.5 0.5 x1 0 0.6
x2 0.8 0.8 x2 0 0.9
x3 -0.7 0.7 x3 -0.9 0
x4 -0.5 -0.5 x4 0 -0.9
. rotate Rotated Solution
Factor analysis/correlation Number of obs = 134
Method: iterated principal factors Retained factors = 2
Rotation: orthogonal varimax (Kaiser off) Number of params = 21

--------------------------------------------------------------------------
Factor | Variance Difference Proportion Cumulative
-------------+------------------------------------------------------------
Factor1 | 3.35544 0.72180 0.5603 0.5603
Factor2 | 2.63364 . 0.4397 1.0000
--------------------------------------------------------------------------
LR test: independent vs. saturated: chi2(55) = 959.26 Prob>chi2 = 0.0000

Rotated factor loadings (pattern matrix) and unique variances

-------------------------------------------------
Variable | Factor1 Factor2 | Uniqueness
-------------+--------------------+--------------
notenjoy | -0.3118 0.5870 | 0.5582
notmiss | -0.3498 0.6155 | 0.4988
desireexceed | -0.1919 0.8381 | 0.2608
personalpe~m | -0.2269 0.7345 | 0.4091
importants~l | 0.5682 -0.1748 | 0.6466
groupunited | 0.8184 -0.1212 | 0.3156
responsibi~y | 0.9233 -0.1968 | 0.1088
interact | 0.6238 -0.2227 | 0.5613
problemshelp | 0.8817 -0.2060 | 0.1802
notdiscuss | -0.0308 0.4165 | 0.8256
workharder | -0.1872 0.5647 | 0.6461
-------------------------------------------------
Rotation options

 “Orthogonal”
• maintains independence of factors
• more commonly seen
• usually at least one option
• Stata: varimax, quartimax, equamax, parsimax, etc.
 “Oblique”
• allows dependence of factors
• make distinctions sharper (loadings closer to 0’s and
1’s
• can be harder to interpret once you lose
independence of factors
Uniqueness

 Should all items be retained?


 Uniquess for each item describes the proportion of the
item described by the factor model
 Recall an R-squared:
• proportion of variance in Y explained by X
 1-Uniqueness:
• proportion of the variance in Xk explained by F1, F2, etc.
 Uniqueness:
• represents what is left over that is not explained by factors
• “error” that remainese
 A GOOD item has a LOW uniqueness
Our current model?

Rotated factor loadings (pattern matrix) and unique variances

-------------------------------------------------
Variable | Factor1 Factor2 | Uniqueness
-------------+--------------------+--------------
notenjoy | -0.3118 0.5870 | 0.5582
notmiss | -0.3498 0.6155 | 0.4988
desireexceed | -0.1919 0.8381 | 0.2608
personalpe~m | -0.2269 0.7345 | 0.4091
importants~l | 0.5682 -0.1748 | 0.6466
groupunited | 0.8184 -0.1212 | 0.3156
responsibi~y | 0.9233 -0.1968 | 0.1088
interact | 0.6238 -0.2227 | 0.5613
problemshelp | 0.8817 -0.2060 | 0.1802
notdiscuss | -0.0308 0.4165 | 0.8256
workharder | -0.1872 0.5647 | 0.6461
-------------------------------------------------
Revised without “notdiscuss”

Rotated factor loadings (pattern matrix) and unique variances

-------------------------------------------------
Variable | Factor1 Factor2 | Uniqueness
-------------+--------------------+--------------
notenjoy | -0.3093 0.5811 | 0.5667
notmiss | -0.3345 0.6455 | 0.4715
desireexceed | -0.1783 0.8483 | 0.2486
personalpe~m | -0.2119 0.7551 | 0.3849
importants~l | 0.5618 -0.2057 | 0.6420
groupunited | 0.8265 -0.1271 | 0.3008
responsibi~y | 0.9247 -0.2089 | 0.1012
interact | 0.6160 -0.2469 | 0.5596
problemshelp | 0.8784 -0.2224 | 0.1789
workharder | -0.2023 0.5271 | 0.6813
-------------------------------------------------
Methods for Estimating Model

 Principal Components (already discussed)


 Principal Factor Method
 Iterated Principal Factor / Least Squares
 Maximum Likelihood (ML)

 Most common(?): ML and Least Squares


Unfortunately, default is often not the best approach!
 Caution! ipf and ml may not converge to the right
answer! Look for uniqueness of 0 or 1. Problem of
“identifiability” or getting “stuck.”
Interpretation

 Naming of Factors

 Wrong Interpretation: Factors represent separate


groups of people.

 Right Interpretation: Each factor represents a continuum


along which people vary (and dimensions are orthogonal
if orthogonal)
Factor Scores and Scales

 Each object (e.g. each cancer survivor) gets a factor score for
each factor.
 Old data vs. New data
 The factors themselves are variables
 An individual’s score is weighted combination of scores on input
variables
 These weights are NOT the factor loadings!
 Loadings and weights determined simultaneously so that there is
no correlation between resulting factors.
Factor Scoring
. predict f1 f2
(regression scoring assumed)

Scoring coefficients
(method = regression; based on varimax rotated factors)

----------------------------------
Variable | Factor1 Factor2
-------------+--------------------
notenjoy | -0.03322 0.19223 Why different than loadings?
notmiss | 0.04725 0.13279 Factors are generally
desireexceed | 0.15817 0.54996
scaled to have
personalpe~m | -0.04037 0.21452
importants~l | 0.02971 -0.02168 variance 1.
groupunited | 0.12273 0.12938 Mean is arbitrary.
responsibi~y | 0.60379 0.07719
interact | 0.04594 -0.00870 * If based on Pearson correlation
problemshelp | 0.31516 0.06376 mean will be zero.
workharder | 0.11750 0.10810
----------------------------------
Orthgonal (i.e., independent)?
7
6
5
4
3
2

1 2 3 4 5 6
Scores for factor 2
Teamwork (Factor 1) by Program
1 2
7
6
5
4
3
2

Graphs by progrm Dragon Boat Walking


Personal Competitive Nature (Factor 2) by Program
1 2
6
5
4
3
2
1

Graphs by progrm Dragon Boat Walking


Criticisms of Factor Analysis
 Labels of factors can be arbitrary or lack scientific basis
 Derived factors often very obvious
• defense: but we get a quantification
 “Garbage in, garbage out”
• really a criticism of input variables
• factor analysis reorganizes input matrix
 Too many steps that could affect results
 Too complicated
 Correlation matrix is often poor measure of association of input
variables.
Our example?

 Preliminary analysis of pilot data!


 Concern: negative items “hang together”, positive items
“hang together:
 Is separation into two factors:
• based on two different factors (teamwork, pers. comp. nature)
• based on negative versus positive items?
 Recall: the computer will always give you something!
 Validity?
• boxplots of factor 1 suggest something
• additional reliability and validity needs to be considered
Stata Code

pwcorr notenjoy-workharder
polychoric notenjoy-workharder
matrix R = r(R)
factormat R, pcf n(134)
screeplot
factormat R, n(134) ipf factor(2)
rotate

polychoric notenjoy notmiss desire personal important group r


matrix R = r(R)
factormat R, n(134) ipf factor(2)
rotate
predict f1 f2
scatter f1 f2
graph box f1, by(progrm)
graph box f2, by(progrm)
Stata Code for Pearson Correlation
factor notenjoy-workharder, pcf
screeplot
factor notenjoy-workharder, ipf factor(2)
rotate

factor notenjoy notmiss desire personal


important group respon interact problem
workharder, ipf factor(2)
rotate
predict f1 f2
scatter f1 f2
graph box f1, by(progrm)
graph box f2, by(progrm)
Stata Options
 Pearson correlation
• Use factor for principal components and factor analysis
 choose estimation approach: ipf, pcf, ml, pf
 choose to retain n factors: factor(n)
 Polychoric correlation
• Use factormat for principal components and factor analysis
 choose estimation approach: ipf, pcf, ml, pf
 choose to retain n factors: factor(n)
 include n(xxx) to describe the sample size
 Scree Plot: screeplot
 Rotate: choose rotation type: varimax (default), promax, etc.
 Create factor variables
• predict: list as many new variable names as there are retained
factors.
• Example: for 3 retained factors,
factor teamwork competition hardworks

Anda mungkin juga menyukai