Anda di halaman 1dari 61

Factor Analysis (FA)

Factor analysis is an interdependence technique


whose primary purpose is to define the underlying
structure among the variables in the analysis.
The purpose of FA is to condense the information
contained in a number of original variables into a
smaller set of new composite dimensions or
variates (factors) with a minimum loss of
information.
Factor analysis decision process
Stage 1: Objectives of factor analysis
Key issues:
Specifying the unit of analysis
R factor analysis- Correlation matrix of the variables to
summarize the characteristics.
Q factor analysis- Correlation matrix of the individual
respondents based on their characteristics. Condenses large
number of people into distinctly different group.
Achieving data summarization vs. data reduction
Data summarization- It is the definition of structure. Viewing
the set of variables at various levels of generalization,
ranging from the most detailed level to the more
generalized level. The linear composite of variables is called
variate or factor.
Data reduction- Creating entirely a new set of variables and
completely replace the original values with empirical value
(factor score).
Variable selection
The researcher should always consider the conceptual
underpinnings of the variables and use judgment as to the
appropriateness of the variables for factor analysis.

Using factor analysis with other multivariate techniques


Factor scores as representatives of variables will be used for
further analysis.

Stage 2: Designing a factor analysis


It involves three basic decisions:
Correlations among variables or respondents (Q type vs. R type)
Variable selection and measurement issues- Mostly performed
on metric variables. For nonmetric variables, define dummy
variables (0-1) and include in the set of metric variables.
Sample size- The sample must have more observations than
variables. The minimum sample size should be fifty
observations. Minimum 5 and hopefully at least 10
observations per variable is desirable.
Stage 3: Assumptions in factor analysis
The assumptions are more conceptual than statistical.
Conceptual issues- 1) Appropriate selection of variables 2)
Homogeneous sample.
Statistical issues- Ensuring the variables are sufficiently
intercorrelated to produce representative factors.
Measure of intercorrelation:
Visual inspection of Correlations greater than .30 in
substantial cases in correlation matrix , the factor
analysis is appropriate.
If partial correlation are high, indicating no
underlying factors, then factor analysis is
inappropriate.
Bartlett test of sphericity- A test for the presence of
correlation among the variables. A statistically
significant Bartletts test of sphericity (sig. >.05)
indicates that sufficient correlation exist among the
variables to proceed.
Measure of sampling adequacy (MSA)- This
index ranges from 0 to 1, reaching 1 when each
variable is perfectly predicted without error by the
other variables. The measures can be integrated with
following guidelines:
Kaiser-Meyer Measure of Sampling Adequacy
in the .90s marvelous
in the .80s meritorious
in the .70s middling
in the .60s mediocre
in the .50s miserable
below .50 unacceptable
MSA values must exceed .50 for both the overall test
and each individual variable
Variables with value less than .50 should be omitted
from the factor analysis.
Stage 4: Deriving factors and assessing overall fit
Apply factor analysis to identify the underlying
structure of relationships.
Two decisions are important:
Selecting the factor extraction method
Common factor analysis
Principal component analysis
Concept of Partitioning the variance of a variable
Common variance- Variance in the variable shared with all
other variables in the analysis. The variance is based on
variables correlations with other variables. Communality
of variable estimates common variance.
Specific variance- AKA unique variance. This variance of
variable cannot be explained by the correlations to the
other variables but is associated uniquely with a single
variable.
Error variance- It is due to unreliability in the data-
gathering process, measurement error, or a random
component in the measured phenomenon.
Component factor analysis- AKA principal
components analysis. Considers the total variance
and derives factors that contain small proportions
of unique variance and in some instances error
variance.

Common factor analysis- Considers only the


common or shared variance, assuming that both
the unique and error variance are not of interest in
defining
Diagonal Variance
value the structure of the variables.
Unit Total variance
y

Communal common
ity

Variance
extracted
Variance excluded
Suitability of factor extraction method
Component factor analysis is appropriate when data
reduction is primary concern.
Common factor analysis is appropriate when primary
objective is to identify the latent dimensions or constructs
represented in the original value.
Criteria for the number of factors to extract
Latent root criterion
It applies to both extraction method.
This criteria assumes that any individual factor should account
for the variance of at least a single variable if it is to be retained
for interpretation.
In component analysis each variable contribute a value of 1 to
the latent roots or eigen values.
So, factors having eigen values greater than 1 are considered
significant and selected.
Eigen value- It represents the amount of variance
accounted for by the factor. It is column sum of
squared loading for a factor.
Scree test criterion
This is plotting the latent roots against the
number of factors in their order of extraction.
The shape of the resulting curve is used to
evaluate the cutoff point.
The point at which the curve begins to
straighten out is considered to indicate the
maximum numbers of factors to extract.
As a general rule, the scree test results in at
least one and sometimes two or three more
factors being considered for inclusion than
does the latent root criterion.
Scree plot of eigenvalues after factor
5
4
Eigenvalues
2 1 3

Eigen
values
Scree
criterion
0

0 5 10
FactorNumber
Stage 5: Interpreting the factors
Three processes of factor interpretation
Estimate the factor matrix
Initial unrotated factor matrix is computed.
It contains factor loadings for each variable on each
factor.
Factor loadings are the correlation of each variable on
each factor.
Higher loadings making the variable representative of
the factor.
Factor rotation
Rotational method is employed to achieve simpler and
theoretically more meaningful factor solutions.
The reference axes of the factors are turned about the
origin until some position has been reached.
There are two types of rotation:
Orthogonal factor rotation
Oblique factor rotation.
Rotating Factors
F2
F2
2
3
1 3 2

F1
4
4

F1

Factor 1 Factor 2 Factor 1 Factor 2

x1 0.5 0.5 x1 0 0.6


x2 0.8 0.8 x2 0 0.9
x3 -0.7 0.7 x3 -0.9 0
x4 -0.5 -0.5 x4 0 -0.9
Orthogonal Rotation Oblique Rotation
When to use Factor Analysis?
Data Reduction
Identification of underlying latent structures
- Clusters of correlated variables are termed factors
Example:
Factor analysis could potentially be used to identify
the characteristics (out of a large number of
characteristics) that make a person popular.

Candidate characteristics: Level of social skills, selfishness, how


interesting a person is to others, the amount of time they spend
talking about themselves (Talk 2) versus the other person (Talk
1), their propensity to lie about themselves.

14
The R-Matrix Factor 1:
The better your social skills,
the more interesting and
talkative you tend to be.

Meaningful clusters of large


correlation coefficients between Factor 2:
subsets of variables suggests Selfish people are likely to lie
and talk about themselves.
these variables are measuring
aspects of the same underlying 15
dimension.
What is a Factor?
Factors can be viewed as classification
axes along which the individual
variables can be plotted.
The greater the loading of variables on
a factor, the more the factor explains
relationships among those variables.
Ideally, variables should be strongly
related to (or load on) only one factor.

16
Graphical Representation of a
factor plot

Note that each variable Factor loadings tell use about


loads primarily on only the relative contribution that a
one factor. variable makes to a factor

17
Mathematical Representation
of a factor plot
The equation describing a linear model can be
applied to the description of a factor.
The bs in the equation represent the factor
loadings observed in the factor plot.
Yi = b1X1i +b2X2i + bnXn + i

Factori = b1Variable1i +b2Variable2i + bnVariablen + i

Note: there is no intercept in the equation since the lines intersection at zero and hence
the intercept is also zero.

18
Mathematical Representation
of a factor plot
There are two factors underlying the popularity construct: general
sociability and consideration.

We can construct equations that describe each factor in terms of the


variables that have been measured.

Sociabilityi = b1Talk 1i +b2Social Skillsi + b3interesti


+ b4Talk 2 + b5Selfishi + b6Liari + i
Considerationi = b1Talk 1i +b2Social Skillsi +
b3interesti + b4Talk 2 + b5Selfishi + b6Liari + i
19
Mathematical Representation
of a factor plot
The values of the bs in the two equations differ, depending on
the relative importance of each variable to a particular factor.
Sociabilityi = 0.87Talk 1i +0.96Social Skillsi + 0.92Interesti + 0.00Talk 2 -
0.10Selfishi + 0.09Liari + i

Considerationi = 0.01Talk 1i - 0.03Social Skillsi + 0.04interesti + 0.82Talk 2 +


0.75Selfishi + 0.70Liari + i

Replace values of b with the co-ordinate of each variable on the graph.

Ideally, variables should have very high b-values for one factor and very low
b-values for all other factors.

20
Factor Loadings
Factors
Variables Sociability Consideration

Talk 1 0.87 0.01


Social Skills 0.96 -0.03
Interest 0.92 0.04
Talk 2 0.00 0.82
Selfish -0.10 0.75
Liar 0.09 0.70

The b values represent the weights of a variable on a factor and are


termed Factor Loadings.
These values are stored in a Factor pattern matrix (A).
Columns display the factors (underlying constructs) and rows
display how each variable loads onto each factor.
21
Factor Scores
Once factors are derived, we can estimate each
persons Factor Scores (based on their scores for each
factors constituent variables).

Potential uses for Factor Scores.


- Estimate a persons score on one or more factors.
- Answer questions of scientific or practical interest (e.g., Are females are
more sociable than males? using the factors scores for sociability).

Methods of Determining Factor Scores


- Weighted Average (simplest, but scale dependent)
- Regression Method (easiest to understand; most typically used)
- Bartlett Method (produces scores that are unbiased and correlate only with their
own factor).
- Anderson-Rubin Method (produces scores that are uncorrelated and
standardized)
22
Approaches to Factor
Analysis
Exploratory
Reduce a number of measurements to a smaller number of
indices or factors (e.g., Principal Components Analysis or PCA).
Goal: Identify factors based on the data and to maximize the
amount of variance explained.

Confirmatory
Test hypothetical relationships between measures and more
abstract constructs.
Goal: The researcher must hypothesize, in advance, the
number of factors, whether or not these factors are correlated,
and which items load onto and reflect particular factors. In
contrast to EFA, where all loadings are free to vary, CFA allows
for the explicit constraint of certain loadings to be zero.
Communality
Understanding variance in an R-matrix
Total variance for a particular variable has two
components:
Common Variance variance shared with other variables.
Unique Variance variance specific to that variable
(including error or random variance).

Communality
The proportion of common (or shared) variance present
in a variable is known as the communality.
A variable that has no unique variance has a
communality of 1; one that shares none of its variance
with any other variable has a communality of 0.
Factor Extraction: PCA vs. Factor
Analysis
Principal Component Analysis. A data reduction
technique that represents a set of variables by a smaller number of
variables called principal components. They are uncorrelated, and
therefore, measure different, unrelated aspects or dimensions of
the data.
Principal Components are chosen such that the first one
accounts for as much of the variation in the data as possible,
the second one for as much of the remaining variance as
possible, and so on.
Useful for combining many variables into a smaller number of
subsets.

Factor Analysis. Derives a mathematical model from which


factors are estimated.
Factors are linear combinations that maximize the shared
portion of the variance underlying latent constructs.
May be used to identify the structure underlying such variables
Factor Extraction: Eigenvalues & Scree
Plot
Eigenvalues
Measure the amount of variation accounted for by each
factor.
Number of principal components is less than or equal to
the number of original variables. The first principal
component accounts for as much of the variability in the
data as possible. Each succeeding component has the
highest variance possible under the constraint that it be
orthogonal to (i.e., uncorrelated with) the preceding
components.
Scree Plots
Plots a graph of each eigenvalue (Y-axis) against the
factor with which it is associated (X-axis).
By graphing the eigenvalues, the relative importance
of each factor becomes apparent.
Factor Retention Based on Scree Plots

27
Factor Retention: Kaisers Criterion
Kaiser (1960) recommends retaining all factors with
eigenvalues greater than 1.
- Based on the idea that eigenvalues represent the amount
of variance explained by a factor and that an eigenvalue of
1 represents a substantial amount of variation.

- Kaisers criterion tends to overestimate the number of


factors to be retained.

28
Doing Factor Analysis: An Example
Students often become stressed about statistics
(SAQ) and the use of computers and/or SPSS to
analyze data.
Suppose we develop a questionnaire to measure
this propensity (see sample items on the following
slides; the data can be found in SAQ.sav).
Does the questionnaire measure a single construct?
Or is it possible that there are multiple aspects
comprising students anxiety toward SPSS?

29
30
31
Doing Factor Analysis: Some
Considerations
Sample size is important! A sample of 300 or more
will likely provide a stable factor solution, but
depends on the number of variables and factors
identified.
Factors that have four or more loadings greater than
0.6 are likely to be reliable regardless of sample
size.
Correlations among the items should not be too low
(less than .3) or too high (greater than .8), but the
pattern is what is important.
32
Factor Extraction
Total Variance Explained

Initial Eigenvalues Extraction Sums of Squared Loadings Rotation Sums of Squared Loadings
% of Cumulative % of Cumulative % of Cumulative
Component Total Variance % Total Variance % Total Variance %
1 7.290 31.696 31.696 7.290 31.696 31.696 3.730 16.219 16.219
2 1.739 7.560 39.256 1.739 7.560 39.256 3.340 14.523 30.742
3 1.317 5.725 44.981 1.317 5.725 44.981 2.553 11.099 41.842
4 1.227 5.336 50.317 1.227 5.336 50.317 1.949 8.475 50.317
5 .988 4.295 54.612
6 .895 3.893 58.504
7 .806 3.502 62.007
8 .783 3.404 65.410
9 .751 3.265 68.676
10 .717 3.117 71.793
11 .684 2.972 74.765
12 .670 2.911 77.676
13 .612 2.661 80.337
14 .578 2.512 82.849
15 .549 2.388 85.236
16 .523 2.275 87.511
17 .508 2.210 89.721
18 .456 1.982 91.704
19 .424 1.843 93.546
20 .408 1.773 95.319
21 .379 1.650 96.969
22 .364 1.583 98.552
23 .333 1.448 100.000
Extraction Method: Principal Component Analysis.

33
Scree Plot for the
SAQ Data

34
Table of Communalities Before Component Matrix Before Rotation
and After Extraction (loadings of each variable onto each factor)
Component Matrixa
Communalities
Component
Initial Extraction 1 2 3 4
Q01 1.000 .435 Q18 .701
Q02 1.000 .414 Q07 .685
Q03 1.000 .530 Q16 .679
Q04 1.000 .469 Q13 .673
Q05 1.000 .343 Q12 .669 Note: Loadings less than
Q06 1.000 .654 Q21 .658 0.4 have been omitted.
Q07 1.000 .545 Q14 .656
Q08 1.000 .739 Q11 .652 -.400
Q09 1.000 .484 Q17 .643
Q10 Q04 .634
1.000 .335
Q03 -.629
Q11 1.000 .690
Q15 .593
Q12 1.000 .513
Q01 .586
Q13 1.000 .536
Q05 .556
Q14 1.000 .488
Q08 .549 .401 -.417
Q15 1.000 .378
Q10 .437
Q16 1.000 .487
Q20 .436 -.404
Q17 1.000 .683 Q19 -.427
Q18 1.000 .597 Q09 .627
Q19 1.000 .343 Q02 .548
Q20 1.000 .484 Q22 .465
Q21 1.000 .550 Q06 .562 .571
Q22 1.000 .464 Q23 .507
Q23 1.000 .412 Extraction Method: Principal Component Analysis.
Extraction Method: Principal Component a. 4 components extracted.

35
Factor Rotation
To aid interpretation it is possible to
maximize the loading of a variable on
one factor while minimizing its loading
on all other factors.

This is known as Factor Rotation.

Two types:
Orthogonal (factors are uncorrelated)
Oblique (factors intercorrelate)

36
Orthogonal Rotation Oblique Rotation

37
Orthogonal a
Rotated Component Matrix

Component

Rotation I have little experience of computers


1
.800
2 3 4

(varimax)
SPSS always crashes when I try to use it .684
I worry that I will cause irreparable damage because
of my incompetenece with computers
.647 Fear of
All computers hate me .638 Computers
Note: Varimax rotation Computers have minds of their own and deliberately
.579
go wrong whenever I use them
is the most commonly Computers are useful only for playing games .550
used rotation. Its goal is Computers are out to get me .459
to minimize the I can't sleep for thoughts of eigen vectors .677
I wake up under my duvet thinking that I am trapped
complexity of the under a normal distribtion
.661

components by making Standard deviations excite me -.567


the large loadings People try to tell you that SPSS makes statistics
easier to understand but it doesn't
.473 .523 Fear of
larger and the small I dream that Pearson is attacking me with correlation Statistics
.516
loadings smaller within coefficients
I weep openly at the mention of central tendency
each component. .514
Statiscs makes me cry .496
Quartimax rotation I don't understand statistics .429
makes large loadings I have never been good at mathematics .833
larger and small I slip into a coma whenever I see an equation
Fear of .747
I did badly at mathematics at school .747
loadings smaller within My friends are better at statistics than me Math .648
each variable. Equamax My friends are better at SPSS than I am .645
rotation is a If I'm good at statistics my friends will think I'm a nerd .586
compromise that My friends will think I'm stupid for not being able to Peer .543
cope with SPSS
attempts to simplify Everybody looks at me when I use SPSS
Evaluation .427
both components and Extraction Method: Principal Component Analysis.
variables. These are all Rotation Method: Varimax with Kaiser Normalization.
a. Rotation converged in 9 iterations.
orthogonal rotations,
that is, the axes remain 38
perpendicular, so the
Pattern Matrixa

Component
1 2 3 4
I can't sleep for thoughts of eigen vectors .706
I wake up under my duvet thinking that I am trapped
.591
under a normal distribtion
Standard deviations excite me -.511 Fear of
I dream that Pearson is attacking me with correlation
coefficients
.405 Statistics
I weep openly at the mention of central tendency .400
Statiscs makes me cry
I don't understand statistics
My friends are better at SPSS than I am .643
My friends are better at statistics than me .621
If I'm good at statistics my friends will think I'm a nerd .615 Peer
Oblique My friends will think I'm stupid for not being able to
cope with SPSS
.507 Evaluation

Rotation:
Everybody looks at me when I use SPSS
I have little experience of computers .885
SPSS always crashes when I try to use it .713
Pattern Matrix All computers hate me
Fear of
.653
I worry that I will cause irreparable damage because
.650
of my incompetenece with computers Computers
Computers have minds of their own and deliberately
.588
go wrong whenever I use them
Computers are useful only for playing games .585
People try to tell you that SPSS makes statistics
.412 .462
easier to understand but it doesn't
Computers are out to get me .411
I have never been good at mathematics -.902
I slip into a coma whenever I see an equation -.774
I did badly at mathematics at school Fear of -.774
Extraction Method: Principal Component Analysis.
Rotation Method: Oblimin with Kaiser Normalization.
Math
a. Rotation converged in 29 iterations.

39
Reliability:
A measure should consistently reflect the construct it is
measuring
Test-Retest Method
What about practice effects/mood states?
Alternate Form Method
Expensive and Impractical
Split-Half Method
Splits the questionnaire into two random halves,
calculates scores and correlates them.
Cronbachs Alpha
Splits the questionnaire (or sub-scales of a questionnaire)
into all possible halves, calculates the scores, correlates
them and averages the correlation for all splits.
Ranges from 0 (no reliability) to 1 (complete reliability)

40
Reliability: Fear of Computers Subscale

41
Reliability: Fear of Statistics Subscale

42
Reliability: Fear of Math Subscale

43
Reliability: Peer Evaluation Subscale

44
Reporting the Results
A principal component analysis (PCA) was conducted on the 23 items with
orthogonal rotation (varimax). Bartletts test of sphericity, 2(253) = 19334.49,
p< .001, indicated that correlations between items were sufficiently large for
PCA. An initial analysis was run to obtain eigenvalues for each component in
the data. Four components had eigenvalues over Kaisers criterion of 1 and
in combination explained 50.32% of the variance. The scree plot was slightly
ambiguous and showed inflexions that would justify retaining either 2 or 4
factors.
Given the large sample size, and the convergence of the scree plot and
Kaisers criterion on four components, four components were retained in the
final analysis. Component 1 represents a fear of computers, component 2 a
fear of statistics, component 3 a fear of math, and component 4 peer
evaluation concerns.
The fear of computers, fear of statistics, and fear of math subscales of the
SAQ all had high reliabilities, all Chronbachs = .82. However, the fear of
negative peer evaluation subscale had a relatively low reliability, Chronbachs
= .57.

45
Step 1: Select Factor Analysis
Step 2: Add all variables to be
included
Step 3: Get descriptive statistics &
correlations
Step 4: Ask for Scree Plot and set
extraction options
Step 5: Handle missing values and sort
coefficients by size
Step 6: Select rotation type and set
rotation iterations
Step 7: Save Factor Scores
Communalities
Variance Explained
Scree Plot
Rotated Component Matrix:
Component 1
Rotated Component Matrix:
Component 2
Component 1: Factor Score
Component (Factor): Score
Values
Rename Components According to
Interpretation

Anda mungkin juga menyukai