Anda di halaman 1dari 21

MSc Business Administration

Research Methodology: Tools


Applied Data Analysis (with SPSS)

Lecture 03: Factor Analysis

March 2011 Prof. Dr. Jrg Schwarz juerg.schwarz@hslu.ch

Page 2

Contents

Aims ___________________________________________________________________________________________________ 5 Introduction _____________________________________________________________________________________________ 6 Outline _________________________________________________________________________________________________ 8 Concepts of Factor Analysis ______________________________________________________________________________ 12 Factor Analysis with SPSS: A detailed example ______________________________________________________________ 16

Page 3

Table of contents
Aims ___________________________________________________________________________________________________ 5
Aims of the lecture .................................................................................................................................................................................................5

Introduction _____________________________________________________________________________________________ 6
Example of a construct...........................................................................................................................................................................................6

Outline _________________________________________________________________________________________________ 8 Concepts of Factor Analysis ______________________________________________________________________________ 12


Key steps involved in using factor analysis...........................................................................................................................................................12 Sample size: Rules of thumb................................................................................................................................................................................13 Principal component analysis vs. principal axis factoring......................................................................................................................................14
Types of factoring ......................................................................................................................................................................................................................14

Problematic issues ...............................................................................................................................................................................................15

Factor Analysis with SPSS: A detailed example ______________________________________________________________ 16


Item battery "motivation" from visitors questionnaire of Documenta.....................................................................................................................16 Before first step: Conduct basic statistics .............................................................................................................................................................17 First step: Correlation matrix of the variables .......................................................................................................................................................20
Factor analysis with SPSS: <Analyze><Dimension Reduction><Factor> .............................................................................................................................22 Inverse of correlation Matrix.......................................................................................................................................................................................................23 Bartletts Test .............................................................................................................................................................................................................................24 Kaiser Meyer Olkin (KMO) .........................................................................................................................................................................................................25

Page 4
Second step: Extraction of factors........................................................................................................................................................................27
Principal components analysis (PCA): Some facts....................................................................................................................................................................28 Principal axis factoring (PAF): Some facts ................................................................................................................................................................................29 Example Documenta: Extraction of factors................................................................................................................................................................................30 Graphical interpretation..............................................................................................................................................................................................................31 Loading plot in SPSS .................................................................................................................................................................................................................32 A loading must satisfy certain criteria ........................................................................................................................................................................................33 Communality ("Gemeinsamkeit") ...............................................................................................................................................................................................33

Third step: Criteria for determining the number of factors.....................................................................................................................................35


Scree plot ...................................................................................................................................................................................................................................36 Kaiser criterion (Eigenvalues > 1)..............................................................................................................................................................................................38

Fourth step: Interpretation of factors.....................................................................................................................................................................39


Rotation......................................................................................................................................................................................................................................39 Content-related interpretation of factors ....................................................................................................................................................................................41

Fifth step: Calculation of Factor Values ................................................................................................................................................................42


SPSS..........................................................................................................................................................................................................................................42

Page 5

Aims
Aims of the lecture
You know the term "construct". You know the key steps involved in using factor analysis.

You can conduct a factor analysis with SPSS (Extraction method: Principal components analysis) In particular, you know how to interpret the correlation matrix derive the "right" number of factors (Scree plot, Kaiser criterion) use varimax rotation to better interpret the factor solution interpret the factors with regard to their meaning

Page 6

Introduction
Example of a construct
Item battery from visitor's questionnaire of "Documenta" (German exhibition of contemporary art)

Page 7

Question Is there a structure in the item battery of the Documenta questionnaire? Are there any sub-dimensions (also called factors)?

Conduct factor analysis Items


I want to experience something I would like to experience a cultural event Interested in entertainment I want to have some fun I pay close attention to the documenta-flair I'm looking specifically for current trends I want to continue my education in arts Interested in information I would like to see an overview of I visit the DOCUMENTA for professional

Factors

Outline
Constructs in social science

factor <> technical term Page 8 dimension <> term of theory item indicator <> in questionnaire <> term of theory

A construct is a theoretical approach to express what cannot be directly observed. Examples: Motivation, intelligence, anxiety. Technically a construct is an item battery. Dimension (also called factor) is a certain sub-structure of a construct.

In the Documenta sample, the purpose item battery was used to operationalise and analyse "motivation" as the psychological construct. It was measured using 8 items (indicators).
I want to experience something I would like to experience a cultural event Interested in entertainment I want to have some fun

Construct "motivation"

I pay close attention to the documenta-flair I'm looking specifically for current trends I want to continue my education in arts

Dimensions (factors) of motivation


Interested in information

I would like to see an overview of I visit the DOCUMENTA for professional

Items (indicators)

Page 9

How to discover dimensions (factors)? Conduct factor analysis! Factor analysis is based on how closely various items are related and how they form factors. Each factor (dimension) represents several different items. Factors turn out to be more efficient than individual items at representing outcomes. The goal is to represent items that are related to one another by a more general term. Example "Interested in entertainment" is more general and comprises "I want to experience something aesthetically beautiful" "I would like to experience a cultural event" "I want to have some fun" "I pay close attention to the documenta-flair" The general term reflects the underlying content. Note: Never forget theoretical and empirical facts while conducting factor analysis.

Page 10

Basic idea of factor analysis Assumption: Some variables are related

Dataset of the Documenta questionnaire I want to experience something aesthetically beautiful => variable v01 I would like to experience a cultural event => variable v02

Page 11

Three possible causes for the correlation of v01 and v02

Variable v01 influences variable v02 Variable v02 influences variable v01 Both variables are influenced by a factor Table of correlations Factor "Interested in entertainment"
v01 v01 v02 v03 v04 v05 v06 v07 v08 v02 v03 v04 v05 v06 v07 v08 1 1 0.20 1 0.35 0.23 1 0.30 0.29 0.25 -0.10 0.00 -0.22 0.00 0.15 0.22 0.04 0.07 -0.05 0.15 -0.13 -0.06 -0.18 -0.10 -0.22 -0.08

1 0.35 0.35 0.39

1 0.32 0.20

1 0.11

Factor "Interested in information"

Page 12

Concepts of Factor Analysis


Key steps involved in using factor analysis
1. Choice of variables Include only variables that are based on theory Not too small a sample (see also rule of thumbs) 2. Extraction of factors Calculation of correlation matrix, inverse correlation matrix Two different concepts: Principal component analysis vs. principal axis factoring 3. Criteria for determining the number of factors Tools: Eigenvalue, scree plot, several rules of thumb 4. Interpretation of factors Interpretation of factor loading, use of rotated factor matrix 5. Calculation of factor values Done by SPSS (three different concepts: Regression, Bartlett, Anderson-Rubin)

Page 13

Sample size: Rules of thumb


How large should the sample be? There is no scientific answer.

Below are some rules of thumb, in descending order of popularity. These are not mutually exclusive: Some researchers use both, STV and the Rule of 200.

1. Rule of 10: There should be at least 10 subjects for each item in the construct being used. 2. STV ratio: The subjects-to-variables ratio should be no lower than 5. 3. Rule of 150: There should be at least 150 - 300 cases, around 150 when there are a few highly correlated variables. 4. Rule of 200: There should be at least 200 cases, regardless of STV.

There is nearly universal agreement that sample size should be at least 50.

Page 14

Principal component analysis vs. principal axis factoring


Types of factoring Principal component analysis (PCA) ("Hauptkomponentenanalyse") Goal: Reproduce the data structure using the smallest number of factors Features: Interpretation is difficult No causal relationship between factors and variables Factors are "general terms", called components Note: PCA is default in SPSS Principal axis factoring (PAF) ("Hauptachsenanalyse") Goal: Determine the cause of the correlation structure Features: Factors cause the correlations between variables Causal interpretation of the factors In this lecture the words "components" and "factors" are used interchangeably because most of the rules on interpreting PCA and PAF are the same.

Page 15

Problematic issues
Many decisions on extracting and interpreting the factors are subjective. The same data set may produce different results, depending on the "decision path". Even though variables must be interval-scaled, variables with a low scale level are often included in real life, and this leads to wrong conclusions.

Missing value problem In item batteries, there are often many missing values Depending on how the missing values are treated, the results may differ in terms of the Number of factors Interpretation of factors There is no single solution to the problem of missing values. Depending on the underlying data and context, another approach may be needed.

Page 16

Factor Analysis with SPSS: A detailed example


Item battery "motivation" from visitors questionnaire of Documenta
The facts Sample of Documenta visitors (n = 775) Item battery with eight items (Note: Some items were "rotated")

Data set: documenta.sav Syntax: documenta.sps

Page 17

Before first step: Conduct basic statistics

Missing No specific problem Mean / median Differences between the mean and the median show that some distributions might not be symmetric Std. Deviation No specific problem Minimum / maximum Values between 1 and 5 show that the full range of the scales was used.

Page 18

Histogram v01 to v04

v01

v02

v03

v04

v01: Equally distributed. Should not be a problem. v02: "I would like to experience a cultural event" is skewed left. Most answers concentrate around "Strongly agree". This might be because the visitors go the exhibition in order to have a cultural experience. Should not be a problem. v03: Equally distributed. Should not be a problem. v04: "I pay close attention to the documenta-flair" has a peak at "Strongly disagree". A reasonable part of the visitors seems to dislike the term "flair". Should not be a problem.

Page 19

Histogram v05 to v06

v05

v06

v07

v08

v05: Equally distributed. Should not be a problem. v06: Almost normally distributed. Should not be a problem. v07: "I want to continue my education in art" is skewed left. Most of the answers are found near "Strongly agree". This might be because the visitors show "social desirability" in their answers. Should not be a problem. v08: "I visit the DOCUMENTA for professional reasons" has a peak at "Strongly disagree". Most of the visitors are concentrated around "Strongly disagree". This might be because most of the visitors are attending as private persons. This item should be eliminated.

Page 20

First step: Correlation matrix of the variables


Correlations

v01 v01 Spearman's rho v02 Correlation Coefficient Sig. (2-tailed) Correlation Coefficient Sig. (2-tailed) v03 Correlation Coefficient Sig. (2-tailed) v04 Correlation Coefficient Sig. (2-tailed) v05 Correlation Coefficient Sig. (2-tailed) v06 Correlation Coefficient Sig. (2-tailed) v07 Correlation Coefficient Sig. (2-tailed) 1.000 . .199 .000 .345 .000 .301 .000 -.097 .007 .146 .000 -.051 .156

v02 .199 .000 1.000 . .229 .000 .293 .000 .000 .997 .222 .000 .154 .000

v03 .345 .000 .229 .000 1.000 . .250 .000 -.218 .000 .043 .237 -.125 .001

v04 .301 .000 .293 .000 .250 .000 1.000 . .002 .966 .069 .056 -.058 .109

v05 -.097 .007 .000 .997 -.218 .000 .002 .966 1.000 . .352 .000 .351 .000

v06 .146 .000 .222 .000 .043 .237 .069 .056 .352 .000 1.000 . .322 .000

v07 -.051 .156 .154 .000 -.125 .001 -.058 .109 .351 .000 .322 .000 1.000 .

There is evidence of two factors. Attention: The variables in this example are nicely grouped in an obvious factor structure. Normally this is not the case!

Page 21

What is important when assessing the correlation matrix?

Significance level of correlations The level of significance of the correlations indicates the likelihood that the actual correlations are a value other than zero. Depending on the sample size and intention of the analysis, the a priori level of significance will be at 1% or 5%. In rare cases, particularly for large samples, the a priori level of significance may be 0.1 %.

Value of the correlation coefficients Factor analysis becomes problematic if there are many small correlation coefficients. In this case, the data structure is too heterogeneous. Best would be if there are clusters of highly correlated variables that are separated. These clusters are indication of an underlying factor structure.

Page 22

Further properties of the correlation structure Using information only from the correlation matrix is insufficient because of a lack of indication of causality lack of indication of how many factors are causing correlation Use more tests! Factor analysis with SPSS: <Analyze><Dimension Reduction><Factor>

Page 23

Inverse of correlation Matrix

Diagonal

A correlation structure is suitable for a factor analysis only if the inverse forms a diagonal matrix. The matrix is diagonal when the non-diagonal elements are close to zero as possible. There is no generally accepted rule. The inverse of the correlation matrix is essentially a visual aid for testing suitability.

In the Documenta example, the non-diagonal elements are significantly smaller => Correlation structure is well suited for a factor analysis.

Page 24

Bartletts Test Null hypothesis H0: The random sample comes from a universe in which all variables are completely uncorrelated. Prerequisite: The variables are normally distributed.

In the case of the Documenta data, the test statistic is very high (603.735), and accordingly the null hypothesis may be rejected (Sig. = .000). The variables are not completely uncorrelated. Note: The statement "The variables are correlated" is incorrect. The alternative hypothesis cannot be stated.

Page 25

Kaiser Meyer Olkin (KMO) Kaiser, Meyer and Olkin have developed the "measure of sampling adequacy" (MSA) test, which has become the standard test procedure for the factor analysis. The Kaiser-Meyer-Olkin measure of sampling adequacy tests whether or not the partial correlations among variables are small. The MSA criterion indicates the degree to which the variables are related, and it thus helps in evaluating if using a factor analysis makes sense.

Page 26

KMO-Value 0.00 to 0.49 0.50 to 0.59 0.60 to 0.69 0.70 to 0.79 0.80 to 0.89 0.90 to 1.00 unacceptable miserable mediocre middling meritorious marvellous

As a rule of thumb, KMO should be 0.60 or higher in order to proceed with a factor analysis.

Kaiser* suggests 0.50 as a cut-off value, and a desirable value of 0.8 or higher.

*Kaiser H. (1970) "A second generation little jiffy," Psychometrika, Springer, vol. 35(4), pages 401-415, December.

Page 27

Second step: Extraction of factors

Page 28

Principal components analysis (PCA): Some facts As PCA is more common than PFA, the simple term "factor analysis" also may refer to PCA. PCA is used when the research purpose is data reduction. Analyzes a correlation matrix in which the diagonal contains 1's. This is not equivalent to analyzing the covariance matrix. Factors, properly called components, reflect the common variance of variables plus the unique variance. That is, manifest variables may be conceptualized as reflecting a combination of total variance explained by the components, plus error variance not explained by the components. Components seek to reproduce the total variable variance as well as the correlations, That is, PCA accounts for the total variance of the variables. PCA is thus a variance-focused approach. For the first component, PCA creates a linear equation which extracts the maximum total variance from the variables; for the second component PCA removes the variance explained by the first component and creates a second linear equation which extracts the maximum remaining variance; etc., continuing until the components can explain all the common and unique variance in a set of variables. Adding variables to the model will change the factor loadings.

http://faculty.chass.ncsu.edu/garson/PA765/factor.htm (Date of access: March, 2011)

Page 29

Principal axis factoring (PAF): Some facts Analyzes a correlation matrix in which the diagonal contains the communalities. This is equivalent to analyzing the covariance matrix. PFA is used when the research purpose is causal modelling. Factors reflect the common variance of variables, excluding unique variance. That is, manifest variables may be conceptualized as reflecting a combination of common variance explained by the factors, plus unique variance not explained by the factors. Factors seek to reproduce the correlations of the variables. That is, PFA accounts for the covariation among the variables. PFA is thus a correlation-focused approach. PFA seeks the least number of factors which can account for the common variance shared by a set of variables. For the first factor, PFA creates a linear equation which extracts the maximum common variance from the variables; for the second component PFA removes the common variance explained by the first component and creates a second linear equation which extracts the maximum remaining variance; etc., continuing until the factors can explain all the common variance in a set of variables. Normally, factors are orthogonal to (uncorrelated with) one another. In principle it is possible to add variables without affecting the factor loadings.

http://faculty.chass.ncsu.edu/garson/PA765/factor.htm (Date of access: March, 2011)

Page 30

Example Documenta: Extraction of factors

Factor loading

Page 31

Graphical interpretation Each variable can be described as a vector in a system of coordinates, which are formed by its factors.

Factor 2
1.0

v05

v07

v06

0.5

Variables v01 to v07 can be described by factors 1 and 2.


v02 v04 v01
0.0 0.5 1.0

0.0 -0.5

Factor 1

The factor loading is the cosine of the angle between the factor and one variable. Example: v06 ARCCOS(0.223) = 77.11 v06 forms an angle of 77.11 with the first factor.

v03
-0.5

Page 32

Loading plot in SPSS

Page 33

A loading must satisfy certain criteria A factor can be interpreted if at least 4 variables have a loading of more than 0.60. The variables with the highest loading are the "marker variables". A factor can be interpreted if at least 10 variables have a loading of more than 0.40. The variables with the highest loading are the "marker variables". If fewer than 10 variables have a loading of more than 0.40 and the sample size is less than 300, the loading structure is likely to be random. Normative: A factor loading of less than 0.2 cannot be considered => such items are omitted and the analysis must be recalculated.

Communality ("Gemeinsamkeit") Variables can normally not be explained completely by means of factors. The number of explanatory factors is usually significantly smaller than the number of variables. Communality is the part of a variables total variance that can be explained by the factors. Communality indicates the extent to which this variable can be explained by the factors.

Page 34

Documenta Example

Example variable v01 Communality after extraction 0.479 => 47.9% of the variance of v01 is explained by Factors 1 and 2. 0.479 = 0.6912 + 0.0392 47.9% = 47.8% + 0.1%

Page 35

Third step: Criteria for determining the number of factors


There is no strictly mathematical criterion.

Common sense: Limit the number of factors to those whose meaning is understandable. Scree test: The scree test plots the components as the X Axis and the corresponding eigenvalues as the Y Axis. The scree test requires all components after the elbow to be dropped. Kaiser criterion: The Kaiser rule is to drop all components with eigenvalues under 1.0. The Kaiser criterion is the default in SPSS and most computer programs, but it is not recommended when used as the sole cut-off criterion for estimating the number of factors. Variance explained criterion: Some researchers simply use the rule of keeping enough factors to account for 90% (sometimes 80%) of the variation.

Page 36

Scree plot

elbow

flat slope

http://en.wikipedia.org/wiki/Scree

= Factor

Scree ("Gerll")

Elbow criterion: When the factors correspond to error or random numbers, the slope will be flat. Keep the number of factors above the elbow.

The elbow of the Documenta scree plot occurs at 3, so keep 2 factors.

Page 37

Typical argument "I have chosen 2 factors because it is consistent with theory and is at least not inconsistent with the scree plot."

Later on you will have to complete scree test with Kaiser criterion (Eigenvalue > 1) test and other criteria.

Page 38

Kaiser criterion (Eigenvalues > 1) An eigenvalue indicates how much of the total variance of all variables is covered by the factor. The Kaiser rule is to drop all components with eigenvalues under 1.0. The Kaiser criterion is the default in SPSS and most computer programs but is not recommended when used as the only cut-off criterion for estimating the number of factors.

Sum of eigenvalues = Maximum number of factors

Documenta example: At most 2 factors <=> Kaiser criterion (eigenvalues > 1)

Page 39

Fourth step: Interpretation of factors


Rotation Rotation makes the output more readable. Varimax rotation maintains the independence of the factors. Factor 2 Factor 2 Factor 1

Factor 1

Page 40

Page 41

Content-related interpretation of factors

Factor 1 v01 v02 v03 v04 v05 v06 v07 I want to experience something aesthetically beautiful I would like to experience a cultural event I want to have some fun I pay close attention to the documenta-flair I'm looking specifically for current trends in visual art I want to continue my education in arts I would like to see an overview of contemporary art

Factor 2

+ + + + + + +

What is the subject of Factor 1? Hints from the item text: "experience", "have fun", "Documenta -flair" => Entertainment

What is the subject of Factor 2? Hints from the item text: "current trend", "education", "overview" => Information

There are also hints from the context of the study.

Page 42

Fifth step: Calculation of Factor Values


SPSS

Regression Method: The resulting scores have a mean of 0 and a variance equal to the squared multiple correlation between the estimated factor scores and the true factor values. The scores may be correlated even when factors are orthogonal. Bartlett Scores: The resulting scores have a mean of 0. The sum of squares of the unique factors over the range of variables is minimized. Anderson-Rubin Method: A modification of the Bartlett method which ensures orthogonality of the estimated factors. The scores that are produced have a mean of 0, have a standard deviation of 1, and are uncorrelated.

Anda mungkin juga menyukai