Anda di halaman 1dari 39

RESEARCH NOTE

DOES PLS HAVE ADVANTAGES FOR SMALL SAMPLE


SIZE OR NON-NORMAL DATA?
1
Dale L. Goodhue
Terry College of Business, MIS Department, University of Georgia,
Athens, GA 30606 U.S.A. {dgoodhue@terry.uga.edu}
William Lewis
{william.w.lewis@gmail.com}
Ron Thompson
Schools of Business, Wake Forest University,
Winston-Salem, NC 27109 U.S.A. {thompsrl@wfu.edu}
There is a pervasive belief in the MIS research community that PLS has advantages over other techniques when
analyzing small sample sizes or data with non-normal distributions. Based on these beliefs, major MIS journals
have published studies using PLS with sample sizes that would be deemed unacceptably small if used with other
statistical techniques. We used Monte Carlo simulation more extensively than previous research to evaluate
PLS, multiple regression, and LISREL in terms of accuracy and statistical power under varying conditions of
sample size, normality of the data, number of indicators per construct, reliability of the indicators, and
complexity of the research model. We found that PLS performed as effectively as the other techniques in
detecting actual paths, and not falsely detecting non-existent paths. However, because PLS (like regression)
apparently does not compensate for measurement error, PLS and regression were consistently less accurate
than LISREL. When used with small sample sizes, PLS, like the other techniques, suffers from increased
standard deviations, decreased statistical power,and reduced accuracy. All three techniques were remarkably
robust against moderate departures from normality, and equally so. In total, we found that the similarities in
results across the three techniques were much stronger than the differences.
Keywords: Partial least squares, PLS, regression, structural equation modeling, statistical power, small sample
size, non-normal distributions, Monte Carlo simulation
Introduction
1
There is a pervasive belief in the Management Information
Systems (MIS) research community that for small sample
sizes or data with non-normal distributions, partial least
squares (PLS) has advantages that make it more appropriate
than other statistical estimation techniques such as regression
or covariance-based structural equation modeling (CB-SEM)
with LISREL, Mplus, etc. Partly because of these beliefs,
PLS has been widely adopted in the MIS research community.
To get a clearer picture of the extent of these beliefs in the
MIS field, we examined three top MIS journals (Information
Systems Research [ISR], Journal of Management Information
Systems [JMIS], and MIS Quarterly [MISQ]). We identified
all articles that used some form of path analysis published
1
Mike Morris was the accepting senior editor for this paper. Andrew Burton-
Jones served as the associate editor.
The appendix for this paper is located in the Online Supplements section
of the MIS Quarterlys website (http://www.misq.org).
A much earlier version of this paper was published in a conference
proceedings (Goodhue et al. 2006).
MIS Quarterly Vol. 36 No. 3, pp. 981-1001/September 2012 981
Goodhue et al./PLS, Small Sample Size, and Non-Normal Data
from 2006 to 2010, inclusive188 articles across the three
journals. Overall, PLS was used for 49% of the path analysis
papers in these three MIS journals. Of the 90 articles using
PLS, at least 35% stated that PLS had special abilities relative
to small sample size and/or non-normal distributions. Thir-
teen of these studies (14%) had sample sizes smaller than 80
(which we will show is insufficient). Four of the 13 papers
stated that PLS had these special abilities without any
supporting citations. This suggests that these beliefs are so
widely accepted they are seen as no longer needing support
from the literature.
There are a relatively small number of articles that are most
often cited to support the claim that PLS has advantages at
small sample sizes (e.g., Barclay et al. 1995; Chin 1998; Chin
et al. 2003; Gefen et al. 2000), with other sources also cited
at times (Chin and Newsted 1999; Falk and Miller 1992;
Fornell and Bookstein 1982; Lohmller 1988). The most
commonly cited minimum sample rule for PLS might be
termed the 10 times rule, which states that the sample size
should be at least 10 times the number of incoming paths to
the construct with the most incoming paths (Barclay et al.
1995; Chin and Newsted 1998). Some MIS researchers have
also cited Falk and Miller (1992) to justify using a 5 times
rule.
Chin and Newsted (1999, p. 327) added the following caution
to their description of the 10 times rule:
Ideally, for a more accurate assessment, one needs to
specify the effect size for each regression analysis
and look up the power tables provided by Cohen
(1988) or Greens (1991) approximation to these
tables.
However, MIS researchers appear to have interpreted these
and similar statements to imply that, although one could use
Cohens tables to determine the minimum allowable sample
size required to conduct a given study, one can also use the
rule of 10 or even the rule of 5. For example Kahai and
Cooper (2003, p. 277) used a sample size of 31 in a study
published in JMIS; Malhotra et al. (2007, p. 268) used a
sample size of 41 in ISR. Chin and his coauthors used a
sample size of 17 in MISQ (Majchrzak et al. 2005, p. 660).
A recent editorial by Gefen, Rigdon, and Straub (2011) in
MIS Quarterly provided guidelines for choosing between PLS
and CB-SEM analysis techniques. They expressed some
concern about PLS and sample size, referencing Marcoulides
and Saunders (2006) with respect to the apparent misuse of
perceived leniencies such as assumptions about minimum
sample size in partial least squares (PLS), but in their article
proper they did not directly address the issue. However, in
their Appendix B they did distinguish between regression on
the one hand (where they suggest that minimum sample size
is primarily an issue of statistical power and is well addressed
by Cohens 1988 guidance), and PLS and CBSEM on the
other hand (where sample size plays a more complex role).
They do note that the core of the PLS estimation method
ordinary least squaresis remarkably stable even at low
sample sizes. They suggest that this gave rise to the 10
times rule, although they point out that it is only a rule of
thumb, however, and has not been backed up with substantive
research. Similarly, Ringle, Sarstedt, and Straub (2012)
noted that very few researchers employing PLS reported any
attempts to determine the adequacy of their sample size (other
than the 10 times rule), but they did go on to suggest that
power tables from regression could be used.
Hair et al. (2011) also provided guidelines for when the use of
PLS is appropriate, and with respect to sample size they
repeated the 10 times rule. While they did go on to caution
that although this rule of thumb does not take into account
effect size, reliability, the number of indicators, and other
factors known to affect power and can thus be misleading,
they still conclude that it nevertheless provides a rough
estimate of minimum sample size requirements. In their
summary table of guidelines for applying PLS, the 10 times
rule is the only guidance for minimum sample size, without
any caveat. Likewise in another work, Hair, Ringle, and
Sarstedt (2011) recommend the 10 times rule without any
caveat.
Statements about PLS and non-normal data are also common,
such as the following quote cited widely in MIS research:
Because PLS estimation involves no assumption about the
population or scale of measurement, there are no distribu-
tional requirements (Fornell and Bookstein, 1982, p. 443).
Many seem to have interpreted this to mean that while the
distribution of data used in a regression or LISREL analysis
is important, it is less important or perhaps even irrelevant for
PLS analysis. More recently it has been suggested the PLS
may not have an advantage on this score, because new esti-
mation techniques with CB-SEM provide options that are
quite robust to departures from normality (Gefen et al. 2011;
Hair et al. 2011). Nonetheless, recommendations to use PLS
when data are to some extent non-normal still appear (Hair,
Ringle, and Sarstedt 2011, p. 144).
Thus while some articles and editorials have issued cautions
against assuming too much for PLSs special capabilities
(e.g., Gefen et al. 2011; Goodhue et al. 2006; Marcoulides
and Saunders 2006), many of the recommendations lack
specificity. In particular, cautions about the 10 times rule for
sample size typically do not suggest any concrete alternative,
982 MIS Quarterly Vol. 36 No. 3/September 2012
Goodhue et al./PLS, Small Sample Size, and Non-Normal Data
even though that rules validity has not been tested empiri-
cally. As a result, MIS journal articles continue to cite and
use the 10 times rule. For example, Zhang et al. explicitly use
it in determining the minimum sample size for their PLS
analysis in their December 2011 MIS Quarterly article.
Clearly a subset of the social science research community still
believes that the 10 times rule provides an acceptable
guideline.
In our study we use Monte Carlo simulation to empirically
address the issues of small sample sizes and non-normal
distributions. Specifically, we look at the relative efficacy of
PLS, regression, and CB-SEM (or as we will refer to it here,
LISREL
2
) under a variety of conditions. By efficacy we mean
their ability to support a researchers need to statistically test
hypothesized relationships among constructs.
3
We specifi-
cally test to see whether these techniques have different
abilities in terms of: (1) arriving at a solution, (2) producing
accurate path estimates, (3) avoiding false positives (Type 1
errors), and (4) avoiding false negatives (Type 2 errors,
related to statistical power). We also test each of the tech-
niques against commonly accepted standards (e.g., at least
80% power, no more than 5% false positives, and path
estimates that are as accurate as possible). A primary goal of
our work is to provide researchers with some concrete
findings to help inform decisions relating to: (1) designing
research studies, (2) selecting analysis techniques, and
(3) interpreting the results obtained.
We accomplish this by using Monte Carlo simulation to
compare results across different statistical analysis techniques
(Goodhue et al. 2012). We employ PLS, regression and
LISREL to analyze identical collections of sets of 500 data
sets each, using the identical research model. We start with
a relatively simple model and five different sample sizes, and
then do sensitivity testing by varying distributional properties
of the data, number and reliability of indicators, and com-
plexity of the model.
Our study provides four important contributions. First, we
show that as sample sizes are reduced, all three techniques
suffer from increasing standard deviations for the path
estimates and the attendant decrease in accuracy, and all have
about the same resulting loss in statistical power.
4
This
strongly suggests that PLS has no advantage for either accu-
racy or statistical power at small sample size, and that the 10
times rule is a misleading guide to minimal sample size for all
three techniques equally. Second, all three techniques were
relatively robust (and equally so) to moderate departures from
normality and all suffered somewhat (again about equally)
under extreme departures from normality. This strongly
suggests that PLS has no advantage with non-normal distribu-
tions. Third, these results hold with simple and more complex
models. Fourth, LISREL consistently produces more accurate
estimates of path strengths. PLS and regression are not only
less accurate than LISREL and about equally so, but the
amount of underestimation is quite consistent with Nunnally
and Bernsteins (1994, pp. 241, 257) equation for the attenua-
tion of relationship strength due to measurement error. This
leads us to the conclusion that PLS, like regression, does not
take measurement error into account in its path estimates, as
opposed to LISREL and other CB-SEM techniques which do.
In this note we restrict our focus to situations where
researchers use PLS, LISREL, or regression and have a
hypothesized model with reflective, multi-indicator construct
data. Given our focus on the efficacy of the three techniques
rather than examining how they operate, we do not describe
the techniques in detail. Interested readers are encouraged to
review published work (e.g., Barclay et al. 1995; Chin 1998;
Chin and Newsted 1999; Fornell 1984; Fornell and Bookstein
1982; Gefen et al. 2000; Hayduk 1987; Kline 1998) to obtain
more detailed descriptions of the different statistical analysis
techniques. However, we will provide enough of a descrip-
tion of each technique to justify the presumption that we
might get different path or statistical significance estimates
depending on which technique was used, even using the exact
same input data.
Following Rnkk and Ylitalo (2010), we can think of regres-
sion and PLS as both having three steps: (1) determining the
weightings for the construct indicators, (2) using those
weights to calculate composite construct scores and using
ordinary least squares to calculate path estimates, and
(3) determining the statistical significance of the path esti-
2
LISREL is a specific statistical analysis program that is one of a set of pro-
grams (others include AMOS, EQS, etc.) that use a covariance-based
structural equation modeling (CB-SEM) technique. For ease of exposition,
we use the term LISREL to refer to this technique in general, since that is the
program we employed; the reader should note that other CB-SEM programs
could have been chosen, presumably with similar results.
3
McDonald (1996) has suggested reserving the phrase latent construct for
SEM techniques such as LISREL that do not presume to have developed an
actual score for each construct, as opposed to the phrase composite con-
struct for techniques such as regression or PLS that do develop explicit
scores for each construct. In this paper we will adhere to that distinction, but
since we are not focusing on construct scores themselves to any great extent,
we generally use the les specific term construct to refer to either latent or
composite constructs.
4
The only caveat to this is that (as is well known) for smaller sample sizes
(smaller than 90 in our studies), LISREL may not arrive at an acceptable
solution. In our studies this was only slightly apparent at n = 40, but very
apparent at n = 20. We note that when this problem occurs, the researcher
will not be tempted to think it is a valid solution, since the result is very
obvious.
MIS Quarterly Vol. 36 No. 3/September 2012 983
Goodhue et al./PLS, Small Sample Size, and Non-Normal Data
mates. For regression, the first step is usually accomplished
by giving equal weights to all indicators. Then composite
scores are determined, and each dependent composite con-
struct and all its predictors are then analyzed separately.
Regression uses ordinary least squares (linear algebra) to
calculate the solution for the path values that minimizes the
squared differences between the predicted and the actual
scores for each dependent construct. There is no iteration
involved. The regression solution includes estimates of the
standard deviation of each path estimate, from which statis-
tical significance can be determined, using normal distribution
theory.
In PLS, step one is the core of its uniqueness. As opposed to
regression where equal weights are used, PLS iterates through
a process to find the optimal indicator weights for each
construct, such that the overall R
2
for all dependent constructs
is maximized. The second step in PLS, as in regression, uses
the indicator weights to calculate construct scores which are
then used in ordinary least squares to determine final path
estimates. The third step in PLS is to determine the standard
deviations of those path estimates with bootstrapping.
5
This
third PLS step contrasts with regression where normal
distribution theory is used, but bootstrapping could also be
used with regression. Thus note that both PLS and regression
use weighted averages of indicator values to develop con-
struct scores, and both use ordinary least squares to determine
path values. The critical difference between the two in terms
of path values is that regression uses equally weighted
indicator scores, while PLS has a process intended to optimize
weights.
LISREL is quite different from regression and PLS. It also
uses linear algebra, but its equations include all constructs, all
indicators, all error terms, and all relationships between them
in a single analysis. LISREL iterates between two processes:
(1) taking a candidate solution for the various parameters
(with candidate values for paths between constructs, error
variances, etc.) and generating the implied covariance matrix
for the indicators, and (2) comparing that implied covariance
matrix with the sample covariance matrix. From this com-
parison, the degree of fit between the candidate solution and
the actual data is determined, and changes to the various
parameters in the candidate solution are suggested. This then
feeds back into step 1, until limited changes are suggested.
Since step 1 also includes estimates of the standard deviation
of each path estimate, statistical significance can be deter-
mined using normal distribution theory.
Monte Carlo simulation has been used to study issues such as
bias sizes in PLS estimates (Cassel et al. 1999), impact of
different correlation structures on goodness of fit tests
(Fornell and Larcker 1981), and the efficacy of PLS with
product indicators versus regression in detecting interaction
effects (Chin et al. 2003; Goodhue et al. 2007). The
remainder of this paper begins with a quick explanation of
Monte Carlo simulation and how it can be used to assess the
efficacy of different statistical techniques. We do this so we
can explain how our examination of the simulation data goes
significantly beyond that employed by previous researchers
who investigated PLS using Monte Carlo simulation (Cassel
et al. 1999; Chin et al. 2003; Chin and Newsted, 1999). Fol-
lowing that, we describe four simulation studies used to test
the three techniques under varying conditions. We end with
a discussion of our overall findings, and their implications.
How to Test Efficacy with Monte Carlo
Simulation: Our Approach and
Previous Approaches
Use of the Monte Carlo simulation approach requires the
researcher to start with a prespecified true model (such as
shown in Figure 1) that includes both the strength of paths
and the amount of random variance in linkages (between con-
structs, and between constructs and their indicators). From
the prespecified model, sets of simulated questionnaire
responses are generated using random number generators.
6
Then the same sets of questionnaire responses are analyzed by
each of the three techniques in turn. The analysis results can
be compared across techniques and against accepted stan-
dards. Repeating this process with many different data sets
(in our case, 500) for each condition tested removes the worry
that any given result is atypical.
Figure 1 shows the model used as the basis for much of our
analysis. (We will modify it to test different specific issues.)
We have four constructs (Ksi1 through Ksi4) that cause
changes in the value of a fifth construct (Eta1). Three of the
four independent constructs have the following effect sizes:
large (.35), medium (.15), and small (.02), to correspond to
Cohens (1988) suggested values. We also include an effect
size of zero so we can test for false positives. See the notes
in Figure 1 for more detail.
5
Jackknifing can also be used to determine statistical significance, but
bootstrapping is generally recommended.
6
See Appendix D for an example of the SAS program used to generate the
data.
984 MIS Quarterly Vol. 36 No. 3/September 2012
Goodhue et al./PLS, Small Sample Size, and Non-Normal Data
The five constructs are each measured by three reflective indicators with weights of .7, .8, and .9, giving them a Cronbachs alpha of .84.
In addition to the path coefficients and the indicator loadings, random error is added to the indicator scores and to the value for Eta1, so
as to give each a variance of 1.0.
**Although only three intercorrelations between the Ksi constructs are shown (to keep the diagram from becoming too cluttered), in our
analysis it is assumed that all four constructs correlate freely with each other. Since regression and PLS impose no constraints on
correlations between exogenous constructs, in our LISREL analysis we also allow the constructs to freely correlate at whatever level best
fits the data. We do this to keep the assumptions for regression, PLS, and LISREL equivalent on this score, even though the data is
generated from a model that has the Ksi constructs independent.
***Following Cohen (1988), the relationship between the path coefficients and the effect sizes is given below. By construction (we generate
the data that way) Ksi1 through Ksi4 are independent, and all five constructs are N(0,1) distributed and therefore have a variance of 1.0.
Therefore, the partial R
2
for each Ksi is the square of the partial correlation (or the square of the path coefficient). With path coefficients
of .48, .314, and .114, the partial R
2
s are .230, .099, and .013. This gives an overall R
2
of .342. Effect size is then the partial R
2
divided
by the unexplained variance. Unexplained variance is (1 .342) = .658. This gives:
Gamma1 = .230 / (1 .342) = .350
Gamma2 = .099 / (1 .342) = .150
Gamma3 = .013 / (1 .342) = .020
Figure 1. The Simple Model: The Basis for Studies 1 Through 3
We use an example to illustrate how our studies differ from
previous Monte Carlo simulation studies (and to explain why
we therefore draw different conclusions). In this illustration
we will focus on the Gamma2 path (medium effect size) of
Figure 1, and compare the results from regression analysis
with the results from PLS analysis. The rule of 10 would
suggest n = 40 is a minimum sample size for PLS given the
model in Figure 1.
Using a random number generator and the relationships from
the Figure 1 model, we generated 500 data sets of 40
questionnaires each (20,000 questionnaire responses total).
This could be thought of as 500 researchers, each with a
sample size of n = 40. In this example, we are interested in
predicting what a 501
st
researcher with a similar sample size
from the same population might find if he or she analyzed that
data set with PLS or regression. Figure 2 shows the results of
MIS Quarterly Vol. 36 No. 3/September 2012 985
Goodhue et al./PLS, Small Sample Size, and Non-Normal Data
Figure 2. Monte Carlo Results for Medium Effect Size, N = 40
analyzing the 500 data sets first with regression and then with
PLS. Figure 2a is a histogram of the 500 regression path
estimates for Gamma2; Figure 2c shows the same thing for
PLS estimates, based on exactly the same 500 data sets.
In Figure 2a, the mean Gamma2 estimate across 500 data sets
is .255 for regression, as shown by the heavy dotted vertical
line. Lighter dotted lines show the 95% confidence interval
for the path value the 501
st
research should expect. Notice in
Figure 2c that PLS has a slightly higher mean value for
Gamma2 (.273). Although the difference is not large, our
results for Gamma2 accuracy are consistent with those of
Chin and Newsted (1999); PLS produces a slightly larger
estimate than regression, on average.
However, we went further than the comparisons done by
earlier studies and also examined the number of data sets that
resulted in statistically significant paths and the standard
deviations of the 500 path estimates. Figures 2b and 2d show
the 500 associated t statistics for Gamma2, using regression
and PLS respectively. Since the t statistic cutoff for p < .05
in this regression is 2.03
7
(the heavy dotted vertical line), we
see in Figure 2b that about 40% of the 500 regression data
sets found a statistically significant Gamma2 path. From
Figure 2d we see that 41% of the PLS data sets found a
significant Gamma2 path.
7
For an n = 40 regression with four constructs and a constant, the degrees of
freedom are 40 5 = 35.
986 MIS Quarterly Vol. 36 No. 3/September 2012
Goodhue et al./PLS, Small Sample Size, and Non-Normal Data
PLS seems to have the advantage: a slightly higher average
path estimate and a slightly higher statistical power. But this
seeming advantage is misleading. First of all, the 95%
confidence intervals for the path value the 501
st
researcher
will likely see for Gamma2 using regression (in Figure 2a)
and PLS (in Figure 2C) almost completely overlap. Although
PLS has a higher mean value than regression (.018 higher),
that difference is not statistically significant at the p < .05
level. Second, for statistical power, the 95% confidence
interval around regressions power (40%) and PLSs power
(41%) goes from about .36% to about .45%.
8
Thus the small
difference of 40% versus 41% is not even close to being
statistically significant. From a statistical point of view,
neither the accuracy nor the power obtained from PLS can be
distinguished from that obtained from regression in this
analysis.
More importantly, since 80% power is the generally sought
minimum acceptable level of power, both 40% and 41%
power are unacceptably low. About 60% of the time, true
medium effect size paths will not be detected by either
technique at this sample size.
Let us focus for a minute just on the issue of accuracy of PLS
estimates. In three of the empirical articles often used to
justify PLSs special capabilities (Cassel et al. 1999; Chin et
al. 2003; Chin and Newsted),
9
the authors displayed both
average path estimates (or average biases) and standard
deviations of the path estimates. However, each set of authors
focused attention almost entirely on the average accuracy of
the path estimate, concluding that it did not change much with
decreases in sample size. None placed much importance on
the fact that as their sample size went down, standard
deviations of the path estimates went up:
from .069 at n = 500 to .250 at n = 20 for Chin et al. (a
factor of 3.5);
from .107 at n = 200 to .380 at n = 20 for Chin and
Newsted (a factor of 3.5); and
from .019 at n = 1000 to .092 at n = 50 for Cassel et al.
(a factor of 4)
10

Focusing on the average path estimate bias can mask serious
problems, since a combination of very high and very low path
estimates might still average out to about the true value. Our
interpretation of the results in all three of these papers is that
as sample size goes down, the average bias across hundreds
of data sets does not change much, but the wildness
(standard deviation) of individual path estimates increases
considerably. In the context of Figures 2a and 2c, the wider
the spread of the data in the histogram (and the wider its
95% confidence interval), the greater the wildness of the
estimates. Increasingly wild estimates do not suggest to us
that PLS is robust to changes in sample size. We do not mean
to suggest that PLS is more wild than regression or LISREL
at these sample sizes. As will be seen, we only suggest that
all three techniques suffer at small sample sizes, and about
equally so.
Therefore, in our analysis, we will look at the average bias of
the 500 path estimates as the above articles did, but also at the
average of the standard deviations of those path estimates and
at the proportion of data sets that found a significant path.
Very different conclusions will be drawn by using this
additional information.
Study 1: The Effect of Sample Size
with a Simple Model
In this study, our objective was to assess the relative efficacy
of the three techniques under conditions of varying sample
size, using normally distributed data, well measured con-
structs, and a simple but realistic model of construct relation-
ships. We used the model shown in Figure 1, and sample
sizes of 20, 40, 90, 150, and 200, generating 500 data sets for
each sample size.
11
For our model, 20 is the required sample
size based on the rule of 5; 40 is the required sample size
based on the rule of 10; 90 is perhaps just below the
minimum sample size acceptable for LISREL, 200 is a
conservative estimate of minimum sample size based on 5
8
A standard equation for the 95% confidence interval around a proportion p
with sample size n is: p +/- 1.96 [ p(1-p) / n ]
.5
.
9
To better understand the statistical significance test that Chin and Newsted
and Chin et al. use, it is necessary to understand the difference between the
question answered in our Figures 2a and 2c (for the 501
st
researcher, does the
95% confidence interval of likely Gamma2 path values include zero?) and the
question answered in our Figures 2b and 2d (what proportion of the 500 data
sets had a statistically significant Gamma2 estimate?). Chin and Newsteds
test for statistical significance in their Tables 7 through 11 is precisely the
statistical significance test we show in Figure 2a and Figure 2c. While it is
a test of statistical significance, it is the significance for the question of
whether a 501
st
researcher with n = 40 will find a Gamma2 that is positive.
It is not an indication of power, and does not tell us the likelihood of a 501
st
researcher finding a statistically significant Gamma2 at this sample size.
10
Based on Chin et al.s four indicators per construct, x y path in their
Table 7 (p. 204), Chin and Newsteds four indicator, two latent variables, .2
path in their Tables 2 and 6 of their Online Appendix; and Cassel et al.s
gamma1 in their Table 4 (p. 442).
11
Note that sample sizes of 20 or 40 will turn out to be too small, as expected,
and not recommended for any technique except where effect sizes are known
to be very strong.
MIS Quarterly Vol. 36 No. 3/September 2012 987
Goodhue et al./PLS, Small Sample Size, and Non-Normal Data
times the number of LISREL estimated parameters,
12
and 150
is roughly the mid-point between 90 and 200. For each of the
five sample sizes, we analyzed the 500 data sets using
multiple regression,
13
PLS-Graph,
14
and LISREL.
15
Results
for Study 1 are displayed in graphical form in Figure 3, with
actual values in Appendix A, Tables A1, A2, and A3.
Simple Model: Arriving at a Solution. We found that
virtually all of our runs with regression and PLS arrived at
viable solutions (i.e., converged with admissible results).
With LISREL, for n = 40, 11 of the 500 data sets did not. For
n = 20, 164 of 500 runs did not. At n = 90 and above, these
problems of LISREL disappeared in our analysis. These are
not surprising results. As is well known, LISREL might not
converge or might produce inadmissable values at smaller
sample sizes; for the most part, regression and PLS do not
share this weakness.
Simple Model: False Positives. All three techniques found
false positives for the false path from Ksi4 to Eta1 roughly
5% of the time. This is exactly as it should be when we fix
the statistical significance hurdle at p < .05. In all of our
studies based on the model in Figure 1, we found no problems
with excessive false positives (Type 1 errors) for any of the
techniques.
Simple Model: Accuracy. Figures 3a (large effect size) and
3b (medium effect size) show differences in accuracy across
the 500 data sets, with details in Appendix A, Table A1. The
small effect size paths are hardly ever detected, at any of our
sample sizes, so they are not displayed.
Accuracy is represented as the bias or percent departure
from the true value in the Figure 1 model. Chin and Newsted
(1999) observed that PLS provided estimates for path
coefficients that were more accurate than regression. Our
results in Figures 3a and 3b were similar. It could be argued
that at all of these sample sizes, PLS is arithmetically more
accurate than regression, since the PLS line is above the
regression line. However, the differences are quite small.
Discounting the small effect size path (which never achieved
even a 35% statistical power), there are ten possible accuracy
comparisons (large and medium effect sizes times five
different sample sizes). For only one of these ten was the
mean PLS path estimate statistically significant
16
and higher
than the mean regression path estimate. That was at n = 90
and medium effect size.
More importantly, the LISREL path estimates were consis-
tently more accurate than those of PLS. Again, discounting
the small effect size path, for nine of the ten possible
comparisons the mean LISREL path estimate was statistically
significantly higher than the mean PLS path estimate. (The
exception was at n = 20 and medium effect size.) We also
note that at n = 40 and n = 20, the LISREL estimates should
be discounted because that is far below any recommended
sample size for LISREL. However, PLS and regression dont
fare much better at those sample sizes!
Similar to Chin and Newsted (1999) and Cassel et al. (1999),
we found that for any given sample size, by averaging across
all 500 data sets, inaccuracies of the individual data sets tend
to cancel each other out (some high and some low), and for all
three statistical techniques, the average across 500 data sets
seems to be robust to reductions in sample size, at least down
to n = 40. The picture changes when we look at standard
deviations of the 500 path estimates. These are shown for the
Gamma1 path (large effect size) in Figure 3c and for the
Gamma2 path (medium effect) in Figure 3d. These graphs
give a sense of how wild the 500 individual path estimates
can be as sample size goes down. From n = 200 to n = 20, the
standard deviations for regression and PLS increase by a
factor of about 3, and for LISREL by a factor of almost 5.
Even though the bias when averaged across 500 data sets is
robust, the findings for individual data sets are not robust with
respect to sample size. Below about n = 90, the chance of
having an accurate path estimate for an individual sample
decreases markedly for all three techniques.
Simple Model: Detecting Paths That Do Exist. Power is
the proportion (of the 500 data sets) for which the t statistic
for true paths exceeds the p < .05 level.
17
We determined
12
Different sample size guidelines for LISREL have been suggested. These
include at least 100 (Hair et al. 1998), at least 150 (Bollen 1989), at least 200
(for degrees of freedom of 55 or lower) (MacCallum et al. 1996), or five
times the number of parameters to be estimated (Bagozzi and Edwards 1998).
13
We used SAS 9.2.
14
We used PLS-Graph 3.0, build 1130 (Chin 2001) for all analyses reported.
15
We used LISREL with the default, maximum likelihood estimation, the
most common choice. With LISREL and other CB-SEM approaches,
identification is an issue. Figure 1 can be thought of as two separate models:
one model with one construct and 3 indicators and a second model with three
constructs and two or more (in this case, three) indicators each. Each of these
is identified, as per Kline (1998, p. 203).
16
We used a two sample t test for equal means.
17
In some conditions (for Study 1 it was the n = 40 and n = 20 conditions),
some LISREL data sets did not converge or resulted in an inadmissible
solution. We included these data sets in the calculation of power, counting
them as data sets that did not detect a significant path. Had we calculated
power looking only at the LISREL data sets that produced admissible
solutions, LISRELs power would have been higher.
988 MIS Quarterly Vol. 36 No. 3/September 2012
Goodhue et al./PLS, Small Sample Size, and Non-Normal Data
Figure 3. Simple Model, Normal Distributions (Each point on the graphs represents 500 data sets at that
condition)
power for regression and LISREL by using the 500 t-statistics
provided by the technique. For PLS we used the boot-
strapping option with 100 resamples for each of the 500
analyses. This means that each reported statistical signifi-
cance value for PLS is based on 50,000 bootstrapping
resamples (500 data sets times 100 resamples). In Appendix
B we address in more depth the reasons for using 100 boot-
strapping resamples instead of a larger number, and provide
a sensitivity analysis to check the impact on our results.
Figures 3e (large effect size) and 3f (medium effect size)
show the power results for the three techniques as the sample
MIS Quarterly Vol. 36 No. 3/September 2012 989
Goodhue et al./PLS, Small Sample Size, and Non-Normal Data
size decreases (right to left) from 200 to 20. Details for the
power analysis are shown in Appendix A, Table A2. To give
a point of comparison, given our base model (Figure 1), a
medium effect size, n = 40, and looking across 500 data sets,
Cohen (1988) would predict a power of about .44, with the
95% confidence interval going from about .40 to about .48.
Similarly, for a large effect size and n = 40, Cohen would
predict power between .78 and .84.
Three things are immediately clear from Figures 3e and 3f.
First, there is almost no difference between the techniques in
their ability to detect true paths for n = 40 and abovethe
three lines are almost on top of each other. Only at n = 20 for
the medium effect size (Figure 3f) are the differences great
enough to be statistically significant. At that sample size,
regressions power of 22% is statistically significantly higher
than PLSs power of 15%, which is statistically significantly
higher that LISRELs power of 10%. Second, all of these
values are abysmalnone are even close to the recommended
power value of 80%. Third, Cohens predictions of power are
remarkably accurate, taking into account effect size, the
overall model, and n.
From Figures 3e and 3f, it is clear that sample sizes smaller
than 90 can make it difficult to detect a medium effect size,
regardless of what technique is used. Following the rule of 10
here (n = 40) with a medium effect size produces a power of
only about 40%. The rule of 5 produces a power of about
20%. The striking result of this analysis is that all three
techniques achieve about the same power at any reasonable
sample size (e.g., 90 or above), and they also achieve nearly
the same level of power at lower sample sizes (e.g., 20 or 40).
If any technique has a power advantage at low sample sizes
it is regression, though all are unacceptable for a medium
effect size at n = 40 or n = 20.
Simple Model: Summary. In our simulation with normally
distributed data, we do not observe any advantage of PLS
over the other two techniques at sample sizes of 90 or above,
nor do we observe any advantage over regression at smaller
sample sizes for either accuracy or power. Clearly, for all
three techniques, individual estimates become increasingly
wild as sample size decreases. In moving from n = 200 to n
= 20, the standard deviations are increased by a factor of at
least 2.5. Therefore, none of the techniques reached an
acceptable level of power for n = 20, and only a large effect
size had acceptable power for n = 40. Instead of PLS demon-
strating greater efficacy, the dominant finding from Study 1
is that all three techniques seem to have remarkably similar
performance, and respond in remarkably similar ways to
decreasing sample size. The only difference suggested is the
greater path estimate accuracy of LISREL.
Study 2: Simple Model and Non-Normal Data
Although PLS may not have an advantage with normally
distributed data, much data in behavioral research is not
normally distributed (Micceri 1989). It may be that the
advantage of PLS is only apparent with non-normally
distributed data. In Study 2 we tested the impact of non-
normal data on the efficacy of our three techniques, gener-
ating our non-normal data using Fleishmans (1978) tables
and approach. Results for Study 2 are displayed in graphical
form in Figure 4. Appendix C has a detailed report of our
findings for the simple model and non-normal data with actual
values in Appendix A, Tables A4 for path estimates and A5
for power. Below we summarize what we found.
Non-Normal Data: Summary. It appears that all three tech-
niques are fairly robust to small to moderate skew or kurtosis
(up to skew = 1.1 and kurtosis = 1.6). However, with more
extremely skewed data (skew = 1.8 and kurtosis = 3.8), all
three techniques suffer a substantial and statistically signifi-
cant loss of power for both n = 40 and n = 90 (the two sample
sizes we tested). For example with n = 90 and medium effect
size, regressions power is 76% with normal data, but drops
to 53% for extremely skewed data. Under the same condi-
tions PLSs power drops from 75% to 48%, while LISREL
drops from 79% to 50%. Although some have argued that
new estimator options are what makes CB-SEM robust to
non-normality (Gefen et al. 2011), we note that we used
maximum likelihood estimate in all of our LISREL analyses.
This suggests that, at least for our simple model, CB-SEM
with maximum likelihood is as resistant to departures from
normality as regression or PLS.
The overall results for the non-normal data do not contradict
what was observed in Study 1. All three techniques seem to
respond in approximately the same way to non-normality.
Regardless of the distribution of the data, none of the tech-
niques has sufficient power to detect a medium effect size at
n = 40. Only at n = 90 do any of the techniques have approxi-
mately an 80% power to detect a medium effect size, and that
is not changed by moderate departures from normality for any
of the techniques.
Study 3: Number of Indicators, Reliability
The data in Studies 1 and 2 utilized three indicators per
construct with loadings of 0.70, 0.80, and 0.90. However,
researchers doing field studies often have a greater number of
indicators, and empirical work has demonstrated that the
number of indicators used in the analysis can impact the path
estimates of PLS (McDonald 1996). It seems important,
990 MIS Quarterly Vol. 36 No. 3/September 2012
Goodhue et al./PLS, Small Sample Size, and Non-Normal Data
Figure 4. Simple Model, Non-Normal Distributions (Each point on the graphs represents 500 data sets
at that condition)
therefore, to extend our previous study by examining the
impact, if any, that changes in the number of indicators have
on the results.
In addition, the reliability of the constructs in the Study 1
model can be considered comfortably high (Cronbachs alpha
= 0.84). To test the impact of less reliable and more diverse
indicators, we generated two additional groups of data sets,
with lower and higher reliabilities (Cronbachs alpha = .74
and .91 respectively). A write-up for all of the Study 3
findings is included in Appendix C, and is summarized below.
Details are in Appendix A, Tables A6 (path estimates) and A7
(power) for the six indicator model, and A8 (path estimates)
and A9 (power) for the lower loadings model.
Number of Indicators and Reliability: Summary. The
doubling of the number of indicators from three to six per
construct gave higher scale reliability (from .84 to .91) and
therefore each technique had slightly higher power. However,
it had no effect on the relative performance of the three
techniques for accuracy or power and none achieved 80%
power at n = 40. Similarly, changing the indicator loadings
from (.7, .8, .9) to (.6, .7, .8) reduced the reliabilities from .84
to .74, and caused all three techniques to lose power.
However (as with the earlier studies) this also did not change
the relative performance. Our results suggest that changing
by moderate amounts the number of indicators or indicator
loadings (both of which affect reliability) does not produce
any evidence of a PLS advantage over regression or LISREL.
MIS Quarterly Vol. 36 No. 3/September 2012 991
Goodhue et al./PLS, Small Sample Size, and Non-Normal Data
Study 4: A More Complex Model
The models that we used in Studies 1, 2, and 3 are quite
simple, with four independent variables and one dependent
variable. PLS may have more relative advantage when
employed with more complicated models. For example, Chin
and Newsted (1999) used a model comprised of multiple
constructs predicting a focal construct which then predicted
multiple dependent constructs.
18
To investigate model complexity, we created a more complex
model as shown in Figure 5. It contains seven constructs that
can be partitioned into three interconnected submodels.
Starting from the bottom and working up, we see that Ksi3
and Eta3 together cause variation in Eta4, but in this case
there is no direct link between Ksi3 and Eta4. Therefore Eta3
fully mediates the relationship between Ksi3 and Eta4.
Moving to the middle of the diagram and looking at Ksi2,
Eta2, and Eta4, we see that Eta2 only partially mediates the
relationship between Ksi2 and Eta4, as there is also a direct
relationship between Ksi2 and Eta4. Finally, at the top of the
diagram looking at Ksi1, Eta1 and Eta4, we see that Eta1 does
not mediate the relationship between Ks1 and Eta4, as there
is no direct relationship between Eta1 and Eta4. All paths
have approximately a medium effect size. See the note in
Figure 5 for more detail.
As before, we generated 500 data sets from the model for
each sample size condition (n = 20, 40, 90, and 150), and
analyzed each sample using all three techniques. Note that
we decided not to test the results at n = 200, since all of our
earlier tests had high power at n = 150 and there was not
much difference in moving from n = 150 to n = 200. Also
note that applying the rule of 10 would give a minimum
sample size of 60 for PLS for this model.
We should note that Reinartz et al, (2009) used a fairly com-
plex model and compared the accuracy and statistical power
of PLS and CB-SEM at sample sizes of 100 and greater.
They concluded that CB-SEM was more accurate, but that
PLS had more statistical power. Since these statistical power
results are quite different from our own results in this paper,
it raises the question of whether the statistical power of either
or both PLS and CB-SEM are highly variable depending upon
the particular model being analyzed. We also note that
Reinartz et al. did not test for false positives.
Arriving at a solution. The findings for the complex model
are quite similar to those for the simpler model. For n = 90
and 150, all techniques arrived at a solution. For n = 20 and
40, LISREL had the expected difficulties; 99 of the n=20 and
two of the n = 40 LISREL runs did not produce a solution.
Accuracy. There are seven true paths in Figure 5 and two
zero (non-existent) paths. The full results for the individual
paths in the complex model are shown in Appendix A, Tables
A10 (accuracy) and A11 (power). To conserve space (and be
respectful of the readers stamina), for accuracy and statistical
power we will look only at two of the seven paths (Ksi3
Eta3 and Eta3 Eta4), with these results shown in Figure 6.
These two paths are quite representative of all seven true
paths. In Figure 7 we move the level of abstraction up and
look at statistical power as averages across all seven true
paths and both false paths.
As shown in Figures 6a and 6b, for both the Ksi3 Eta3 and
Eta3 Eta4 paths, PLS had a slight advantage over
regression in terms of bias (path accuracy), but as before
LISREL had the least bias. We note that at n = 20, LISRELs
bias advantage largely disappeared.
Although the average bias across 500 data sets was reason-
ably robust to decreases in sample size, the standard deviation
across the 500 path estimates was not, repeating what was
found in the simpler model. For all techniques, as the sample
size goes down, the average path estimate standard deviations
go up, as can be seen in Figure 6c and 6d. For all three tech-
niques and for both the paths displayed, the standard deviation
increased by a factor of about 2.5 as sample size dropped
from 150 to 20.
Statistical Power. In terms of statistical power, for n = 90
and above, the three techniques were remarkably similar for
both the Ksi3 Eta3 and Eta3 Eta4 pathsthe three lines
essentially blur together, as can be seen in Figures 6e and 6f.
At n = 40 and below, none of the techniques has acceptable
power, but statistically significant differences do begin to
appear. However, they are inconsistent. At n = 40, for the
Ksi3 Eta3 path, PLSs power (59%) is significantly larger
than LISRELs (53%). But for the Eta3 Eta4 path,
LISRELs power (43%) is significantly larger than PLSs
(35%). At n = 20, again there are inconsistent significant
differences in the power. For the Ksi3 Eta3 path, PLSs
power (37%) is significantly larger than both LISRELs and
regressions, but for the Eta3 Eta4 path, regressions power
(22%) is significantly larger than PLSs (19%) or LISRELs
(15%). We note that at n = 40 and below, none of the tech-
niques has even close to acceptable power.
18
As in their simpler model, Chin and Newsted found that, using their more
complex model (their Figure 4), the mean bias for the PLS path estimates was
again quite robust to reductions in sample size. Although they did not focus
on it, the standard error of those path estimates generally doubled for all paths
as sample size was reduced from 200 to 50. See their Online Appendix,
Tables 1215.
992 MIS Quarterly Vol. 36 No. 3/September 2012
Goodhue et al./PLS, Small Sample Size, and Non-Normal Data
Note: Starting from the bottom of the figure and working up, we see that Ksi3 and Eta3 together can cause variation in Eta 4, but in this
case there is no direct link between Ksi3 and Eta4. Therefore, Eta 3 fully moderates the relationship between Ksi3 and Eta4. Looking at
Ksi2, Eta2, and Eta4, we see that Eta2 only partially moderates the relationship between Ksi2 and Eta4, since there is also a direct
relationship between Ksi2 and Eta4. Finally, looking at Ksi1, Eta1 and Eta4, we see that Eta1 does not moderate the relationship between
Ksi1 and Eta4, as there is no relationship between Eta1 and Eta4.
The actual relationships between the Ksi2, Eta2, and Eta4 constructs are all created to have a medium effect size of .15 (ala Cohen 1988).
The actual relationships between Ksi1, Eta1, and Eta4 are all slightly less than a medium effect size at .12; while the relationships between
Ksi3, Eta3, and Eta4 are all slightly more than a medium effect size at .18. Although not shown, all constructs are measured with three
indicators having .7, .8, and .9 loadings, respectively.
Figure 5. A More Complex Model, the Basis for Study 4 (With three types of mediation)
False Positives. The paths from Eta1 to Eta4 and from Ksi3
to Eta4 are non-existent, allowing us to test for false positives.
The proportion of false positives for these two paths is shown
in Figures 6g and 6h. Here we see something unexpected.
Although no technique consistently shows up as better or
worse than the others, the 95% confidence interval around 5%
was exceeded (i.e., higher than 6.9% false positives) at least
once for each technique. This suggests that in more complex
models, false positives may become a concern.
Average Power Across All Seven True Paths. Given the
inconsistencies in the dominant technique for the power of
individual paths at sample sizes of less than n = 90, a look at
the behavior averaged across all seven true paths in the com-
plex model becomes of interest. These averaged power
values
19
are shown in Figure 7a, and at the bottom of Appen-
dix A Table A11. As expected, for sample sizes n = 90 and
above, the values for the average of the seven true paths are
almost on top of one another, with no statistically significant
19
Each of these power numbers represents a proportion out of 3,500 trials
(7 500), with a 95% confidence interval of plus or minus 1.3%, 1.6%,
1.3%, and .7% for proportion values of 20%, 40%, 80%, and 95%,
respectively.
MIS Quarterly Vol. 36 No. 3/September 2012 993
Goodhue et al./PLS, Small Sample Size, and Non-Normal Data
Figure 6. Complex Model: Selected Paths (Each point on the graphs represents 500 data sets at that
condition)
994 MIS Quarterly Vol. 36 No. 3/September 2012
Goodhue et al./PLS, Small Sample Size, and Non-Normal Data
Figure 7. Complex Model: Averages Across Seven True and Two False Paths (Each point on the graph
represents 3,500 (7a) or 1,000 (7b) data sets)
differences. However, at n = 40, regression is significantly
lower than the other techniques (41.2% versus 43.0%), while
at n = 20, regression is significantly higher than the other
techniques (23.2% versus 19% and 15%). Given that below
n = 90 none of these power values are even close to the target
of 80%, these inconsistent differences may not be very
important.
Average False Positives. Similarly, to increase the level of
abstraction on the erratic picture of false positives, we aver-
aged across both false paths as shown in Figure 7b and Table
A11 of Appendix A. Each combined power score represents
a proportion of false positives (out of 1,000). With 1,000
trials,
20
the 95% confidence interval around .05 goes from 3.6
to 6.4. From Figure 6b, it can be seen that across all sample
sizes regression just barely stays within the safe range of
false positives. For sample sizes of 90 and above, both
LISREL and PLS have too many false positives, especially at
n = 90 where PLS has 7.5% and LISREL 7.1% false positives
(see Table A11). These values are worthy of concern, since
false positives threaten our ability to draw correct conclusions
from our statistical methods.
For n = 20 and n = 40, LISRELs proportion of false positives
increases to about 10%; regression jumps up but then drops
for n = 20; and PLSs drops to an average of 4%. Thus both
LISREL and regression have excessive false positives at n =
40 and below, while PLS has an acceptable number. We do
note that PLSs average power at these sample sizes is quite
low (43% at n = 40 and 26% at n = 20).
Complex Model with Non-Normal Data. There is also the
possibility that a different picture might emerge if the com-
plex model were analyzed with non-normal data, so we tested
this as well. See Table A12 in Appendix A for the specific
results. In fact, all three techniques have about the same per-
centage drop in power with extremely non-normal data,
regardless of whether the simple or the complex model is
used. If any technique has the advantage, it is regression, for
both the simple or complex model. PLS has no apparent
advantage when using more complex models and non-normal
data.
Complex Model Summary. The most prominent finding for
the complex model is that, for the most part, the results
closely mirror what we found with the simpler model.
LISREL has a slight advantage in accuracy across the board.
For power, there are differences between PLS versus regres-
sion or LISREL on particular paths of the modelsome paths
consistently seem to favor PLS, others LISREL. However, an
average across all seven true paths gives statistically
indistinguishable power for all three techniques for both n =
90 and n = 150.
For n = 40, PLS and LISREL power values are identical, and
statistically larger than regressions, although all values are
below 43%. For n = 20, regressions power is significantly
higher than PLS and LISREL, but all values are below 25%.
Interestingly, at each sample size except n = 20, the average
power for the complex model (with its seven approximately
medium effect sized paths) is within a percent or two of the
medium effective size power for the simpler model.
However, there is one striking difference from the simpler
model. With the complex model both PLS and LISREL have
20
The confidence interval around a 5% proportion based on 1,000 trials is +/-
1.4 %.
MIS Quarterly Vol. 36 No. 3/September 2012 995
Goodhue et al./PLS, Small Sample Size, and Non-Normal Data
an unsettling number of false positives, especially at sample
sizes of 90 where PLS has 7.5% and LISREL 7.1%. The
difference between these values and the allowable 5% is
statistically significant. We attribute the somewhat higher
number of false positives to the fact that in each case, the
false predictor constructs were correlated with the true
predictor constructs. This creates some amount of multi-
collinearity and may make the results less stable (see
Goodhue et al. 2011a). This condition may often be present
in more complex models and is a potential concern.
Overall, the results suggest that small sample size and non-
normality have the same effect in the complex model that they
do in the simple model. We again see no advantage for PLS,
except fewer false positives at sample sizes of 40 or less,
where power is below 50% for all techniques.
Post Hoc Analysis: Accuracy
Differences
21
Across all of our studies, LISREL seemed to have the advan-
tage in terms of path estimate accuracy. For the simple
model, Figure 8a shows the average bias at different values of
effect size and reliability. LISREL was closer to the true
value (bias closer to zero) than PLS in 24 of 27
comparisons.
22
Looking just at LISREL and PLS (the two
leading contenders), and taking as the null hypothesis that
either technique had a 50% chance of being the most accurate
in every comparison, we can use the binomial distribution to
ask, what is the likelihood of having only zero, 1, 2, or 3 com-
parisons favoring PLS out of 27 trials under those assump-
tions? With this nonparametric test, we reject the hypothesis
that LISREL is no more accurate than PLS, with a p value of
.00002. The picture is similar if we look at results from the
non-normal data, or the more complex model.
In fact, Figure 8a shows PLS to be quite a bit more similar to
regression than it is to LISREL in terms of accuracy. Given
the apparent fact of LISRELs accuracy advantage over PLS
(and regression), a reasonable question is to ask is, why? A
possible answer comes from other work we have been con-
ducting that suggests that the level of measurement reliability
could hold the answer. It is generally accepted that regression
does not account for measurement reliability in its path
estimates, but that LISREL does. Regression path estimates
are attenuated by measurement error, according to the
following equation from Nunnally and Bernstein (1994, pp.
241, 257):
Apparent Correlation
XY
= Actual Correlation
XY

Square Root(Reliability
X
Reliability
Y
)
Given this equation, and knowing the reliabilities of all of the
constructs in our various studies, we should be able to
adjust the attenuated regression path estimates to the correct
value using the reliability of the constructs in our model.
Figure 8b shows both PLS and regression path estimates
corrected for measurement error and LISREL estimates
unchanged. Looking at Figures 8a and 8b suggests the
following: The negative bias of PLS and regression is propor-
tional to the reliability of the constructs, and is essentially
corrected when Nunnally and Bernsteins equation is used.
The only time this is not true is when a small effect size is
involved, in which case the reliability correction sometimes
seems to over correct for PLS. Figures 8a and 8b suggest that
PLS is more similar to regression than it is to LISREL in the
way in which it compensates for measurement error. In
retrospect, this is not so surprising. It is consistent with both
McDonald (1996, pp. 266-267) and Dijkstra (1983, p. 81)
who suggest that when dealing with multiple indicators for
each construct, as measurement error goes up, both PLS and
regression will suffer increasingly in terms of accuracy in
comparison with LISREL.
Consider the PLS process in mode A (when all constructs
have reflective indicators, which is what our study involves).
As we said earlier in this note, we can think of the PLS
process as having three major steps. The first step is to iterate
through a process to determine the optimal indicator weights
for each construct. The second step uses those weights to
calculate construct scores which are then used in ordinary
least squares regression to determine path estimates. The
third step is to determine the standard deviations of those path
estimates with bootstrapping.
We would submit that if PLS has somehow taken into account
measurement error in determining its path estimates, then it
must have done so in step 1. This has to be the case, since
beyond this point (in step 2) all path estimates are determined
by ordinary least squares regression, which is known to not
compensate for measurement error. But all that comes out of
step 1 is the appropriate weights for the indicators. If PLS
compensates for measurement error, it must do so by
assigning just the right combination of weights. We would
21
An earlier version of the material in this section was published as a
conference paper (Goodhue et al. 2011b).
22
This includes only N = 90, 150, and 200, for each of the three levels of
reliability (.74, .84, .91), and each of three effect sizes (large, medium, and
small). The overall results do not change if we also include the other possible
sample sizes (n = 20 and n = 40 where applicable).
996 MIS Quarterly Vol. 36 No. 3/September 2012
Goodhue et al./PLS, Small Sample Size, and Non-Normal Data
Figure 8. Bias for Effect Size and Alpha, Uncorrected and Corrected (Each point on the graphs
represents 1,500 data sets at that condition)
submit that unless at least one indicator measures the con-
struct without error, there is no conceivable weighting scheme
that will overcome measurement error. In fact, our empirical
evidence seems to suggest that PLS path estimates (like
regression path estimates) do not compensate for measure-
ment error at all, while LISREL path estimates do.
We recognize this may be a controversial statement, since it
is counter to the widespread belief among MIS researchers.
However we are not the first to suggest it. Marcoulides et al.
(2009, p. 172) commented:
We note that in cases where the model errors are not
explicitly taken into account for the estimation of
endogenous latent variables, a new approach pro-
posed by Vittadini et al. (2007) would need to be
used to appropriately determine the PLS model esti-
mates. This is because in PLS[for reflective con-
structs] the model errors are not taken into account.
Our work is a bit more explicit than the Marcoulides et al.
commentwe show that PLS is like regression in ignoring
measurement error in determining its path estimates, and that
the same correction (Nunnally and Bernstein 1994) can be
used in either PLS or regression to address this weakness.
Our findings on measurement error also suggest a different
perspective on Wolds (1982) assertion that PLS has consis-
tency at large. He suggests that given an infinite number of
indicators and an infinite sample size, PLS path estimates will
MIS Quarterly Vol. 36 No. 3/September 2012 997
Goodhue et al./PLS, Small Sample Size, and Non-Normal Data
have no bias. We would point out that if there were an
infinite number of indicators, then Cronbachs alpha would be
1, there would be no measurement error, and Wolds claim
and ours would be the same: no measurement bias in the PLS
path estimates, nor in the regression estimates, nor in the
LISREL estimates. This suggests that saying that PLS has
consistency at large is not actually a unique or useful selling
point.
Limitations and Opportunities for
Future Research
As with any study, there are a number of limitations that may
lead to opportunities for future research. For example, most
of the data generated for use in this study were designed to
have relatively high indicator loadings with few cross-
loadings. While such well-behaved data created a level
playing field, actual field data often exhibits more challenging
characteristics. Future studies could be designed to test the
three techniques across a variety of other data conditions,
including indicators that cross-load (i.e., load on constructs
other than the one they are intended to measure), or constructs
exhibiting multicollinearity. Furthermore, future studies
could examine the testing of more complex models involving
formative measurement (Barclay et al. 1995; Chin 1998;
Petter et al. 2007).
In addition, it is important to recognize that all of our analyses
with the more complex model involved approximately
medium effect sizes. We do not expect that the relative
performance between the three techniques would change
much with different effect sizes, however.
Conclusion
The belief among MIS researchers that PLS has special
powers at small sample size or with non-normal distributions
is strongly and widely held in the MIS research community.
Our study, however, found no advantage of PLS over the
other techniques for non-normal data or for small sample size
(other than the universally stated concern that with smaller
samples, LISREL may not converge). This should not be
surprising; all of these techniques can be thought of as
attempts to make meaningful statements about latent variables
in a population on the basis of a small sample. Even if a true
random sample is drawn (which already suggests problems),
basic statistics tells us that the smaller the N, the less sure we
can be about our estimates.
Earlier in this paper, we indicated that one of our primary
goals was to provide guidance to researchers in terms of
designing studies, selecting statistical analysis techniques, and
interpreting results. We do so here.
Study Design
Although many have argued that there are important differ-
ences between the efficacies of the three techniques under
certain conditions, in our studies what is much more prom-
inent than the differences is the surprising similarity of the
results. In our studies, all suffered from increasingly large
standard deviations for their path estimates as sample sizes
decreased, and from the resulting drop in statistical power.
None could successfully detect medium effect sizes on a
consistent basis using sample sizes recommended by the rule
of 10 or the rule of 5. All techniques showed substantial (and
about equal) robustness in response to small or moderate
departures from normality; all showed significant losses in
response to extreme non-normality.
Recommendation: When determining the minimum sample
size to obtain adequate power, use Cohens approach (regard-
less of the technique to be used). Do not rely on the rule of
10 (or the rule of 5) for PLS. It also might be fruitful for
social science researchers to more precisely identify the
amounts and types of non-normality, so we know better at
what point such problems become threatening.
Interpretation of Results
We do need to consider what our findings for PLS and small
sample size might mean in terms of existing published
research that found statistically significant results using PLS
with small sample size. First, with the simple model, none of
the techniques showed excessive false positives. Even though
with the more complex model we did find evidence of greater
than a 5% occurrence of false positives with all three tech-
niques, with that model and smaller sample size PLS had the
smallest occurrence of false positives. Therefore, overall
there is nothing in our findings to suggest that any previously
reported statistically significant results found with PLS and
small sample sizes are suspect.
23
We note that our findings
suggest that regression would be equally likely to find such
statistically significant relationships at those small sample
23
We note that we did not test PLS using an alternate approach for deter-
mining statistical significance employed by Majchrzak et al. (2005). We are
skeptical of the viability of that approach.
998 MIS Quarterly Vol. 36 No. 3/September 2012
Goodhue et al./PLS, Small Sample Size, and Non-Normal Data
sizes. On the other hand, studies using PLS and small sample
sizes that failed to detect a hypothesized path (e.g., Malhotra
et al. 2007), may well have a false negative.
Recommendation: In studies where small sample sizes were
used (with any of the techniques, including PLS) and a
hypothesized path was not observed to be statistically signi-
ficant, this should not be interpreted as a lack of support for
the hypothesis. In these cases, further testing with larger
sample sizes is probably warranted.
Selecting a Technique
One important difference between the techniques did stand
out. This is the suggestion that PLS, unlike LISREL, seems
not to compensate for measurement error in its path estimates.
In fact, PLS seems to be quite comparable to regression in all
its performance measures, including accuracy of path esti-
mates. If accuracy is a major concern, both PLS and regres-
sion have poorer performance than LISREL. The magnitude
of the difference in path estimates, however, was not great.
Recommendation: Here, interestingly, the three authors of
this note are not in complete agreement. One point of view
suggests that if one is in the early stages of a research investi-
gation and is concerned more with identifying potential
relationships than the magnitude of those relationships, then
regression or PLS would be appropriate. As the research
stream progresses and accuracy of the estimates becomes
more important, LISREL (or other CB-SEM techniques)
would likely be preferable.
The second point of view suggests the following: Both
regression and CB-SEM techniques have received a great deal
of attention from statisticians over the years, and their advan-
tages and limitations are generally well established. PLS has
received much less attention (at least until more recently), and
hence we are still learning about its properties and how it
behaves under various conditions. In addition to the evidence
presented here (with respect to a loss of power at small
sample sizes, etc.) some recent studies have provided prelim-
inary evidence of other potential short-comings, such as
greater susceptibility to multicollinearity problems than CB-
SEM and regression (Goodhue et al. 2011b), lower construct
validity when measurement errors are correlated across con-
structs (Rnkk and Ylitalo 2010), and an inability to detect
mis-specified models (Evermann and Tate 2010). As a result,
one could argue that researchers would be well advised to use
PLS with caution (until more is known about its properties),
and to rely on more well-established techniques under most
circumstances.
In conclusion, we want to stress that in our empirical work,
even though it seems to have no special abilities with respect
to sample size or non-normality, PLS did not perform worse
than the other techniques in terms of statistical power and
avoidance of false positives. These are perhaps the most
important performance attributes of hypothesis testing. PLS
is still a convenient and powerful technique that is appropriate
for many research situations. For example, with complex
research models, PLS may have an advantage over regression
in that it can analyze the whole model as a unit, rather than
dividing it into pieces. However, we found that PLS certainly
is not a silver bullet for overcoming the challenges of small
sample size or non-normality. At reasonable sample sizes,
LISREL has equal power and greater accuracy.
Acknowledgments
Professor Thompson would like to acknowledge the generous
financial support of the Wake Forest Schools of Business in helping
to complete this research project.
References
Barclay, D., Higgins, C., and Thompson, R. 1995. The Partial
Least Squares (PLS) Approach to Causal Modeling: Personal
Computer Adoption and Use as an Illustration, Technology
Studies (2:2), pp. 285-309.
Bagozzi, R. P., and Edwards, J. R. 1998. A General Approach for
Representing Constructs in Organizational Research, Organi-
zational Research Methods (1:1), pp. 45-87.
Bollen, K. A. 1989. Structural Equations with Latent Variables,
New York: Wiley.
Cassel, C., Hackl, P., and Westlund, A. 1999. Robustness of
Partial Least-Squares Method for Estimating Latent Variable
Quality Structures, Journal of Applied Statistics (26:4), pp.
435-446.
Chin, W. W. 1998. The Partial Least Squares Approach to Struc-
tural Equation Modeling, in Modern Methods for Business
Research, G. A. Marcoulides (ed.), London: Psychology Press,
pp. 295-336.
Chin, W. W. 2001. PLS Graph Users Guide, Version 3.0,
Houston, TX: Soft Modeling, Inc.
Chin, W. W., Marcolin, B., and Newsted, P. 2003. A Partial Least
Squares Latent Variable Modeling Approach for Measuring
Interaction Effects: Results from a Monte Carlo Simulation
Study and an Electronic-Mail Emotion/Adoption Study, Infor-
mation Systems Research (14:2), pp. 189-217.
Chin, W. W., and Newsted, P. R. 1999. Structural Equation
Modeling Analysis with Small Samples Using Partial Least
Squares, in Statistical Strategies for Small Sample Research, R.
Hoyle (ed.), Newbury Park, CA: Sage Publications, pp. 307-341.
MIS Quarterly Vol. 36 No. 3/September 2012 999
Goodhue et al./PLS, Small Sample Size, and Non-Normal Data
Cohen, J. 1988. Statistical Power Analysis for the Behavioral
Sciences, Hillsdale, N: Lawrence Erlbaum Associates.
Dijkstra, T. 1983. Some Comments on maximum Likelihood and
Partial Least Squares Methods, Journal of Econometrics (22),
pp. 67-90.
Evermann, J., and Tate, M. 2010. Testing Models or Fitting
Models? Identifying Model Misspecification in PLS, in
Proceedings of the 31
st
International Conference on Information
Systems, St. Louis, MO, December 12-15.
Falk, R. F., and Miller, N. B. 1992. A Primer for Soft Modeling,
Akron, OH: University of Akron Press.
Fleishman, A. I. 1978. A Method for Simulating Non-Normal
Distributions, Psychometrika (43:4), pp. 521-532.
Fornell, C. 1984. A Second Generation of Multivariate Analysis:
Classification of Methods and Implications for Marketing
Research, Working Paper, University of Michigan.
Fornell, C., and Bookstein, F. 1982. Two Structural Equation
Models: LISREL and PLS Applied to Consumer Exit-Voice
Theory, Journal of Marketing Research (19), pp. 440-452.
Fornell, C., and Larcker, D. 1981. Evaluating Structural Equation
Models with Unobservable Variables and Measurement Error,
Journal of Marketing Research (18), pp. 39-50.
Gefen, D., Rigdon, E., and Straub, D. 2011. Editors Comments:
An Update and Extension to SEM Guidelines for Administrative
and Social Science Research, MIS Quarterly (35:2), pp. iii-xiv.
Gefen, D., Straub, D., and Boudreau, M. C. 2000. Structural
Equation Modeling and Regression: Guidelines for Research
Practice, Communications of the Association for Information
Systems (4:Article 7).
Goodhue, D., Lewis, W., and Thompson, R. 2006. Small Sample
Size and Statistical Power in MIS Research, in Proceedings of
the 39
th
Hawaii International Conference on Systems Sciences, R.
Sprague (ed.), Los Alamitos, CA: IEEE Computer Society
Press, January 4-7.
Goodhue, D., Lewis, W., and Thompson, R. 2007. Research Note
Statistical Power in Analyzing Interaction Effects: Questioning
the Advantage of PLS With Product Indicators, Information
Systems Research (18:2), pp. 211-227.
Goodhue, D., Lewis, W., and Thompson, R. 2011a. A Dangerous
Blind Spot in IS Research: False Positives Due to Multi-
collinearity Combined with Measurement Error, in Proceedings
of the 17
th
Americas Conference on Information Systems, Detroit,
MI, August 4-7.
Goodhue, D., Lewis, W., and Thompson, R. 2011b. Measurement
Error in PLS, Regression and CB-SEM, in Proceedings of the
6
th
Mediterranean Conference on Information Systems Sciences,
Limassol, Cyprus, September 3-5.
Goodhue, D., Lewis., W., and Thompson, R. 2012. Comparing
PLS to Regression and LISREL: A Response to Marcoulides,
Chin, and Saunders, MIS Quarterly (36:3), pp. 703-716.
Green, S. B. 199. How Many Subjects Does it Take to Do A
Regression Analysis, Multivariate Behavioral Research (26),
pp. 499-510.
Hayduk, L. A. 1987. Structural Equation Modeling with LISREL,
Baltimore, MD: Johns Hopkins University Press.
Hair, J. F., Jr., Anderson, R. E., Tatham, R. L., and Black, W. C.
1998. Multivariate Data Analysis with Readings (5
th
ed.),
Englewood Cliffs, NJ: Prentice Hall.
Hair, J. F., Ringle, C. M., and Sarstedt, M. 2011. PLS-SEM:
Indeed a Silver Bullet, Journal of Marketing Theory and
Practice (19:2), pp. 139-151.
Hair, J. F., Sarstedt, M., Ringle, C. M., and Mena, J. A. 2011. An
Assessment of the Use of Partial Least Squares Structural
Equation Modeling in Marketing Research, Journal of the
Academy of Marketing Science, Online Publication (DOI
10.1007/s11747-011-0261-6).
Kahai, S. S., and Cooper, R. B. 2003. Exploring the Core
Concepts of Media Richness Theory: The Impact of Cue Multi-
plicity and Feedback Immediacy on Decision Quality, Journal
of Management Information Systems (20:1), pp. 263-299.
Kline, R. B. 1998. Principles and Practice of Structural Equation
Modeling, New York: Guilford Press.
Lohmller, J. B. 1988. The PLS Program System: Latent Vari-
ables Path Analysis with Partial Least Squares Estimation,
Multivariate Behavioral Research (23), pp. 125-127.
MacCallum, R., Browne, M., and Sugawara, H. 1996. Power
Analysis and Determination of Sample Size for Covariance
Structure Modeling, Psychological Methods (1:2), pp. 130-149.
Majchrzak, A., Beath, C. M., Lim, R. A., and Chin, W. W. 2005.
Managing Client Dialogues During Information Systems Design
to Facilitate Client Learning, MIS Quarterly (29:4), pp. 653-672.
Malhotra, A., Gosain, S., and El Sawy, O. 2007. Leveraging
Standard Electronic Business Interfaces to Enable Adaptive
Supply Chain Partnerships, Information Systems Research
(18:3), pp. 260-279.
Marcoulides, G. A., Chin, W. W., and Saunders, C. 2009. Fore-
word: A Critical Look at Partial Least Squares Modeling, MIS
Quarterly (33:1), pp. 171-175.
Marcoulides, G. A., and Saunders, C. 2006. Editors Comments:
PLS: A Silver Bullet?, MIS Quarterly (30:2), pp. iii-ix.
McDonald, R. P. 1996. Path Analysis with Composite Variables,
Multivariate Behavioral Research (31:2), pp. 239-270.
Micceri, T. 1989. The Unicorn, the Normal Curve, and Other
Improbable Creatures, Psychological Bulletin (105:1), pp.
156-166.
Nunnally, J. C., and Bernstein, I. H. 1994. Psychometric Theory
(3
rd
ed.), New York: McGraw-Hill.
Petter, S., Straub, D., and Rai, A. 2007. Specifying Formative
Constructs in Information Systems Research, MIS Quarterly
(31:4), pp. 623-656.
Reinartz, W., Haenlein, M., and Henseler, J. 2009. An Empirical
Comparison of the Efficacy of Covariance-Based and Variance-
Based SEM, 2009, International Journal of Research in
Marketing (26), pp. 332-344.
Ringle, C., Sarstedt, M., and Straub, D. 2012. Editors Comments:
A Critical Look at the Use of PLS-SEM in MIS Quarterly, MIS
Quarterly (36:1), pp. iii-xiv.
Rivard, S., and Huff, S. 1988. Factors of Success for End-User
Computing, Communications of the ACM (31:5), pp. 552-561.
1000 MIS Quarterly Vol. 36 No. 3/September 2012
Goodhue et al./PLS, Small Sample Size, and Non-Normal Data
Rnkk, M, and Ylitalo, J. 2010. Construct Validity in Partial
Least Squares Path Modeling, in Proceedings of the 31
st
International Conference on Information Systems, St. Louis,
MO, December 12-15.
Wold, H. O. 1982. Soft Modeling: The Basic Design and Some
Extensions, in Systems Under Indirect Observation: Causality,
Structure, Prediction, Part II, K. G. Jreskog and H. Wold (eds.),
Amsterdam: North-Holland.
Wood, R. E., Goodman, F. S., Beckmann, N., and Cook, A. 2008.
Mediation Testing in Management Research: A Review and
Proposals, Organizational Research Methods (11:2), pp
270-295.
Vittadimi, G., Minotti, S. I., Fattore, M., and Lovaglio, P. G. 2007.
On the Relationship Among Latent Variables and Residuals in
PLS Path Modeling: The Formative-Reflective Scheme,
Computational Statistics and Data Analysis (51:12), pp.
5828-5846.
Zhang, T., Agarwal, R., and Lucas, H., Jr. 2011. The Value of
IT-Enabled Retailer Learning: Personalized Product Recommen-
dations and Customer Store Loyalty in Electronic Markets, MIS
Quarterly (35:4), pp. 859-882.
About the Authors
Dale Goodhue is the MIS Department head, and the C. Herman and
Mary Virginia Terry Chair of Business Administration at the
University of Georgias Terry College of Business. He has pub-
lished in journals including Management Science, MIS Quarterly,
Information Systems Research, Decision Sciences, and Sloan
Management Review. Dales research interests include measuring
impacts of information systems, the impact of tasktechnology fit on
individual performance, the management of data and other IS
infrastructures/resources, the impacts of enterprise systems on
organizations, and the strengths and weaknesses of various statistical
techniques.
William Lewis is an independent consultant and a former assistant
professor of MIS in the Department of Management and Information
Systems at Louisiana Tech University. His research has appeared
in several journals including MIS Quarterly and Communications of
the ACM. Williams research interests include business continuity
planning, individual technology adoption in organizations, IS
leadership, and research methodology.
Ronald L. Thompson is a professor in the Schools of Business at
Wake Forest University. His research has been published in a
variety of journals, including MIS Quarterly, Information Systems
Research, Journal of Management Information Systems, and
Information & Management. Ron is currently a senior editor for
MIS Quarterly, and formerly served on the editorial board for the
Journal of the Association for Information Systems. He holds a
Ph.D. from the Ivey School at the University of Western Ontario.
His current research interests include managing information systems
projects and the use of advanced data analysis techniques in
Information Systems research.
MIS Quarterly Vol. 36 No. 3/September 2012 1001
1002 MIS Quarterly Vol. 36 No. 3/September 2012
RESEARCH NOTE
DOES PLS HAVE ADVANTAGES FOR SMALL SAMPLE
SIZE OR NON-NORMAL DATA?
Dale L. Goodhue
Terry College of Business, MIS Department, University of Georgia,
Athens, GA 30606 U.S.A. {dgoodhue@terry.uga.edu}
William Lewis
{william.w.lewis@gmail.com}
Ron Thompson
Schools of Business, Wake Forest University,
Winston-Salem, NC 27109 U.S.A. {thompsrl@wfu.edu}
Table of Contents
Appendix A. Results for Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A2
Appendix B. Use of 100 Bootstrapping Resamples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A12
Appendix C. Detailed Presentation, Studies 2 and 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A13
Appendix D. SAS Program Used to Generate the Data for this Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A15
Appendix A. Results for Studies
Table A1. Study 1: Path Estimates for Simple Model (3 Indicators)
Table A2. Study 1: Power Results for Simple Model (3 Indicators)
Table A3. Study 1: R
2
In the Simple Model (3 Indicators)
Table A4. Study 2: Path Estimates for Simple Model, Non-Normal Data
Table A5. Study 2: Power Results for Simple Model, Non-Normal Data
Table A6. Study 3(a): Path Estimates for Simple Model (6 Indicators)
Table A7. Study 3(a): Power Results for Simple Model (6 Indicators)
Table A8. Study 3(b): Path Estimates for Simple Model (3 Indicators, Lower Loadings)
Table A9. Study 3(b): Power Results for Simple Model (3 Indicators, Lower Loadings)
Table A10. Study 4: Path Estimates for More Complex Model
Table A11. Study 4: Power Results for More Complex Model
Table A12. Study 4: Power Results for Complex Model and Non-Normal Data
MIS Quarterly Vol. 36 No. 3Appendices/September 2012 A1
Goodhue et al./PLS, Small Sample Size, and Non-Normal Data
Appendix A
Results for Studies
Table A1. Study 1: Path Estimates for Simple Model (3 Indicators)*
Gamma1: Large Effect Size
n = 20 40 90 150 200
Actual .480 .480 .480 .480 .480
Regression .383 .398 .391 .393 .392
PLS .388 .403 .399 .400 .399
LISREL .456

.486

.485 .484 .484


Gamma2: Medium Effect Size
n = 20 40 90 150 200
Actual .314 .314 .314 .314 .314
Regression .258 .255 .258 .254 .256
PLS .262 .273 .270 .263 .262
LISREL .290

.317

.320 .315 .314


Gamma3: Effect Size
n = 20 40 90 150 200
Actual .114 .114 .114 .114 .114
Regression .081 .099 .099 .096 .089
PLS .090 .106 .115 .107 .099
LISREL .107

.116

.120 .119 .110


Gamma4: No Effect
n = 20 40 90 150 200
Actual .000 .000 .000 .000 .000
Regression -.010 -.009 -.001 -.002 -.002
PLS -.011 -.017 .003 -.002 -.003
LISREL -.012 -.010 .000 -.004 -.003
*Each construct is measured with three indicators, with loadings of .70, .80, and .90.

N = 20 or N = 40 is far below any recommended minimum sample size for LISREL.


A2 MIS Quarterly Vol. 36 No. 3Appendices/September 2012
Goodhue et al./PLS, Small Sample Size, and Non-Normal Data
Table A2. Study 1: Power Results for Simple Model (Three Indicators)
Gamma1: Large Effect Size
n = 20 40 90 150 200
Predicted* <.46

.78 .99 .99 .99
95% C.I. N/A (.74, .82) (.98, 1.0) (.98, 1.0) (.98, 1.0)
Regression .39 .75 .99 1.00 1.00
PLS .31 .76 .98 1.00 1.00
LISREL .18

.69

.98 1.00 1.00


Gamma2: Medium Effect Size
n = 20 40 90 150 200
Predicted* <.25

.44 .81 .96 .99


95% C.I. N/A (.40, .48) (.78, .84) (.94, .98) (.98, 1.0)
Regression .22 .40 .76 .92 .97
PLS .15 .41 .75 .92 .97
LISREL .10

.37

.79 .95 .98


Gamma3: Small Effect Size
n = 20 40 90 150 200
Predicted* < .25

< .25

< .25

< .25

.33
95% C.I. N/A N/A N/A N/A (.29, .37)
Regression .07 .09 .16 .27 .29
PLS .04 .10 .18 .28 .29
LISREL .06

.11

.19 .28 .31


Gamma4: No Effect
n = 20 40 90 150 200
Allowable .05 .05 .05 .05 .05
95% C.I. (.03, .07) (.03, .07) (.03, .07) (.03, .07) (.03, .07)
Regression .05 .06 .05 .04 .03
PLS .04 .06 .05 .05 .04
LISREL .05

.08

.05 .04 .04


*The predicted values for the power results are for normally distributed data, and are derived from Cohen (1988).

N = 20 or N = 40 is far below any recommended minimum sample size for LISREL.

Cohens tables do not go to this low of a sample size.


MIS Quarterly Vol. 36 No. 3Appendices/September 2012 A3
Goodhue et al./PLS, Small Sample Size, and Non-Normal Data
Table A3. Study 1: R
2
in the Simple Model (Three Indicators)
Regression PLS LISREL
Sample Size R
2
* Adjusted R
2
** R
2
* Adjusted R
2
** R
2
* Adjusted R
2
**
20 .37 .20 .50 .36 .52 .39
40 .31 .23 .38 .31 .45 .39
90 .26 .22 .30 .26 .39 .36
150 .25 .23 .27 .25 .37 .35
200 .24 .23 .26 .25 .36 .35
*While regression users typically report adjusted R
2
values, users of PLS or LISREL typically report the only R
2
value produced by the software
package, the unadjusted R
2
.
**Adjusted R
2
for PLS and LISREL was calculated using the formula:
1 1
2

( ) R
n
n p
i
where i is one if there is an intercept or zero if there is not an intercept; p is the total number of regressors in the model (excluding the
constant term) and n is the sample size (SAS Users Guide: Statistics, Version 5, 1985, p. 715).
Table A4. Study 2: Path Estimates for Non-Normal Data (Normal, Average Skew, Moderate Skew,
Extreme Skew, High Negative Kurtosis)
Gamma1: Large Effect Size
n = 40 90
Actual* .480 .480
Normal
Ave.
Skew
Mod.
Skew
Extrm
Skew
Negative
Kurt. Normal
Ave.
Skew
Mod.
Skew
Extrm
Skew
Negative
Kurt.
Reg. .398 .380 .379 .330 .360 .391 .390 .393 .334 .368
PLS .403 .391 .390 .351 .375 .399 .401 .399 .350 .376
LISREL .486* .478* .500* .407* .455* .485 .487 .484 .428 .461
Gamma2: Medium Effect Size
n = 40 90
Actual* .314 .314
Normal
Ave.
Skew
Mod.
Skew
Extrm
Skew
Negative
Kurt. Normal
Ave.
Skew
Mod.
Skew
Extrm
Skew
Negative
Kurt.
Reg. .255 .259 .249 .208 .246 .258 .253 .255 .198 .246
PLS .273 .278 .275 .233 .270 .270 .269 .267 .220 .262
LISREL .317* .311* .413* .269* .306* .320 .321 .312 .254 .312
Gamma3: Small Effect Size
n = 40 90
Actual* .114 .114
Normal
Ave.
Skew
Mod.
Skew
Extrm
Skew
Negative
Kurt. Normal
Ave.
Skew
Mod.
Skew
Extrm
Skew
Negative
Kurt.
Reg. .099 .089 .096 .075 .084 .099 .098 .089 .099 .068
PLS .106 .107 .107 .088 .090 .115 .102 .112 .080 .099
LISREL .116* .111* .072* .107* .106* .120 .110 .125 .120 .085
Gamma4: No Effect
n = 40 90
Actual* .000 .000
Normal
Ave.
Skew Mod.
Extrm
Skew
Negative
Kurt. Normal
Ave.
Skew
Mod.
Skew
Extrm
Skew
Negative
Kurt.
Reg. -.009 .002 -.008 .004 -.002 -.001 .002 .001 .000 -.011
PLS -.017 .006 -.005 -.004 .002 .003 .002 .005 -.002 -.011
LISREL -.010* -.005* .058* .005* -.004* .000 .001 .005 -.004 -.009
*N = 40 is far below any recommended minimum sample size for LISREL.
A4 MIS Quarterly Vol. 36 No. 3Appendices/September 2012
Goodhue et al./PLS, Small Sample Size, and Non-Normal Data
Table A5. Study 2: Power Results for Non-Normal Data (Normal, Average Skew, Moderate Skew,
Extreme Skew, High Negative Kurtosis)
Gamma1: Large Effect Size
n = 40 90
Predicted

.78 (for normal distributions) .99 (for normal distributions)


95% C.I. (.74, .82) (.98, 1.00)
Normal
Ave.
Skew
Mod.
Skew
Extrm
Skew
Negative
Kurt. Normal
Ave.
Skew
Mod.
Skew
Extrm
Skew
Negative
Kurt.
Reg. .75 .70 .67 .53 .65 .99 .97 .97 .87 .97
PLS .76 .71 .68 .51 .65 .98 .97 .98 .88 .97
LISREL .69* .64* .62* .39* .54* .98 .97 .98 .87 .97
Gamma2: Medium Effect Size
n = 40 90
Predicted

.44 (for normal distributions) .80 (for normal distributions)


95% C.I. (.40, .48) (.77, .85)
Normal
Ave.
Skew
Mod.
Skew
Extrm
Skew
Negative
Kurt. Normal
Ave.
Skew
Mod.
Skew
Extrm
Skew
Negative
Kurt.
Reg. .40 .39 .37 .28 .36 .76 .74 .76 .53 .71
PLS .41 .42 .39 .24 .39 .75 .76 .74 .48 .71
LISREL .37* .40* .36* .22* .33* .79 .78 .75 .50 .73
Gamma3: Small Effect Size
n = 40 90
Predicted

< .25 < .25


95% C.I. N/A N/A
Normal Ave.
Skew
Mod.
Skew
Extrm
Skew
Negative
Kurt.
Normal Ave.
Skew
Mod.
Skew
Extrm
Skew
Negative
Kurt.
Reg. .09 .09 .11 .07 .07 .16 .17 .15 .12 .14
PLS .10 .13 .10 .07 .08 .18 .17 .19 .12 .15
LISREL .11* .10* .09* .08* .08* .19 .16 .20 .11 .15
Gamma4: No Effect
n = 40 90
Allowable .05 .05
95% C.I. (.03, .07) (.03, .07
Normal
Ave.
Skew
Mod.
Skew
Extrm
Skew
Negative
Kurt. Normal
Ave.
Skew
Mod.
Skew
Extrm
Skew
Negative
Kurt.
Reg. .06 .05 .04 .05 .06 .05 .04 .06 .05 .06
PLS .06 .07 .06 .06 .07 .05 .07 .06 .06 .06
LISREL .08* .07* .05* .06* .06* .05 .07 .05 .06 .06
*N = 40 is far below any recommended minimum sample size for LISREL.

All of the predicted values for the power are for normally distributed data, and are derived from Cohen (1988). Obviously, the predicted value
may not apply for non-normal data.
MIS Quarterly Vol. 36 No. 3Appendices/September 2012 A5
Goodhue et al./PLS, Small Sample Size, and Non-Normal Data
Table A6. Study 3(a): Path Estimates for Simple Model (Six Indicators)*
Gamma1: Large Effect Size
n = 40 90 150 200
Actual .480 .480 .480 .480
Regression .432 .431 .435 .427
PLS .431 .432 .438 .430
LISREL .482

.482 .486 .477


Gamma2: Medium Effect Size
n = 40 90 150 200
Actual .314 .314 .314 .314
Regression .283 .280 .280 .281
PLS .297 .291 .287 .287
LISREL .317

.314 .312 .314


Gamma3: Small Effect Size
n = 40 90 150 200
Actual .114 .114 .114 .114
Regression .099 .104 .100 .106
PLS .113 .121 .115 .121
LISREL .112

.118 .111 .118


Gamma4: No Effect
n = 40 90 150 200
Actual .000 .000 .000 .000
Regression .006 -.002 .001 -.001
PLS .004 -.001 -.004 -.005
LISREL .009

.002 .001 .009


*Each construct is measured with six indicators: two with loadings set at .70, two at .80, and two at .90.

N = 40 is far below any recommended minimum sample size for LISREL.


A6 MIS Quarterly Vol. 36 No. 3Appendices/September 2012
Goodhue et al./PLS, Small Sample Size, and Non-Normal Data
Table A7. Study 3(a): Power Results for Simple Model (Six Indicators)*
Gamma1: Large Effect Size
n = 40 90 150 200
Regression .84 .99 1.00 1.00
PLS .85 .99 1.00 1.00
LISREL .81

.99 1.00 1.00


Gamma2: Medium Effect Size
n = 40 90 150 200
Regression .47 .85 .98 1.00
PLS .48 .86 .98 .99
LISREL .45

.87 .98 1.00


Gamma3: Small Effect Size
n = 40 90 150 200
Regression .10 .20 .31 .40
PLS .12 .20 .30 .40
LISREL .09

.20 .30 .42


Gamma4: No Effect
n = 40 90 150 200
Allowable .05 .05 .05 .05
95% C.I. (.03, .07) (.03, .07) (.03, .07) (.03, .07)
Regression .05 .06 .06 .04
PLS .06 .06 .05 .04
LISREL .05

.05 .06 .04


*Each construct is measured with six indicators: two with loadings set at .70, two at .80, and two at .90.

N = 40 is far below any recommended minimum sample size for LISREL.


MIS Quarterly Vol. 36 No. 3Appendices/September 2012 A7
Goodhue et al./PLS, Small Sample Size, and Non-Normal Data
Table A8. Study 3(b): Path Estimates for Simple Model (3 Indicators, Lower Loadings)*
Gamma1: Large Effect Size
n = 40 90 150 200
Actual .480 .480 .480 .480
Regression .352 .341 .347 .346
PLS .386 .355 .357 .354
LISREL .601

.487 .490 .486


Gamma2: Medium Effect Size
n = 40 90 150 200
Actual .314 .314 .314 .314
Regression .222 .226 .224 .225
PLS .245 .248 .238 .236
LISREL .202

.322 .318 .315


Gamma3: Small Effect Size
n = 40 90 150 200
Actual .114 .114 .114 .114
Regression .085 .088 .085 .077
PLS .100 .108 .101 .089
LISREL .176* .121 .120 .109
Gamma4: No Effect
n = 40 90 150 200
Actual .000 .000 .000 .000
Regression -.007 -.001 .000 -.002
PLS .000 .001 .002 .000
LISREL .066

-.001 .003 -.003


*Each construct is measured with three indicators, with loadings of .60, .70 and .80.

N = 40 is far below any recommended minimum sample size for LISREL.


A8 MIS Quarterly Vol. 36 No. 3Appendices/September 2012
Goodhue et al./PLS, Small Sample Size, and Non-Normal Data
Table A9. Study 3(b): Power Results for Simple Model (3 Indicators, Lower Loadings)*
Gamma1: Large Effect Size
n = 40 90 150 200
Regression .63 .93 1.00 1.00
PLS .63 .93 1.00 1.00
LISREL .41

.92 .99 1.00


Gamma2: Medium Effect Size
n = 40 90 150 200
Regression .29 .64 .83 .93
PLS .30 .64 .83 .92
LISREL .18

.62 .83 .93


Gamma3: Small Effect Size
n = 40 90 150 200
Regression .07 .13 .20 .20
PLS .09 .14 .21 .21
LISREL .04
*
.12 .20 .21
Gamma4: No Effect
n = 40 90 150 200
Allowable .05 .05 .05 .05
95% C.I. (.03, .07) (.03, .07) (.03, .07) (.03, .07)
Regression .06 .04 .04 .03
PLS .05 .07 .05 .04
LISREL .05

.04 .04 .04


*Each construct is measured with three indicators, with loadings of .60, .70 and .80.

N = 40 is far below any recommended minimum sample size for LISREL.


MIS Quarterly Vol. 36 No. 3Appendices/September 2012 A9
Goodhue et al./PLS, Small Sample Size, and Non-Normal Data
Table A10. Study 4: Path Estimates for More Complex Model
n = 20 40 90 150
Reg. PLS LISREL Reg. PLS LISREL Reg. PLS LISREL Reg. PLS LISREL
E1E4 .005 -.006 -.004

.021 .020 .009

.010 .010 -.003 .003 .002 -.012


E2E4 .254 .248 .247

.262 .267 .287

.254 .258 .289 .254 .258 .286


E3E4 .280 .282 .288

.254 .263 .311

.254 .273 .322 .263 .269 .319


K1E1 .278 .325 .307

.270 .301 .310

.277 .299 .326 .275 .288 .323


K2E2 .318 .371 .360

.303 .338 .356

.303 .319 .359 .301 .312 .356


K3E3 .327 .377 .357

.328 .367 .383

.334 .354 .395 .332 .345 .395


K1E4 .215 .221 .212

.219 .225 .259

.218 .222 .261 .224 .230 .268


K2E4 .283 .289 .283

.265 .270 .299

.269 .274 .305 .275 .278 .312


K3E4 .006 .002 .001

.018 .016 -.004

.020 .018 .002 .021 .020 .003

N = 20 or n = 40 is far below any recommended minimum sample size for LISREL.


Table A11. Study 4: Power Results for More Complex Model
n = 20 40 90 150
Reg. PLS LISREL Reg. PLS LISREL Reg. PLS LISREL Reg. PLS LISREL
E1E4 0.050 0.016 0.110

0.062 0.052 0.084

0.066 0.080 0.078 0.054 0.056 0.056


E2E4 0.196 0.056 0.166

0.376 0.372 0.400

0.734 0.756 0.730 0.928 0.928 0.904


E3E4 0.216 0.074 0.160

0.374 0.350 0.426

0.788 0.778 0.808 0.928 0.932 0.936


K1E1 0.238 0.290 0.108

0.392 0.458 0.392

0.764 0.780 0.764 0.940 0.946 0.946


K2E2 0.294 0.396 0.128

0.468 0.554 0.468

0.846 0.880 0.866 0.966 0.970 0.974


K3E3 0.318 0.370 0.136

0.546 0.588 0.526

0.930 0.938 0.928 0.992 0.990 0.994


K1E4 0.146 0.056 0.154

0.312 0.306 0.374

0.610 0.606 0.656 0.886 0.890 0.900


K2E4 0.214 0.092 0.180

0.414 0.380 0.422

0.788 0.790 0.750 0.950 0.950 0.938


K3E4 0.066 0.014 0.106

0.072 0.062 0.096

0.052 0.070 0.064 0.074 0.066 0.074


Avg. 7
True
0.232 0.191 0.147

0.412 0.430 0.430

0.780 0.790 0.786 0.941 0.944 0.942


Avg. 2
False
0.058 0.015 0.108

0.067 0.057 0.090

0.059 0.075 0.071 0.064 0.061 0.065

N = 20 or n = 40 is far below any recommended minimum sample size for LISREL.


A10 MIS Quarterly Vol. 36 No. 3Appendices/September 2012
Goodhue et al./PLS, Small Sample Size, and Non-Normal Data
Table A12. Study 4: Complex Model With Non-Normal Data: Power
Comparing the Drop in Power Due to Extremely Skewed Data (Averaged Across 7 True Paths)
With the Drop in Power Due to Extremely Skewed Data in the Simple Model
Complex Model Average Power Across Seven True Paths
N = 40 N = 90
LISREL PLS Regression LISREL PLS Regression
Normal Distribution 0.43

0.43 0.41 0.79 0.79 0.78


Extreme Skew (S = 1.8; K = 3.8) 0.24

0.26 0.28 0.52 0.51 0.59


Reduction in Power Due to Non-
Normal Data
0.19

0.17 0.13 0.27 0.28 0.19


Simple Model Power for Medium Effect Size
N = 40 N = 90
LISREL PLS Regression LISREL PLS Regression
Normal Distribution 0.37

0.41 0.40 0.79 0.75 0.76


Extreme Skew (S = 1.8; K = 3.8) 0.22

0.24 0.28 0.50 0.48 0.53


Reduction in Power Due to Non-
Normal Data
0.15

0.17 0.12 0.29 0.27 0.24

N = 40 is far below any recommended minimum sample size for LISREL.


MIS Quarterly Vol. 36 No. 3Appendices/September 2012 A11
Goodhue et al./PLS, Small Sample Size, and Non-Normal Data
Appendix B
Use of 100 Bootstrapping Resamples
A question could be asked concerning our decision to use 100 bootstrapping resamples for each PLS analysis, rather than a larger number such
as 500 or 1,000. It is true that Chin (1998) recommends using 500 bootstrapping resamples to determine statistical significance, and it is also
true that the bootstrapping literature recommends that amount or more. The reason is that when using a small original sample, or a sample from
a difficult or irregular distribution, 500 resamples gives confidence that the resulting bootstrapping distribution is close to the underlying
distribution. We decided to use only 100 resamples in our analyses of each PLS run for both substantive and practical reasons.
We can explain the substantive reason by looking at the difference between two different research situations. Consider a typical researcher
who was interested in reporting the significance of the Gamma2 link in Figure 1, from his or her sample of 40 questionnaires. Analogously,
we are interested in reporting an estimate of the likely significance that would be found by any researcher studying Gamma2 with 40
questionnaires. Both the typical researcher and we are interested in reporting a single value: the researcher wants to report the t-statistic of
a single data set of n = 40; we want to report the power achieved by the PLS analyses of 500 data sets of n = 40 each.
If that researcher reported a value based on only 100 bootstrapping resamples for their PLS analysis, there is a danger that random chance in
the data analysis could distort the findings. We agree it would be better for that researcher to use 500 or 1,000 resamples to bolster confidence
that the single reported value is very close to the true value.
We also want to report a single value. But we develop that single value from PLS analyses of 500 data sets, and 100 resamples for each of
those. In other words, if the typical researcher used only 100 resamples for his or her data set, the reported value would be based on only 100
resamples. When we use only 100 resamples for each data set, we are also using 500 data sets. Therefore, every reported PLS power value
in our study is based on 50,000 resamples.
To be very clear, in all of our tables every reported value generated by bootstrapping is based on at least 500 samples and 50,000 resamples.
With this much replication, we doubt there is much likelihood the picture would change by doing 5 or 10 times as much replication (i.e., with
500 or 1,000 resamples per data set giving 250,000 or 500,000 bootstrapping resamples for each value reported in our tables). To test this
assumption, we reran the 500 data sets with N = 90 from the more complex model in Figure 4, now using PLS with 1,000 bootstrapping
resamples. The comparison between 100 resamples and 1,000 resamples is shown in Table B1. Each reported value in the 1,000 resamples
columns of Table B1 is based on 500,000 bootstrapping resamples.
The differences between 100 and 1,000 resamples are quite small: the average difference in power is .002; the largest difference is .018. None
of these changes are statistically significant either individually or as a group. Changes of this magnitude do not impact the overall results of
the study. Given the above logical analysis and the spot checking displayed in Table B1, we believe an increase to 500 or 1,000 bootstrapping
resamples is not necessary.
Our practical reason for using 100 resamples is the surprising amount of work that would be involved in using more resamples. Given the large
numbers of runs involved in the total research project, it should be clear to the reader that this type of research would not be practical if it were
not somehow automated. The automation process we used for every PLS analysis in our tables involved creating a large deck file of 50,500
PLS runs; for example, the 3 indicator N = 200 run (including Gamma1 through Gamma4) required us to create a PLS command file or "deck"
file which is .7 gigabytes in size.
The problem with processing files this large is magnified significantly by the fact that PLS-Graph has some peculiarities that cause it to crash
occasionally (especially with larger files). In general for an individual analysis this is not a problemthe same PLS run can be rerun
immediately, and virtually always completes the second time. But when we execute a deck file of 50,500 PLS runs, it virtually always crashes
somewhere in the process. When a crash occurs, our solution is to chop the large deck file into smaller and smaller pieces until no crashes
occur, and then reassemble the results at the end. This, however, moves us away from an automated process, and to a more manual one.
Increasing the number of resamples to 500 would have required the processing of deck files with 252,500 PLS runs each. Although we would
have liked to use 500 or 1,000 resamples instead of 100, it simply wasnt practical to do so; we would have been unable to complete our
analyses.
A12 MIS Quarterly Vol. 36 No. 3Appendices/September 2012
Goodhue et al./PLS, Small Sample Size, and Non-Normal Data
Table B1. Testing PLS Results with 1,000 Bootstrapping Resamples (Complex Model)
n = 90
Path Estimate Power Results
Path 100 resamples
1,000
resamples 100 resamples
1,000
resamples
Power
Difference
Eta1-Eta4 .010 .010 .080 .066 .014
Eta2-eta4 .258 .258 .756 .746 .010
Eta3-Eta4 .273 .273 .778 .790 -.012
Ksi1-Eta1 .299 .299 .780 .776 .004
Ksi2-Eta2 .319 .319 .880 .898 -.018
Ksi3-Eta3 .354 .354 .938 .930 .008
Ksi1-Ksi4 .222 .222 .606 .606 .000
Ksi2-Ksi4 .274 .274 .790 .788 .002
Ksi3-Ksi4 .018 .018 .070 .060 .010
Average .002
Appendix C
Detailed Presentation, Studies 2 and 3
Study 2: Simple Model and Non-Normal Data
Since much of the data used in behavioral research is not normally distributed and PLS could have an advantage with non-normal data, we
tested the impact of non-normal data on the performance of the three techniques with Study 2. To determine initial realistic values of skew
and kurtosis, we first used a convenience sample of four IS-related data sets that we considered as representative of IS research, and then chose
more extreme values of skew and kurtosis from the literature (Fleishman 1978; Vale and Maurelli 1983). For the four representative IS data
sets, we examined the distributions of the data for all of the measurement indicators used in each. From this we obtained values for what we
call the average degree of skewness (0.63) and kurtosis (0.45) across the responses for the indicators. We then added one standard deviation
to both the average skew (0.63 + 0.50 = 1.13) and the average kurtosis (0.45 + 1.16 = 1.61) to determine values for more non-normal
distributions. These two conditions we labeled average skew and moderate skew.
To test the possible influence of more extreme values, we chose the most extreme value of skew and the most extreme value of negative kurtosis
from Fleishmans (1978) table of possible values of skew and kurtosis, resulting in two new conditions to test. The first, which we termed high
skew, had values of 1.8 for skew and 3.8 for kurtosis. The second, which we termed negative kurtosis, had values of .25 for skew and -1.00
for kurtosis.
For these four new conditions of non-normality (each with the values for skew and kurtosis stated above), we used Figure 1 as our model and
Fleishmans tables and his approach for generating non-normal data. Because at n = 20 and n = 150 the results are quite predictable (at n =
20 all techniques have much less than 40% power for a medium effect size; at n = 150 all techniques have power much greater than 80% for
a medium effect size) we focused on n = 40 and n = 90 in this study. Once again, for each distribution and sample size combination, we
generated 500 data sets, and analyzed them as we had for the normally distributed data.
Table A4 (parameter estimates) and Table A5 (power) paralleling the earlier Tables A1 and A2, are included in Appendix A. The key results
for n = 40 and n = 90 can be seen in Figure 4 of the paper proper. The picture is not very different for the two sample sizes. Skew up to about
1.1 and kurtosis up to 1.6 has only a slight impact on the statistical power of the analysis, regardless of which technique is used. When skew
is increased to 1.8 and kurtosis to 3.8, there is a noticeable drop in power and accuracy from what would be seen with normally distributed data.
However, going to the other extreme on kurtosis with a large negative (-1.00 in our study), power and accuracy are not much reduced from
the moderate skew condition.
MIS Quarterly Vol. 36 No. 3Appendices/September 2012 A13
Goodhue et al./PLS, Small Sample Size, and Non-Normal Data
Interestingly, there doesnt seem to be any important difference between the abilities of these three statistical techniques to handle non-normal
data. The one exception to this is that for n = 40, high negative kurtosis, and a medium effect size, LISRELs power (.33) has a statistically
significant drop (p < .05) below PLSs (.39). Since neither of these power values is even close to .80, and since no one would suggest using
LISREL for such small sample sizes, this is not a surprising or interesting finding.
Study 3: Number of Indicators, Reliability
The data in Studies 1 and 2 utilized three indicators per construct with loadings of 0.70, 0.80, and 0.90. However, researchers doing field
studies often have a greater number of indicators available to measure each of the constructs in the research model, and empirical work has
demonstrated that the number of indicators used in the analysis can impact the path estimates of PLS (McDonald 1996). As the number of
indicators used to represent a construct increases, so too does the reliability. Similarly, researchers often work with construct items that have
lower loadings. It seems important, therefore, to extend our previous studies by examining the impact, if any, that changes in the number of
indicators and changes in reliability have on the results.
To explore the impact of increasing the number of indicators, we generated a group of data sets using six indicators per construct (with loadings
of two at .7, two at .8, and two at .9). The results for the path estimates and power are shown in Appendix A, Tables A6 and A7. The results
from these tests were consistent with our previous analyses. Since the number of indicators (and therefore the overall reliability, now .91) of
the construct measures increased, the path estimates and the power were slightly higher than when three indicators were used. Nevertheless,
there were no changes in the relative efficacy of the three analysis techniques. Differences between the three techniques were modest and not
statistically significant. As with Studies 1 and 2, LISREL produced higher and more accurate path estimates for the large and medium effect
sizes at sample sizes of 90 and above; PLS generally produced estimates that were slightly higher than regression.
With respect to power, the same observations held. Increasing the number of indicators did increase the power for all techniquesfor example,
at n = 40 and medium effect size, regressions power went from .40 to .47; the power for PLS went from .41 to .48. However, the larger number
of indicators did not alter the relative efficacy of the three techniques. Furthermore, moving to six indicators did not change the fact that the
rule of 10 is not a useful guide to appropriate sample size when power is a consideration.
The reliability of each of the constructs in the Study 1 model can be considered comfortably high (0.84). To test the impact of less reliable
and more diverse indicators, we generated an additional group of data sets. The indicator loadings were set at 0.6, 0.7, and 0.8 for each of the
constructs, which resulted in a Cronbachs alpha of .74. We note that with lower indicator loadings and a sample size of n = 40, we begin to
run the risk of empirical underidentification in the LISREL runs. Although the Gamma4 submodel is theoretically just identified, if one of
its indicators is not significantly different from zero in a particular run, LISREL may not converge.
The results of the analyses for these data sets are shown in Table A8 for the path estimates and A9 for power, and were also consistent with
previous analyses. Although the path estimates were lower and the percentage of estimates that were statistically significant declined slightly
for the lower loading data set, the overall pattern of results (when compared across the three techniques or against accepted standards) remained
the same.
A14 MIS Quarterly Vol. 36 No. 3Appendices/September 2012
Goodhue et al./PLS, Small Sample Size, and Non-Normal Data
Appendix D
SAS Program Used to Generate the Data for this Study
This appendix presents the SAS program used to generate the data from the simple model for 500 samples of n = 40. Note that we use the SAS
subroutine Rannor (which returns N(0,1) values) separately for every random error variable in the model, each with its own seed. In this way,
values for each indicator are generated as part of its own independent string of random numbers, each with a different starting seed.
We added 5.2 to our indicator scores to translate all of the data points to values between 0 and 10. Note also that amount of random variance
added to each indicator was chosen to bring the total variance of each indicator to 1.0. Since all construct scores and random error scores have
a variance of 1, we get a variance of 1 for the indicators by adding the appropriate portion of the error variance. For example, for X1 have the
following variance: (.70)
2
(1)
2
+ (.714)
2
(1)
2
= 1.0.
DATA Temp.One;
File 'c:\MultiThreadRandomNumbers\MT-RNums500x40.txt';
Retain Seed1 5532806 Seed2 528793 Seed3 523857 seed4 5223263 Seed5 5238837 Seed6 523852187
Seed7 52385427 Seed8 52398857 Seed9 52385797 Seed10 52273857 Seed11 52385347 Seed12 52385794
Seed13 5238537 Seed14 5285957 Seed15 5238657 Seed16 53385794 Seed17 539754437
Seed18 586085957 Seed19 533826357 Seed20 532657;
Do I = 1 to 20000;
Call Rannor(Seed1,KSI1);
Call Rannor(Seed2,X1Err);
Call Rannor(Seed3,X2Err);
Call Rannor(Seed4,X3Err);
X1 = round(.70*KSI1 + .714*X1Err + 5.2);
X2 = round(.80*KSI1 + .60 *X2Err + 5.2);
X3 = round(.90*KSI1 + .436*X3Err + 5.2);
Call Rannor(Seed5,KSI2);
Call Rannor(Seed6, X4Err);
Call Rannor(Seed7, X5Err);
Call Rannor(Seed8, X6Err);
X4 = round(.70*KSI2 + .714*X4Err + 5.2);
X5 = round(.80*KSI2 + .60 *X5Err + 5.2);
X6 = round(.90*KSI2 + .436*X6Err + 5.2);
Call Rannor(Seed9,KSI3);
Call Rannor(Seed10, X7Err);
Call Rannor(Seed11, X8Err);
Call Rannor(Seed12, X9Err);
X7 = round(.70*KSI3 + .714*X7Err + 5.2);
X8 = round(.80*KSI3 + .60 *X8Err + 5.2);
X9 = round(.90*KSI3 + .436*X9Err + 5.2);
Call Rannor(Seed13,KSI4);
Call Rannor(Seed14, X10Err);
Call Rannor(Seed15, X11Err);
Call Rannor(Seed16, X12Err);
X10 = round(.70*KSI4 + .714*X10Err + 5.2);
X11 = round(.80*KSI4 + .60 *X11Err + 5.2);
X12 = round(.90*KSI4 + .436*X12Err + 5.2);
Call Rannor(Seed17,Eta1Err);
Call Rannor(Seed18,Y1Err);
Call Rannor(Seed19,Y2Err);
Call Rannor(Seed20,Y3Err);
MIS Quarterly Vol. 36 No. 3Appendices/September 2012 A15
Goodhue et al./PLS, Small Sample Size, and Non-Normal Data
ETA1 = .48*KSI1 + .314*KSI2 + .114*KSI3 + .000*KSI4 + .811*Eta1Err;
Y1 = round(.70*ETA1 + .714*Y1Err + 5.2);
Y2 = round(.80*ETA1 + .60 *Y2Err + 5.2);
Y3 = round(.90*ETA1 + .036*Y3Err + 5.2);
Put Y1 3.0 Y2 3.0 Y3 3.0 X1 3.0 X2 3.0 X3 3.0 X4 3.0 X5 3.0 X6 3.0 X7 3.0 X8 3.0 X9 3.0 X10 3.0 X11 3.0 X12 3.0;
end;
Generating Non-Normal Data
Replacing the three lines generating values for X1, X2 and X3 on the previous page with the following will generate highly negative kurtosis
data for X1NN, X2NN, and X3NN. Similar code can be used to generate the other non-normal data. Changing the values of a through d
in the code according to the table in Fleishman will give data of different specified characteristics.
X1N = .70*KSI1N + .714*X1ErrN ;
X2N = .80*KSI1N + .60 *X2ErrN ;
X3N = .90*KSI1N + .436*X3ErrN ;
*** Negative Kurtosis Random Set 2007-1: Skew = .25, Kurtosis = -1.00
*** Ala Fleishman, 1978, Pyschometrika, Vol 43, No 4.;
b = 1.263412800;
c = .0774624390;
d = -.1000360450;
a = -c;
X1NN = round(a + b*X1N + c*X1N**2 + d*X1N**3) + 5.2;
X2NN = round(a + b*X2N + c*X2N**2 + d*X2N**3) + 5.2;
X3NN = round(a + b*X3N + c*X3N**2 + d*X3N**3) + 5.2;
References
Chin, W. W. 1998. The Partial Least Squares Approach to Structural Equation Modeling, in Modern Methods for Business Research, G.
A. Marcoulides (ed.), London: Psychology Press, pp. 295-336.
Cohen, J. 1988. Statistical Power Analysis for the Behavioral Sciences, Hillsdale, N: Lawrence Erlbaum Associates.
Fleishman, A. I. 1978. A Method for Simulating Non-Normal Distributions, Psychometrika (43:4), pp. 521-532.
McDonald, R. P. 1996. Path Analysis with Composite Variables, Multivariate Behavioral Research (31:2), pp. 239-270.
SAS Users Guide: Statistics, Version 5, 1985. Cary, NC: SAS Institute, Inc.
A16 MIS Quarterly Vol. 36 No. 3Appendices/September 2012
Copyright of MIS Quarterly is the property of MIS Quarterly & The Society for Information Management and
its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's
express written permission. However, users may print, download, or email articles for individual use.

Anda mungkin juga menyukai