Anda di halaman 1dari 20

Using Correlation and Regression:

Mediation, Moderation, and More


Part 3: Moderation with regression
Claremont Graduate University
Professional Development Workshop
August 22, 2015
Dale Berger, Claremont Graduate University (dale.berger@cgu.edu)
Statistics website: http://wise.cgu.edu
This document is designed to aid note taking during the presentation and to serve as a resource
for later use. It provides selected formulas, figures, SPSS syntax and output, and references, with
much more detail than PowerPoint slides that accompany the presentation.
This document, data files, supplemental reading, and other materials are available on a Google
Drive site for which members of the class will receive a link. If you have difficulty, please
contact me at dale.berger@cgu.edu . We wont cover all of this material in the on-line
presentation.
2
3
3
4

10
10

14
15
16
18

Moderation analysis with regression


Examples of moderation (identify X, Y, and Z)
Model of salary for men and women
SPSS example of moderation with a dichotomous moderator
5 SPSS point-and-click commands
6 SPSS regression output and interpretations
7 Figure presenting the findings
8 Table presenting regression analysis
9 Dummy mediator variable and centered continuous X variable
Multicollinearity and tolerance
SPSS example of moderation with a continuous moderator
11 SPSS output and table for presentation
12 Interpretations
13 Figure presenting the findings
Summary
References
SPSS syntax for moderation analysis
Excel workbook Plotting Regression Interactions

1
Session 2a: Moderation Analysis with Regression

Moderation Analysis with Regression


Group differences in treatment effects often are especially important to measure and understand.
If a treatment has greater effects for women than for men, we say that sex moderates the effects
of the treatment. When there is moderation, it may be misleading to describe overall treatment
effects without taking group membership into account. Moderation analysis can guide decisions
about interpreting effects and redesigning treatments for different groups.
Moderation is interaction. For example,
among eighth grade children, drug use by a
child can be predicted by drug use of their
friends. However, the relationship is weaker
for children who have greater parental
monitoring. This finding can be displayed by
showing the difference in regression lines for
children with high parental monitoring and
children with low parental monitoring.
Parental monitoring moderates the
relationship between drug use by children and
drug use by their friends, such that the
relationship is weaker for children with
stronger parental monitoring.
In general, Variable Z is a moderator of the relationship between X and Y if the strength of the
relationship between X and Y depends on the level of Z. A moderator relationship can be
illustrated with an arrow from Variable Z (the moderator variable) pointing to the arrow that
connects X and Y (see Figure 2). A model can include both mediation and moderation. We can
include a path from X to Z, indicating that Z may also mediate the relationship between X and Y.
The X-Z path would not affect our analysis of the moderation effect. Z can be a moderator even
if it has no direct effect on Y and thus no mediation effect. The effect of X on Y for a specific
value of Z is called a simple effect of X on Y for that value of Z.
Figure 2: Model Showing Z Moderating the X-Y Relationship

Y
2
Session 2a: Moderation Analysis with Regression

Examples of moderation (identify X, Y, and Z):


The impact of a program is greater for younger adolescents than for older adolescents.
X=
Y=
Z=
Learning outcomes are positively related to amount of study time for children who use either
Book A or Book B, but the relationship is stronger for those who use Book B.
X=
Y=
Z=
The relationship between education and occupational prestige is greater for women than for men.
X=
Y=
Z=

Models of Salary for Men and Women


We wish to test the null hypothesis that the relationship between salary and time on the job is the
same for men and women in a large organization. We have data on salary, years on the job, and
gender for a random sample of n=200 employees.
Y = salary in $1000s; X1 = years on the job; X2 = gender (men = 0; women = 1)
To test for an interaction, we create a special interaction term, X3 = X1 * X2.
Regression analysis yielded the following model:
Y' = 55.0 + 1.5X1 -3.4X2 + .7X3
With this model, we can predict salary for any individual if we know X1 and X2 for that person.
For men, X2 = 0, and also X3 = 0 because X3 = X1 * X2 = X1 * 0 = 0.
Thus, for men, the regression model simplifies to Y' = 55.0 + 1.5X1.
For women, X2 = 1, so X3 = X1 * X2 = X1 * 1 = X1. Thus, for women, the regression model
simplifies to Y' = 55.0 + 1.5X1 + (-3.4) + .7X1, or Y' = 51.6 + 2.2X1
We can use the models for men and women to create a diagram showing the simple effects for
men and women. Elements of the original regression equation can be interpreted as follows.
3
Session 2a: Moderation Analysis with Regression

The constant (55.0) is the predicted salary for someone who has values of zero on all predictors.
In this example, men with zero years on the job have X1 = 0, X2 = 0, and X3 = 0, so the constant
is the predicted salary for men with zero years on the job, i.e., 55.0 or $55,000.
If we did not have an interaction term in the model, then both men and women would be given
the same regression coefficient on X1. Because there is an interaction term, the coefficient of 1.5
on X1 applies only to men (who have values of zero on the interaction term), so the model
predicts average salary to be $1500 greater for each year on the job for men. This does not mean
that every individual man will earn $1500 more each year, but rather 1.5 describes the slope of
the best fitting regression line for men in the cross-sectional data. In general, the coefficient on
X1 is the simple effect of X1 when X2 = 0.
The coefficient of -3.4 on X2 is the modeled difference in salary between men and women who
have zero years on the job. This is the difference in the constant for the models for men and
women. It is not a measure of the average sex effect when the interaction term is in the model.
The coefficient of .7 on the interaction term X3
indicates the difference in the regression weight for
men and women. Because X3 = 0 for men and X3 =
X1 for women, the weight on X3 is the additional
weight given to X1 for women. The null hypothesis
for the test of the regression weight on X3 is that this
weight is zero in the population, which would mean
that the slopes of the regression model for men and
women are the same. If this null hypothesis is
rejected, the conclusion is that the slopes are
different. In the example, the slope is .7 greater for
women, indicating that the average increment in
predicted salary per year is $700 greater for women.
In the regression models for men and women, the
weight on X1 is 1.5 for men and 2.2 for women, a
difference of .7.
An assumption for tests of statistical significance is that residuals from the model are reasonably
homogeneous and normally distributed across levels of X1 and X2. If relationships are nonlinear,
the nonlinear components should be included in the model.

SPSS Example: Moderation Effects with a Dichotomous Moderator


Occupational prestige as measured by a standard scale is positively related to years of education,
but is the relationship the same for men and women? For this example, we can use data from a
national sample of U.S. adults given in 1991 U.S. General Social Survey.SAV, as provided by
SPSS. This sample includes n=1415 cases with complete data on the three variables of
occupational prestige, years of education, and gender.

4
Session 2a: Moderation Analysis with Regression

In this example, the dependent variable (Y) is occupational prestige and the independent variable
(X) is years of education, while gender is a potential moderator (Z). Moderation in this example
is indicated by an interaction between X and Z in predicting Y, which would indicate that the
relationship between X and Y depends on the level of Z. A special term must be constructed to
represent the interaction of X and Z. With regression we can test whether this term contributes
beyond the main effects of X and Z in predicting Y. Mathematically, the interaction term can be
computed as the product of X and Z. To demonstrate how this works in our example, we
compute the product of Education (X) and Sex (Z) to create a new variable that we name EdxSex
(XZ), and we include EdxSex in a final model to predict Y.

SPSS Commands (Point and Click):


Call up SPSS and the GSS1991 data file (available online under computer files for this course,
select file 1991 U.S. General Social Service.sav).
First, create the interaction term. Click Transform, Compute, in the Target Variable: window
enter EdxSex, select educ and click the black triangle to enter educ into the Numeric
Expression: window, click *, select sex and click the triangle. This should give educ * sex in the
Numeric Expression: window. This expression can be typed into the window instead of selecting
and clicking. You can click OK to run this computation, or you can click Paste to save the syntax
in a syntax file to be run later. If you use Paste, go to the syntax window and run this
computation because we need the new variable EdxSex for the next analysis. (Highlight the
compute statement and the Execute command, press the triangle to run.)
The regression analysis to test interactions is hierarchical, whereby we must enter the main
effects of education and sex before we enter the interaction.
Click Analyze, Regression, Linear, select prestg80 and click the black triangle to enter it as
the Dependent variable. Select educ and click the triangle to enter educ as the first independent
variable. Click Next to go to the second block; select sex and click the triangle to enter sex as the
second independent variable. Click Next to go to the third block; select EdxSex and click the
triangle to enter EdxSex as the third independent variable in a hierarchical analysis.
For illustration, we will ask for a lot of statistics. Click Statistics, select Estimates, Model fit,
R squared change, Descriptives, Part and partial correlations, and Collinearity diagnostics, and
click Continue.
Click Plots, select *ZRESID as the Y variable and *ZPRED as the X variable, check Histogram,
and click Continue. Click Paste to save the syntax.
Go to the syntax window and run the regression analysis. Table 1 shows selected SPSS output.

5
Session 2a: Moderation Analysis with Regression

Table 1: Test of Moderation Effects with a Dichotomous Moderator (N=1415)


Coe fficientsa

(Cons tant)
Highes t Year of
School Completed
(Cons tant)
Highes t Year of
School Completed
Respondent's Sex
(Cons tant)
Highes t Year of
School Completed
Respondent's Sex
EDXSEX

Unstandardiz ed
Coefficients
Std.
B
Error
13.079
1.340

Standa
rdized
Coeffic
ients
Beta

Collinearity
Statistic s

Correlations
t
9.761

Sig.
.000

22.864

.000

8.466

.000

Zeroorder

Partial

Part

.520

.520

.520

Toleranc e

VIF

2.295

.100

14.294

1.688

2.286

.101

.518

22.732

.000

.520

.518

.517

.995

1.005

-.709
22.403

.600
4.300

-.027

-1.182
5.210

.237
.000

-.063

-.031

-.027

.995

1.005

1.668

.318

.378

5.247

.000

.520

.138

.119

.099

10.1

-6.083
.412

2.689
.201

-.231
.244

-2.262
2.050

.024
.041

-.063
.255

-.060
.054

-.051
.047

.049
.036

20.2
27.5

.520

1.000 1.000

a. Dependent Variable: R's Occ upational Prestige Sc ore (1980)

The first model uses only Education (X) to predict Y (Occupational Prestige). We see that
education is a strong predictor, with r = beta = .520, t(1413) = 22.864, p < .001.
The second model predicts Occupational Prestige (Y) from the additive effects of Education (X)
and Sex (Z), assuming no moderation. From Unstandardized Coefficients we find the following:
Predicted Y = = B0 + B1X + B2Z ;

= 14.294 + 2.286X - .709Z

The coefficient B1 = 2.286 can be interpreted as indicating that for either males or females, one
additional unit of X (one more year of education) is associated with 2.286 more units of
predicted Y (+2.286 on the Occupational Prestige scale). However, if there is an interaction, the
model may be misleading. If the effects of education are different for males and females, this
simple model is not accurate for either group.
The third model in Table 1 includes the interaction term, resulting in the following equation:
= B0 + B1X + B2Z + B3XZ ;

= 22.403 + 1.668X 6.083Z + .412XZ

The test of statistical significance of the interaction term yields t(1411)=2.050, p=.041. We
conclude that the relationship between Education and Occupational Prestige differs for males and
females. This also means that the sex difference on occupational prestige varies with level of
education. (Statistical significance doesnt necessarily indicate a large or important effect.)
6
Session 2a: Moderation Analysis with Regression

How does this work? It is instructive to compute the regression equations separately for males
and females. In this data set, Sex (Z) is coded Z=1 for males and Z=2 for females. Thus, for
males the equation reduces to = 22.403 + 1.668X (6.083)(1) + .412X(1), which can be
written as = 22.403 6.083*1 + 1.668*X + .412*X*1, or m = 16.320 + 2.080X.
For females the equation is = 22.403 + 1.668X (6.083)(2) + .412X(2), which can be
written as = 22.403 12.166 + 1.668X + .824X, or f = 10.237 + 2.492X.
The weight on the XZ interaction term (B3 = .412) is the difference in the regression weight on X
for females and males (2.492 vs. 2.080). Thus, a test of B3 is a test of the sex difference in the
regression weight on education when predicting occupational prestige.
We can conclude that, on average, education has a statistically significantly stronger relationship
with occupational prestige for females than for males. In the model without the interaction term,
the regression weight of 2.286 on Education overestimates the relationship for males and
underestimates the relationship for females. Of course, statistical significance does not imply that
this difference is large enough to be theoretically or practically interesting.

Occupational Prestige

Figure 7: Modeled Occupational Prestige as a Function


of Education for Males and Females (N = 1415)
70
60
50
40

Males

30

Females

20
10
0

20
Years of Education

An interaction is often illustrated effectively with a figure. Figure 7 shows the size and direction
of the main effects and interaction, and where the modeled sex effect is largest, etc. This graph
was made with Excel using regression weights from SPSS. You can access this Excel worksheet
through http://WISE.cgu.edu under WISE Stuff in a file called Plotting Regression
Interactions.XLS. It is important to note that the figure is a model of the relationship, not the
actual data (which probably would not show such a nice regular pattern).
Simulation studies have shown that statistical power to detect the effects of a dichotomous
moderator variable can be very low if samples are small, the proportions of cases in the two
groups are unequal, or if there is restriction of range on the predictor, especially in field studies
7
Session 2a: Moderation Analysis with Regression

with large measurement error, low co-occurrence of extreme values of predictors, and small
effect sizes (McClelland & Judd, 1993).

Presenting Results from Regression Analysis in a Table


Results can also be presented in a table. Reasonable people may choose to report different
statistics, depending on the goals of the study. Table 2 shows one way to present a selection of
important information.
Table 2: Moderation Effects of Sex on Education in Predicting Occupational Prestige
(N=1415)
Variable

Education (years)

Sex (M=1; F=2)

-.063**

.001

Education x Sex

---

.002*

(Constant)

R2 Change

Step

.520***

.270***

SEB

1.668*** .318

Beta
.518***

-6.083*

2.689

-.027

.412*

.201

---

22.403*** 4.300

*p<.05; **p<.01; ***p<.001; Cumulative R squared = .273; Adjusted R squared = .271.


B and SEB are from the final model at Step 3, and Beta is from the model at Step 2 (all
main effects, but no interaction term).
Table 2 summarizes key information with four conceptually distinct types of data, each of which
can be useful. First, we have the simple correlations (r) which tell us how each individual
predictor variable is related to the criterion variable, ignoring all other variables. We can see that
Education is a much better predictor of Occupational Prestige than Sex, although both
correlations are statistically significant. The correlation of the interaction term with the criterion
is not easily interpreted, because this correlation is greatly influenced by scaling of the main
effects; it is best omitted from the table.
The second type of information comes from R2 Change at each step. Here the order of entry is
critical if the predictors overlap with each other. For example, if Sex had been entered alone on
Step 1, R2 Change would have been .004**, statistically significant with p<.01. (R2 Change for
the first term entered into a model is simply its r squared.) Because of partial overlap with
education, Sex adds only .001 R2 Change (not significant) when it is entered after Education is in
the model. However, the interaction term adds significantly beyond the main effects (R2 Change
= .002, p<.05), indicating that we do have a statistically significant interaction between Sex and
Education in predicting Occupational Prestige. R2 Change measures and tests the effect sizes of
components, while controlling for variables that were entered into the model earlier.
The third type of information comes from the unstandardized B weights in the final model. These
weights allow us to construct the raw regression equation, and we can use them to compute
separate equations for males and females, if we wish. The B weights and their tests of
8
Session 2a: Moderation Analysis with Regression

significance on the main effects are not easily interpreted in the final model, because they refer
to the unique contribution of each main effect beyond all other terms, including the interaction
(which was computed as a product of the main effects). The test of B for the last term entered
into the model is meaningful, as it is equivalent to the test of R2 change for the final term. In this
case, both tests tell us that the interaction is statistically significant.
The fourth type of information comes from the tests of regression weights for the model that
contains only main effects (no interactions). These are tests of the unique contribution of each
main effect beyond all other main effects. If the main effects do not overlap at all, the beta
weight for each variable is identical to its r value. Here we see that Sex does not contribute
significantly beyond Education in predicting Occupational Prestige (beta = -.027), although its
simple r was -.063, p<.01. When there is an interaction, these main effects may be misleading.
Dummy Mediator Variables and Centered Continuous Predictor Variables
Interpretability of regression coefficients can be improved by centering continuous predictor
variables. Centering is accomplished by subtracting the mean from the variable. Thus, a centered
score is a deviation score. Cohen, Cohen, West, and Aiken (2003, p. 267) recommend that
continuous predictor variables be centered before interaction terms are computed, unless the
variable has a meaningful zero (also see Marquardt, 1980). Centering reduces multicollinearity
or overlap of the interaction term with other predictors and may improve interpretability,
especially when zero is not meaningful on a scale (e.g., an SAT score of 0 is meaningless).
In our example, the mean on Education is 13.02. We create a centered education variable by
subtracting 13.02 from Years of Education for each case. We do not center the dependent
variable, Occupational Prestige, because we wish to predict values on the original scale. The
syntax for this analysis is shown in Appendix A.
Compare the B coefficients for Sex in Tables 2 and 3.With uncentered Education in Table 2, the
test of Sex is for a sex difference when Education = 0 years. With centered variables in Table 3,
the test of Sex is for a sex difference when Education is at the mean level of education.
Table 3: Moderation Effects of Sex on Education in Predicting Occupational Prestige,
Education Centered and Sex Dummy Coded (N=1415)
Step

Variable

Education (years)

Sex (M=0; F=1)

Education x Sex

(Constant)

r
.520***
-.063**

R2 Change

SEB

.270***

2.080*** .142

.001

-.720

.002*

.412*
43.401

.599
.201

Beta
.471***
-.027
.066*

.450

*p<.05; **p<.01; ***p<.001; Cumulative R squared = .273; Adjusted R squared = .271.


Education is centered to a mean of zero.
9
Session 2a: Moderation Analysis with Regression

When variables are centered, generally there is much less overlap between the interaction and the
two main effects, so the B coefficients are much more stable (compare the SEB in the two tables).
In Table 3 we show the simpler and more common convention of reporting both B and beta for
the final model. The beta values for the final model with uncentered data (not shown in Table 2)
would have been .378, -.231, and .244, respectively, which are not easily interpreted because of
the great overlap between the interaction term and the main effects, and because the tests for the
main effects are not at the means of the original scales.
Comparing Tables 2 and 3, we see that the R and R2 change values are the same. No conclusions
are changed. We can interpret the constant in Table 3 (B0 = 43.401) as the mean value on
Occupational Prestige when all predictors are zero, i.e., for males (Sex = 0) at the mean of
education (centered Education = 0). We can also interpret the regression weight on Sex (B2
= -.720) as the difference in Occupational Prestige for males and females at the mean on
education. In contrast, in Table 2, the weight on Sex (B2 = -6.083) is the modeled difference
between Occupational Prestige for males and females at zero years of education. That
information is probably less interesting than the sex difference at the average level of education.

Multicollinearity and Tolerance


Multicollinearity is the proportion of variance in a predictor that can be predicted from other
predictor variables; tolerance is one minus multicollinearity, or the proportion of variance in a
predictor that cannot be predicted from other predictor variables. The tolerance for EdxSex =
.036. This means that if we were to use multiple regression to predict EdxSex using all of the
other predictor variables in the model with uncentered variables (Education and Sex), we would
find R2 = .964, and 1 - R2 = 1-.964 = .036. If all predictors were independent, the R2 for
predicting any one from the others would be zero, and tolerance would be 1.00. When predictor
variables overlap substantially (high multicollinearity, low tolerance), regression weights are
unstable. The error term for a regression weight is inflated in proportion to the inverse of
tolerance. This term is the VIF (Variance Inflation Factor). For EdxSex, VIF = 27.5, and
tolerance = 1/27.5 = .036. Centering reduces multicollinearity with the interaction term.
However, the test of the interaction term is not affected by centering, as we can see by
comparing the R2 Change, B, and SEB for the interaction term in Tables 2 and 3.
When two predictors are highly correlated, it is likely that neither one will make a unique
contribution to the model, even if each is a good predictor. In this case, it may be desirable to
eliminate one of the predictors or to make a composite of the two.

SPSS Example: Moderation Effects with a Continuous Moderator


With two continuous predictors, the interaction term is again computed as the product of two
predictors, and it is tested as the contribution of the interaction term beyond the main effects in
predicting the dependent variable. However, presenting findings is more challenging.
In this example, we will test whether mothers education moderates the relationship between
years of education and occupational prestige. Is the correlation between education and
occupational prestige greater for those people whose mothers have more years of formal
education? The data from this example are also from the 1991 U.S. General Social Survey.
10
Session 2a: Moderation Analysis with Regression

Table 4: SPSS Output for Moderation with a Continuous Moderator (N=1162)


Coe fficie ntsa
Standa
rdized
Coef f ic
ients

Unstandardized
Coef f icients
Model
1
2

(Cons tant)
CEDUC2
(Cons tant)
CEDUC2
CMA EDUC
(Cons tant)
CEDUC2
CMA EDUC
cedxmaed

B
43.669
2.406
43.669
2.557
-.294
43.343
2.624
-.273
.081

Std. Error
.335
.120
.334
.132
.107
.351
.133
.107
.027

Beta
.507
.539
-.076
.554
-.071
.077

t
130.277
20.057
130.646
19.430
-2.755
123.588
19.716
-2.558
2.967

Sig.
.000
.000
.000
.000
.006
.000
.000
.011
.003

Correlations
ZeroPart
order
ial
Part

Collinearity
Statistic s
Toler
ance
V IF

.507

.507

.507

1.000

1.000

.507
.148

.496
-.08

.490
-.07

.826
.826

1.211
1.211

.507
.148
-.032

.501
-.07
.087

.496
-.06
.075

.802
.822
.949

1.247
1.216
1.054

a. Dependent V ariable: pres tg80 R's Occupational Pres tige Score (1980)

This moderation analysis is designed to detect a linear by linear interaction. That is, it tests the
extent to which the strength of the linear relationship between a predictor and the dependent
variable is a linear function of the level of a second predictor variable. The SPSS analysis for
centered continuous predictor variables is shown in Table 4, and Table 5 illustrates a summary
table presentation of these results. SPSS syntax for this analysis is shown in Appendix B.
Table 5: Moderation Effects of Mothers Education on Respondents Education in
Predicting Occupational Prestige (N=1162)
Step

Variable

Education (years)

Mothers Educ

Educ x Mom Educ

(Constant)

R2 Change

SEB

Beta

.507***

.258***

2.624

.133

.554***

.148***

.005**

-.273

.107

-.071*

.006**

.081

.027

.077**

43.343

.351

---

*p<.05; **p<.01; ***p<.001; Cumulative R squared = .268, F(3, 1158) = 141.2, p < .001;
Adjusted R squared = .266. Both predictors are centered to a mean of zero.
Education is a strong predictor of Occupational Prestige (r = .507, p < .001). Mothers Education
adds significantly on the second step (R2 Change = .005, p < .01). Notice that the beta for
Mothers Education in Model 2 in Table 4 is negative (-.076, p < .01) although the zero-order
correlation is positive (r = .148, p < .001). This may be a surprising finding. Occupational
11
Session 2a: Moderation Analysis with Regression

Prestige is greater for those whose mothers have more education; however, on average, for
people of a given level of education, Occupational Prestige is greater for those whose mothers
have less education. For someone whose mother has average education (centered Mothers
Education = 0), each additional year of education is associated with 2.557 more points on the
predicted Occupational Prestige scale. For someone with average education (centered Education
= 0), each additional year of Mothers Education is associated with .294 fewer points on the
Occupational Prestige scale. Model 2 does not consider the interaction between predictors.
We have a modest suppression relationship (see Cohen, et al., 2003, pp. 77-78). An indicator of
suppression is when the beta weight for a variable is not between zero and the correlation of that
variable with the dependent variable (Y). The beta weight for Education in Model 2 (.539) is
greater than the zero-order correlation (.507), indicating that mothers education suppresses the
relationship between Education and Occupational Prestige when it is not controlled. Suppression
may or may not be large enough to be practically or theoretically interesting. The statistically
significant interaction must be considered when main effects are interpreted. A figure can be
very helpful to describe complex findings such as these.
To construct a figure, we can use information from the regression model of the relationship
between Education and Occupational Prestige for each of several levels of Mothers Education.
Following Cohen et al. (2003), we might select levels of Mothers Education at the mean and at
one standard deviation above and below the mean. In our example, the mean and SD for
Education are 13.41 and 2.796, and for Mothers Education these values are 10.79 and 3.443. For
Mothers Education, we could use a high value of 10.79 + 3.44 = 14.23 years, the mean of 10.79
years, and a low value of 10.793.44 = 7.35 years. In this example, it might be better to pick
more meaningful high and low values for Mothers Education, such as 16, 12, and 6 years,
respectively. The model for uncentered variables is easier to use when generating a figure.
If centered variables are used to create a figure, great care must be taken with conversions
between the raw and centered scales. Here is an example.
The final regression model for centered data in Table 4 is
= 43.343 + 2.624*(ceduc) - .273*(cmaeduc) + .081*(ceduc * cmaeduc).
For cases where Mothers Education is 16, the value on cmaeduc is 16 - 10.79 = 5.21, which is
5.21 above the mean of maeduc. Entering 5.21 for cmaeduc, the regression equation becomes
= 43.343 + 2.624*(ceduc) - .273*(5.21) + .081*(5.21)*(ceduc), or
= 41.921 + 3.046*(ceduc). This is the model for cases where maeduc = 16.
For cases where Mothers Education is 6, the value on the centered scale (cmaeduc) is
6 - 10.79 = -4.79. When we replace cmaeduc with -4.79, the regression equation becomes
= 43.343 + 2.624*(ceduc) - .273*(-4.79) + .081*(-4.79)*(ceduc), or
= 44.651 + 2.236*(ceduc). This is the model for cases where maeduc = 6.
The mean education for respondents was 13.41 years. For respondents with 6 years of education,
ceduc = 6 13.41 = -7.41. For respondents with 20 years of education, ceduc = 20 13.41 =
6.59. We can use Excel with these values to generate a plot of the modeled relationships as
shown in Figure 8 (generated with Plotting Regression Interactions.XLS, available on
12
Session 2a: Moderation Analysis with Regression

http://WISE.cgu.edu under WISE Stuff). The figure shows the size and direction of the effects
more clearly than tabled numbers, and is more suitable for a nontechnical audience.

Figure 8: Modeled Occupational Prestige as a Function of


Education and Mother's Education (N = 1162)
70

Occupational Prestige

60

Mothers
Education

50

6 years

40

12 years
30

16 years

20
10
0

6
20
Respondent's Education (years)

Keep in mind that a modeled description of the data is not a complete description of actual data,
and it may give a misleading impression of regularity in the data. Especially be careful not to
over interpret patterns in the model at the extremes of observed data. In our model, the large
effect of Mothers Education when the respondents Education = 6 should not be taken seriously
without additional evidence. Very few respondents had as little as six years of education.
Estimates near the ends of the distribution are less reliable than estimates from the middle. Be
sure to plot the raw data to assure that models are appropriate.

Other Issues
Many additional issues are discussed in detail in comprehensive textbooks such as Cohen, et al.
(2003) or specialized books such as Aiken and West (1991). Berger (2004) provided short
introductions to categorical variables, correlation and causation, multicollinearity, interactions,
centering, nonlinear relationships, outliers, missing data, power analysis and sample size,
adjusted, and stepwise vs. hierarchical selection of variables.

13
Session 2a: Moderation Analysis with Regression

Summary and Final Advice


The most important advice is to get close to your data and make sure that your models and
descriptions are appropriate to the data. It is essential to examine the plot of residuals as a
function of predicted Y. An assumption of regression analysis is that residuals are random,
independent, normally distributed, and homoscedastic (equal variance at all values of predicted
Y). A residual plot can help you spot extreme outliers or departures from linearity. Bivariate
scatter plots can also provide helpful diagnostics, but a plot of residuals is the best way to find
multivariate outliers. A transformation of your data (e.g., log or square root) may reduce the
effects of extreme scores, make relationships more linear, and make the distributions closer to
normal (see Tabachnick & Fidell, 2007, Chapter Four on Cleaning Up Your Act).
Estimates and tests of mediation and moderation are based on assumptions. In particular, we
must assume that residuals from the regression modules are reasonably normally distributed.
Additionally, sampling must be random and independent if we wish to generalize to the
population from which the sample was selected.
In general, it is important to include effect sizes and directions of effects along with statistical
significance. G*Power is a wonderful free program for power analysis that you can download
from http://www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3/.
Keep in mind that alternate models may also account for the data. A model that hypothesizes
causal flow in a different direction may fit the data equally well and also produce statistically
significant effects. Be on the lookout for omitted lurking variables that may affect multiple
variables in your model. Perhaps when these prior variables are included, the direct (unique)
contributions of observed variables will change.
You can find discussion of mediation analysis in program evaluation, a glossary of terms, and
addition summary advice in Berger (2004).

14
Session 2a: Moderation Analysis with Regression

References
Aiken, L. S. & West, S. G. (1991). Multiple regression: Testing and interpreting interactions.
Newbury Park, CA: Sage Publications.
Baron, R. M. & Kenny, D. A. (1986). The moderator-mediator distinction in social psychological
research: Conceptual, strategic, and statistical considerations. Journal of Personality and Social
Psychology, 51, 1173-1182.
Berger, D. E. Web Interface for Statistics Education: WISE http://wise.cgu.edu
Berger, D. E. (2004). Using regression analysis. In Wholey, J., Hatry, H., & Newcomer, K. (eds.).
Handbook of practical program evaluation, 2nd ed. Jossey Bass, 479-505.
Campbell, D. T., & Kenny, D. A. (1999). A primer on regression artifacts. New York: Guilford.
Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Applied multiple regression/correlation
analysis for the behavioral sciences (3rd ed.). Mahwah, NJ: Lawrence Erlbaum Associates.
Donaldson, S. I. (2001). Mediator and moderator analysis in program development. In S. Sussman
(Ed.), Handbook of program development for health behavior research. Newbury Park, CA:
Sage, 470-496.
Hayes, A. http://www.afhayes.com/spss-sas-and-mplus-macros-and-code.html [Downloadable
SPSS, SAS, and MPlus macros for mediation, moderation, and much, much more.]
Judd, C. M., Kenny, D. A., & McClelland, G. H. (2001). Estimating and testing mediation and
moderation in within-participant designs. Psychological Methods, 6, 115-134.
Kenny, D. http://davidakenny.net/cm/moderation.htm [Excellent discussion of moderation.]
Kraemer H. C., Wilson G. T., Fairburn C. G., & Agras W. S. (2002). Mediators and moderators of
treatment effects in randomized clinical trials. Archives of General Psychiatry, 59, 877-883.
Marquardt, D. W. (1980). You should standardize the predictor variables in your regression
models. Journal of the American Statistical Association, 75, 87-91.
McClelland, G. H., & Judd, C. M. (1993). Statistical difficulties of detecting interactions and
moderator effects. Psychological Bulletin, 114, 376-390.
Muller, D., Judd, C. M., & Yzerbyt, V. Y. (2005). When moderation is mediated and mediation is
moderated. Journal of Personality and Social Psychology, 89, 852-863.
Tabachnick, B. G., & Fidell, L. S. (2007). Using multivariate statistics (5th ed.). Boston: Allyn
and Bacon.

15
Session 2a: Moderation Analysis with Regression

Appendix A: SPSS syntax for moderation analysis with a dichotomous moderator


*Syntax for Multiple Regression Workshop - moderation.
*Data file is GSS1991 from SPSS.
*First, look at the data carefully, checking for errors or violations of assumptions.
FREQUENCIES
VARIABLES=sex educ prestg80
/FORMAT=LIMIT(10)
/STATISTICS=STDDEV MINIMUM MAXIMUM MEAN SKEWNESS SESKEW
KURTOSIS SEKURT
/HISTOGRAM NORMAL
/ORDER= ANALYSIS .
*Note data are missing on prestg80.
*Select only cases with complete data.
USE ALL.
COMPUTE filter_$=(sex >= 0 & educ >= 0 & prestg80 >= 0).
VARIABLE LABEL filter_$ 'sex >= 0 & educ >= 0 & prestg80 >= 0 (FILTER)'.
VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'.
FORMAT filter_$ (f1.0).
FILTER BY filter_$.
EXECUTE .
*Recheck to make sure the filter worked as intended.
FREQUENCIES
VARIABLES=sex educ prestg80
/FORMAT=LIMIT(10)
/STATISTICS=STDDEV MINIMUM MAXIMUM MEAN SKEWNESS SESKEW
KURTOSIS SEKURT
/HISTOGRAM NORMAL
/ORDER= ANALYSIS .
*Create interaction term.
COMPUTE EdxSex = educ * sex .
EXECUTE .
*Hierarchical analysis with interaction entered last, using uncentered education.
REGRESSION
/DESCRIPTIVES MEAN STDDEV CORR SIG N
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA COLLIN TOL CHANGE ZPP
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT prestg80
/METHOD=ENTER educ /METHOD=ENTER sex /METHOD=ENTER EdxSex
/RESIDUALS HIST(ZRESID) .
16
Session 2a: Moderation Analysis with Regression

Create a dummy variable for sex and center education on its mean, 13.02 years.
Then calculate the interaction term. The EXECUTE command must be given to create these
variables before they can be used in regression.
*Recode sex to dummy variable, center education, create interaction term.
RECODE SEX (1=0) (2=1) INTO sexd.
COMPUTE CEDUC = EDUC - 13.02.
COMPUTE CEDXSEXD = SEXD * CEDUC .
EXECUTE .
*Hierarchical regression with interaction entered last, using centered education.
REGRESSION
/variables=sexd,ceduc,cedxsexd,prestg80
/DESCRIPTIVES MEAN STDDEV CORR SIG N
/MISSING LISTWISE
/STATISTICS COEFF OUTS CI R ANOVA TOL CHANGE ZPP
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT prestg80
/METHOD=ENTER ceduc /METHOD=ENTER sexd /ent=cedxsexd
/RESIDUALS HIST(ZRESID) .

Appendix B: SPSS syntax for moderation analysis with a continuous moderator


*Example with continuous moderator variable, mother's education.
*Limit analyses to cases with complete data.
USE ALL.
COMPUTE filter_$=(educ >= 0 & maeduc >=0 & prestg80 >= 0).
VARIABLE LABEL filter_$ 'educ >= 0 & maeduc >=0 & prestg80 >= 0 (FILTER)'.
VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'.
FORMAT filter_$ (f1.0).
FILTER BY filter_$.
EXECUTE .
*Find the means for the new subset of cases.
FREQUENCIES
VARIABLES=educ, maeduc, prestg80
/STATISTICS=STDDEV MINIMUM MAXIMUM MEAN MEDIAN SKEWNESS SESKEW
/HISTOGRAM NORMAL
/ORDER= ANALYSIS .
*Recenter education for this reduced sample with N=1162 and create interaction term.
COMPUTE CEDUC2 = EDUC - 13.4088.
COMPUTE CMAEDUC = MAEDUC - 10.7926.
compute cedxmaed = ceduc2*cmaeduc.
17
Session 2a: Moderation Analysis with Regression

*Tables 4 and 5.
REGRESSION
/variables=ceduc2, cmaeduc, cedxmaed, prestg80
/DESCRIPTIVES MEAN STDDEV CORR SIG N
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA TOL CHANGE ZPP
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT prestg80
/METHOD=ENTER ceduc2 /METHOD=ENTER cmaeduc /ent=cedxmaed
/RESIDUALS HIST(ZRESID) .
An alternative method to limit a regression analysis with a subset of variables to those cases that
have complete data on all of the cases used in the full model is to use a command like this
/variables=ceduc2, cmaeduc, cedxmaed, prestg80
Even if the analysis used only ceduc2 and prestg80, cases missing data on cmaeduc would be
omitted.

Excel workbook for plotting regression interactions


Unstandardized regression coefficients (B weights) can be copied from the Coefficients table in
SPSS output files and pasted into an Excel worksheet to generate a graph showing interactions.
You can download an Excel template for making figures from http://wise.cgu.edu: go to WISE
Stuff, Excel Downloads, Demonstrations using Excel, Plotting Regression Interactions.
With two categorical variables, bar graphs may be the best way to present the relationships. The
example below is taken from a data set provided by SPSS, showing starting salaries for men and
women in different colleges. There is an interaction between gender and college in predicting
starting salary.

18
Session 2a: Moderation Analysis with Regression

The Excel workbook Plotting Regression Interactions, offers templates where regression
coefficients from SPSS can be copied into the template to produce graphs. Here are examples
showing interactions with education, genders, etc. in predicting occupational prestige.

19
Session 2a: Moderation Analysis with Regression

20
Session 2a: Moderation Analysis with Regression

Anda mungkin juga menyukai