Regression
Osa
Rafshodia
with
a
simple
linear
regression
model
with
one
continuous
variable
predicting
a
continuous
outcome
F a k u l t a s K e d o k t e r a n U n i v e r s i t a s M u l a w a r m a n 2 0 1 3
Table
of
Contents
Continuous
predictors:
Linear
.................................................................................
3
Chapter
overview
...............................................................................................................................................
3
Simple
Linear
Regression
...............................................................................................................................
3
Multiple
regression
............................................................................................................................................
4
Graphing
.................................................................................................................................................................
5
Checking
for
nonlinearity
graphically
......................................................................................................
5
1.
Examining
scatterplot
of
predictor
and
outcome
...........................................................................
5
2.
Checking
for
nonlinearity
using
residuals
..........................................................................................
5
3.
Checking
for
nonlinearity
using
locally
weighted
smoother
.....................................................
5
4.
Graphing
outcome
mean
at
each
level
of
predictor
.......................................................................
5
By : dr. Osa Rafshodia Rafidin, MSc.IH, MPH This chapter focuses on how to interpret the coefficient of a continuous predictor in a linear regression model. This chapter begins with a simple linear regression model with one continuous variable predicting a continuous outcome. Terminology: Continuous and categorical variables When I use the term continuous variable, I am referring to a variable that is measured on an interval or ratio scale. By contrast, when I speak of a categorical (or factor) variable, I am referring to either a nominal variable or an ordinal/interval/ratio variable that we wish to treat as though it were a nominal variable
Multiple
regression
Lets
now
turn
to
a
multiple
regression
model
that
predicts
the
respondents
education
from
the
fathers
education,
the
mothers
education
and
the
age
of
the
respondent.
(regress
educ
paeduc
maeduc
age)
educ
=
6.96
+
0.26paeduc
+
0.21maeduc
+
0.03age
The
coefficients
from
this
multiple
regression
model
reflect
the
association
between
each
predictor
and
the
outcome
after
adjusting
for
all
the
other
predictors.
For
example,
the
coefficient
for
paeduc
is
0.26,
meaning
that
for
every
one-year
increase
in
the
education
of
the
father,
we
would
expect
the
education
of
the
respondent
to
be
0.26
years
higher,
holding
the
mothers
education
and
the
age
of
the
respondent
constant.
As
an
aid
to
interpreting
the
coefficients
from
the
multiple
regression
model,
we
can
compute
adjusted
means
of
the
outcome
as
a
function
of
one
or
more
predictors
from
the
model.
For
example,
to
help
interpret
the
coefficient
for
paeduc,
we
can
use
the
margins
command
to
compute
the
adjusted
mean
of
education
given
different
values
of
fathers
education,
adjusting
for
the
other
mean
of
the
respondents
education
when
the
fathers
education
equals
8,
12
and
16,
adjusting
for
the
other
predictors
(mothers
education
and
age
of
the
respondent).
(margins,
at(paeduc=(8
12
16))
vsquish).
Terminology:
Adjusted
means
Looking
at
the
output
from
the
previous
margins
command,
we
see
that
when
a
respondents
father
has
8
years
of
education,
the
respondent
is
predicted
to
have
13.03
years
of
education,
after
adjusting
for
education
of
the
mother
and
age
of
the
respondent.
What
do
we
call
the
quantity
13.03
?
we
can
call
this
a
predicted
mean
after
adjusting
for
all
other
predictors.
For
example,
we
can
say
that
the
predicted
mean,
given
the
father
has
8
years
of
education,
is
13.03
after
adjusting
for
all
other
predictors.
We
could
also
call
this
an
adjusted
mean.
When
the
father
has
8
years
of
education,
the
adjusted
mean
is
13.03.
(The
term
adjusted
mean
implies
after
adjusting
for
all
other
predictors
in
the
model).
The
margins
command
allow
us
to
hold
more
than
one
variable
constant
at
a
time.
In
the
example
below,
we
compute
the
adjusted
means
when
the
fathers
education
equals
8,
12
and
16,
while
holding
the
mothers
education
constant
at
14.
(margins,
at(paeduc=(8
12
16)
maeduc=14)
vsquish).
Compared
with
the
result
of
the
previous
margins
command,
we
can
see
that
the
adjusted
means
are
higher
when
the
mothers
education
is
held
constant
at
14.
However,
the
effect
of
the
fathers
education
remains
the
same.
For
the
example,
in
the
previous
margins
command,
the
change
in
the
adjusted
means
due
to
increasing
the
fathers
education
from
8
to
16
years
was
2.06
(15.09
13.03).
When
the
mothers
education
is
held
constant
at
14,
the
change
due
to
increasing
the
fathers
education
from
8
to
16
years
is
the
same
(aside
from
rounding),
2.07
*15.61
13.54)
.
Although
the
adjusted
means
are
higher
when
the
mothers
education is held constant at 14 years, the difference in the adjusted means due to increasing the fathers education remains the same.
Graphing
margins,
at(paeduc=(0(4)20))
marginsplot
Checking
for
nonlinearity
graphically
These
approaches
include
:
1. examining
scatterplot
of
predictor
and
outcome
2. examining
residual-versus
fitted
plots
3. creating
plots
based
on
locally
weighted
smoothers
4. plotting
the
mean
of
the
outcome
for
each
level
of
the
predictor
1.
Examining
scatterplot
of
predictor
and
outcome
Lets
look
at
a
scatterplot
of
the
size
of
the
engine(displacement)
by
the
length
of
the
car(length)
with
a
line
showing
the
linear
fit.
(use
autosubset//graph
twoway
(scatter
displacement
length)
(lfit
displacement
length),
ytitle(Engine
displacement
(cu
in.))
legend(off))
2.
Checking
for
nonlinearity
using
residuals
We
can
check
for
nonlinearity
by
looking
at
the
relationship
between
the
residuals
and
predicted
values,
after
accounting
for
the
other
variables
in
the
model.
(regress
displacement
length
trunk
weight//rvfplot)
3.
Checking
for
nonlinearity
using
locally
weighted
smoother
Suppose
we
wanto
to
determine
the
nature
of
the
relationship
between
year
that
the
respondent
was
born
and
education
level.
(graph
use
"/Users/osa/Desktop/survival
analisis/gss_ivrm
lowess.gph")
4.
Graphing
outcome
mean
at
each
level
of
predictor
One
sample
way
to
create
such
a
graph
is
to
fit
a
regression
model
predicting
the
outcome
treating
the
predictor
variable
as
a
categorical
(factor)
variable.
(graph
use
"/Users/osa/Desktop/survival
analisis/yrborn.gph")