Simple Linear Regression: Inference + Diagnostics: Statistics 203: Introduction To Regression and Analysis of Variance

- p.
1/17
Statistics 203: Introduction to Regression
and Analysis of Variance
Simple Linear Regression: Inference +
Diagnostics
Jonathan Taylor
qOutline
qDistribution of
b
,
e
qt-random variables
qF-random variables
qInference for
b
: t-statistics
qWhy reject for large |T|?
qt vs. Normal
qCondence interval for
1
qLinear combinations of
0
,
1
qA new observation:
forecasting
qGoodness of t
qGoodness of t
qF test for signicance of
regression
qWhat can go wrong?
qProblems in the regression
function
qProblems with the errors
- p. 2/17
Outline
s
Inference for vector of coef cients .
s
Diagnostics: what can go wrong in our model?
qOutline
qDistribution of
b
,
e
qt-random variables
qF-random variables
qInference for
b
: t-statistics
qt vs. Normal
1
0
,
1
qA new observation:
forecasting
qGoodness of t
qGoodness of t
regression
qWhat can go wrong?
function
- p. 3/17
Distribution of

,
e
s
The vector

= (
0
,
1
) is a function of

Y so is independent
of e.
s
Both

and

Y are linear transformations of Y so they are
normally distributed.
s
We will prove
E((
0
,
1
)) = (
0
,
1
)
and has covariance matrix
Var(
) =
_
2
n
+
2 X
2
S
xx
2 X
S
xx
2 X
S
xx
2
S
xx
_
s
Natural estimates of covariance matrix
Var(
) =
_
b
2
n
+
2 X
2
S
xx

2 X
S
xx

2 X
S
xx
b
2
S
xx
_
qOutline
qDistribution of
b
,
e
qt-random variables
qF-random variables
qInference for
b
: t-statistics
qt vs. Normal
1
0
,
1
qA new observation:
forecasting
qGoodness of t
qGoodness of t
regression
qWhat can go wrong?
function
- p. 4/17
t-random variables
s
Start with Z N(0, 1) is standard normal and G
2
,
independent of Z.
s
Compute
T =
Z
_
G
.
s
Then T t
has a t-distribution with degrees of freedom.

s
Where do they come up in regression?
qOutline
qDistribution of
b
,
e
qt-random variables
qF-random variables
qInference for
b
: t-statistics
qt vs. Normal
1
0
,
1
qA new observation:
forecasting
qGoodness of t
qGoodness of t
regression
qWhat can go wrong?
function
- p. 5/17
F-random variables
s
Start with G
1

2
1
and another independent G
2

2
2
s
Compute
F =
G
1
/
1
G
2
/
2
s
Then F F
1
,
2
has an F-distribution with
1
degrees of
freedom in the numerator in
2
in the denominator.
s
Note: if T t
than T
2
F
1,
.
s
Where do they come up in regression?
qOutline
qDistribution of
b
,
e
qt-random variables
qF-random variables
qInference for
b
: t-statistics
qt vs. Normal
1
0
,
1
qA new observation:
forecasting
qGoodness of t
qGoodness of t
regression
qWhat can go wrong?
function
- p. 6/17
Inference for

: t-statistics
s
Because e is independent of

it follows that
Var(
1
) and
Var(
0
) are independent of

.
s
Under the hypothesis H
0
:
1
=
0
1
T =
0
1
_
Var(
1
)
t
n2
.
(Why?)
s
To test this hypothesis, compare |T| to t
n2,1/2
the 1 /2
quantile of the t distribution with n 2 degrees of freedom.
s
Reject H
0
if |T| > t
n2,1/2
.
qOutline
qDistribution of
b
,
e
qt-random variables
qF-random variables
qInference for
b
: t-statistics
qt vs. Normal
1
0
,
1
qA new observation:
forecasting
qGoodness of t
qGoodness of t
regression
qWhat can go wrong?
function
- p. 7/17
Why reject for large |T|?
s
Observing a large |T| is unlikely if
1
=
0
1
: reasonable to
conclude that H
0
is false.
s
Common to report p-value
p value = 2
_

|T|
f
t
n2
(s) ds.
s
Above, f
t
n2
is the density of a t- random variable with n 2
degrees of freedom.
qOutline
qDistribution of
b
,
e
qt-random variables
qF-random variables
qInference for
b
: t-statistics
qt vs. Normal
1
0
,
1
qA new observation:
forecasting
qGoodness of t
qGoodness of t
regression
qWhat can go wrong?
function
- p. 8/17
t vs. Normal
3 2 1 0 1 2 3
0
.
0
0
.
1
0
.
2
0
.
3
0
.
4
s
D
e
n
s
i
t
y

f
(
s
)
t, 10 df
Normal
qOutline
qDistribution of
b
,
e
qt-random variables
qF-random variables
qInference for
b
: t-statistics
qt vs. Normal
1
0
,
1
qA new observation:
forecasting
qGoodness of t
qGoodness of t
regression
qWhat can go wrong?
function
- p. 9/17
Condence interval for
1
s
For simplicity, write
SE(
1
) =
n
i=1
(X
i
X)
2
.
s
Under the model assumptions
1 = P
_
1
SE(
1
)
< t
n2,1/2
_
= P
_
1
t
n2,1/2
SE(
1
)
_
qOutline
qDistribution of
b
,
e
qt-random variables
qF-random variables
qInference for
b
: t-statistics
qt vs. Normal
1
0
,
1
qA new observation:
forecasting
qGoodness of t
qGoodness of t
regression
qWhat can go wrong?
function
- p. 10/17
Linear combinations of
0
,
1
s
It is not too hard to prove that a
0
0
+a
1
1
is normally
distributed and its standard deviation can be estimated by
SE(a
0
0
+a
1
1
) =
_
a
2
0
n
+
(a
0
X a
1
)
2
n
i=1
_
X
i
X
_
2
s
As in last slide, con dence interval is
a
0
0
+a
1
1
t
n2,1/2
SE(a
0
0
+a
1
1
)
qOutline
qDistribution of
b
,
e
qt-random variables
qF-random variables
qInference for
b
: t-statistics
qt vs. Normal
1
0
,
1
qA new observation:
forecasting
qGoodness of t
qGoodness of t
regression
qWhat can go wrong?
function
- p. 11/17
A new observation: forecasting
s
New observation
Y
new
=
0
+
1
X
new
+
new
.
s
SE(Y
new
) =
_
1 +
1
n
+
(X X
new
)
2
n
i=1
_
X
i
X
_
2
.
s
Again, prediction interval is
0
+

1
X
new
t
n2,1/2
SE(Y
new
)
qOutline
qDistribution of
b
,
e
qt-random variables
qF-random variables
qInference for
b
: t-statistics
qt vs. Normal
1
0
,
1
qA new observation:
forecasting
qGoodness of t
qGoodness of t
regression
qWhat can go wrong?
function
- p. 12/17
Goodness of t
The variation in Y , SST, can be decomposed into two parts:
one for the regression, SSR, and one for the error, SSE.
SST =
n
i=1
(Y
i
Y )
2
= SSE +SSR
SSE =
n
i=1
(Y
i

Y
i
)
2
=
n
i=1
(Y
i
1
X
i
)
2
SSR =
n
i=1
(Y

Y
i
)
2
=
n
i=1
(Y

1
X
i
)
2
SST = SSR+SSE
qOutline
qDistribution of
b
,
e
qt-random variables
qF-random variables
qInference for
b
: t-statistics
qt vs. Normal
1
0
,
1
qA new observation:
forecasting
qGoodness of t
qGoodness of t
regression
qWhat can go wrong?
function
- p. 13/17
Goodness of t
s
R
2
=
SSR
SST
= 1
SSE
SST
=

Cor(X, Y )
2
.
s
R
2
tells us how much variability in the Y s is explained by the
regression.
qOutline
qDistribution of
b
,
e
qt-random variables
qF-random variables
qInference for
b
: t-statistics
qt vs. Normal
1
0
,
1
qA new observation:
forecasting
qGoodness of t
qGoodness of t
regression
qWhat can go wrong?
function
- p. 14/17
F test for signicance of regression
s
Under H
0
:
1
= 0:
SSR
2

2
1
SSE
2
2
n2
s
Therefore
F =
MSR
MSE
=
SSR/1
SSE/(n 2)
F
1,n2
.
(Why?)
s
Reject H
0
for large values of F.
s
General form of the F: a ratio of dispersion: numerator is
the dispersion of

Y around Y while denominator is disperion
of e.
qOutline
qDistribution of
b
,
e
qt-random variables
qF-random variables
qInference for
b
: t-statistics
qt vs. Normal
1
0
,
1
qA new observation:
forecasting
qGoodness of t
qGoodness of t
regression
qWhat can go wrong?
function
- p. 15/17
What can go wrong?
s
Regression function can be wrong missing predictors,
nonlinear.
s
Assumptions about the errors can be wrong.
s
Outliers: both in predictors and observations.
s
Inuential points: these points have undue inuence on the
regression function.
s
Examples:
x
Example #1: diagnostics for usual linear model
x
Example #2: t density
x
Example #3: misspeci ed model
qOutline
qDistribution of
b
,
e
qt-random variables
qF-random variables
qInference for
b
: t-statistics
qt vs. Normal
1
0
,
1
qA new observation:
forecasting
qGoodness of t
qGoodness of t
regression
qWhat can go wrong?
function
- p. 16/17
Problems in the regression function
s
True regression function may have higher-order non-linear
terms i.e. X
2
1
, or may truly be non-linear.
s
How to x? Sometimes things can be transformed to
linearity: suppose
Y
i
=
0
e
1
X
i

i
.
Then
log Y
i
= log
0
+
1
X
i
+ log
i
is a linear model and if s are independent lognormal
random variables, then this transformed model has the same
form as the original model!
s
Later, we will see Box-Cox transformations to choose a
transformation that optimally linearizes the model.
qOutline
qDistribution of
b
,
e
qt-random variables
qF-random variables
qInference for
b
: t-statistics
qt vs. Normal
1
0
,
1
qA new observation:
forecasting
qGoodness of t
qGoodness of t
regression
qWhat can go wrong?
function
- p. 17/17
Problems with the errors
s
Errors may not be normally distributed. We will look at
QQplot for a graphical check. May not effect inference in
large samples.
s
Variance may not be constant. We will see some graphical
checks of this and (later) some transformations that might
help correct this.
s
Errors may not be independent. This seriously affects our
estimates of SE which can change t and F statistics
substantially!

Simple Linear Regression: Inference + Diagnostics: Statistics 203: Introduction To Regression and Analysis of Variance

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Simple Linear Regression: Inference + Diagnostics: Statistics 203: Introduction To Regression and Analysis of Variance

Diunggah oleh

Hak Cipta:

Format Tersedia

- p.

has a t-distribution with degrees of freedom.

Anda mungkin juga menyukai