CH 10 3rd NYU

Chapter 10:
Linear Regression
A perfect correlation implies the
ability to perfectly predict one score
from another
Perfect predictions
Very simple, especially with z-scores:
the z-score you predict for the Y
variable is the same as the z-score for
the X variable
That is, zY = zX or zY = -zX
When r is less than perfect, this rule

does not give the best predictions
Chapter 10
Predicting with z-scores

Modified rule: zY = rzX
This incorporates r
If r = 1, we have the original rule
If r = 0, our best prediction is the
mean (0 is the mean of z-scores)
As r becomes smaller, there is

less of a tendency to expect an
extreme score on one variable to
be associated with an equally
extreme score on the other
As long as r 0, the first variable
is taken into account in the
prediction of the second
First variable has an influence, but it
lessens as r decreases
Chapter 10
Regression Toward the Mean

Sir Francis Galton: heights of
parents and children (regression
towards mediocrity)
Noted that very tall parents
tended to have children shorter
than themselves and vice versa
Just a consequence of the laws
of probability when r is less than
perfect
As r decreases, predictor gets
closer to z = 0 (regresses toward
mean)
Remember: r measures linear
regression
Chapter 10
Raw score graph; z-score graph

Scatter Plot
Collection 1
64
63
Exam2
62
61
60
59
58
57
56
87
88
89
90
91 92
Exam1
Exam2 = mean Exam2

Exam2 = 1.119Exam1 - 41.8; r^2 = 0.83
Collection 1
93
94
95
Scatter Plot
2.0
1.5
zscore2
1.0
0.5
0.0
-0.5
-1.0
-1.5
zscore2 = 0.909zscore1 + 0.00000000000000093; r^2 = 0.83
Chapter 10
Raw Score Regression Formula

From z-scores:
Y
Y
r X X Y
X
Y
bYX
r
X
aYX Y bYX X
Y bYX X aYX
Chapter 10
Predictions based on raw scores

When r = 1,
Y
bYX
X
Ex) height and weight; assume r = 1.0,

slope = 5
Perfect r, but you wouldnt expect a 1
lb change in weight for every 1 inch
change in height
Predictions: z-scores (r = 1)
Slope in z-scores = 1.0; you would
expect a change of 1 standard
deviation in height to be associated
with a change of 1 standard deviation
in weight
Cant predict weight with just the

slope because the line does not
go through the origin; we need the
intercept
Chapter 10
Quantifying the errors around

the regression line
Regression equation give us the
straight line that minimizes the
error involved in making
predictions
Residual: difference between
actual Y value and predicted Y
value: Y Y
It is the amount of the original value
that is left over after the prediction
is subtracted out
If you add these errors they sum to
zero. The regression line functions
like an average in that the amount
of error above the line always
balances out the amount of error
below
Chapter 10
The Variance of the Estimate

Quantifies the total amount of error
in the predictions:
2
estY
Y Y
Variance of the estimate: Variance

of the data points around the
regression line. Also called
residual variance
Smaller as points are closer to the
line
Lower rs will have more error in
prediction and higher 2estY
When r = 0, 2estY is largest and
regression line becomes horizontal:
slope = 0. Y is always the mean
when r = 0
Chapter 10
This means the variance of the predictions

around the regression line is just the
ordinary variance of the Y values (2Y) :
regression line does not help in prediction
For any r > 0, 2estY will be less than 2Y
and that represents the advantage of
performing regression
Explained and Unexplained

Variance
Explained: the difference between the
variance of the estimate and the total
variance
Unexplained: variance of the estimate
These are deviations broken down:
Y Y Y Y Y Y
unexplained
explained
total
We turn these into variances by squaring,

adding up and dividing by N
Whenever r 0, unexplained variance is
less than total variance, so error is
reduced; we have a better estimate
Chapter 10
Coefficient of Determination
Represents the proportion of the
total variance that is explained by
the predictor variable
Tells you how well your regression line
is doing in terms of predicting one
variable from the other
r2 =
explained variance
total variance
Coefficient of Nondetermination
Proportion of variance not
accounted for
1 r2 = unexplained variance = 2estY
total variance
2Y
Sometimes symbolized as k2. We

want this to be as small as possible
Produces an easier formula for
variance of estimate
2
estY
Y2 1 r 2
Chapter 10
10
Formulas using sample stats:
sY
bYX
r
sX
aYX Y bYX X
Y bYX X aYX
Example:
X (Age)
M = 98.14 (mths)
sX = 21.0
Y (Score)
M = 30.35 (items correct)
sY = 7.25
r = .72
N = 100
Chapter 10
11
Test Score
Scatterplot of Age vs Test Score

y = 5.81 + 0.25x
2
R = 0.525
50
45
40
35
30
25
20
15
10
40
60
80
100
120
140
Age (months)
Example from Lockhart, Robert S. (1998). Introduction

to statistics and data analysis. New York: W. H.
Freeman & Company.
Chapter 10
12
Variance of the estimate with

sample statistics
Notice df = N 2 for a sample
2
estY
2
estY
Y Y
N 2
N 1 2
2
s
1
r
Y
N 2
Standard error of the estimate

population:
estY Y 1 r 2
sample: sestY sY
Chapter 10
N 1
2
1
N 2
13
Confidence Intervals for Predictions

Two sources of error
1. Our r is less than perfect; unknown factors
(1 r2) account for some of the variability
of the scores
2. Different samples lead to different
regression lines and different predictions
Y tcrit sestY
1
X X
1
N N 1 s X2
If r increases, sestY decreases and the CI

gets narrower
If N increases, this makes the CI narrower;
the tcv decreases and the additional factor
also decreases
Unless r is high, there is plenty of room for
error in the prediction
Chapter 10
14
Assumptions Underlying Linear

Regression
Independent random sampling
Linearity
Normal distribution
Homoscedasticity
When to use Linear

Regression
Prediction
Statistical control
Regression with Manipulated
Variables
Chapter 10
15
Scatterplot of # of Errors vs Hours Sleep

Deprivation
y = 13.53 + 0.671X
2
R = 0.594
Number of Errors
60
50
40
30
20
10
0
0
16
24
32
40
Sleep Deprivation (hours)
Example from Lockhart, Robert S. (1998). Introduction

to statistics and data analysis. New York: W. H.
Freeman & Company.
Chapter 10
16

CH 10 3rd NYU

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

CH 10 3rd NYU

Diunggah oleh

Hak Cipta:

Format Tersedia

Chapter 10:

When r is less than perfect, this rule

Predicting with z-scores

As r becomes smaller, there is

Regression Toward the Mean

Raw score graph; z-score graph

Exam2 = mean Exam2

zscore2 = 0.909zscore1 + 0.00000000000000093; r^2 = 0.83

Raw Score Regression Formula

Predictions based on raw scores

Ex) height and weight; assume r = 1.0,

Cant predict weight with just the

Quantifying the errors around

The Variance of the Estimate

Variance of the estimate: Variance

This means the variance of the predictions

Explained and Unexplained

We turn these into variances by squaring,

Sometimes symbolized as k2. We

Formulas using sample stats:

Scatterplot of Age vs Test Score

Example from Lockhart, Robert S. (1998). Introduction

Variance of the estimate with

Standard error of the estimate

Confidence Intervals for Predictions

If r increases, sestY decreases and the CI

Assumptions Underlying Linear

When to use Linear

Scatterplot of # of Errors vs Hours Sleep

Sleep Deprivation (hours)

Example from Lockhart, Robert S. (1998). Introduction

Anda mungkin juga menyukai