Anda di halaman 1dari 16

Chapter 10:

Linear Regression
A perfect correlation implies the
ability to perfectly predict one score
from another
Perfect predictions
Very simple, especially with z-scores:
the z-score you predict for the Y
variable is the same as the z-score for
the X variable
That is, zY = zX or zY = -zX

When r is less than perfect, this rule


does not give the best predictions

Chapter 10

Predicting with z-scores


Modified rule: zY = rzX
This incorporates r
If r = 1, we have the original rule
If r = 0, our best prediction is the
mean (0 is the mean of z-scores)

As r becomes smaller, there is


less of a tendency to expect an
extreme score on one variable to
be associated with an equally
extreme score on the other
As long as r 0, the first variable
is taken into account in the
prediction of the second
First variable has an influence, but it
lessens as r decreases
Chapter 10

Regression Toward the Mean


Sir Francis Galton: heights of
parents and children (regression
towards mediocrity)
Noted that very tall parents
tended to have children shorter
than themselves and vice versa
Just a consequence of the laws
of probability when r is less than
perfect
As r decreases, predictor gets
closer to z = 0 (regresses toward
mean)
Remember: r measures linear
regression
Chapter 10

Raw score graph; z-score graph


Scatter Plot

Collection 1
64
63

Exam2

62
61
60
59
58
57
56
87

88

89

90

91 92
Exam1

Exam2 = mean Exam2


Exam2 = 1.119Exam1 - 41.8; r^2 = 0.83
Collection 1

93

94

95

Scatter Plot

2.0
1.5
zscore2

1.0
0.5
0.0
-0.5
-1.0
-1.5

zscore2 = 0.909zscore1 + 0.00000000000000093; r^2 = 0.83

Chapter 10

Raw Score Regression Formula


From z-scores:

Y
Y
r X X Y
X

Y
bYX
r
X
aYX Y bYX X

Y bYX X aYX
Chapter 10

Predictions based on raw scores


When r = 1,

Y
bYX
X

Ex) height and weight; assume r = 1.0,


slope = 5
Perfect r, but you wouldnt expect a 1
lb change in weight for every 1 inch
change in height

Predictions: z-scores (r = 1)
Slope in z-scores = 1.0; you would
expect a change of 1 standard
deviation in height to be associated
with a change of 1 standard deviation
in weight

Cant predict weight with just the


slope because the line does not
go through the origin; we need the
intercept
Chapter 10

Quantifying the errors around


the regression line
Regression equation give us the
straight line that minimizes the
error involved in making
predictions
Residual: difference between
actual Y value and predicted Y
value: Y Y
It is the amount of the original value
that is left over after the prediction
is subtracted out
If you add these errors they sum to
zero. The regression line functions
like an average in that the amount
of error above the line always
balances out the amount of error
below
Chapter 10

The Variance of the Estimate


Quantifies the total amount of error
in the predictions:

2
estY

Y Y

Variance of the estimate: Variance


of the data points around the
regression line. Also called
residual variance
Smaller as points are closer to the
line
Lower rs will have more error in
prediction and higher 2estY
When r = 0, 2estY is largest and
regression line becomes horizontal:
slope = 0. Y is always the mean
when r = 0
Chapter 10

This means the variance of the predictions


around the regression line is just the
ordinary variance of the Y values (2Y) :
regression line does not help in prediction
For any r > 0, 2estY will be less than 2Y
and that represents the advantage of
performing regression

Explained and Unexplained


Variance
Explained: the difference between the
variance of the estimate and the total
variance
Unexplained: variance of the estimate
These are deviations broken down:

Y Y Y Y Y Y
unexplained

explained

total

We turn these into variances by squaring,


adding up and dividing by N
Whenever r 0, unexplained variance is
less than total variance, so error is
reduced; we have a better estimate

Chapter 10

Coefficient of Determination
Represents the proportion of the
total variance that is explained by
the predictor variable
Tells you how well your regression line
is doing in terms of predicting one
variable from the other
r2 =
explained variance
total variance

Coefficient of Nondetermination
Proportion of variance not
accounted for
1 r2 = unexplained variance = 2estY
total variance

2Y

Sometimes symbolized as k2. We


want this to be as small as possible
Produces an easier formula for
variance of estimate
2
estY
Y2 1 r 2

Chapter 10

10

Formulas using sample stats:

sY
bYX
r
sX

aYX Y bYX X

Y bYX X aYX
Example:
X (Age)
M = 98.14 (mths)
sX = 21.0

Y (Score)
M = 30.35 (items correct)
sY = 7.25

r = .72
N = 100

Chapter 10

11

Test Score

Scatterplot of Age vs Test Score


y = 5.81 + 0.25x
2
R = 0.525

50
45
40
35
30
25
20
15
10
40

60

80

100

120

140

Age (months)

Example from Lockhart, Robert S. (1998). Introduction


to statistics and data analysis. New York: W. H.
Freeman & Company.

Chapter 10

12

Variance of the estimate with


sample statistics
Notice df = N 2 for a sample

2
estY

2
estY

Y Y

N 2

N 1 2
2

s
1

r
Y
N 2

Standard error of the estimate


population:

estY Y 1 r 2

sample: sestY sY
Chapter 10

N 1
2
1

N 2

13

Confidence Intervals for Predictions


Two sources of error
1. Our r is less than perfect; unknown factors
(1 r2) account for some of the variability
of the scores
2. Different samples lead to different
regression lines and different predictions

Y tcrit sestY

1
X X
1
N N 1 s X2

If r increases, sestY decreases and the CI


gets narrower
If N increases, this makes the CI narrower;
the tcv decreases and the additional factor
also decreases
Unless r is high, there is plenty of room for
error in the prediction
Chapter 10

14

Assumptions Underlying Linear


Regression
Independent random sampling
Linearity
Normal distribution
Homoscedasticity

When to use Linear


Regression
Prediction
Statistical control
Regression with Manipulated
Variables
Chapter 10

15

Scatterplot of # of Errors vs Hours Sleep


Deprivation
y = 13.53 + 0.671X
2

R = 0.594

Number of Errors

60
50
40
30
20
10
0
0

16

24

32

40

Sleep Deprivation (hours)

Example from Lockhart, Robert S. (1998). Introduction


to statistics and data analysis. New York: W. H.
Freeman & Company.

Chapter 10

16

Anda mungkin juga menyukai