Introduction

• We have been looking at differences between means and at the chi-square test of

the independence of two variables.

• Now we are going to look at the relationship between two variables.

• Two common examples are the relationship between Beta-endorphin levels 12

hours before surgery and 10 minutes before surgery. Are high levels at one

reading associated with high levels at the other? (We ran a t test on these data

about two weeks ago.) The second example is the relationship between SAT

scores and performance on an SAT-like test when the subjects have not read the

passage on which the questions are based.

variables.

• Discuss differences between correlation and regression

o Correlation is the word we usually use when we want a single measure of

the degree of relationship between two variables.

o Regression is the word we usually use when we want an equation relating

the variables.

o When we have only one predictor, the two approaches tend to blur into

one--we almost never use regression without also speaking of the

correlation coefficient. When we have multiple predictors, we are much

more interested in the regression side of things.

• Y is almost always thought of a as a dependent variable beyond the experimenter's

control.

o In regression, X is usually (traditionally) thought of as a fixed variable,

even when it really isn't.

This is called the linear regression model.

o In correlation, X is usually thought of as a random variable.

This is called the bivariate normal model.

• I'm deliberately using a small sample example just to keep things simple. But

don't get the idea that small samples are a good idea.

• The following data refer to beta-endorphin levels 12 hours and 10 minutes before

surgery. Notice that they are paired by patient. (These are real data.)

12 Hours 10 Min.

Subject Gain

Before Before

• We could run a t test here, but we did that before.

o It would address an entirely different question.

• We would presumably like to look at the relationship between people's beta-

endorphin scores at the two times.

o Did people who started out high stay high?

o What would it mean if they didn't?

• The first thing we could do is to plot the data.

o The 10 min. data go on the ordinate, because it is logical to predict

forward, not backward, in time.

• Here we see that there is a positive relationship between the two variables--we'll

talk about significance later.

• If we want a measure of the degree of this relationship, the correlation is 0.699

o As we'll see later, the relationship is significant.

o What does that mean?

• In this particular example both of the variables are random--we don't know what

the values of X, or Y, will be before the experiment begins.

Example with Fixed X

• Data from Langlois and Roggman (1990) on page 411 of the text.

o Describe study

o Here I have entered 1, 2, ..., 5 for the power of 2 concerning the number of

pictures that were averaged. I have used the mean rated attractiveness of

the photographs.

1 2.201 3 2.811

1 2.411 3 2.857

1 2.407 3 3.422

1 2.403 4 3.233

1 2.826 4 3.505

1 3.380 4 3.192

2 1.893 4 3.209

2 3.102 4 2.860

2 2.355 4 3.111

2 3.644 5 3.200

2 2.767 5 3.253

2 2.109 5 3.357

3 2.906 5 3.169

3 2.118 5 3.291

5 3.290

Notice that there is no sampling error in X, whereas there was in the previous example.

• Notice how the columns line up. Get them to explain why. (This is common with

fixed X.)

• Notice how judged attractiveness increases with the number of faces included in

the composite.

• Notice how the variability of data points decreases as we increase X. This is a

no-no from the point of view of assumptions behind correlation and regression. It

will also be a problem with the analysis of variance.

o Keep in mind that we are talking about assumptions about populations,

though I'm pretty sure that the assumption is violated.

o Ask why this might be expected to happen.

• The correlation is about the same as in the previous example--r = .56, and it is

significant.

• I chose this example because it is one that psychologists deal with, and relates to

an important health problem.

• The question is the relationship between age and low-birthweight (we know they

are related), and what happens when mothers do, and do not, smoke.

• Data on Smoking mothers (pooled across 48 states, dv = % low birthweight.

•

•

o Neither relationship is exactly linear, though we get away with a straight

line in the first one.

o Both relationships are essentially the same, but exaggerated

o Notice the difference in the mean %.

o I don't quite know what to make of these data, but they are interesting.

If you get pregnant, don't smoke--especially if you are old and

creaky.

I'm not above a little drum beating.

Nations compilation of the latest data documenting progress for women

worldwide), an African woman's lifetime risk of dying from pregnancy-related

causes is 1 in 16; in Asia, 1 in 65; and in Europe, 1 in 1,400."

• One of my favorite examples.

The Correlation Coefficient

"unscaled" measure.

• Discuss the correlation coefficient and its calculation.

• Why is this one negative?

• I want them to know what this is, but I don't want them to go away thinking that

we use if very often. (We rarely do).

• What we want is an unbiased estimate of the correlation in the population.

• Comment that we very rarely use the adjusted coefficient, even though most

programs print it out.

• Here we are looking for the best straight line that can be fit to these data.

• I have included those lines in the plots above.

• We want an equation of the form:

• There are an infinite number of lines with that slope, and another infinite number

of lines with that intercept, but only one line with both that slope and intercept.

• Discuss all parts of this printout:

o Include the Anova table and explain what's going on

o Ask what an intercept of 0 would mean. (In this case I can't imagine that it

would mean much, because I can't imagine a case where solar radiation

really = 0.)

o Discuss the slope

What if the slope were greater or less than it is?

What if the slope were 0?

What if we were plotting the same general variable on both axes

(as we did with endorphins) and we had a slope = 1.0. What would

that mean?

o Point out the tests on these coefficients.

o Go back to the regression line and discuss "least squares."

