Learning objectives
1.! 2.! 3.! 4.! 5.! Pearson correlation Estimating the population Pearson correlation Misleading correlations Impact of range Two limitations in using correlation to infer causality
5801 Correlation
Relationships
! ! An important goal in statistics is to describe "relationships" between two variables. By describing relationships in a sample, we estimate the relationship in the population. What does it mean to say that "there is a relationship" between two variables (e.g., X and Y), or to say that "X and Y are related"? There are different ways of answering this question. On a strictly quantitative level, we may say X and Y are related
! ! !
Some common ways of saying "X and Y are related": ! "X and Y are associated", "an association between X and Y"; ! "X and Y are correlated", "a correlation between X and Y".
5801 Correlation
r=
# (x
i =1 n
" x )( yi " y )
2 n
=
i
# (x
i =1
" x )
# (y
i =1
" y )
Important properties of r ! 1. ! It can only range from -1 to ! It measures the degree and direction of linear relationship between X and Y. ! r = 0 implies there is no linear relationship. ! The more r differs from 0, the greater the linear relationship. ! r > 0 is called a positive relationship, r = 1 is a "perfect" positive linear relationship. ! r < 0 is called a negative relationship, r = -1 is a "perfect" negative linear relationship.
5801 Correlation
Examples
3 2
20
2
1 0 -1
10
-2
-2 -3 -3
-1
Y
-2 -1 0 X 1 2 3
-10
-2
-1
0 X
-4 -3
-2
-1
0 X
-3 -3
-20 -3
-2
-1
0 X
3 2
4 2 0
20
2 1 0 -1 -2 -2 -3 -3 -4 -3 0
10
-2 -4 -6 -3 -10
Y
-2 -1 0 X 1 2 3
-2
-1
0 X
-2
-1
0 X
-20 -3
-2
-1
0 X
5801 Correlation
As the data look more and more like such a line, r will get closer and closer to -1.
As the data look more and more like such a line, r will get closer and closer to 1.
r = -1.00
3 2 1 0 -1 -2 -3 -3
3 2 1 0 -1 -2 -3 -3
r = 0.00
3 2 1 0 -1 -2 -3 -3
r = 1.00
-2
-1
0 X
-2
-1
0 X
-2
-1
0 X
5801 Correlation
4 2 0
r = -0.80
-2
-2
-4 -6 -3
-4 -3
-2
-1
0 X
-2
-1
0 X
! r measures the degree (strength / magnitude) and direction of linear relationship. ! Degree of the relationship involves the absolute value of r. ! More different |r| is from 0, stronger is the linear relationship.
5801 Correlation
! ! !
5801 Correlation
5801 Correlation
Examples
Do you agree with the correlation coefficients?
r = 0.00
r = 0.00
5801 Correlation
Examples
Is there a positive linear relationship here? Is there a positive linear relationship here?
This small cluster of data has clearly created the positive relationship.
10
5801 Correlation
! !
Notice the "outliers" here are not outlying at all in terms of Y. These points are outliers in the sense of having undue influence on r. r = 0.40 So what is r for this sample? Is it 0.40 or 0.00?
Y
11
5801 Correlation
12
5801 Correlation
100 80
100 80
(weight) KG
(weight) KG
(height) METER
(height) METER
13
5801 Correlation
Based on the left data, hard to speculate what happens when you study far beyond 15 hours.
Sample data
75 75
GRADE
50
GRADE
0 3 6 9 12 15
50
25
25 0 5 10 15 20 25
14
5801 Correlation
15
5801 Correlation
75
GRADE
50
25 0 5 10 15 20 25
! !
But unless we're doing statistics purely for the sake of statistics, a "relationship" has much more meaning to researchers. Let us now move beyond the statistical / quantitative level.
16
5801 Correlation
Very often, a relationship is further used for explanations: If X and Y are related, we say "X explains Y", or "Y explains X". Both causality and explanation are actually complicated philosophical concepts, ! What does it mean exactly for A to cause B? ! How does an explanation work? We will not venture too philosophically, but we will consider some limitations in trying to use relationships we observe (e.g., r) for the causations we hope to infer.
17
5801 Correlation
$1
#
X1 r (!)
$2
! !
X2
Inference concerning how constructs are related depends on the quality of measurement; just how good are the indicators? Simply treating r (or !) as indicative of # without concern for the quality of measurement is a mistake.
18
5801 Correlation
2 0 -2 -4 -6 -3
r = 0.66
-2
-1
19
5801 Correlation
! Many social science variables can be related to each other through such a "chain". ! This is attributable to the complexity (or interrelatedness) of social phenomena. ! It is not difficult to find variables that have a strong relationship sometimes, unexpected relationships will be stumbled upon. ! The prevalence of relationships between arbitrarily paired social variables was deemed the "crud factor" (e.g., Meehl, 1997).
20
5801 Correlation
$1
$2
21