X Y
Obs. 1 2 1
Obs. 2 4 4
Obs. 3 3 1
Obs. 4 7 5
Obs. 5 5 6
Obs. 6 2 1
Obs. 7 4 4
Obs. 8 3 1
Obs. 9 7 5
Obs. 10 5 6
Some examples:
Two-sample data:
Sample 1: 3,2,5,1,3,4,2,3
Sample 2: 4,4,3,6,5
A bivariate simple random sample (SRS) can be written
X1, X2, . . . , Xn
or
Y1, Y2, . . . , Yn
Such properties are called joint properties. For example the mean
of X Y , the IQR of X/Y , and the average of all Xi such that
the corresponding Yi is negative are all joint properties.
H T
H 1/4 1/4
T 1/4 1/4.
0 1 2
0 1/8 1/8 0
1 1/8 1/4 1/8
2 0 1/8 1/8
0 1 0 1
0 2/5 1/5 0 3/10 3/10
1 1/10 3/10 1 1/5 1/5
The most important graphical summary of bivariate data is the
scatterplot. This is simply a plot of the points (Xi, Yi) in the
plane. The following figures show scatterplots of June maximum
temperatures against January maximum temperatures, and of
January maximum temperatures against latitude.
110 80
70
100
60
January temperature
90
50
June
80
40
70
30
60
20
50 10
10 20 30 40 50 60 70 80 20 25 30 35 40 45 50
January Latitude
A key feature in a scatterplot is the association, or trend between
X and Y .
i(Xi X)(Y )/(n 1)
iY
P
r= .
X
Y
The numerator
X
(Xi X)(Y )/(n 1)
iY
i
If either the Xi or the Yi values are constant (i.e. all have the
same value), then one of the sample standard deviations is zero,
and therefore the correlation coefficient is not defined.
Both the sample and population correlation coefficients always
fall between 1 and 1.
(Xi X)(Y ).
iY
and Yi > Y
Xi > X or and Yi < Y
Xi < X
and Yi < Y
Xi > X or and Yi > Y
Xi < X
Y >Y
X < X, Y >Y
X > X,
5
Y
(X, )
0
Y <Y
X < X, Y <Y
X > X,
-5
-10
-10 -5 0 5 10
3
-1
-2
-3
-3 -2 -1 0 1 2 3
-1
-2
-3
-3 -2 -1 0 1 2 3
Yi = Xi + b
Zi = cXi
=X
Then Y + b and Z
= cX.
X
C=
(Xi X)(Y )/(n 1),
iY
i
Yi = Xi + b
Zi = cXi
Then
Y = Z = |c|
X and X .
From the previous two slides, it follows that the sample correla-
tion coefficient is not affected by translation.
cor(X, X) = 1
cov(X, X) = var(X)
Y X X
Y
n and .
D
XY
-1
-2
Y
-3
-4
-5
0 1 2 3 4 5 6 7
X
3.5
2.5
2
Y
1.5
0.5
-0.5
-1
0 1 2 3 4 5 6
X
6.5
5.5
4.5
Y
3.5
2.5
1.5
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3
X
1
Y
-1
-2
-1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 4
X
2.5
1.5
1
Y
0.5
-0.5
-1
-1.5
-1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5
X
4
Y
0
-4 -3 -2 -1 0 1 2 3 4
X
3
Y
-1
-8 -7 -6 -5 -4 -3 -2 -1
X
2
Y
-1
-2
-8 -7 -6 -5 -4 -3 -2 -1 0
X
-3 -4
-4 -6
-5 -8
-6 -10
-7 -12
Y
Y
-8 -14
-9 -16
-10 -18
-11 -20
1 2 3 4 5 6 7 0 5 10 15 20 25 30
X X
5 10
4
6
4
3
2 0
Y
Y
-2
1
-4
-6
0
-8
-1 -10
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
X X
2 2
1 1
0 0
-1 -1
-2 -2
-3 -3
-3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3
-5
-10
-15
-20
-25
-30
-5 0 5 10 15 20 25 30
250 250
200 200
150 150
100 100
50 50
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
1+r
r = log /2
1r
and let
!
1+
= log /2,
1
P (r > 0.42) = P (Z > 0.42 27),
P (r > t) =
P( n 3r > n 3t) =
t = Q(1 )/ n 3,
1+r
log = 2t
1r
1+r
= exp(2t)
1r
exp(2t) 1
r= .
exp(2t) + 1
To get a p-value smaller than .05 under a one sided test with
n = 30 we need r > 1.64/ 27 0.32, so we need r > 0.31.
It is surprising how visually weak a significant trend can be. The
following 30 points have correlation 0.31, and hence have p-value
smaller than 0.05 for the right tailed test of = 0:
2.5
1.5
0.5
-0.5
-1
-1.5
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5
The Fisher Z transform can be used to construct confidence
intervals for the correlation coefficient. To get a 95% CI, start
with
P (1.96 n 3(r ) 1.96) = 0.95,
P (r 1.96/ n 3 r + 1.96/ n 3) = 0.95,
!
1+
= log /2
1
for , yielding
exp(2) 1
=
.
exp(2 ) + 1
exp(2x) 1
f (x) =
exp(2x) + 1
So
f (r 1.96/ n 3), f (r + 1.96/ n 3)
is a 95% CI.
D=( m 3r1 n 3r2 )/
2,
If we then take the average of all Y values that are paired with
these X values, we would obtain the conditional mean of Y given
X = 3, or the conditional expectation of Y given X = 3, written
E(Y |X = 3).
More generally, for any value x in the X sample space we can
average all values of Y that are paired with X = x, yielding
E(Y |X = x).
8
1.5
6
4 1
2
0.5
0
-2
-4 -0.5
-6
-1
-8
-10 -1.5
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
The following plots show two distinct joint distributions with the
same regression function.
6 6
E(Y|X) E(Y|X)
5 5
4 4
3 3
2 2
1 1
0 0
-1 -1
-2 -2
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
var(Y |X = x) = (Y |X = x)2.
The value of (Y |X = x) is the standard deviation of all Y values
that are paired with X = x.
2 2
1.5 1.5
1 1
0.5 0.5
0 0
-3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3
2 2
1.5 1.5
1 1
0.5 0.5
0 0
-3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3
7 2
1.5
4 1
0.5
1 0
-1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1
1.5
2 1
0.5
-1 0
-1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1
0.8
3
0.7
2 0.6
0.5
1
0.4
0 0.3
0.2
-1
0.1
-2 0
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
5
0.8
0.6
3
2
0.4
0.2
0
-1 0
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2