Anda di halaman 1dari 49

6-1/49

Statistics and Data


Analysis
Professor William Greene
Stern School of Business
IOMS Department
Department of Economics

Part 6: Correlation

6-2/49

Statistics and Data Analysis


Part 6 Correlation

Part 6: Correlation

6-3/49

Correlated Variables

Part 6: Correlation

6-4/49

Correlated Variables

Part 6: Correlation

6-5/49

Correlation Agenda

Two Related Random Variables

Were interested in correlation

Dependence and Independence


Conditional Distributions
We have to look at covariance first
Regression is correlation

Correlated Asset Returns

Part 6: Correlation

6-6/49

Probabilities for Two Events, A,B

Marginal Probability = The probability of an


event not considering any other events. P(A)
Joint Probability = The probability that two
events happen at the same time. P(A,B)
Conditional Probability = The probability that
one event happens given that another event
has happened. P(A|B)

Part 6: Correlation

Probabilities: Inherited Color Blindness*

Inherited color blindness has different incidence rates in men and


women. Women usually carry the defective gene and men usually
inherit it.
Experiment: pick an individual at random from the population.
CB
= has inherited color blindness
MALE = gender, Not-Male = FEMALE
Marginal: P(CB)
= 2.75%
P(MALE)
= 50.0%
Joint:
P(CB and MALE)
= 2.5%
P(CB and FEMALE)
= 0.25%
Conditional: P(CB|MALE)
= 5.0%
(1 in 20 men)
P(CB|FEMALE)
= 0.5%
(1 in 200 women)

* There are several types of color blindness and large variation in the incidence across different demographic
groups. These are broad averages that are roughly in the neighborhood of the true incidence for particular groups.

6-7/49

Part 6: Correlation

6-8/49

Dependent Events
Random variables X and Y are dependent if PXY(X,Y) PX(X)PY(Y).
Color Blind

P(Color blind, Male)

= .0250

Gender

No

Yes

Total

P(Male)

= .5000

Male

.475

.025

0.50

P(Color blind)

= .0275

Female

.4975

.0025

0.50

P(Color blind) x P(Male)


= .0275 x .500 = .01375

Total

.97255

.0275

1.00

.01375 is not equal to .025


Gender and color blindness are
not independent.

Part 6: Correlation

6-9/49

Equivalent Definition of
Independence

Random variables X and Y are independent if


PXY(X,Y) = PX(X)PY(Y).

The joint probability equals the product of


the marginal probabilities.

Part 6: Correlation

Getting hit by lightning and hitting a hole-in-one are independent Events

If these probabilities are correct,


P(hit by lightning) = 1/3,000 and P(hole in one) = 1/12,500,
then the probability of (Struck by lightning in your lifetime and hole-in-one)
= 1/3,000 * 1/12500 = .00000003 or one in 37,500,500.
Has it ever happened?

6-10/49

Part 6: Correlation

6-11/49

Dependent Random Variables

Random variables are dependent if the


occurrence of one affects the probability
distribution of the other.

If P(Y|X) changes when X changes, then the


variables are dependent.

If P(Y|X) does not change when X changes,


then the variables are independent.

Part 6: Correlation

6-12/49

Two Important Math Results

For two random variables,


P(X,Y) = P(X|Y) P(Y)
P(Color blind, Male) = P(Color blind|Male)P(Male)
= .05 x .5 = .025

For two independent random variables,


P(X,Y) = P(X) P(Y)
P(Ace,Heart) = P(Ace) x P(Heart).
(This does not work if they are not independent.)
Part 6: Correlation

Conditional Probability
Prob(A | B) = P(A,B) / P(B)
Prob(Color Blind | Male)
=

Prob(Color Blind,Male)
P(Male)

= .025 / .50
= .05

Color Blind
Gender

No

Yes

Total

Male

.475

.025

0.500

Female

.4975

.0025

0.50

Total

.97255

.0275

1.00

What is P(Male | Color Blind)?

A Theorem: For two random variables, P(X,Y) = P(X|Y) P(Y)

6-13/49

P(Color blind, Male) = P(Color blind|Male)P(Male)


= .05 x .5 = .025

Part 6: Correlation

6-14/49

Conditional Distributions
Marginal Distribution of Color Blindness
Color Blind
Not Color Blind
.0275
.9725
Distribution Among Men (Conditioned on Male)
Color Blind|Male
Not Color Blind|Male
.05
.95
Distribution Among Women (Conditioned on Female)
Color Blind|Female Not Color Blind|Female
.005
.995
The distributions for the two genders are different. The
variables are dependent.

Part 6: Correlation

6-15/49

Independent Random Variables


One card is drawn randomly
from a deck of 52 cards
Ace
Heart

Yes=1

No=0

Total

Yes=1

1/52

12/52

13/52

No=0

3/52

36/52

Total

4/52

48/52

P(Ace|Heart)

= 1/13

P(Ace|Not-Heart)

= 3/39 = 1/13

P(Ace)

= 4/52 = 1/13

P(Ace) does not depend on whether the


card is a heart or not.
P(Heart|Ace)

= 1/4

P(Heart|Not-Ace)

= 12/48 = 1/4

39/52

P(Heart)

= 13/52 = 1/4

52/52

P(Heart) does not depend on whether


the card is an ace or not.

A Theorem: For two independent random variables, P(X,Y) = P(X) P(Y)


P(Ace, Heart) = P(Ace)P(Heart) = 1/13 x 1/4 = 1/52

Part 6: Correlation

6-16/49

Covariation and Expected Value

Pick 10,325 people at random from the population. Predict how


many will be color blind: 10,325 x .0275 = 284

Pick 10,325 MEN at random from the population. Predict how


many will be color blind: 10,325 x .05 = 516

Pick 10,325 WOMEN at random from the population. Predict how


many will be color blind: 10,325 x .005 = 52

The expected number of color blind people, given gender,


depends on gender.
Color Blindness covaries with Gender

Part 6: Correlation

6-17/49

Positive Covariation: The distribution of


one variable depends on another variable.
Distribution of fuel bills changes
(moves upward) as the number
of rooms changes (increases).

The per capita number of


cars varies (positively)
with per capita income.
The relationship varies by
country as well.

Part 6: Correlation

6-18/49

Application Legal Case


Mix: Two kinds of cases
show up each month, real
estate (R=0,1,2) and
financial (F=0,1)
(sometimes together,
usually separately).

Joint Distribution
R = Real estate cases
F = Financial cases

Joint probabilities are


Prob(F=f and R=r)

Finance
0
1
Total

Real Estate
0
1
2
.15
.10 .05
.30
.20 .20
.45
.30 .25

Total
.30
.70
1.00

Marginal
Distribution
for Financial
Cases

Marginal Distribution for Real Estate Cases

Note that marginal probabilities are obtained


by summing across or down.

Part 6: Correlation

Legal Services Case Mix

Probabilities for R given the value of F


Distribution of R|F=0 Distribution of R|F=1
P(R=0|F=0)=.15/.30=.50 P(R=0|F=1)=.30/.70=.43
P(R=1|F=0)=.10/.30=.33 P(R=1|F=1)=.20/.70=.285
P(R=2|F=0)=.05/.30=.17 P(R=2|F=1)=.20/.70=.285

6-19/49

The probability distribution of Real estate cases (R) given Financial cases (F)
varies with the number of Financial cases (0 or 1).
The probability that (R=2)|F goes up as F increases from 0 to 1.
This means that the variables are not independent.

Part 6: Correlation

6-20/49

(Linear) Regression of Bills on Rooms

Part 6: Correlation

Measuring How Variables Move


Together: Covariance
Cov(X, Y) values of X values of Y P(X=x,Y=y)(x- X )(y Y )

6-21/49

Covariance can be positive or negative


The measure will be positive if it is likely
that Y is above its mean when X is above
its mean.
It is usually denoted XY.

Part 6: Correlation

6-22/49

Conditional Distributions
Overall Distribution
Color Blind
Not Color Blind
.0275
.9725
Distribution Among Men (Conditioned on Male)
Color Blind|Male
Not Color Blind|Male
.05
.95
Distribution Among Women (Conditioned on Female)
Color Blind|Female Not Color Blind|Female
.005
.995
The distribution changes given gender.

Part 6: Correlation

6-23/49

Covariation

Pick 10,325 people at random from the population. Predict how


many will be color blind: 10,325 x .0275 = 284

Pick 10,325 MEN at random from the population. Predict how


many will be color blind: 10,325 x .05 = 516

Pick 10,325 WOMEN at random from the population. Predict how


many will be color blind: 10,325 x .005 = 52

The expected number of color blind people, given gender,


depends on gender.
Color Blindness covaries with Gender

Part 6: Correlation

6-24/49

Covariation in legal services


How many real estated cases should the office expect if it
knows (or predicts) the number of financial cases?
Distribution of R|F=0 Distribution of R|F=1
P(R=0|F=0)=.15/.30=.50 P(R=0|F=1)=.30/.70=.43
P(R=1|F=0)=.10/.30=.33 P(R=1|F=1)=.20/.70=.285
P(R=2|F=0)=.05/.30=.17 P(R=2|F=1)=.20/.70=.285
E[R|F=0] = 0(.50) + 1(.33) + 2(.17)

= 0.670

E[R|F=1] = 0(.43) + 1(.285) + 2(.285)

= 0.855

This is how R and F covary.

Part 6: Correlation

6-25/49

Covariation and Regression


Expected Number of Real Estate Cases
Given Number of Financial Cases
1.0
0.8
0.6
The regression of R on F

0.4
0.2 0.0 -

Financial Cases

Part 6: Correlation

6-26/49

Legal Services Case Mix Covariance

The two means are


R = 0(.45)+1(.30)+2(.25) = 0.8
F = 0(.00)+1(.70)

= 0.7

Compute the Covariance


FR (F-.7)(R-.8)P(F,R)=
(0-.7)(0-.8).15 =+.084
(0-.7)(1-.8).10= -.014
(0-.7)(2-.8).05= -.042
(1-.7)(0-.8).30= -.072
(1-.7)(1-.8).20= +.012
(1-.7)(2-.8).20= +.072
Sum
= +0.04 = Cov(R,F)

I knew the covariance would be


positive because the regression
slopes upward. (We will see this
again later in the course.)

Part 6: Correlation

6-27/49

Covariance and Scaling


Compute the Covariance
Cov(R,F) = +0.04
What does the covariance mean?
Suppose each real estate case requires 2 lawyers
and each financial case requires 3 lawyers. Then
the number of lawyers is NR = 2R and NF = 3F. The
covariance of NR and NF will be 3(2)(.04) = 0.24.
But, the relationship is the same.

Part 6: Correlation

Independent Random Variables


Have Zero Covariance
One card drawn randomly from a
deck of 52 cards
E[H] = 1(13/52)+0(49/52) = 1/4

A=Ace
H=Heart Yes=1 No=0

E[A] = 1(4/52)+0(48/52) = 1/13

Total

Covariance = HAP(H,A) (H H)(A A)


1/52 (1 1/4)(1 1/13) = +36/522

Yes=1

1/52

12/52

13/52

No=0

3/52

36/52

39/52

12/52 (1 1/4)(0 1/13) = 36/522

52/52

36/52 (0 1/4)(0 1/13) = +36/522

6-28/49

Total

4/52

48/52

3/52 (0 1/4)(1 1/13) = 36/522

SUM

= 0 !!

Part 6: Correlation

6-29/49

Covariance and Units of Measurement


Covariance takes the units of
(units of X) times (units of Y)
Consider Cov($Price of X,$Price of Y).

Now, measure both prices in GBP, roughly $1.60


per .
The prices are divided by 1.60, and the covariance
is divided by 1.602.

This is an unattractive result.


Part 6: Correlation

6-30/49

Correlation is Units Free


Correlation Coefficient
XY

Covariance(X,Y)

Standard deviation(Y) Standard deviation(Y)


1.00 XY +1.00.

Part 6: Correlation

6-31/49

Correlation
R = .8 F = .7
Var(F) = 02(.3)+12(.7) - .72
Standard deviation = ..46

= .21

Var(R) = 02(.45)+12(.30)+22(.25) .82


= .66
Standard deviation = 0.81
Covariance = +0.04

Correlation=

.04
=0.107
.46 .81

Part 6: Correlation

6-32/49

Uncorrelated Variables
Independence implies zero correlation. If
the variables are independent, then the
numerator of the correlation coefficient is
zero.

Part 6: Correlation

6-33/49

Sums of Two Random Variables


Example 1: Total number of cases = F+R
Example 2: Personnel needed
= 3F+2R
Find for Sums

Expected Value
Variance and Standard Deviation

Application from Finance: Portfolio

Part 6: Correlation

6-34/49

Math Facts 1 Mean of a Sum

Mean of a sum. The


Mean of X+Y = E[X+Y] = E[X]+E[Y]

Mean of a weighted sum


Mean of aX + bY = E[aX] + E[bY]
= aE[X] + bE[Y]

Part 6: Correlation

6-35/49

Mean of a Sum

R = .8
F = .7

What is the mean (expected) number of cases each


month, R+F? E[R + F] = E[R] + E[F] = .8 + .7 = 1.5

Part 6: Correlation

6-36/49

Mean of a Weighted Sum


Suppose each Real Estate
case requires 2 lawyers and
each Financial case requires 3
lawyers. Then
NR = 2R and NF = 3F.
R = .8
F = .7

If NR = 2R and NF = 3F, then the mean number of lawyers is the mean of


2R+3F. E[2R + 3F] = 2E[R] + 3E[F] = 2(.8) + 3(.7) = 3.7 lawyers required.

Part 6: Correlation

6-37/49

Math Facts 2 Variance of a Sum


Variance of a Sum
Var[x+y] = Var[x] + Var[y] +2Cov(x,y)
Variance of a sum equals the sum of the variances
only if the variables are uncorrelated.
Standard deviation of a sum
The standard deviation of x+y is not equal to the sum
of the standard deviations.

x y 2 xy
2
x

2
y

Part 6: Correlation

Variance of a Sum
R = .8,

R2 = .66, R = .81

F = .7,

F2 = .21, F = .46
RF = 0.04

What is the variance of the total number of cases that occur each month?
This is the variance of F+R = .21 + .66 + 2(.04) = .95.
The standard deviation is .975.

6-38/49

Part 6: Correlation

6-39/49

Math Facts 3 Variance of a Weighted Sum


Var[ax+by] = Var[ax] + Var[by] +2Cov(ax,by)
= a2Var[x] + b2Var[y] + 2ab Cov(x,y).
Also, Cov(x,y) is the numerator in xy, so
Cov(x,y) = xy x y.

ax by a b 2abxy x y
2

2
x

2
y

Part 6: Correlation

Variance of a Weighted Sum


R = .8,

R2 = .66, R = .81

F = .7,

F2 = .21, F = .46
RF = 0.04, , RF = .107

Suppose each real estate case requires 2 lawyers and each


financial case requires 3 lawyers. Then NR = 2R and NF = 3F.

What is the variance of the total number of lawyers needed each month?
What is the standard deviation? This is the variance of 2R+3F

6-40/49

= 22(.66) + 32(.21) + 2(2)(3)(.107)(.81)(.46) = 5.008


The standard deviation is the square root, 2.238

Part 6: Correlation

6-41/49

Correlated Variables: Returns on Two Stocks*

* Averaged yearly return

Part 6: Correlation

6-42/49

The two returns are positively correlated.

Part 6: Correlation

6-43/49

Part 6: Correlation

6-44/49

Application - Portfolio

You have $1000 to allocate between assets


A and B. The yearly returns on the two
assets are random variables rA and rB.

The means of the two returns are


E[rA] = A and E[rB] = B

The standard deviations (risks) of the


returns are A and B.

The correlation of the two returns is AB


Part 6: Correlation

6-45/49

Portfolio

You have $1000 to allocate to A and B.

You will allocate proportions w of your


$1000 to A and (1-w) to B.

Part 6: Correlation

6-46/49

Return and Risk

Your expected return on each dollar is


E[wrA + (1-w)rB] = wA + (1-w)B

The variance your return on each dollar is


Var[wrA + (1-w)rB]
= w2 A2 + (1-w)2B2 + 2w(1-w)ABAB

The standard deviation is the square root.

Part 6: Correlation

Risk and Return: Example


Suppose you know A, B, AB, A, and B (You have watched
these stocks for over 6 years.)
The mean and standard deviation are then just functions of w.
I will then compute the mean and standard deviation for different
values of w.
For our Microsoft and Walmart example,
A = .050071, B, = .021906
A = .114264, B,= .086035, AB = .248634
E[return] = w(.050071) + (1-w)(.021906)
= .021906 + .028156w
SD[return] = sqr[w2(.1142)+ (1-w)2(.0862) +
2w(1-w)(.249)(.114)(.086)]
= sqr[.013w2 + .0074(1-w)2 + .000244w(1-w)]

6-47/49

Part 6: Correlation

6-48/49

W=1

W=0

For different values of w,


risk = sqr[.013w2 + .0074(1-w)2 + .00244w(1-w)] is on the horizontal axis
return =
.02196 + .028156w
is on the vertical axis.

Part 6: Correlation

6-49/49

Summary

Random Variables Dependent and Independent


Conditional probabilities change with the values of
dependent variables.
Covariation and the covariance as a measure.
(The regression)
Correlation as a units free measure of covariation
Math results

Mean of a weighted sum


Variance of a weighted sum
Application to a portfolio problem.

Part 6: Correlation

Anda mungkin juga menyukai