Slides 3
Bilkent
(Bilkent)
ECON509
1 / 125
Preliminaries
(Bilkent)
ECON509
2 / 125
So far, our interest has been on events involving a single random variable only. In
other words, we have only considered univariate models.
Multivariate models, on the other hand, involve more than one variable.
Consider an experiment about health characteristics of the population. Would we be
interested in one characteristic only, say weight? Not really. There are many
important characteristics.
Denition (4.1.1): An n-dimensional random vector is a function from a sample
space into Rn , n-dimensional Euclidean space.
Suppose, for example, that with each point in a sample space we associate an
ordered pair of numbers, that is, a point (x , y ) 2 R2 , where R2 denotes the plane.
Then, we have dened a two-dimensional (or bivariate) random vector (X , Y ) .
(Bilkent)
ECON509
3 / 125
Example (4.1.2): Consider the experiment of tossing two fair dice. The sample
space has 36 equally likely points. For example:
(3, 3)
(4, 1)
Now, let
X = sum of the two dice
&
Then,
(3, 3)
(4, 1)
X =6
and
Y = 0,
X =5
and
Y = 3,
(Bilkent)
ECON509
4 / 125
4) =
1
?
9
This is because the only sample points that yield this event are (4, 3), (3, 4), (5, 2)
and (2, 5).
Note that from now on we will use P (event a, event b) rather than P (event a and
event b).
(Bilkent)
ECON509
5 / 125
(Bilkent)
ECON509
6 / 125
0
1
2
3
4
5
1
36
3
1
18
4
1
36
1
18
5
1
18
1
18
6
1
36
1
18
1
18
x
7
1
18
1
18
1
18
8
1
36
1
18
1
18
9
1
18
1
18
10
1
36
1
18
11
1
18
12
1
36
The joint pmf is dened for all (x , y ) 2 R2 , not just the 21 pairs in the above Table.
For any other (x , y ) , f (x , y ) = P (X = x , Y = y ) = 0.
(Bilkent)
ECON509
7 / 125
As before, we can use the joint pmf to calculate the probability of any event dened
in terms of (X , Y ) . For A R2 ,
P ((X , Y ) 2 A ) =
f (x , y ).
fx ,y g2A
4g. Then,
4) = f (7, 1) + f (7, 3) =
1
1
1
+
= .
18
18
9
Expectations are also dealt with in the same way as before. Let g (x , y ) be a
real-valued function dened for all possible values (x , y ) of the discrete random
vector (X , Y ) . Then, g (X , Y ) is itself a random variable and its expected value is
E [g (X , Y )] =
g (x , y )f (x , y ).
(x ,y )2R2
(Bilkent)
ECON509
8 / 125
Example (4.1.4): For the (X , Y ) whose joint pmf is given in the above Table, what
is the expected value of XY ? Letting g (x , y ) = xy , we have
E [XY ] = 2
1
+4
36
1
+ ... + 8
36
1
+7
18
1
11
= 13 .
18
18
As before,
E [ag1 (X , Y ) + bg2 (X , Y ) + c ] = aE [g1 (X , Y )] + bE [g2 (X , Y )] + c.
One very useful result is that any nonnegative function from R2 into R that is
nonzero for at most a countable number of (x , y ) pairs and sums to 1 is the joint
pmf for some bivariate discrete random vector (X , Y ).
(Bilkent)
ECON509
9 / 125
=
=
=
f (0, 1) = 1/6,
f (1, 1) = 1/3,
0
(Bilkent)
ECON509
10 / 125
y 2R
fX ,Y (x , y )
and
fY (y ) =
x 2R
fX ,Y (x , y ).
Proof: For any x 2 R, let Ax = f(x , y ) : < y < g. That is, Ax is the line in
the plane with rst coordinate equal to x . Then, for any x 2 R,
fX (x )
=
=
=
P (X = x ) = P (X = x ,
P ((X , Y ) 2 Ax ) =
y 2R
(Bilkent)
< Y < )
fX ,Y (x , y )
(x ,y )2A x
fX ,Y (x , y ).
ECON509
11 / 125
Example (4.1.7): Now we can compute the marginal distribution for X and Y from
the joint distribution given in the above Table. Then,
fY (0 )
1/6.
(Bilkent)
fY (2) = 2/9,
5y =0 fY
fY (3) = 1/6,
fY (4) = 1/9,
fY (5) = 1/18.
ECON509
12 / 125
(Bilkent)
ECON509
13 / 125
f (1, 0) = 5/12,
f (x , y ) = 0
Then,
fY (0 )
fY (1 )
fX (0 )
and
fX (1 )
=
=
=
=
Now consider the marginal pmfs for the distribution considered in Example (4.1.5).
fY (0 )
fY (1 )
fX (0 )
and
fX (1 )
=
=
=
=
We have the same marginal pmfs but the joint distributions are dierent!
(Bilkent)
ECON509
14 / 125
RR
f (x , y )dxdy .
P ((X , Y ) 2 A ) =
Z Z
g (x , y )f (x , y )dxdy .
It is important to realise that the joint pdf is dened for all (x , y ) 2 R2 . The pdf
may equal 0 on a large set A if P ((X , Y ) 2 A ) = 0 but the pdf is still dened for
the points in A.
(Bilkent)
ECON509
15 / 125
Again, naturally,
fX (x )
fY (y )
=
=
f (x , y )dy ,
< x < ,
f (x , y )dx ,
< y < .
0 for all
(Bilkent)
ECON509
16 / 125
6xy 2
0
0<x <1
otherwise
6xy 2 dxdy
=
=
Z 1
0
(Bilkent)
0<y <1
Z 1Z 1
0
and
6xy 2 dxdy =
Z 1
0
3x 2 y 2 j1x =0 dy
3y 2 dy = y 3 j1y =0 = 1.
ECON509
17 / 125
y =1
and
x + y = 1.
Therefore,
A
(Bilkent)
=
=
=
f(x , y ) : x + y
f(x , y ) : x 1
f(x , y ) : 1 y
1,
0 < x < 1,
y,
0 < x < 1,
x < 1,
ECON509
0 < y < 1g
0 < y < 1g
0 < y < 1 g.
18 / 125
1)
Z Z
f (x , y )dxdy =
A
Z 1Z 1
0
Z 1Z 1
0
1 y
Z 1
6y 3
3
2
3
9
=
.
5
10
3x 2 y 2 j1x =1
y dy
3y 4 dy =
6xy 2 dxdy
1 y
Z 1h
3y 2
3 (1
6 4
y
4
3 5
y
5
i
y )2 y 2 dy
y =0
Moreover,
f (x ) =
f (x , y )dy =
1
3
<X <
2
4
Z 1
0
(Bilkent)
ECON509
Z 3/4
1/2
2xdx =
5
.
16
This Version: 7 Nov 2014
19 / 125
y)
x, Y
for all (x , y ) 2 R2 .
Although for discrete random vectors it might not be convenient to use the joint cdf,
for continuous random variables, the following relationship makes the joint cdf very
useful:
Z x Z y
F (x , y ) =
f (s, t )dsdt.
(Bilkent)
ECON509
20 / 125
We have talked a little bit about conditional probabilities before. Now we will
consider conditional distributions.
The idea is the same. If we have some extra information about the sample, we can
use that information to make better inference.
Suppose we are sampling from a population where X is the height (in kgs) and Y is
the weight (in cms). What is P (X > 95)? Would we have a better/more relevant
answer if we knew that the person in question has Y = 202 cms ? Usually,
P (X > 95jY = 202) is supposed to be much larger than P (X > 95jY = 165).
Once we have the joint distribution for (X , Y ) , we can calculate the conditional
distributions, as well.
(Bilkent)
ECON509
21 / 125
f (x , y )
.
fX (x )
For any y such that P (Y = y ) = fY (y ) > 0, the conditional pmf of X given that
Y = y is the function of x denoted by f (x jy ) and dened by
f (x jy ) = P (X = x jY = y ) =
f (x , y )
.
fY (y )
f (y jx ) =
y
(Bilkent)
0 and fX (x ) > 0,
y f (x , y )
f (x )
= X
= 1.
fX (x )
fX (x )
ECON509
22 / 125
ECON509
23 / 125
In addition,
f (10j1)
f (20j1)
f (30j2)
3
3/18
=
,
10
10/18
4/18
4
=
,
10/18
10
4/18
= 1.
4/18
f (30j1) =
>
>
etc...
(Bilkent)
ECON509
24 / 125
f (x , y )
.
fY (y )
(Bilkent)
ECON509
25 / 125
g (y )f (y jx )
and
E [g (Y )jx ] =
g (y )f (y jx )dx ,
(Bilkent)
ECON509
26 / 125
b )2 ],
b )2 ]
=
=
=
E fX
h
E (fX
E [X ] + E [X ]
b g2
E [X ]g + fE [X ]
E [X ]g2 + fE [X ]
E fX
+2E (fX
E [X ]gfE [X ]
b g)2
b g2
b g) ,
where
E (fX
E [X ]gfE [X ]
b g) = fE [X ]
b gE fX
E [X ]g = 0.
Then,
E [(X
(Bilkent)
b )2 ] = E fX
E [X ]g2 + fE [X ]
ECON509
b g2 .
This Version: 7 Nov 2014
27 / 125
We have no control over the rst term, but the second term is always positive, and
so, is minimised by setting b = E [X ]. Therefore, b = E [X ] is the value that
minimises the prediction error and is the best predictor of X .
In other words, the expectation of a random variable is its best predictor.
Can you show this for the conditional expectation, as well? See Exercise 4.13 which
you will be asked to solve as homework.
(Bilkent)
ECON509
28 / 125
f (x , y )dy =
dy = e
f (x , y )
e
=
fX (x )
e
=e
(y x )
if y > x ,
and
0
f (x , y )
=
= 0, if y x .
fX (x )
e x
Thus, given X = x , Y has an exponential distribution, where x is the location
parameter in the distribution of Y and = 1 is the scale parameter. Notice that the
conditional distribution of Y is dierent for every value of x . Hence,
f (y jx ) =
E [Y jX = x ] =
(Bilkent)
ye (y
x)
dy = 1 + x .
ECON509
29 / 125
ye (y
x)
dy
ye (y
x)
(x x )
h
x + 0 + e (x
xe
e (y
x)
dy
y =x
x)
+
i
(y x )
y =x
= x + 1.
y 2 e (y
x)
dy
fE [Y jx ]g2 .
Z
ye (y
x)
dy
= 1.
Again, you can obtain the rst part of this result by integration by parts.
What is the implication of this? One can show that the marginal distribution of Y is
gamma (2, 1) and so Var (Y ) = 2. Hence, the knowledge that X = x has reduced the
variability of Y by 50%!!!
(Bilkent)
ECON509
30 / 125
(Bilkent)
ECON509
31 / 125
Yet, in some other cases, the conditional distribution might not depend on the
conditioning variable.
Say, the conditional distribution of Y given X = x is not dierent for dierent
values of x .
In other words, knowledge of the value of X does not provide any more information.
This situation is dened as independence.
Denition (4.2.5): Let (X , Y ) be a bivariate random vector with joint pdf or pmf
f (x , y ) and marginal pdfs or pmfs fX (x ) and fY (y ). Then X and Y are called
independent random variables if, for every x 2 R and y 2 R,
f (x , y ) = fX (x )fY (y ).
(Bilkent)
ECON509
(1)
32 / 125
f (x )fY (y )
f (x , y )
= X
= fY (y ).
fX (x )
fX (x )
We can either start with the joint distribution and check independence for each
possible value of x and y , or start with the assumption that X and Y are
independent and model the joint distribution accordingly. In this latter direction, our
economic intuition might have to play an important role.
Would information on the value of X really increase our information about the
likely value of Y ?
(Bilkent)
ECON509
33 / 125
Example (4.2.6): Consider the discrete bivariate random vector (X , Y ) , with joint
pmf given by
f (10, 1) = f (20, 1) = f (20, 2) = 1/10,
f (10, 2) = f (10, 3) = 1/5
and
f (20, 3) = 3/10.
and
although
f (10, 3)
f (10, 1)
1
11
6=
= fX (10)fY (3),
5
22
1
11
=
= fX (10)fY (1).
10
25
(Bilkent)
ECON509
34 / 125
and
h (y ) = fY (y ).
Then, by (1),
f (x , y ) = fX (x )fY (y ) = g (x )h (y ).
For the if part, suppose f (x , y ) = g (x )h (y ). Dene
Z
g (x )dx = c
and
h (y )dy = d ,
where
cd
=
=
(Bilkent)
g (x )dx
Z Z
h (y )dy
g (x )h (y )dydx =
ECON509
Z Z
f (x , y )dydx = 1.
35 / 125
Moreover,
fX (x ) =
g (x )h (y )dy = g (x )d
and
fY (y ) =
g (x )h (y )dx = h (y )c.
Then,
f (x , y ) = g (x )h (y ) = g (x )h (y )|{z}
cd = fX (x )fY (y ),
1
To prove the Lemma for discrete random vectors, replace integrals with summations.
(Bilkent)
ECON509
36 / 125
1 2 4
x y e
384
y (x /2 )
x >0
and
y > 0.
If we dene
y 4e y
,
384
for x > 0 and y > 0 and g (x ) = h (y ) = 0 otherwise, then, clearly,
g (x ) = x 2 e
x /2
and
h (y ) =
f (x , y ) = g (x )h (y )
for all (x , y ) 2 R2 . By Lemma (4.2.7), X and Y are independently distributed!
(Bilkent)
ECON509
37 / 125
=
=
.01,
fX (1) = .09,
fX (2) = .90,
.70,
fY (1) = .25,
fY (2) = .05.
fX ,Y (0, 1) = .0025,
etc.
We can thus calculate quantities such as,
P (X = Y )
(Bilkent)
=
=
.70 + .09
ECON509
.25 + .90
.05 = .0745.
38 / 125
Proof: Start with (2) and consider the continuous case. Now,
E [g (X )h (Y )]
=
=
=
=
Z Z
Z Z
Z
g (x )h (y )f (x , y )dxdy
g (x )h (y )fX (x )fY (y )dxdy
g (x )fX (x )dx
h (y )fY (y )dy
E [g (X )]E [h (Y )].
(Bilkent)
ECON509
39 / 125
x 2A
f (x )dx =
=
=
=
P ((X , Y ) 2 C ) = E [g (X )h (Y )]
E [g (X )]E [h (Y )]
P (X 2 A )P (Y 2 B ).
These results make life a lot easier when calculating expectations of certain random
variables.
(Bilkent)
ECON509
40 / 125
x2
4, Y < 3) = P (X
4 )P (Y < 3 ) = e
(1
).
E [X 2 Y ] = E [X 2 ]E [Y ] = Var (X ) + fE [X ]g2 E [Y ] = 1 + 12 1 = 2.
The moment generating function is back!
Theorem (4.2.12): Let X and Y be independent random variables with moment
generating functions MX (t ) and MY (t ). Then the moment generating function of
the random variable Z = X + Y is given by
M Z (t ) = M X (t )M Y (t ).
Proof: Now, we know from before that MZ (t ) = E [e tZ ]. Then,
E [e tZ ] = E [e t (X +Y ) ] = E [e tX e tY ] = E [e tX ]E [e tY ] = MX (t )MY (t ),
where we have used the result that for two independent random variables X and Y ,
E [g (X )g (Y )] = E [g (X )]E [g (Y )].
Of course, if independence does not hold, life gets pretty tough! But we will not
deal with that here.
(Bilkent)
ECON509
41 / 125
2 t 2
X t + X
2
and
MY (t ) = exp
2 t 2
Y t + Y
2
#
(2X + 2Y )t 2
MZ (t ) = MX (t )MY (t ) = exp (X + Y )t +
,
2
which is the mgf of a normal random variable with mean X + Y and variance
2X + 2Y .
(Bilkent)
ECON509
42 / 125
Bivariate Transformations
We now consider transformations involving two random variables rather than only
one.
Let (X , Y ) be a random vector and consider (U , V ) where
U = g1 (X , Y )
and
V = g2 (X , Y ),
(U , V ) 2 B
R2 , notice that
if and only if
(X , Y ) 2 A,
A = f(x , y ) : g1 (x , y ), g2 (x , y ) 2 B g.
Hence,
P ((U , V ) 2 B ) = P ((X , Y ) 2 A ) .
This implies that the probability distribution of (U , V ) is completely determined by
the probability distribution of (X , Y ) .
(Bilkent)
ECON509
43 / 125
Bivariate Transformations
(Bilkent)
ECON509
fX ,Y (x , y ).
(x ,y )2Auv
44 / 125
Bivariate Transformations
Poisson ( ) and Y
x e
x!
y e
y!
Obviously
and
V = Y,
g1 (x , y ) = x + y
and
g2 (x , y ) = y .
implying that
(Bilkent)
ECON509
45 / 125
Bivariate Transformations
What is B ? Since y = v , for any given v ,
u = x + y = x + v.
Hence, u = v , v + 1, v + 2, v + 3, ... . Therefore,
Auv = (u
v , v ).
As such,
fU ,V (u, v )
fX ,Y (x , y ) = fX ,Y (u
v, v)
(x ,y )2Auv
(Bilkent)
u
(u
e v e
v )! v !
ECON509
46 / 125
Bivariate Transformations
What is the marginal pmf of U ?
For any xed non-negative integer u, fU ,V (u, v ) > 0 only for v = 0, 1, ..., u. Then,
u
v =0 (u
u!
e ( +) u
(u v )!v ! u
u!
v =0
e ( +) u
u
v u
u!
v =0
e v e
v )! v !
fU (u )
=e
( +)
v =
u v v
v )!v !
v =0 (u
u
v
e ( +)
( + )u ,
u!
u = 0, 1, 2, ... ,
which follows from the binomial formula given by (a + b )n = ni=0 (xn )ax b n x .
This is the pmf of a Poisson random variable with parameter + . A theorem
follows.
Theorem (4.3.2): If X
Poisson ( ), Y
Poisson () and X and Y are
independent, then X + Y
Poisson ( + ).
(Bilkent)
ECON509
47 / 125
Bivariate Transformations
Consider the continuous case now.
Let (X , Y ) fX ,Y (x , y ) be a continuous random vector and
A
B
=
=
and
v = g2 (x , y ),
x = h1 (u, v )
and
y = h2 (u, v ).
and obtain
(Bilkent)
ECON509
48 / 125
Bivariate Transformations
The last remaining ingredient is the Jacobian of the transformation. This is the
determinant of a matrix of partial derivatives.
J=
x
u
y
u
x
v
y
v
x y
u v
x y
.
v u
(Bilkent)
ECON509
49 / 125
Bivariate Transformations
The next example is based on the beta distribution, which is related to the gamma
distribution.
The beta (, ) pdf is given by
fX (x j, ) =
( + )
x
() ( )
(1
x )
(Bilkent)
ECON509
50 / 125
Bivariate Transformations
beta ( + , ) and X ?? Y .
beta (, ), Y
( + )
x
() ( )
x )
(1
( + + ) +
y
( + ) ()
(1
y )
and
V = X.
As usual,
(Bilkent)
ECON509
51 / 125
Bivariate Transformations
Now we know that V = X and the set of possible values for X is 0 < x < 1. Hence,
the set of possible values for V is given by 0 < v < 1.
Since U = XY = VY , for any given value of V = v , U will vary between 0 and v , as
the set of possible values for Y is 0 < y < 1. Hence
0 < u < v.
Therefore, the given transformation maps A onto
and
y = h2 (u, v ) =
u
.
v
(Bilkent)
ECON509
52 / 125
Bivariate Transformations
For
x = h1 (u, v ) = v
and
y = h2 (u, v ) =
u
,
v
we have
J=
x
u
y
u
x
v
y
v
1
v
u
v2
1
.
v
( + + )
v
() ( ) ()
(1
v )
u
v
+ 1
u
v
1
,
v
(Bilkent)
ECON509
53 / 125
Bivariate Transformations
(+ +)
Z 1
u
fU ,V (u, v )dv
Z 1
(1
v )
u
v
(1
v )
(u )
Z 1
u
Ku
Z 1
(1
v )
+ 1
uv
v
vu
|
Ku
(Bilkent)
u
v
ECON509
u
v
u
v
1
dv
v
u
v
1
dv
v
1 + 1
1
v
{z
}
v 1
1
(u )
u v
|
{z
}
(u/v )
Z 1
u
v
u
dv .
v2
1u
dv
v
v
|{z}
u/v 2
54 / 125
Bivariate Transformations
Let
u
v
y =
u 1
dv ,
v2 1 u
or, equivalently,
v2
(1
u
dv =
dy
u
v
u
v
u) .
(1
y )
=
=
=
(Bilkent)
1
1
u/v + u
1 u
1
ECON509
u/v
1 u
1
1
55 / 125
Bivariate Transformations
Moreover, for v = u, y = 1 and for v = 1, y = 0. Therefore,
fU (u )
Ku
Z 1
u
u
u
v
|
{z
y
=
=
Ku
Ku
1 (1
u )
u ) +
(1
u ) +
(1
} |
(1 y )
Z 0
Z 1
( + )
y
( ) ()
(1
Now,
y
y )
u
v
{z
1
u
2
v
}
(1
(1
dy
(1 u )
dv
|{z}
y )
y )
v2
u
(1 u )
dy
dy .
beta ( , )
1
y )
(1
0 < y < 1,
> 0,
> 0,
(Bilkent)
(1
y )
ECON509
( ) ()
.
( + )
This Version: 7 Nov 2014
56 / 125
Bivariate Transformations
Hence,
fU (u )
=
=
=
Ku
(1
u ) +
Z 1
0
(1
y )
dy
( + + ) ( ) () 1
u
(1 u ) +
() ( ) () ( + )
( + + ) 1
u
(1 u ) + 1 ,
() ( + )
(Bilkent)
ECON509
57 / 125
Bivariate Transformations
N (0, 1), Y
U = g1 (X , Y ) = X + Y
V = g2 (X , Y ) = X
Y.
=
=
where
Therefore,
(Bilkent)
1
1
1 2
p exp
p exp
x
2
2
2
1
1
exp
x2 + y2 ,
2
2
1 2
y
2
< y < .
A = f (x , y ) : fX ,Y (x , y ) > 0g = R2 .
ECON509
58 / 125
Bivariate Transformations
We have
u = x +y
and
v =x
y,
Thankfully, when these equations are solved for (x , y ) they yield unique solutions:
x = h1 (u, v ) =
u+v
2
and
y = h2 (u, v ) =
v
2
(Bilkent)
ECON509
59 / 125
Bivariate Transformations
The Jacobian is given by
J=
x
u
y
u
x
v
y
v
1
2
1
2
1
2
1
2
1
4
1
=
4
1
.
2
Therefore,
fU ,V (u, v )
=
=
where
1
2
1
,
2
< v < .
Rearranging gives,
1
1
p exp
fU ,V (u, v ) = p
2 2
u2
4
1
1
p p exp
2 2
v2
4
Since the joint density factors into a function of u and a function of v , by Lemma
(4.2.7), U and V are independent.
(Bilkent)
ECON509
60 / 125
Bivariate Transformations
Remember Theorem (4.2.14).
Theorem (4.2.14): Let X
N (X , 2X ) and Y
N (Y , 2Y ) be independent
normal random variables. Then the random variable Z = X + Y has a
N (X + Y , 2X + 2Y ) distribution.
Then,
U = X +Y
What about V ? Dene Z =
N (0, 2).
N (0, 1).
Y = X +Z
N (0, 2),
as well.
That the sums and dierences of independent normal random variables are
independent normal random variables is true independent of the means of X and Y ,
as long as Var (X ) = Var (Y ). See Exercise 4.27.
Note that the more di cult bit here is to prove that U and V are indeed
independent.
(Bilkent)
ECON509
61 / 125
Bivariate Transformations
Theorem (4.3.5): Let X ?? Y be two random variables. Dene U = g (X ) and
V = h (Y ), where g (x ) is a function only of x and h (y ) is a function only of y .
Then U ?? V .
Proof: Consider the proof for continuous random variables U and V . Dene for any
u 2 R and v 2 R,
A u = fx : g (x )
ug
and
B v = fy : h (y )
v g.
Then,
FU ,V (u, v )
=
=
=
P (U
u, V
v)
P (X 2 A u , Y 2 B v )
P (X 2 A u ) P (Y 2 B v ) ,
=
=
2
F
(u, v )
uv U ,V
d
P (X 2 A u )
du
d
P (Y 2 B v ) ,
dv
and clearly, the joint pdf factorises into a function only of u and a function only of
v . Therefore, by Lemma (4.2.7), U and V are independent.
(Bilkent)
ECON509
62 / 125
Bivariate Transformations
What if we start with a bivariate random variable (X , Y ) but are only interested in
U = g (X , Y ) and not (U , V )?
We can then choose a convenient V = h (X , Y ) , such that the resulting
transformation from (X , Y ) to (U , V ) is one-to-one on A.
Then, the joint pdf of (U , V ) can be calculated as usual and we can obtain the
marginal pdf of U from the joint pdf of (U , V ) .
Which V is convenient would generally depend on what U is.
(Bilkent)
ECON509
63 / 125
Bivariate Transformations
What if the transformation of interest is not one-to-one?
In a similar fashion to Theorem 2.1.8, we can dene
P ((X , Y ) 2 A 0 ) = 0;
The transformation U = g 1 (X , Y ) and V = g 2 (X , Y ) is a one-to-one transformation
from A i onto B for each i = 1, ..., k .
Then we can calculate the inverse functions from B to Ai for each i . Let
x = h1i (u, v )
y = h2i (u, v )
and
(u, v ) = (g1 (x , y ) , g2 (x , y )) .
Let Ji be the Jacobian for the i th inverse. Then,
k
fU ,V (u, v ) =
i =1
(Bilkent)
ECON509
64 / 125
binomial (Y , p ) ,
Poisson () .
(Bilkent)
ECON509
65 / 125
P (X = x )
P (X
= x, Y = y)
P (X
= x jY = y ) P (Y = y )
y =0
y =0
y =x
y x
p (1
x
p)
y x
y!
where the summation in the third line starts from y = x rather than y = 0 since if
y < x , the conditional probability should be equal to zero: clearly, the number of
surviving eggs cannot be larger than the number of those laid.
(Bilkent)
ECON509
66 / 125
x . This gives,
P (X = x )
y!
(y x )!x ! p x (1
y =x
x
(p ) e
x!
y =x
[(1
p )y
x y x
y!
y x
p )]
(y x ) !
(p )x e
x!
t =0
But the summation in the nal term is the kernel for Poisson ((1
that,
[(1 p )]t
= e (1 p ) .
t!
t =0
[(1
p ) ]t
.
t!
p )) . Remember
Therefore,
(p )x e (1 p )
(p )x p
e
e
,
=
x!
x!
which implies that X
Poisson (p ) .
The answer to the original question then is E [X ] = p : on average, p eggs survive.
P (X = x ) =
(Bilkent)
ECON509
67 / 125
Now comes a very useful theorem which you will, most likely, use frequently in the
future.
Remember that E [X jy ] is a function of y and E [X jY ] is a random variable whose
value depends on the value of Y .
Theorem (4.4.3): If X and Y are two random variables, then
n
o
E X [ X ] = E Y E X jY [ X jY ] ,
It is important to notice that the two expectations are with respect to two dierent
probability densities, fX . ( ) and fX jY ( jY = y ).
This result is widely known as the Law of Iterated Expectations.
(Bilkent)
ECON509
68 / 125
=
=
=
Z Z
xf (x , y )dxdy =
EX jY [X jY = y ]fY (y )dy
n
o
E Y E X jY [X jY ] .
The corresponding proof for the discrete case can be obtained by replacing integrals
with sums.
How does this help us? Consider calculating the expected number of survivors again.
n
o
E X [X ] = E Y E X jY [ X jY ]
(Bilkent)
EY [pY ] = pEY [Y ] = p.
ECON509
69 / 125
(Bilkent)
ECON509
70 / 125
binomial (Y , p ),
exponential ( ).
Y j
Poisson (),
Then,
E X [X ]
=
=
=
o
EX jY [X jY ] = EY [pY ]
n
o
n
o
E EY j [pY j] = pE EY j [Y j]
EY
pE [] = p,
(Bilkent)
ECON509
71 / 125
P (Y = y , 0 < < ) =
=
=
f (y j )f ( )d =
1
y !
1
1+
(1 +
e
1
1+
!y
"0
Z
e
0
f (y , )d
#
y
1 /
e
d
y!
1
d =
(y + 1 )
y !
1
1+
!y +1
(1 +
d = (y + 1 )
Gamma (y + 1, (1 + 1/)
ECON509
1
1+
!y +1
1 ).
This Version: 7 Nov 2014
72 / 125
The nal expression is that of the negative binomial pmf. Therefore, the two-stage
hierarchy is given by
X jY
Y
(Bilkent)
binomial (Y , p ),
negative binomial
ECON509
r = 1, p =
1
1+
73 / 125
=
=
=
EX [X 2 ] fEX [X ]g2
n
o
E Y E X jY [X 2 jY ]
h
n
o
E Y E X jY [X 2 jY ]
EY fEX jY [X jY ]g
EY
fEX jY [X jY ]g2
+ EY fEX jY [X jY ]g2
EY fEX jY [X jY ]g
hn
i
o
EY
E X jY [X 2 jY ]
fEX jY [X jY ]g2
EY fEX jY [X jY ]g
+ EY fEX jY [X jY ]g2
h
i
EY VarX jY (X jY ) + VarY fEX jY [X jY ]g ,
ECON509
74 / 125
binomial (n, P ) ,
beta (, ) .
o
EX jP [X jP ] = EP [nP ] = n
,
+
where the last result follows from the fact that for P beta (, ) ,
E [P ] = / ( + ) .
Example (4.4.8): Now, lets calculate the variance of X . By Theorem (4.4.7),
VarX (X ) = VarP fEX jP [X jP ]g + EP [VarX jP (X jP )].
Now, EX jP [X jP ] = nP and since P
beta (, ) ,
( + ) ( + + 1)
75 / 125
( + )
x
() ( )
(1
x )
p )p
(1
EP [nP (1
( + )
()( )
P )]
Z 1
p (1
p )
dp.
The integrand is the kernel of another beta pdf with parameters + 1 and + 1
since the pdf for P beta ( + 1, + 1) is given by
( + + 2)
p (1
( + 1) ( + 1)
(Bilkent)
ECON509
p ) .
76 / 125
Therefore,
EP [VarX jP (X jP )]
( + ) ( + 1)( + 1)
()( ) ( + + 2)
() ( )
( + )
()( ) ( + + 1) ( + ) ( + )
n
.
( + + 1) ( + )
Thus,
VarX (X )
(Bilkent)
+ n2
2
( + + 1) ( + )
( + ) ( + + 1)
( + + n )
( + )2 ( + + 1 )
ECON509
77 / 125
E [Y ] = Y ,
Var (X ) = 2X
and
Var (Y ) = 2Y .
X ) (Y
Y )].
Cov (X , Y )
,
X Y
ECON509
78 / 125
and
WZ = 0.9.
X Y .
Proof:
Cov (X , Y )
(Bilkent)
=
=
=
=
E [(X
X ) (Y
E [XY
X Y
Y )]
Y X + X Y ]
E [XY ]
X E [Y ]
E [XY ]
X Y .
ECON509
Y E [X ] + X Y
79 / 125
Z x +1
x
0 < x < 1,
fX ,Y (x , y )dy =
x < y < x + 1.
Z x +1
x
1
1dy = y jxy +
=x = 1,
(Bilkent)
ECON509
80 / 125
and
2Y = 1/6.
Moreover,
EX ,Y [XY ]
Z 1 Z x +1
0
Z 1
0
Therefore,
Cov (X , Y ) =
(Bilkent)
xydydx =
1
x2 + x
2
1
12
ECON509
x +1
Z 1
xy 2
and
dx =
dx
y =x
7
.
12
1
XY = p .
2
81 / 125
E [XY ] = E [X ]E [Y ].
Then
Cov (X , Y ) = E [XY ]
X Y = X Y
X Y = 0,
and consequently,
XY =
Cov (X , Y )
= 0.
X Y
(Bilkent)
ECON509
82 / 125
ECON509
83 / 125
Y = X + Z,
and consider (X , Y ) .
Consider the following intuitive derivation of the distribution of (X , Y ) . We are
given X = x and Y = x + Z for a particular value of X . Now, the conditional
distribution of Z given X is,
Z jX
1
x + 1/10
1
x1
1 1
= 10,
1/10 1
ECON509
84 / 125
1
2
and
E [X + Z ] =
E [XY ]
E [X ]E [Y ]
1
1
11
+
=
.
2
20
20
Then,
Cov (X , Y )
=
=
=
=
=
E [X (X + Z )]
E [X ]E [X + Z ]
fE [X ]g2 E [X ]E [Z ]
E [X ] + E [X ]E [Z ] fE [X ]g2 E [X ]E [Z ]
1
E [X 2 ] fE [X ]g2 = Var (X ) =
.
12
E [X ] + E [XZ ]
2
By Theorem (4.5.6),
Var (Y ) = Var (X ) + Var (Z ) =
1
1
+
.
12
1200
Then,
XY
(Bilkent)
= p
1/12
=
1/12 1/12 + 1/1200
ECON509
100
.
101
This Version: 7 Nov 2014
85 / 125
Therefore, in the latter case, knowledge about the value of X gives us a more precise
idea about the possible values that Y can take on. Seen from a dierent
perspective, in the second example Y is conditionally more tightly distributed around
the particular value of X .
p
In relation to this observation, XY for the rst example is equal to 1/2 which is
p
much smaller compared to XY = 100/101 from the second example.
Another important result, provided without proof at this point, is that
1
XY
1;
see part (1) of Theorem (4.5.7) in Casella & Berger (p. 172). We will provide an
alternative proof of this result when we deal with inequalities.
(Bilkent)
ECON509
86 / 125
exp
where u =
(Bilkent)
y Y
Y
and v =
1
p
2X Y
x X
X
1
1
2 (1
, while
ECON509
2 )
u2
2uv + v 2
< y < .
87 / 125
X
Y
2X
X Y
X Y
2Y
In addition, starting from the bivariate distribution, one can show that
Y jX = x
Y + Y
X jY = y
X + X
X
X
, 2Y (1
2 ) ,
Y
Y
, 2X (1
2 ) .
and, likewise,
Finally, again, starting from the bivariate distribution, it can be shown that
X
N (X , 2X )
and
N (Y , 2Y ).
Therefore, joint normality implies conditional and marginal normality. However, this
does not go in the opposite direction; marginal or conditional normality does not
necessarily imply joint normality.
(Bilkent)
ECON509
88 / 125
(Bilkent)
ECON509
89 / 125
2uv + v 2
Now,
u2
2uv + v 2
u2
2uv + v 2 + 2 v 2
(u
v )2 + 1
2 v 2
2 v 2 .
Then,
fX ,Y (x , y )
2X Y
=
Y
=
Y
(Bilkent)
1
p
2 (1
1
p
2 )
exp
1
2 (1
2 )
exp
2 (1
1
exp
2 )
2 (1
1
2 (1
2 )
2 )
exp
ECON509
v )2 + 1
(u
2 v 2
v )2
(u
2 v 2
1
2 (1
2 )
(u
v )2
X
1
p
exp
v2
2
90 / 125
=
Y
=
Y
and
p
p
1
2 (1
2 )
1
2 (1
2 )
1
2 (1
1
p
2 )
exp
exp
exp
exp
v2
2
2 )
2 (1
2 (1
2Y
2
) 2Y
2 (1
1
2 )2Y
=
X
1
p
v )2
(u
exp
Y
y
"
1
2
X
X
)
2
Y
(x
X
X
X )
Y jX = Y + Y (x X ) and 2Y jX = (1 2 )2Y .
X
The second, on the other hand, is the unconditional density fX (x ) which is normal
with mean X and variance 2X .
(Bilkent)
ECON509
91 / 125
=
=
=
=
=
=
Z Z
Z Z
Z Z
(y
Y )(x
X )fX ,Y (x , y )dxdy
(y
Y )(x
(y
X )dx
E [y Y jX ]
Y +
Y
(x
X
X )
Y fX (x )(x
X )dx
Y
(x X )2 fX (x )dx
X
Y 2
Y
Var (X ) =
= Y X .
X
X X
XY
= X Y = .
X Y
X Y
ECON509
92 / 125
Multivariate Distributions
So far, we have discussed multivariate random variables which consist of two random
variables only. Now, we extend these ideas to general multivariate random variables.
For example, we might have the random vector (X1 , X2 , X3 , X4 ) where X1 is
temperature, X2 is weight, X3 is height and X4 is blood pressure.
The ideas and concepts we have discussed so far extend to such random vectors
directly.
(Bilkent)
ECON509
93 / 125
Multivariate Distributions
Let X = (X1 , ..., Xn ). Then the sample space for X is a subset of Rn , the
n-dimensional Euclidian space.
If this sample space is countable, then X is a discrete random vector and its joint
pmf is given by
f (x) = f (x1 , ..., xn ) = P (X1 = x1 , X2 = x2 , ..., Xn = xn )
Rn ,
For any A
P (X 2 A ) =
f (x).
x2A
Similarly, for the continuous random vector, we have the joint pdf given by
f (x) = f (x1 , ..., xn ) which satises
P (X 2 A ) =
f (x)d x =
A
R
R
Note that
A is an n-fold integration, where the limits of integration are such
that the integral is calculated over all points x 2 A.
(Bilkent)
ECON509
94 / 125
Multivariate Distributions
Let g (x) = g (x1 , ..., xn ) be a real-valued function dened on the sample space of X.
Then, for the random variable g (X),
(discrete )
(continuous )
:
:
E [g (X)] =
E [g (X)] =
x 2 Rn
Z
g (x)f (x),
Z
g (x)f (x)d x.
The marginal pdf or pmf of (X1 , ..., Xk ) , the rst k coordinates of (X1 , ..., Xn ), is
given by
(discrete )
(continuous )
:
:
f (x1 , ..., xk ) =
f (x1 , ..., xk ) =
f (x1 , ..., xn ) ,
(Bilkent)
ECON509
95 / 125
Multivariate Distributions
f (x1 , ..., xn )
,
f (x1 , ..., xk )
(Bilkent)
ECON509
96 / 125
Multivariate Distributions
Example (4.6.1): Let n = 4 and
(
3/4 x12 + x22 + x32 + x42
f (x1 , x2 , x3 , x4 ) =
0
0 < xi < 1, i = 1, 2, 3, 4
.
otherwise
1
2
is given by
151
.
1024
Now, consider
f (x1 , x2 )
=
=
Z Z
Z 1Z 1
3
0
3
1
x 2 + x22 + ,
4 1
2
ECON509
97 / 125
Multivariate Distributions
This marginal pdf can be used to compute any probably or expectation involving X1
and X2 . For instance,
E [X 1 X 2 ]
=
=
Z Z
Z Z
1
5
3
x 2 + x22 +
dx1 dx2 =
.
4 1
2
16
(Bilkent)
ECON509
98 / 125
Multivariate Distributions
X3 >
Z 1/2 Z 1
0
Z 1/2 Z 1
0
=
=
(Bilkent)
3/4
3/4
!
3
1
1
2
, X4 < X1 = , X2 =
4
2
3
3
#
"
(1/3)2 + (2/3)2 + x32 + x42
5
9
9
+ x32 + x42
11
11
11
dx3 dx4
dx3 dx4
Z 1/2
5
111
9
+
+ x42 dx4
44
704
44
0
5
111
3
203
+
+
=
.
88
1408
352
1408
ECON509
99 / 125
Multivariate Distributions
Lets introduce some other useful extensions to the concepts we covered for the
univariate and bivariate random variables.
Denition (4.6.5): Let X1 , ..., Xn be random vectors with joint pdf or pmf
f (x1 , ..., xn ). Let fXi (xi ) denote the marginal pdf or pmf of Xi . Then, X1 , ..., Xn are
called mutually independent random vectors if, for every (x1 , ..., xn ),
n
fX (xi ).
i
i =1
If the Xi s are all one-dimensional, then X1 , ..., Xn are called mutually independent
random variables.
Clearly, if X1 , ..., Xn are mutually independent, then knowledge about the values of
some coordinates gives us no information about the values of the other coordinates.
Remember that mutual independence implies pairwise independence and beyond.
For example, it is possible to specify a probability distribution for (X1 , ..., Xn ) with
the property that each pair Xi , Xj is pairwise independent but X1 , ..., Xn are not
mutually independent.
Theorem (4.6.6) (Generalisation of Theorem 4.2.10): Let X1 , ..., Xn be mutually
independent random variables. Let g1 , ..., gn be real-valued functions such that
gi (xi ) is a function only of xi , i = 1, ..., n. Then
E [g1 (X1 ) ... gn (Xn )] = E [g1 (X1 )] E [g2 (X2 )] ... E [gn (Xn )].
(Bilkent)
ECON509
100 / 125
Multivariate Distributions
Theorem (4.6.7) (Generalisation of Theorem 4.2.12): Let X1 , ..., Xn be mutually
independent random variables with mgfs MX 1 (t ), ..., MX n (t ). Let
Z = X1 + ... + Xn .
Then, the mgf of Z is
MZ (t ) = MX 1 (t ) ... MX n (t ).
In particular, if X1 , ..., Xn all have the same distribution with mgf MX (t ), then
MZ (t ) = [MX (t )]n .
Example (4.6.8): Suppose X1 , ..., Xn are mutually independent random variables
and the distribution of Xi is gamma (i , ) . The mgf of a gamma (, ) distribution
is M (t ) = (1 t ) . Thus, if Z = X1 + ... + Xn , the mgf of Z is
MZ (t ) = MX 1 (t ) ... MX n (t ) = (1
t )
... (1
t )
= (1
t ) (1 +...+n ) .
(Bilkent)
ECON509
101 / 125
Multivariate Distributions
Corollary (4.6.9): Let X1 , ..., Xn be mutually independent random variables with
mgfs MX 1 (t ), ..., MX n (t ). Let a1 , ..., an and b1 , ..., bn be xed constants. Let
Z = (a1 X1 + b1 ) + ... + (an Xn + bn ) .
Then, the mgf of Z is
n
MZ (t ) = exp t
bi
i =1
exp t
bi
i =1
n
exp t
bi
i =1
(ai X i + bi )
i =1
!#
ECON509
102 / 125
Multivariate Distributions
Corollary (4.6.10): Let X1 , ..., Xn be mutually independent random variables with
Xi
N (i , 2i ). Let a1 , ..., an and b1 , ..., bn be xed constants. Then,
"
#
n
Z =
(ai X i + bi )
i =1
i =1
i =1
(ai i + bi ) , ai2 2i
(Bilkent)
ECON509
103 / 125
Multivariate Distributions
(Bilkent)
ECON509
104 / 125
Multivariate Distributions
We can also calculate the distribution of a transformation of a random vector, in the
same fashion as before.
Let (X1 , ..., Xn ) be a random vector with pdf fX (x1 , ..., xn ). Let
A = fx : fX (x) > 0g. Consider a new random vector (U1 , ..., Un ) , dened by
U1 = g1 (X1 , ..., Xn ), U2 = g2 (X1 , ..., Xn ), ..., Un = gn (X1 , ..., Xn ).
Suppose that A0 , A1 , ..., Ak from a partition of A with the following properties.
The set A0 , which may be empty, satises
P ((X1 , ..., Xn ) 2 A0 ) = 0.
The transformation
This i th inverse gives, for (u1 , ..., un ) 2 B , the unique (x1 , ..., xn ) 2 Ai such that
(u1 , ..., un ) = (g1 (x1 , ..., xn ), g2 (x1 , ..., xn ), ..., gn (x1 , ..., xn )) .
(Bilkent)
ECON509
105 / 125
Multivariate Distributions
Ji =
h 1i (u)
u 1
h 2i (u)
u 1
..
.
h ni (u)
u 1
h 1i (u)
u 2
h 2i (u)
u 2
..
.
h ni (u)
u 2
h 1i (u)
u n
h 2i (u)
u n
..
..
.
h ni (u)
u n
n matrix.
fU (u1 , ..., un ) =
i =1
(Bilkent)
ECON509
106 / 125
Inequalities
We will now cover some basic inequalities used in statistics and econometrics.
Most of the time, more complicated expressions can be written in terms of simpler
expressions. Inequalities on these simpler expressions can then be used to obtain an
inequality, or often a bound, on the original complicated term.
This part is based on Sections 3.6 and 4.7 in Casella & Berger.
(Bilkent)
ECON509
107 / 125
Inequalities
We start with one of the most famous probability inequalities.
Theorem (3.6.1) (Chebychevs Inequality): Let X be a random variable and let
g (x ) be a nonnegative function. Then, for any r > 0,
E [g (X )]
.
r
Proof: Using the denition of the expected value,
P (g (X )
E [g (X )]
r)
g (x )fX (x )dx
fx :g (x ) r g
g (x )fX (x )dx
fx :g (x ) r g
rP (g (X )
fX (x )dx
r ).
Hence,
E [g (X )]
rP (g (X )
P (g (X )
r)
r ),
implying
(Bilkent)
ECON509
E [g (X )]
.
r
This Version: 7 Nov 2014
108 / 125
Inequalities
This result comes in very handy when we want to turn a probability statement into
an expectation statement. For example, this would be useful if we already have
some moment existence assumptions and we want to prove a result involving a
probability statement.
Example (3.6.2): Let g (x ) = (x )2 /2 , where = E [X ] and 2 = Var (X ).
Let, for convenience, r = t 2 . Then,
"
#
#
"
1
1
(X )2
(X )2
2
P
= 2.
t
E
2
t2
2
t
This implies that
P [jX
1
,
t2
t ]
and, consequently,
P [jX
j < t ]
1
.
t2
2 ]
0.25
or
P [jX
j < 2 ]
0.75.
This says that, there is at least a 75% chance that a random variable will be within
2 of its mean (independent of the distribution of X ).
(Bilkent)
ECON509
109 / 125
Inequalities
This information is useful. However, many times, it might be possible to obtain an
even tighter bound in the following sense.
Example (3.6.3): Let Z
N (0, 1). Then,
P (Z
t)
2
1
p
e x /2 dx
2 t
Z
1
x x 2 /2
p
e
dx
2 t t
2
1 e t
p
t
2
/2
The second inequality follows from the fact that for x > t, x /t > 1. Now, since Z
has a symmetric distribution, P (jZ j t ) = 2P (Z
t ) . Hence,
r
2
t
/2
2e
P (jZ j t )
.
t
Set t = 2 and observe that
r
2e 2
P (jZ j 2)
= .054.
2
This is a much tighter bound compared to the one given by Chebychevs Inequality.
Generally Chebychevs Inequality provides a more conservative bound.
(Bilkent)
ECON509
110 / 125
Inequalities
A related inequality is Markovs Inequality.
Lemma (3.8.3): If P (Y
r)
E [Y ]
,
r
t)
E [jX jr ]
.
tr
(Bilkent)
ECON509
111 / 125
Inequalities
1 p 1 q
a + b
p
q
ab,
(2)
(Bilkent)
ECON509
112 / 125
Inequalities
Proof: Fix b, and consider the function
g (a ) =
1 p 1 q
a + b
p
q
ab.
b = 0 =) a = b p 1 .
A check of the second derivative will establish that this is indeed a minimum. The
value of the function at the minimum is
1 pp 1
1
b
+ bq
p
q
(b p 1 )b =
1 q 1 q
b + b
p
q
bp
p
1
1
1
+
p
q
bq
bp
p
1
= 0.
+ q 1 = 1.
Hence, the unique minimum is zero and (2) is established. Since the minimum is
1
unique, equality holds only if a = b p 1 , which is equivalent to ap = b q , again from
1
1
p + q = 1.
(Bilkent)
1) = q since p
ECON509
113 / 125
Inequalities
Theorem (4.7.2) (Hlders Inequality): Let X and Y be any two random
variables, and let p and q be such that
1
1
+ = 1.
p
q
Then,
1/p
1/q
E [jXY j] fE [jX jp ]g
.
fE [jY jq ]g
Proof: Dene
jY j
jX j
and b =
.
a=
1/q
p 1/p
fE [jX j ]g
fE [jY jq ]g
Applying Lemma (4.7.1), we get
1 jX jp
1 jY jq
p +
p E [jX j ]
q E [jY jq ]
Observe that,
fE [jX j ]g
jXY j
1/p
(3)
114 / 125
fE [jY jq ]g
1/q
1 jX jp
1 jY jq
1
1
= + = 1.
p +
p E [jX j ]
q E [jY jq ]
p
q
Using this result with (3) yields
E
E [jXY j]
(Bilkent)
fE [jX jp ]g
ECON509
1/p
fE [jY jq ]g
1/q
Inequalities
Now consider a common special case of Hlders Inequality.
Theorem (4.7.3) (Cauchy-Schwarz Inequality): For any two random variables X
and Y ,
n
o1/2 n
o1/2
E [jXY j]
E [jX j2 ]
E [jY j2 ]
.
Proof: Set p = q = 2.
Example (4.7.4): If X and Y have means X and Y and variances 2X and 2Y ,
respectively, we can apply the Cauchy-Schwarz Inequality to get
n h
io1/2 n h
io1/2
E [j(X X ) (Y Y )j]
E (X X )2
E (Y Y )2
.
(4)
Now, observe that for any two random variables W and Z ,
jWZ j
WZ
jWZ j ,
E [jWZ j]
E [WZ ]
fE [WZ ]g2
(Bilkent)
E [jWZ j],
fE [jWZ j]g2 .
ECON509
115 / 125
Inequalities
Therefore,
fE [(X
|
X ) (Y
{z
[Cov (X ,Y )]2
Y )]g2
}
fE [j(X
[Cov (X , Y )]2
X ) (Y
Y )j]g2 ,
2X 2Y .
XY
1,
where XY is the correlation between X and Y . One can show this without using
the Cauchy-Schwarz Inequality, as well. However, the corresponding calculations
would be a lot more tedious. See Proof of Theorem 4.5.7 in Casella & Berger (2001,
pp.172-173).
(Bilkent)
ECON509
116 / 125
Inequalities
fE [jX jp ]g
1/p
+ fE [jY jp ]g1/p .
jX + Y j
jX j + jY j .
Now,
E [jX + Y jp ]
h
i
E jX + Y j jX + Y jp 1
h
i
h
E jX j jX + Y jp 1 + E jY j jX + Y jp
(Bilkent)
ECON509
(5)
117 / 125
Inequalities
Next, we apply Hlders Inequality to each expectation on the right-hand side of (5)
and obtain,
n h
io1/q
1/p
E [jX + Y jp ]
E j X + Y j q (p 1 )
fE [jX jp ]g
n h
io1/q
+ fE [jY jp ]g1/p E jX + Y jq (p 1 )
,
+q
= 1. Notice that q (p
1) = p and 1
=p
1.
Then,
E [jX + Y jp ]
n h
io1/q
E j X + Y j q (p 1 )
=
=
=
and
1/p
fE [jX + Y jp ]g
(Bilkent)
fE [jX jp ]g
ECON509
E [jX + Y jp ]
1/q
fE [jX + Y jp ]g
1
fE [jX + Y jp ]g
p
fE [jX + Y jp ]g
1/p
q
1
+ fE [jY jp ]g1/p .
118 / 125
Inequalities
The previous results can be used for the case of numerical sums, as well.
For example, let a1 , ..., an and b1 , ..., bn be positive nonrandom values. Let X be a
random variable with range a1 , ..., an and P (X = ai ) = 1/n, i = 1, ..., n. Similarly,
let Y be a random variable taking on values b1 , ..., bn with probability
P (Y = bi ) = 1/n. Moreover, let p and q be such that p 1 + q 1 = 1. Suppose also
that P (X = ai , Y = bj ) is equal to 0 whenever i 6= j and is equal to 1/n whenever
i = j.
Then,
and
(Bilkent)
E [jXY j]
fE [jX jp ]g
1/p
fE [jY jq ]g
1/q
ECON509
1 n
jai bi j ,
n i
=1
"
#1/p
1 n
p
jai j
n i
=1
"
#1/q
1 n
q
.
jbi j
n i
=1
119 / 125
Inequalities
Hence, by Hlders Inequality,
"
#1/p "
#1/q
1 n
1 n
1 n
p
q
jai bi j
jai j
jbi j
n i
n i
n i
=1
=1
=1
"
#1/p "
#1/q
1
1
n
1 (p +q ) n
p
q
=
,
jai j
jbi j
n
i =1
i =1
and so,
"
jai bi j
i =1
jai j
i =1
#1/p "
jbi j
i =1
#1/q
A special case of this result is obtained by letting bi = 1 for all i and setting
p = q = 2. Then,
"
#1/2 "
#1/2 "
#1/2
n
jai j
i =1
jai j2
i =1
i =1
and so,
1
n
(Bilkent)
jai j
i =1
!2
ECON509
jai j2
n 1/2 ,
i =1
ai2 .
i =1
120 / 125
Inequalities
) y )
g (x ) + (1
) g (y ),
g (x ) is convex.
(Bilkent)
ECON509
g fE [X ]g.
121 / 125
Inequalities
Proof: Let `(x ) be a tangent line to g (x ) at the point g (E [X ]). Write
`(x ) = a + bx for some a and b.
By the convexity of g , we have g (x )
inequalities,
E [g (X )]
`(E [X ]) = g (E [X ]).
fE [X ]g2 .
1
X
1
.
E [X ]
(Bilkent)
ECON509
g (E [X ]).
122 / 125
Inequalities
E [g (X )]E [h (X )].
E [g (X )]E [h (X )].
(Bilkent)
ECON509
123 / 125
Inequalities
Let us prove the rst part of this Theorem for the easier case where
E [g (X )] = E [h (X )] = 0.
Now,
E [g (X )h (X )]
=
=
Z
Z
g (x )h (x )fX (x )dx
fx :h (x ) 0 g
g (x )h (x )fX (x )dx +
fx :h (x ) 0 g
g (x )h (x )fX (x )dx .
(Bilkent)
min
fx :h (x ) 0 g
g (x )
and
ECON509
g (x0 ) =
max
fx :h (x ) 0 g
x0 g.
g (x ).
124 / 125
Inequalities
Hence,
Z
and
fx :h (x ) 0 g
fx :h (x ) 0 g
g (x )h (x )fX (x )dx
g (x )h (x )fX (x )dx
g (x0 )
g (x0 )
fx :h (x ) 0 g
fx :h (x ) 0 g
h (x )fX (x )dx ,
h (x )fX (x )dx .
Thus,
E [g (X )h (X )]
fx :h (x ) 0 g
g (x0 )
g (x0 )
g (x )h (x )fX (x )dx +
fx :h (x ) 0 g
fx :h (x ) 0 g
g (x )h (x )fX (x )dx
fx :h (x ) 0 g
h (x )fX (x )dx
You can try to do proof for the second part, following the same method.
(Bilkent)
ECON509
125 / 125