Notes#8 PDF

Probability and Statistics
(ENM 503)
Michael A. Carchidi
December 7, 2014
Chapter 8 - Properties of Expectation
The following notes are based on the textbook entitled: A First Course in
Probability by Sheldon Ross (9th edition) and these notes can be viewed at
https://canvas.upenn.edu/
after you log in using your PennKey user name and Password.
1. Introduction
In this chapter, we develop and exploit additional properties of expected values. To begin, recall that the expected value of a discrete random variable X is
defined by
E(X) =
+
X
x=
xp(x)
or
E(X) =
xp(x),
(1a)
x|p(x)>0
where the second sum need only be over those values of X = x for which its pmf
p(x) is non-zero, and the expected value of a continuous random variable X is
defined by
Z +
Z
E(X) =
xf (x)dx
or
E(X) =
xf (x)dx
(1b)
x|f (x)>0
where the second integral need only be over only those values of X = x for which
its pdf f (x) is non-zero.
Since E(X) is a weighted average of the possible values of X, it follows that if

X must lie between a and b, so that
P (a X b) = 1
then
To prove this, we take

E(X) =
a E(X) b.
xp(x)
x|p(x)>0
and since a x b and p(x) > 0, we have

ap(x) xp(x) bp(x).
Adding over all values of x in which p(x) > 0, we then have
X
X
X
ap(x)
xp(x)
bp(x)
x|p(x)>0
or
a
x|p(x)>0
But since
x|p(x)>0
x|p(x)>0
p(x) E(X) b
p(x) = 1
p(x).
x|p(x)>0
we now have
x|p(x)>0
a E(X) b
and the proof is complete.

2. Expectation Values of Functions of Random Variables
For a non-negative continuous random variable Y , let us prove that
Z
E(Y ) =
P (Y > y)dy.
0
Toward this end, we note that

Z
Z
P (Y > y)dy =
0
fY (x)dxdy
(2)
where the region of integration in the xy plane is below the 45 line in the first
quadrant, as shown in the following figure.
The region
R = {(x, y)|0 y < , y x < }
is below the 45 line in the first quadrant
This region described by
R = {(x, y)|0 x < , 0 y x}
is also below the 45 line in the first quadrant of the xy plane, and so interchanging
the order of integration in the above integral, we have
Z Z x
Z Z x
Z
Z
P (Y > y)dy =
fY (x)dydx =
dy fY (x)dx =
xfY (x)dx = E(Y )
0
and so we see that

E(Y ) =
P (Y > y)dy
when Y 0. More generally, when Y 0 is not true, we still have

Z
Z Z
Z Z x
Z
P (Y > y)dy =
fY (x)dxdy =
fY (x)dydx =
0
and we also have

Z
P (Y < y)dy =
Z y
fY (x)dxdy,
xfY (x)dx
where the region of integration is below the 45 line in the second quadrant of
the xy plane, as shown in the following figure.
The region
R = {(x, y)|0 y < , < x y}
is below the 45 line in the second quadrant
This region described by
R = {(x, y)| < x 0, 0 y x}
is also below the 45 line in the second quadrant of the xy plane, and so interchanging the order of integration in the above integral, we see that
Z
Z 0 Z x
Z 0 Z x
Z 0
P (Y < y)dy =
fY (x)dydx =
dy fY (x)dx =
(x)fY (x)dx
0
and so
Z
0
P (Y > y)dy
which leads to
Z
P (Y < y)dy =
xfY (x)dx +
xfY (x)dx =
xfY (x)dx
(x)fY (x)dx
xfY (x)dx = E(Y ).
Thus we find that in general,

Z
Z
E(Y ) =
P (Y > y)dy
0
P (Y < y)dy
(3)
for any continuous random variable Y .

Computing E(g(X)) for any Continuous Function g(x)
Suppose next that g(x) is a non-negative function of x, and that X is a random
variable with pdf f (x), then
E(g(X)) =
P (g(X) > y)dy =
or
E(g(X)) =
x|g(x)>0
f (x)dydx =
x|g(x)>y
x|g(x)>0
g(x)
dy f (x)dx =
and so
E(g(X)) =
g(x)
f (x)dydx
g(x)f (x)dx
x|g(x)>0
g(x)f (x)dx
(4a)
when g(x) 0. In a similar way we find in general that

Z
Z
E(g(X)) =
P (g(X) > y)dy
P (g(X) < y)dy,
0
(4b)
and
E(g(X)) =
g(x)f (x)dx
(4c)
for any continuous function g(x).

Computing E(g(X, Y )) for any Continuous Function g(x, y)
Suppose next that X and Y have a joint probability mass function p(x, y),
then
XX
E(g(X, Y )) =
g(x, y)p(x, y)
(5a)
x
or if X and Y have a joint probability density function f (x, y), then

Z + Z +
E(g(X, Y )) =
g(x, y)f (x, y)dydx.
(5b)
To prove this, suppose that g(X, Y ) 0, then from Equation (2), we have
Z Z Z
Z
P (g(X, Y ) > t)dt =
f (x, y)dydxdt.
E(g(X, Y )) =
0
(x,y)|g(x,y)>t
By interchanging the order of integration, we have

Z Z Z g(x,y)
Z Z Z
E(g(X, Y )) =
f (x, y)dtdydx =
x
or
E(g(X, Y )) =
Z Z
x
g(x,y)
dt f (x, y)dydx
g(x, y)f (x, y)dydx
and Equation (5b) has been proven.
Example #1: Expected Distance Between Two Points On a Line

An accident occurs at a point X that is uniformly distributed on a road of
length L. At the time of the accident, an ambulance is at a location Y that is
also uniformly distributed on the road. Assuming that X and Y are independent,
find the expected distance between the ambulance and the point of the accident.
Since the joint density function is
f (x, y) =
1
1
1
= 2
L L L
for 0 < x < L and 0 < y < L, and zero otherwise, we have
Z LZ L
Z LZ L
1
1
E(|X Y |) =
|x y| 2 dydx = 2
|x y|dydx.
L
L 0 0
0
0
Now since
|x y| =
we have
Z
x y, for x y > 0
y x, for x y < 0
|x y|dy =
(x y)dy +
x y, for 0 < y < x
y x, for x < y < L
1
(y x)dy = x2 + L2 xL
2
and so
1
E(|X Y |) = 2
L
or E(|X Y |) = L/3.
1 2
2
x + L xL dx,
2
Computing E(X + Y ) For Random Variables X and Y

Suppose that E(X) and E(Y ) are both finite. Then, setting g(X, Y ) = X +
Y , we have
Z + Z +
Z + Z +
E(g(X, Y )) =
g(x, y)f (x, y)dydx =
(x + y)f (x, y)dydx
or
E(X + Y ) =
+ Z +
xf (x, y)dydx +
yf (x, y)dydx
or
E(X + Y ) = E(X) + E(Y )
(6)
for any two random variables X and Y and constants and . More generally,
we may use mathematical induction to show that
E(1 X1 + 2 X2 + + n Xn ) = 1 E(X1 ) + 2 E(X2 ) + + n E(Xn )
(7)
for any n random variables X1 , X2 , X3 , . . . , Xn , and constants 1 , 2 , 3 , . . . , n .

Example #2: If X Y Then E(X) E(Y )
Suppose that X and Y are random variables with X Y for all possible
outcomes of X and Y . This means for any outcome of any experiment involving
X and Y , we will always find that X Y . This then implies that X Y 0
and so using Equation (6) with = 1 and = 1, we have
Z Z
E(X Y ) =
(x y)f (x, y)dydx 0
x
and so E(X) E(Y ) 0, or E(X) E(Y ).

7
The Sample Mean

Let X1 , X2 , X3 , . . . , Xn be independent and identically distributed random
variables having distribution function f (x) and expected value . Such a sequence of random variables is said to constitute a sample from the distribution
f (x). The random variable
n
1X
Xi
(8)
X=
n i=1
is called the sample mean statistic and note that

!
n
n
n
X
1X
1X
1
Xi =
E(Xi ) =
= .
E(X) = E
n i=1
n i=1
n i=1
This says that the expected value of the sample mean statistic is , the mean of
the distribution. When the distribution mean is unknown, the sample mean is
often used in statistics to estimate it.
Booles Inequality
Let A1 , A2 , ..., An denote n events and define the indicator variables X1 , X2 ,
..., Xn by
Ai occurs
1, if
Xi =
0, if Ai does not occur
for i = 1, 2, 3, ..., n, and let
X=
n
X
Xi
i=1
so that X denotes the number of events Ai that occur. Finally, let
1, if X 1
Y =
0, if X = 0
so that Y is equal to 1 if at least one of the events Ai occurs and Y is zero if none
of the events Ai occur. Now it should be immediately obvious that X Y , so
8
that E(X) E(Y ). But since

E(X) =
n
X
i=1
n
n
X
X
E(Xi ) =
((1)P (Ai ) + (0)P (Ai )) =
P (Ai )
i=1
i=1
and
E(Y ) = 1P (X 1) + 0P (X = 0) = P (X 1)
and
P (X 1) = P (at least of the Ai occur) = P (A1 A2 An )
we see that
P (A1 A2 An )
n
X
P (Ai )
(9)
i=1
which is a result that is not obvious from the inclusion-exclusion principle.

Example #3: The Mean of The Binomial Random Variable
Recall that the binomial random variable X with parameters p and n has pmf

n x
P (X = x) =
p (1 p)nx
x
for x = 0, 1, 2, 3, ..., n and recall that such a random variable represents the number
of successes in n independent and identical Bernoulli trials (Xi ), each with a
success probability of p, so that
if xi = 1
p,
p(xi ) =
1 p, if xi = 0
for i = 1, 2, 3, ..., n, so that E(Xi ) = (1)(p) + (0)(1 p) = p. Since
X = X1 + X2 + X3 + + Xn
we see that
E(X) = E(X1 ) + E(X2 ) + E(X3 ) + + E(Xn ) = p + p + p + + p
9
or E(X) = np.
Example #4: The Mean of The Pascal Random Variable
Recall that the Pascal random variable X with parameters p and r has pmf
x1 r
P (X = x) =
p (1 p)xr
r1
for x = r, r + 1, r + 2, ..., and recall that such a random variable represents the
number of independent and identical Bernoulli trials, each with a success probability of p, needed to achieve the rth success. If Xi is geometric with parameter
p, then this random variable represents the number of independent and identical
Bernoulli trials, each with a success probability of p, needed to achieve the first
success so that
p(xi ) = p(1 p)xi 1
for xi = 1, 2, 3, ..., and i = 1, 2, 3, ..., r, and E(Xi ) = 1/p. If Xi (for i = 1, 2, 3, ..., r)
are all independent and identical geometric random variables, each with parameter
p, then
X = X1 + X2 + X3 + + Xr
is Pascal with parameters p and r and then we see that
E(X) = E(X1 ) + E(X2 ) + E(X3 ) + + E(Xr ) =
1 1 1
1
+ + + +
p p p
p
or E(X) = r/p.
Example #5: The Mean of The Hypergeometric Random Variable
Recall that the hypergeometric random variable X with parameters g, b and
n has pmf
g b
P (X = x) =
nx
g+b
for x = xmin , xmin + 1, xmin + 2, . . . , xmax , and zero otherwise, where

xmin = max(n b, 0)
and
10
xmax = min(g, n).
If n balls are randomly selected from an urn containing N balls of which g are
white, find the expected number of white balls selected. To solve this, let X
denote the number of white balls selected, and let us represent X as
X = X1 + X2 + X3 + + Xn
where
Xi =
1, if the ith ball selected is white
0,
otherwise
for i = 1, 2, 3, ..., n. Then, since the ith ball selected is equally likely to be any of
the N balls, it follows that
g
E(Xi ) = .
N
Hence
E(X) = E(X1 ) + E(X2 ) + E(X3 ) + + E(Xn ) =
g
g
g
g
+
+
++
N N N
N
or E(X) = ng/N.
Example #6: A Coupon-Collecting Problem
Suppose that there are N types of coupons, and each time one obtains a
coupon, it is equally likely to be any one of the N types. Determine the expected
number of coupons one needs to amass before obtaining a complete set of at
least one of each type of coupon. To solve this, we let X denote the number
of coupons collected before a complete set is attained and let us define Xi (for
i = 0, 1, 2, 3, ..., N 1) to be the number of additional coupons that need to be
obtained after i distinct types have been collected in order to obtain another
distinct type, and we note that
X = X0 + X1 + X2 + + XN1 .
When i distinct types have already been collected, a new coupon obtained will be
a distinct type with probability 1 i/N. Therefore
k1
i
i
P (Xi = k) = 1
N
N
11
for k 1, showing that Xi is geometric with p = 1 i/N. Then

E(Xi ) =
1
N
=
p
N i
and so
E(X) = E(X0 ) + E(X1 ) + E(X2 ) + + E(XN1 )
leads to
E(X) =
which reduces to
N
N
N
N
+
+
+ +
N N 1 N 2
N (N 1)
E(X) = N
N
X
1
i=1
= NHN
where HN is the Nth harmonic number.

Example #7: An Interesting Identity
Consider any non-negative, integer-valued random variable X. If, for each

i 1, we define
1, if X i
Xi =
,
0, if X < i
then
Xi =
i=1
and then
E(X) = E
X
X
Xi +
i=1
X
i=1
Xi
i=X+1
X
i=1
X
X
Xi =
(1) +
(0) = X
i=1
X
E(Xi ) =
((1)P (Xi i) + (0)P (Xi < i))
which reduces to
E(X) =
i=1
X
i=1
which is a useful identity.
i=X+1
12
P (Xi i)
(10)
3. Covariance, Variance of Sums, and Correlations

Suppose that X and Y are independent random variables so that f (x, y) =
fX (x)fY (y), then for any functions h and g,
E(g(X)h(Y )) = E(g(X))E(h(Y )).
(11)
To prove this, we simply write

Z + Z +
g(x)h(y)f (x, y)dydx
E(g(X)h(Y )) =
Z + Z +
g(x)h(y)fX (x)fY (y)dydx
=
Z +
Z +
g(x)fX (x)dx
h(y)fY (y)dy
=
= E(g(X))E(h(Y )).
The proof for the discrete case is similar.

The Covariance Between Two Random Variables X and Y
The covariance between two random variables X and Y is defined by
Cov(X, Y ) = E((X E(X))(Y E(Y )).
(12a)
Using the properties of expectation, we may also write this as

Cov(X, Y ) = E(XY E(X)Y E(Y )X + E(X)E(Y ))
= E(XY ) E(X)E(Y ) E(Y )E(X) + E(X)E(Y )
which reduces to
Cov(X, Y ) = E(XY ) E(X)E(Y ).
(12b)
Cov(X, X) = E(X 2 ) (E(X))2 = V (X)
(13a)
Note that
and
Cov(X, Y ) = E(XY ) E(X)E(Y ) = E(Y X) E(Y )E(X) = Cov(Y, X) (13b)
13
and
Cov(X, Y ) = E(XY ) E(X)E(Y ) = E(XY ) E(X)E(Y )
or
Cov(X, Y ) = Cov(X, Y ).
(13c)
We also note that

n
!
n
!
n
! m !
m
m
X
X
X X
X
X
Cov
= E
Xi ,
Yj
Xi
Yj E
Xi E
Yj
i=1
j=1
i=1
= E
j=1
n m
XX
Xi Yj
i=1 j=1
=
=
m
n X
X
i=1 j=1
m
n X
X
i=1 j=1
and hence
Cov
n
X
Xi ,
i=1
m
X
E(Xi Yj )
i=1
n
X
i=1
m
n X
X
j=1
! m
!
X
E(Xi )
E(Yj )
j=1
E(Xi )E(Yj )
i=1 j=1
(E(Xi Yj ) E(Xi )E(Yj ))
Yj
j=1
m
n X
X
Cov(Xi , Yj ).
(13d)
i=1 j=1
Covariance and Independence

If X and Y are independent, then we note that
Cov(X, Y ) = E(XY ) E(X)E(Y ) = E(X)E(Y ) E(X)E(Y ) = 0
but it should be noted that Cov(X, Y ) = 0 does not imply that X and Y are
independent. For a counter-example, we note that if X is defined so that
P (X = 1) = P (X = 0) = P (X = 1) =
and
Y =
0, if X 6= 0
1, if X = 0
14
1
3
then XY = 0 all the time so that E(XY ) = 0. Also E(X) = 0 and so

Cov(X, Y ) = E(XY ) E(X)E(Y ) = 0
but clearly X and Y are not independent.
The Variance of a Sum
Using Equations (13a) and (13d), we note that
!
n
!
n
n
X
X
X
V
Xi = Cov
Xi ,
Xj
i=1
or simply
V
i=1
n
X
Xi
i=1
n
n X
X
j=1
Cov(Xi , Xj ).
(14a)
i=1 j=1
We may also write this as

n
!
n
n
n X
n X
X
X
X
V
=
Xi
Cov(Xi , Xj ) +
Cov(Xi , Xj )
i=1
i=1 j=i
n
X
Cov(Xi , Xi ) +
i=1
or
V
n
X
i=1
Xi
n
X
i=1 j6=i
n
n
XX
Cov(Xi , Xj )
i=1 j6=i
V (Xi ) +
i=1
n
n X
X
Cov(Xi , Xj ).
Since Cov(Xi , Xj ) = Cov(Xj , Xi ), we may also write Equation (14b) as

!
n
n
n
n X
X
X
X
V
Xi =
V (Xi ) + 2
Cov(Xi , Xj ).
i=1
(14b)
i=1 j6=i
i=1
(14c)
i=1 j>i
Note also that if X1 , X2 , X3 , ..., Xn are pairwise independent, then Cov(Xi , Xj ) =

0 and
V (X1 + X2 + + Xn ) = V (X1 ) + V (X2 ) + + V (Xn ).
(14d)
15
Example #8: Samples

Let X1 , X2 , X3 , ..., Xn be independent and identically distributed random
variables having expected value and variance 2 , and let
X
= 1
X
Xi
n i=1
n
(15a)
for i = 1, 2, 3, ..., n, are called devibe the sample mean. The quantities Xi X
ations, as they equal the dierences between the individual data and the sample
mean. The random variable
1 X
2
S =
(Xi X)
n 1 i=1
n
(15b)
and then compute E(S 2 ).

is called the sample variance. Let us first compute V (X)
The first of these give
n
!
!
n
X
X
1
1
=V
V (X)
Xi = 2 V
Xi
n i=1
n
i=1
since V (X) = 2 V (X). This leads to
n
n
1 X 2
1
1 X
V (Xi ) = 2
= 2 n 2
V (X) = 2
n i=1
n i=1
n
or
1 2
.
n
To do the second part of the problem, we first note that
=
V (X)
(n 1)S
n
n
X
X
2
2
=
(Xi X) =
(Xi + X)
i=1
n
X
i=1
i=1
+ ( X)
2)
((Xi )2 + 2(Xi )( X)
n
n
X
X
2
)2
=
(Xi ) 2(X )
(Xi ) + n(X
i=1
i=1
16
(15c)
n
X
)(nX
n) + n(X
)2
=
(Xi )2 2(X
i=1
n
X
)2 + n(X
)2
=
(Xi )2 2n(X
i=1
or
n
X
)2 .
(Xi )2 n(X
(n 1)S =
2
(15d)
i=1
Then
!
n
X
)2
(Xi )2 n(X
E((n 1)S 2 ) = E
i=1
or
n
X
(n 1)E(S 2 ) =
i=1
n
X
i=1
n
X
i=1
)2
E(Xi )2 nE(X
E(Xi )2 nV (X)
V (Xi ) nV (X)
= . This says that

since E(X)
2
(n 1)E(S ) =
n
X
i=1
1
2 n 2 = (n 1) 2
n
which says that

E(S 2 ) = 2 .
(15e)
Note that for the population variance,

1X
2
(Xi X)
n i=1
(15f)
1
E(T ) = 1
2
n
(15g)
T2 =
we find that
17
and this is less than 2 .

The Correlation Coecient Between Two Random Variables X and Y
Given two random variables X and Y , we define
Cov(X, Y )
(X, Y ) = p
V (X)V (Y )
(16a)
as the correlation coecient between X and Y . It should be noted that if X and

Y are independent, then (X, Y ) = 0 and we say that the random variables X
and Y are uncorrelated, but is should be noted that (X, Y ) = 0 does not imply
that X and Y are independent.
We also note that
Cov(X, X)
V (X)
(X, X) = p
=p
=1
V (X)V (X)
V (X)V (X)
(16b)
and when Cov(X, Y ) ' 1, we says that the random variables are positively correlated. Also
Cov(Y, X)
Cov(X, Y )
=p
= (Y, X)
(X, Y ) = p
V (X)V (Y )
V (Y )V (X)
and
(16c)
Cov(X, Y )
Cov(X, Y )
Cov(X, Y )
p
=p
=
(X, Y ) = p
V (X)V (Y )
2 V (X)V (Y )
|| V (X)V (Y )
showing that
(X, Y ) = (X, Y ) =
which says that
(X, Y )
||
(X, X) = (X, X) = 1
(16d)
(16e)
and when Cov(X, Y ) ' 1, we say that the random variables are negatively
correlated. We also note that, in general,
1 (X, Y ) 1
18
(16f)
for all random variables X and Y . To prove Equation (16f), we note that
X
Y
X Y
Y
X
=V
+V
2Cov
,
0 V
X Y
X
Y
X Y
1
2
1
V (X) + 2 V (Y )
Cov(X, Y )
=
2
X
Y
X Y
1
1 2
+ 2 Y2 2(X, Y ) = 1 + 1 2(X, Y )
=
2 X
X
Y
and so we see that
0 1 (X, Y )
which says that 0 1 + (X, Y ), or 1 (X, Y ), and 0 1 (X, Y ), or
(X, Y ) 1, and so collectively, we have Equation (16f).
Example #9: Indicator Random Variables
Let IA and IB be indicator variables for the events A and B, which says that
if A occurs
1,
IA =
0, if A does not occur
and
IB =
1,
if B occurs
0, if B does not occur
Then E(IA ) = P (A) and E(IB ) = P (B) and E(IA IB ) = P (A B). This says
that
Cov(IA , IB ) = E(IA IB ) E(IA )E(IB ) = P (A B) P (A)P (B)
or
Cov(IA , IB ) = (P (A|B) P (A))P (B) = P (A)(P (B|A) P (B)).
Thus, we obtain the quite intuitive result that the indicator variables for A and
B are either positively correlated, uncorrelated, or negative correlated, depending
on whether P (A|B) is, respectively, greater than, equal to, or less than P (A).
19
Example #10: Sample Mean and Deviation From The Sample Mean
and all deviations from the sample
Let us now show that the sample mean X
mean, Xi X are uncorrelated. Toward this end, we have

X)
=
Cov(Xi X,
=
=
=
=
X)
E(Xi X)E(
E((Xi X)
X)
X
2 ) E(Xi X)E(
X)
E(Xi X
E(X
2 ) (E(Xi ) E(X))E(
X)
E(Xi X)
2
E(X
) E(Xi )E(X)
+ (E(X))
2
E(Xi X)
(E(X
2 ) (E(X))
2)
E(Xi )E(X)
E(Xi X)
or
X)
= Cov(Xi , X)
V (X).
Cov(Xi X,
But this is
or
X
X)
= Cov Xi , 1
Cov(Xi X,
Xj
n j=1
n
1
2
n
1X
1
Cov(Xi , Xj ) 2
n j=1
n
n
X)
=
Cov(Xi X,
where 2 = V (Xi ) for each i = 1, 2, 3, ..., n. But since the Xk s are independent
and identical, we have
0, when i 6= j
Cov(Xi , Xj ) =
2
, when i = j
and so
X)
= 1 2 1 2 = 0.
Cov(Xi X,
n
n
Note that although X and Xi X are uncorrelated, they are not (in general) independent. However, in the special case where the Xi are normal random variables,
independent of a single deviation Xi X,
but it
it turns out that not only is X
for i = 1, 2, 3, ..., n.
is independent of the entire sequence of deviations Xi X
20
4. Conditional Expectation
If X and Y are jointly discrete random variables, then the conditional probability mass function of X, given that Y = y (defined for all y such that P (Y = y) > 0)
by
P ((X = x) (Y = y))
pX|Y (x|y) = P (X = x|Y = y) =
P (Y = y)
or
p(x, y)
pX|Y (x|y) =
.
(17a)
pY (y)
It is therefore natural to define, in this case, the conditional expectation of X
given that Y = y, for all values of y such that pY (y) > 0, by
X
X
xP (X = x|Y = y) =
xpX|Y (x|y).
(17b)
E(X|Y = y) =
x
Similarly, if X and Y are jointly continuous random variables with a joint probability density function f (x, y), then the conditional probability density of X, given
Y = y, is defined for all y such that fY (y) > 0, by
fX|Y (x|y) =
f (x, y)
.
fY (y)
(17c)
and the conditional expectation of X given that Y = y, for all values of y such
that fY (y) > 0, by
Z +
E(X|Y = y) =
xfX|Y (x|y)dx.
(17d)
Example #11
Suppose that the joint density of X and Y is given by
1
f (x, y) = eyx/y
y
for 0 < x and 0 < y. Let us compute E(X|Y = y). Towards this end, we first
have
Z +
Z +
1 yx/y
f (x, y)
where fY (y) =
e
f (x, y)dx =
dx = ey
fX|Y (x, y) =
fY (y)
y
0
21
so that
fX|Y (x, y) =
1 yx/y
e
y
ey
and then
E(X|Y = y) =
or E(X|Y = y) = y.
1
= ex/y
y
1
x ex/y dx,
y
It should be noted that

E(g(X)|Y = y) =
g(x)pX|Y (x|y)
(18a)
g(x)fX|Y (x|y)dx
(18b)
for discrete X and Y and

E(g(X)|Y = y) =
for continuous X and Y , and it should be noted that E(X|Y ) is a random variable
whose value at Y = y is E(X|Y = y). Using this idea and Equations (18a,b), we
may say that
X
E(X) =
E(X|Y = y)pY (y)
(18c)
y
for the discrete case and
E(X) =
E(X|Y = y)fY (y)dy
(18d)
for the continuous case which says that

E(X) = E(E(X|Y )).
(18e)
Example #12: Getting Out of the Mine

A miner is trapped in a mine containing 3 doors. The first door leads to a
tunnel that will take him to safety after 3 hours of travel. The second door leads
to a tunnel that will return him back to the tunnel after 5 hours of travel and the
third door leads to a tunnel that will return him back to the tunnel after 7 hours
22
of travel. If we assume that the miner is at all times equally likely to choose any
of the three doors, determine the expected length of time until the miner reaches
safety.
To solve this, we let X denote the amount of time (in hours) until the miner
reaches safety and let Y denote the door he initially chooses. Then
E(X) = E(X|Y = 1)P (Y = 1) + E(X|Y = 2)P (Y = 2) + E(X|Y = 3)P (Y = 3)
1
1
1
=
E(X|Y = 1) + E(X|Y = 2) + E(X|Y = 3).
3
3
3
But
E(X|Y = 1) = 3
,
E(X|Y = 2) = 5 + E(X)
and E(X|Y = 3) = 7 + E(X) and so
1
1
1
E(X) = (3) + (5 + E(X)) + (7 + E(X))
3
3
3
and solving for E(X), we find that E(X) = 15 hours.
Example #13: Spending Money in a Store
Suppose that the number of people entering a small store on a given day is a
random variable with a mean of 50. Suppose further that the amount of money
spent by these customers are independent random variables having a common
mean of E(X) = $8. Finally, suppose also that the amount of money spent by
a customer is also independent of the total number of customers who enter the
store. Determine the expected amount of money spent in the store on a given
day.
To solve this, we let N denote the number of customers who enter the store
and we let Xi be the amount of money spent by customer i, then the total amount
of money spent on a given day can be expressed as
S=
N
X
Xi
and using
E(S) = E(E(S|N)).
i=1
we have
E(S|N) = E
N
X
i=1
Xi
23
N
X
i=1
E(Xi ) = NE(X),
and then
E(S) = E(E(S|N)) = E(NE(X)) = E(N)E(X).
For E(N) = 50 and E(X) = $8, we find that E(S) = $400.
Example #14: The Game of Craps
The game of craps is begun by rolling an ordinary pair of dice. If the sum of
the dice is 2, 3, or 12, the player loses. If it is 7 or 11, the player wins. If the sum
of the dice is any other number i, the player continues to roll the dice until the
sum is either 7 or i. If it is 7, the player loses and if it is i, the player wins. Let
the random variable R denote the number of rolls of the dice in a typical game of
craps. Let us determine the expected number of rolls in a typically game, which
we denote by E(R).
Towards this end, we note that if Pi is the probability that the sum of the dice
is i, then
i1
Pi = P14i =
36
for i = 2, 3, 4, 5, 6, 7 and to compute E(R), let us condition on S, the sum on the
initial roll of the dice, giving
E(R) =
12
X
E(R|S = i)Pi .
i=2
But
E(R|S = i) =
when
1,
i = 2, 3, 7, 11, 12
.
1 + 1/(Pi + P7 ), when i = 4, 5, 6, 8, 9, 10
The second of these follows from a geometric distribution since if the sum is a
value of i that does not immediately end the game, the dice will continue to be
rolled until the sum is either i or 7, and the number of rolls until this occurs is
geometric with parameter p = Pi + P7 . Therefore, we see that
E(R) =
+
+
+
E(R|S
E(R|S
E(R|S
E(R|S
= 2)P2 + E(R|S = 3)P3 + E(R|S = 4)P4

= 5)P5 + E(R|S = 6)P6 + E(R|S = 7)P7
= 8)P8 + E(R|S = 9)P9 + E(R|S = 10)P10
= 11)P11 + E(R|S = 12)P12
24
or
1
E(R) = P2 + P3 + 1 +
P4
P4 + P7
1
1
P5 + 1 +
P6 + P7
+ 1+
P5 + P7
P6 + P7
1
1
P8 + 1 +
P9
+ 1+
P8 + P7
P9 + P7
1
P10 + P11 + P12
+ 1+
P10 + P7
or
1
2
1
3
E(R) =
+
+ 1+
36 36
3/36 + 6/36 36
1
6
1
4
5
+ 1+
+
+ 1+
4/36 + 6/36 36
5/36 + 6/36 36 36
5
1
4
1
+ 1+
+ 1+
5/36 + 6/36 36
4/36 + 6/36 36
3
2
1
1
+
+
+ 1+
3/36 + 6/36 36 36 36
or
557
1 839 243 1
+
+
+ =
' 3.376 rolls.
2 660 220 2
165
Next, let us determine the probability that a player wins in the game of craps.
Toward this end, we write
E(R) =
Pwins =
12
X
P (wins|S = i)Pi
i=2
or
Pwins
=
+
+
+
P (wins|S
P (wins|S
P (wins|S
P (wins|S
= 2)P2 + P (wins|S = 3)P3 + P (wins|S = 4)P4

= 11)P11 + P (wins|S = 12)P12
25
which becomes
Pwins
=
+
+
+
P (wins|S = 4)P4
P (wins|S = 5)P5 + P (wins|S = 6)P6 + P7
P (wins|S = 8)P8 + P (wins|S = 9)P9 + P (wins|S = 10)P10
P11 .
Recall from Chapter #3, the result that if there are two events E and F with
the probability that E occurs being P (E) and with the probability that F occurs
being P (F ), then the probability that E occurs before F is
P (E)
.
P (E) + P (F )
Therefore,
P (wins|S = 4) = P (a 4 occurs before a 7) =
P4
P4 + P7
with similar results for P (wins|S = 5), P (wins|S = 6), P (wins|S = 8), P (wins|S =
9), and P (wins|S = 10). Thus we find that
P4
P5
P6
Pwins =
P4 +
P5 +
P6 + P7
P4 + P7
P5 + P7
P6 + P7
P8
P9
P10
P8 +
P9 +
P10 + P11
+
P8 + P7
P9 + P7
P10 + P7
and this leads to
Pwins
4
5
6
3
4
5
3
+
+
+
=
3 + 6 36
4 + 6 36
5 + 6 36 36
5
4
4
3
3
2
5
+
+
+
+
5 + 6 36
4 + 6 36
3 + 6 36 36
which reduces to
Pwins =
or Pwins = 49.3%, which is almost even.
26
244
495
5. Conditional Variance
Just as we had defined the conditional expectation of X, given that Y = y,
we can also define the conditional variance of X, given that Y = y as
V (X|Y ) = E((X E(X|Y ))2 |Y ).
(19a)
Using the usual steps, this can also we expressed as

V (X|Y ) = E(X 2 |Y ) (E(X|Y ))2 .
(19b)
Note that
E(V (X|Y )) = E(E(X 2 |Y )) E((E(X|Y ))2 ) = E(X 2 ) E((E(X|Y ))2 )
and from E(E(X|Y )) = E(X), we have
V (E(X|Y )) = E((E(X|Y ))2 ) (E(E(X|Y )))2 = E((E(X|Y ))2 ) (E(X))2 .
Adding these, we arrive at
V (X) = E(V (X|Y )) + V (E(X|Y )).
(20)
Example #15: Spending Money in a Store - Revisited

Suppose that the number of people entering a small store on a given day is
a random variable with a mean of 50 and a standard deviation of 2. Suppose
further that the amount of money spent by these customers are independent
random variables having a common mean of E(X) = $8 and a common standard
deviation of (X) = $2. Finally, suppose also that the amount of money spent
by a customer is also independent of the total number of customers who enter the
store. We had determine the expected amount of money spent in the store on a
given day in Example #13 and found that
E(S) = E(N)E(X) = $400.
We now want to determine the variance in the total number of customers who
enter the store. To solve this, we let N denote the number of customers who enter
27
the store and we let Xi be the amount of money spent by customer i, then the
total amount of money spent on a given day can be expressed as
S=
N
X
Xi
and
E(S) = E(E(S|N)).
i=1
But
E(S|N) = E
N
X
Xi
i=1
and
N
X
E(Xi ) = NE(X)
i=1
V (E(S|N)) = V (NE(X)) = V (N)(E(X))2 .

Next
V (S|N) = V
N
X
i=1
and then
Xi
N
X
V (Xi ) = NV (X),
i=1
E(V (S|Y )) = E(NV (X)) = E(N)V (X).

Using
V (S) = E(V (S|N)) + V (E(S|N))
we then find that
V (S) = E(N)V (X) + (E(X))2 V (N)
For E(N) = 50, E(X) = $8 and V (X) = ($2)2 and we find that
V (S) = (50)(2)2 + (8)2 (4) = 456
so that (S) =
456 = $21. 35.
6. Moment Generating Functions

The moment generating function for a random variable X is defined as
M(t) = E(etX )
(21)
and is defined for all real values of t and for a given distribution for X, M(t) is
unique and conversely, for a given M(t), the distribution for X is unique. Using
the fact that
X
(tX)k
tX
e =
k!
k=0
28
we see that
M(t) = E
!
X (tX)k
k=0
k!
k=0
(tX)k
k!
X
E(X k )
k=0
k!
tk
which then says that

M (k) (0) = E(X k )
(22)
for k = 0, 1, 2, .... Thus we have M(0) = 1 and M 0 (0) = E(X) and M 00 (0) =
E(X 2 ) so that
V (X) = M 00 (0) (M 0 (0))2 .
Example #16: The Bernoulli Distribution
Here we have
p(x) =
and so
1 p, for X = 0
p,
for X = 1
M(t) = E(etX ) = e0 (1 p) + et p
or
M(t) = 1 + p(et 1).
(23a)
Note that M(0) = 1 and M 0 (0) = p, as expected.

Example #17: The Binomial Distribution
Here we have
for x = 0, 1, 2, ..., n, and so

n x
p(x) =
p (1 p)nx
x
n
n
X
X
n tx x
n
nx
M(t) = E(e ) =
e p (1 p)
(pet )x (1 p)nx
=
x
x
x=0
x=0
tX
which, via the binomial theorem becomes

M(t) = (pet + 1 p)n .
29
(23b)
Note that M(0) = 1 and M 0 (0) = np, as expected.

Example #18: The Geometric Distribution
Here we have
p(x) = p(1 p)x1
for x = 1, 2, 3, ...,, and so
tX
M(t) = E(e ) =
X
x=1
tx
x1
e p(1 p)
p X
=
((1 p)et )x
1 p x=1
which reduces to
M(t) =
p
(1 p)et
pet
=
1 p 1 (1 p)et
1 (1 p)et
or
M(t) =
et
p
1+p
(23c)
Note that M(0) = 1 and M 0 (0) = 1/p, as expected.

Example #19: The Pascal Distribution
Here we have
x1 r
p(x) =
p (1 p)xr
r1
for x = r, r + 1, r + 2, ...,, and so
X
X
tX
tx x 1
r
xr
tx x 1
M(t) = E(e ) =
p (1 p)
pr (1 p)xr
e
=
e
r
1
r
1
x=r
x=r
which reduces to
or
t
pet
M(t) =
1 (1 p)et
r
p
M(t) =
et 1 + p
Note that M(0) = 1 and M 0 (0) = r/p, as expected.

30
(23d)
Example #20: The Poisson Distribution

Here we have
x e
p(x) =
x!
for x = 0, 1, 2, 3, ..., and so

tX
M(t) = E(e ) =
X
etx x e
x!
x=0
which reduces to
X
(et )x e
= ee e
x!
x=0
M(t) = e(e 1) .
(23d)
Note that M(0) = 1 and M 0 (0) = , as expected.

Example #21: The Uniform Distribution
Here we have
f (x) =
for a < x < b, and so
tX
M(t) = E(e ) =
1
ba
1
1
e
dx =
ba
ba
tx
which reduces to
M(t) =
etx dx
ebt eat
t(b a)
(23e)
for t < . Note that M(0) = 1 and M 0 (0) = , as expected.

Example #22: The Exponential Distribution
Here we have
f (x) = ex
for x > 0, and so
tX
M(t) = E(e ) =
tx
e e
dx =
31
e(t)x dx
which reduces to
t
0
for t < . Note that M(0) = 1 and M (0) = , as expected.
(23f)
M(t) =
Example #23: The Standard Normal Distribution

Here we have
1 2
1
f (x) = e 2 x
2
for < x < +, and so

Z
tX
M(t) = E(e ) =
1 2
1
1
e e 2 x dx =
2
2
tx
which reduces to
1 2
etx e 2 x dx
1 2
M(t) = e 2 t
(23g)

Example #24: The General Normal Distribution
Here we have
1 2
1
f (z) = e 2 z
2
for < z < +, and X = + Z, and so
M(t) = E(etX ) = E(et(+Z) ) = E(et+tZ ) = E(et etZ ) = et E(etZ )

which reduces to
M(t) = et+ 2
2 t2

The Moment Generating Function of X + Y
Suppose that X and Y are independent random variables, then
MX+Y (t) = E(et(X+Y ) ) = E(etX etY ) = E(etX )E(etY )
32
(23h)
which says that

(24a)
MX+Y (t) = MX (t)MY (t).
More generally, if X1 , X2 , X3 , ..., Xn are independent random variables and

X = X1 + X2 + X3 + + Xn
then
MX (t) = MX1 (t)MX2 (t)MX3 (t) MXn (t).
(24b)
Example #25: The Sum of Two Independent Binomial Distributions

Suppose that X is binomial with parameters p and n and that Y is binomial
with parameters p and m. Then
MX (t) = (pet + 1 p)n
and MY (t) = (pet + 1 p)m
so that
MX+Y (t) = MX (t)MY (t) = (pet + 1 p)n (pet + 1 p)m
or
MX+Y (t) = (pet + 1 p)n+m
showing that X + Y is binomial with parameters p and n + m.

Example #26: The Sum of Two Independent Poisson Distributions
Suppose that X is Poisson with parameter 1 and that Y is Poisson with
parameter 2 . Then
t
MX (t) = e1 (e 1)
and MY (t) = e2 (e 1)
so that
MX+Y (t) = MX (t)MY (t) = e1 (e 1) e2 (e 1)

or
MX+Y (t) = e(1 +2 )(e 1)

showing that X + Y is Poisson with parameter 1 + 2 .
33
Example #27: The Sum of Two Independent Normal Distributions

Suppose that X is normal with parameters 1 and 12 and that Y is normal
with parameters 2 and 22 . Then
1
2 2
MX (t) = e1 t+ 2 1 t
2 2
and MY (t) = e2 t+ 2 2 t
so that
2 2
2 2
MX+Y (t) = MX (t)MY (t) = e1 t+ 2 1 t e2 t+ 2 2 t

or
MX+Y (t) = e(1 +2 )t+ 2 (1 +2 )t
showing that X + Y is normal with parameters 1 + 2 and 12 + 22 .

The Sum of Squares of n Independent Standard Normal Distributions
Suppose that Zi for i = 1, 2, 3, ..., n are independent standard normal distributions and that
X = Z12 + Z22 + Z32 + + Zn2 ,
then
MX (t) = (Mz2 (t))n
where
tZ 2
Mz2 (t) = E(e
)=
tz 2
1 2
1
1
e 2 z dz =
2
2
e( 2 t)z dz.
Setting s = (1/2 t)1/2 z,. we have ds = (1/2 t)1/2 dz, so that

Z +
Z +
1
1
1
2
s2
e
ds = p
es ds
Mz2 (t) =
1/2
(1/2 t)
2
(1 2t)
or
Then
1
.
Mz2 (t) =
1 2t
MX (t) = (1 2t)n/2 .
34
(25a)
The chi-squared distribution with degrees of freedom (denoted by 2 ) is a special

case of a gamma distribution with = /2 and = 1/. This then has a pdf
given by
1
x/21 ex/2
f (x) = /2
2 (/2)
for 0 x and f (x) = 0 for x < 0, where
Z
(/2) =
z /21 ez dz.
0
The moment generating function of this distribution is given by

Z
Z
etx
1
/21 x/2
/21 ( 12 t)x
x
e
dx
=
x
e
dx.
M(t) =
2/2 (/2)
2/2 (/2) 0
0
Setting s = (1/2 t)x, we have
1
M(t) = /2
2 (/2)
s
1/2 t
/21
es
1
ds
1/2 t
or
1
M(t) = /2
2 (/2)(1/2 t)/2
s/21 es ds =
1
(/2)
2/2 (/2)(1/2 t)/2
or
M(t) = (1 2t)/2 .
(25b)
Comparing Equations (25a,b), we see that if Z1 , Z2 , Z3 , . . ., Zn are all independent

and from N(0, 1), then the random variable
X = Z12 + Z22 + Z32 + + Zn2
has a chi-squared distribution with n degrees of freedom.
35

Notes#8 PDF

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Notes#8 PDF

Diunggah oleh

Hak Cipta:

Format Tersedia

Probability and Statistics

Since E(X) is a weighted average of the possible values of X, it follows that if

To prove this, we take

and since a x b and p(x) > 0, we have

and the proof is complete.

Toward this end, we note that

and so we see that

when Y 0. More generally, when Y 0 is not true, we still have

and we also have

xfY (x)dx = E(Y ).

Thus we find that in general,

for any continuous random variable Y .

P (g(X) > y)dy =

when g(x) 0. In a similar way we find in general that

for any continuous function g(x).

or if X and Y have a joint probability density function f (x, y), then

By interchanging the order of integration, we have

g(x, y)f (x, y)dydx

and Equation (5b) has been proven.

Example #1: Expected Distance Between Two Points On a Line

x y, for 0 < y < x

y x, for x < y < L

Computing E(X + Y ) For Random Variables X and Y

for any n random variables X1 , X2 , X3 , . . . , Xn , and constants 1 , 2 , 3 , . . . , n .

and so E(X) E(Y ) 0, or E(X) E(Y ).

The Sample Mean

is called the sample mean statistic and note that

0, if Ai does not occur

for i = 1, 2, 3, ..., n, and let

so that X denotes the number of events Ai that occur. Finally, let

that E(X) E(Y ). But since

which is a result that is not obvious from the inclusion-exclusion principle.

for x = xmin , xmin + 1, xmin + 2, . . . , xmax , and zero otherwise, where

xmax = min(g, n).

1, if the ith ball selected is white

for k 1, showing that Xi is geometric with p = 1 i/N. Then

where HN is the Nth harmonic number.

Consider any non-negative, integer-valued random variable X. If, for each

which is a useful identity.

3. Covariance, Variance of Sums, and Correlations

To prove this, we simply write

The proof for the discrete case is similar.

Using the properties of expectation, we may also write this as

Cov(X, X) = E(X 2 ) (E(X))2 = V (X)

We also note that

(E(Xi Yj ) E(Xi )E(Yj ))

Covariance and Independence

then XY = 0 all the time so that E(XY ) = 0. Also E(X) = 0 and so

We may also write this as

Since Cov(Xi , Xj ) = Cov(Xj , Xi ), we may also write Equation (14b) as

Note also that if X1 , X2 , X3 , ..., Xn are pairwise independent, then Cov(Xi , Xj ) =

Example #8: Samples

and then compute E(S 2 ).

= . This says that

which says that

Note that for the population variance,

and this is less than 2 .

as the correlation coecient between X and Y . It should be noted that if X and

0, if A does not occur

0, if B does not occur

mean, Xi X are uncorrelated. Toward this end, we have

It should be noted that

for discrete X and Y and