Anda di halaman 1dari 128

STATS 116

Theory of Probability
Jointly Distributed Random Variables
Prathapasinghe Dharmawansa

Department of Statistics
Stanford University

Summer 2018
Agenda

• Joint distribution functions


• Independent random variables
• Sums of independent random variables
• Conditional distributions: discrete case
• Conditional distributions: continuous case
• Order statistics
• Joint probability distribution of functions of random variables
• Exchangeable random variables

1
Joint distribution functions

2
Joint cdf of two random variables

Joint cdf of two random variables


For any two random variables X and Y , the joint cumulative
probability distribution function (cdf) of X and Y is given by

F (a, b) = P {X ≤ a, Y ≤ b}, −∞ < a, b < ∞

All joint probability statements about X and Y can, in theory, be


answered in terms of their joint distribution function.

3
• Obtaining the distribution of X from the joint cdf of X and Y :

FX (a) =P {X ≤ a} = P {X ≤ a, Y < ∞}
( )
=P lim {X ≤ a, Y ≤ b}
b→∞

(1) (2)
= lim P {X ≤ a, Y ≤ b} = lim F (a, b) = F (a, ∞)
b→∞ b→∞

In (1) we have used the fact that probability is a continuous set (that
is, event) function. In (2), we have used the definition on the previous
page.
• Similarly, FY (b) = P {Y ≤ b} = lima→∞ F (a, b) = F (∞, b).
• FX (·) and FY (·) are referred to as the marginal distributions of X and
Y , respectively.

4
It can be shown that, whenever a1 < a2, b1 < b2,

P {a1 < X ≤ a2, b1 < Y ≤ b2}


= F (a2, b2) + F (a1, b1) − F (a1, b2) − F (a2, b1)

5
Joint probability mass function

Given two discrete random variables X and Y , the joint probability mass
function of X and Y is defined by

p(x, y) = P {X = x, Y = y}.

The probability mass function of X can be obtained from p(x, y) by



pX (x) = P {X = x} = p(x, y).
y:p(x,y)>0

Similarly, ∑
pY (y) = P {Y = y} = p(x, y).
x:p(x,y)>0

6
Joint probability mass function: Example

Suppose that 3 balls are randomly selected from an urn containing 3


red, 4 white, and 5 blue balls. If we let X and Y denote, respectively,
the number of red and white balls chosen, then the joint probability mass
function of X and Y , p(i, j) = P {X = i, Y = j}, is given by
( ) ( ) ( )( ) ( )
5 12 10 4 5 12 40
p(0, 0) = / = , p(0, 1) = / =
3 3 220 1 2 3 220
( )( ) ( ) ( ) ( )
4 5 12 30 4 12 4
p(0, 2) = / = , p(0, 3) = / =
2 1 3 220 3 3 220
( )( ) ( ) ( )( )( ) ( )
3 5 12 30 3 4 5 12 60
p(1, 0) = / = , p(1, 1) = / =
1 2 3 220 1 1 1 3 220
( )( ) ( ) ( )( ) ( )
3 4 12 18 3 5 12 15
p(1, 2) = / = , p(2, 0) = / =
1 2 3 220 2 1 3 220

7
( )( ) ( ) ( ) ( )
3 4 12 12 3 12 1
p(2, 1) = / = , p(3, 0) = / = .
2 1 3 220 3 3 220

i \\\ j 0 1 2 3 P {X = i} (row sum)

10 40 30 4 84
0 220 220 220 220 220

30 60 18 108
1 220 220 220 0 220

15 12 27
2 220 220 0 0 220

1 1
3 220 0 0 0 220

P {Y = j} (col. sum) 56
220
112
220
48
220
4
220

8
In the previous table, the probability mass function (pmf) of X is
obtained by computing the row sums, whereas the pmf of Y is obtained
by computing the column sums. Because the individual pmfs of X and Y
thus appear in the margin of such a table, they are often referred to as the
marginal pmfs of X and Y , respectively.

9
Joint pdf of two random variables

Two random variables X and Y are said to be jointly continuous if


there exists a function f (x, y), defined for all real x and y, having
the property that, for every set C of pairs of real numbers (that
is, C is a set in the 2-D plane),
∫∫
P {(X, Y ) ∈ C} = f (x, y)dxdy.
(x,y)∈C

The function f (x, y) is called the joint probability density function


of X and Y .

10
Joint pdf of two random variables

• If A and B are any sets of real numbers, then, by defining C = {(x, y) :


x ∈ A, y ∈ B}, we have
∫ ∫
P {X ∈ A, Y ∈ B} = f (x, y)dxdy.
B A

• Since
∫ b ∫ a
F (a, b) = P {X ∈ (−∞, a], Y ∈ (−∞, b]} = f (x, y)dxdy,
−∞ −∞

it follows that
∂2
f (a, b) = F (a, b)
∂a∂b
wherever the partial derivatives are defined.

11
Joint pdf of two random variables

• Since
∫ b+db ∫ a+da
P {a < X < a + da, b < Y < b + db} = f (x, y)dxdy
b a
≈f (a, b)dadb

when da and db are small and f (x, y) is continuous at a, b, one can


see that f (a, b) is a measure of how likely it is that the random vector
(X, Y ) will be near (a, b).

12
Joint pdf of two random variables

If X and Y are jointly continuous, they are individually continuous, and


their probability density functions can be obtained as follows:

P {X ∈ A} =P {X ∈ A, Y ∈ (−∞, ∞)}
∫ ∫ ∞ ∫
= f (x, y)dydx = fX (x)dx
A −∞ A

where ∫ ∞
fX (x) = f (x, y)dy
−∞
is thus the probability density function of X. Similarly, the probability
density function of Y is given by
∫ ∞
fY (y) = f (x, y)dx.
−∞

13
Joint pdf of two random variables

Example: The joint density function of X and Y is given by


{
2e−xe−2y , 0 < x < ∞, 0 < y < ∞
f (x, y) = .
0, otherwise

Compute:

(a) P {X > 1, Y < 1}.


(b) P {X < Y }.
(c) P {X < a}.

14
Joint pdf of two random variables

Example: (solution) The joint density function of X and Y is given by


{
2e−xe−2y , 0 < x < ∞, 0 < y < ∞
f (x, y) = .
0, otherwise

Compute:

(a) P {X > 1, Y < 1} = e−1(1 − e−2).


(b) P {X < Y }=1/3.
(c) P {X < a}=1 − e−a.

15
Joint pdf of two random variables
Example: Consider choosing a point which is uniformly distributed within
a circle of radius R with its center at the origin. Define X and Y to be the
coordinates of the chosen point. The joint density function of X and Y is
given by (for(0,0)
some value x of c)
(X, Y)
{
R c, if x2 + y 2 ≤ R2
f (x, y) = .
0, if x2 + y 2 > R2
y

(a) Determine c.
(b) Find the marginal density functions
of X and Y .
(c) Let D denote the distance from the
origin of the selected point. Compute
the probability that D
is less than or equal to a.
(d) Find E[D].

16
Solution:

1
(a) Determine c. (c = πR 2 .)
(b) Find the marginal density functions of X and Y .
{ √
2 R2 −x2
πR2
, −R ≤ x ≤ R
fX (x) = .
0, otherwise

(c) Let D denote the distance from the origin of the selected point.
Compute the probability that D is less than or equal to a.

a2
P {D ≤ a} = 2 , 0 ≤ a ≤ R.
R

(d) Find E[D].


2R
E[D] = .
3

17
Joint distributions of n random variables

• Joint probability distributions for n random variables can be defined in


exactly the same manner as for n = 2.
• The joint cdf F (a1, a2, . . . , an) of the n random variables X1, X2, . . . , Xn
is defined by

F (a1, a2, . . . , an) = P {X1 ≤ a1, X2 ≤ a2, . . . , Xn ≤ an}.

• The n random variables are said to be jointly continuous if there exists


a function f (x1, x2, . . . , xn), called the joint pdf, such that, for any set
C in n-dimensional space,

P {(X1, X2, . . . , Xn) ∈ C}


∫ ∫
= ··· f (x1, . . . , xn)dx1dx2 . . . dxn
(x1 ,x2 ,...,xn )∈C

18
In particular, for any n sets of real numbers A1, A2, . . . , An,

P {X1 ∈ A1, X2 ∈ A2, . . . , Xn ∈ An}


∫ ∫ ∫
= ... f (x1, . . . , xn)dx1dx2 . . . dxn
An An−1 A1

19
Example: The multinomial distribution

One of the most important joint distributions is the multinomial


distribution, which arises when a sequence of n independent and identical
experiments is performed.

Suppose that each experiment can result in any ∑rone of r possible


outcomes, with respective probabilities p1, p2, . . . , pr , i=1 pi = 1.

Let Xi denote the number out of the n experiments that result in the
i-th outcome (i = 1, . . . , r), then

n!
P {X1 = n1, X2 = n2, . . . , Xr = nr } = pn1 1 pn2 2 . . . pnr r
n1!n2! . . . nr !
∑r
where i=1 ni = n. The joint distribution whose joint pmf is shown above
is called the multinomial distribution. When r = 2, the multinomial reduces
to the binomial distribution.

20
Independent random variables

21
Independent random variables

The random variables X and Y are said to be independent if, for


any two sets of real numbers A and B,

P {X ∈ A, Y ∈ B} = P {X ∈ A}P {Y ∈ B}. (⋆)

In other words, X and Y are independent if, for all A and B, the
events EA = {X ∈ A} and FB = {Y ∈ B} are independent.

Using the three axioms of probability, Equation (⋆) follows if and only
if, for all a, b,

P {X ≤ a, Y ≤ b} = P {X ≤ a}P {Y ≤ b}.

22
Independent random variables

• In terms of the joint distribution function F of X and Y : X and Y are


independent if for all a, b,

F (a, b) = FX (a)FY (b).

23
Independent random variables
When X and Y are discrete random variables, the condition of
independence in the definition (Equation (⋆)) is equivalent to, for all
x, y,
p(x, y) = pX (x)pY (y). (⋆1)
Discussion of the equivalence:

• Consider the one-point sets A = {x} and B = {y}. The equivalence


follows since if Equation (⋆) is satisfied, then (⋆1) is obtained.
• Furthermore, if Equation (⋆1) is valid, then, for any sets A, B,
∑∑ ∑∑
P {X ∈ A, Y ∈ B} = p(x, y) = pX (x)pY (y)
y∈B x∈A y∈B x∈A
∑ ∑
= pY (y) pX (x) = P {Y ∈ B}P {X ∈ A},
y∈B x∈A

and Equation (⋆) is established.


24
Independent random variables

In the jointly continuous case, the condition of independence is equivalent


to, for all x, y,
f (x, y) = fX (x)fY (y).

Comments:

• Roughly speaking, X and Y are independent if knowing the value of one


does not change the distribution of the other.
• Random variables that are not independent are said to be dependent.

25
Example

Suppose that n + m independent trials having a common probability of


success p are performed. If X is the number of successes in the first n
trials, and Y is the number of successes in the final m trials, then X and Y
are independent, since knowing the number of successes in the first n trials
does not affect the distribution of the number of successes in the final m
trials (by the assumption of independent trials). In fact, for integer x and
y, 0 ≤ x ≤ n, 0 ≤ y ≤ m,
( ) ( )
n x n−x m
P {X = x, Y = y} = p (1 − p) py (1 − p)n−y
x y
=P {X = x}P {Y = y}

In contrast, X and Z will be dependent, where Z is the total number of


successes in the n + m trials. (Why?)

26
Example

Suppose that the number of people who enter a post office on a given
day is a Poisson random variable with parameter λ. Show that if each
person who enters the post office is a male with probability p and a female
with probability 1p, then the number of males and females entering the post
office are independent Poisson random variables with respective parameters
λp and λ(1 − p).

27
Steps in the solution: Let X and Y denote, respectively, the number
of males and females that enter the post office. We shall show the
independence of X and Y by establishing Equation (⋆1).

• Consider P {X = i, Y = j}.

P {X = i, Y = j} =P {X = i, Y = j|X + Y = i + j}P {X + Y = i + j}
+ P {X = i, Y = j|X + Y ̸= i + j} P {X + Y ̸= i + j}
| {z }
=0

• P {X + Y = i + j} follows the assumption of the Poisson arrivals.


• P {X = i, Y = j|X + Y = i + j} is binomial.
• Thus, we obtain P {X = i, Y = j}. From this, we can sum over i to get
P {Y = j}. Similarly, we can get P {X = i}.
• By comparing P {X = i, Y = j} with P {X = i}P {Y = j}, we obtain
the desired result.

28
Example

Two people decide to meet at a certain location. If each of them


independently arrives at a time uniformly distributed between 12 noon and
1 P.M., find the probability that the first to arrive has to wait longer than
10 minutes.

29
Solution: Let X and Y denote, respectively, the time past 12 that
Person 1 and Person 2 arrive¿ Clearly, X and Y are independent random
variables, each uniformly distributed over (0, 60).

The desired probability, P {|X − Y | > 10}, can be written as P {Y >


X + 10} + P {X > Y + 10} = 2P {Y > X + 10} by symmetry of X and Y .
∫ 50 ∫ 60
1 1
2P {Y > X + 10} =2 dx dy
0 60 x+10 60
1250
=2 ×
3600
25
= .
36

30
Independent random variables

A necessary and sufficient condition for the random variables X and Y


to be independent is for their joint pdf (or joint pmf in the discrete case)
f (x, y) to factor into two terms, one depending only on x and the other
depending only on y.

Proposition:
The continuous (discrete) random variables X and Y are
independent if and only if their joint probability density (mass)
function can be expressed as

fX,Y (x, y) = h(x)g(y), −∞ < x < ∞, −∞ < y < ∞.

31
Proof: Consider the continuous case.

• =⇒. Since independence implies that the joint density is the product
of the marginal densities of X and Y, so the preceding factorization will
hold when the random variables are independent.
• ⇐= Now, suppose that fX,Y (x, y) = h(x)g(y). Then,
∫ ∞∫ ∞ ∫ ∞ ∫ ∞
1= fX,Y (x, y)dxdy = h(x)dx · g(y)dy = C1C2.
−∞ −∞ | −∞ {z } | −∞ {z }
C1 C2

∫∞
In addition, since by definition fX (x) = −∞ fX,Y (x, y)dy = C2h(x)
∫∞
and fY (y) = −∞ fX,Y (x, y)dx = C1g(y), we have

C1 C2 =1
fX (x)fY (y) = C1C2h(x)g(y) = h(x)g(y) = fX,Y (x, y).

This shows that X, Y are independent.

32
Independent random variables

• In general, the n random variables X1, X2, . . . , Xn are said to be


independent if, for all sets of real numbers A1, A2, . . . , An,


n
P {X1 ∈ A1, X2 ∈ A2, . . . , Xn ∈ An} = P {Xi ∈ Ai}
i=1

• As before, it can be shown that this condition is equivalent to


n
P {X1 ≤ a1, X2 ≤ a2, . . . , Xn ≤ an} = P {Xi ≤ ai},
i=1

for all a1, a2, . . . , an.


• Finally, an infinite collection of random variables is said to be independent
if every finite sub-collection of them is independent.

33
Independent random variables

Example:

Let X, Y, Z be independent and uniformly distributed over (0, 1).


Compute P {X ≥ Y Z}.

34
Independent random variables

Solution: Based on the given information,

fX,Y,Z (x, y, z) =fX (x) · fY (y) · fZ (z)


{
1, 0 ≤ x ≤ 1, 0 ≤ y ≤ 1, 0 ≤ z ≤ 1
=
0, otherwise.

Thus, ∫ 1 [∫ 1 (∫ 1 ) ]
P {X ≥ Y Z} = dx dz dy.
0 0 yz

35
Remark on the independence

Independence is a symmetric relation.

The random variables X and Y are independent if their joint density


function (or mass function in the discrete case) is the product of their
individual density (or mass) functions.

Therefore, to say that X is independent of Y is equivalent to saying


that Y is independent of X, or just that X and Y are independent.

36
Remark on the independence

Recall the chain rule for conditional probability:

P {X1 ≤ a1, . . . , Xn ≤ an} =P {X1 ≤ a1}P {X2 ≤ a2|X1 ≤ a1} . . .


. . . P {Xn ≤ an|X1 ≤ a1, . . . , Xn−1 ≤ an−1}.

From the above, one can see that the independence of X1, . . . , Xn can be
established sequentially, i.e., by showing that

• X2 is independent of X1
• X3 is independent of X1, X2
• X4 is independent of X1, X2, X3
• ...
• Xn is independent of X1, . . . , Xn−1.

37
Sums of independent random variables

38
Sums of independent random variables

Consider the distribution of X + Y from the distributions of X and


Y when X and Y are independent. Further assume that X and Y are
independent, continuous rvs with pdfs fX and fY . Then the cumulative
distribution function (cdf) of X + Y is obtained as follows:
∫∫
FX+Y (a) = P {X + Y ≤ a} = fX (x)fY (y)dxdy
x+y≤a
∫ ∞ [∫ a−y ]
= fX (x)dx fY (y)dy
−∞ −∞
∫ ∞
= FX (a − y)fY (y)dy.
−∞

39
Sums of independent random variables

From the previous page,


∫ ∞
FX+Y (a) = P {X + Y ≤ a} = FX (a − y)fY (y)dy.
−∞

Differentiating the above, we obtain the pdf fX+Y of X + Y , i.e.,


∫ ∞
dFX+Y (a) dFX (a − y)
fX+Y (a) = = fY (y)dy
da −∞ da
∫ ∞
= fX (a − y)fY (y)dy.
−∞

The probability density function fX+Y is called the convolution of the pdfs
fX and fY (the pdfs of X and Y , respectively).1
1
Note that this part is different from the textbook.

40
Sum of i.i.d. uniform random variables

Note: Independent and identically distributed (i.i.d.)

Example: If X and Y are independent random variables, both uniformly


distributed on (0, 1), find the pdf of X + Y .

41
Sum of i.i.d. uniform random variables

Example: If X and Y are independent random variables, both uniformly


distributed on (0, 1), find the pdf of X + Y .
{
1, 0 < a < 1
Solution: Clearly, fX (a) = fY (a) = . Then
0, otherwise
∫ ∞ ∫ 1
fX+Y (a) = fX (a − y)fY (y)dy = fX (a − y)dy
−∞ 0
∫ a

∫0 1dy = a, 0<a≤1
1
= 1dy = 2 − a, 1 ≤ a < 2 .


a−1
0, otherwise

Here X + Y is said to have a triangular distribution due to the shape of its


pdf (Draw it!).

42
Sum of i.i.d. uniform random variables

Let X1, X2, . . . , Xn be i.i.d. uniform (0, 1) random variables, and let

Fn(x) = P {X1 + . . . + Xn ≤ x}.

Show that Fn(x) = xn/n!, for 0 ≤ x ≤ 1.

Note that a general formula for Fn(x) is messy.

43
Sum of i.i.d. uniform random variables

Let X1, X2, . . . , Xn be i.i.d. uniform (0, 1) random variables, and let
Fn(x) = P {X1 + . . . + Xn ≤ x}. Show that Fn(x) = xn/n!, for 0 ≤ x ≤ 1.

Proof: Use mathematical induction.

• Clearly, the result holds when n = 1.


• Suppose that the identity( F n−1 (x)
) = x n−1
/(n − 1)!, 0 ≤ x ≤ 1 is true.
∑n ∑n−1
Then write i=1 Xi = i=1 Xi + Xn . From previous discussions,

∫ ∫
1 x
(x − z)n−1 xn
Fn(x) = Fn−1(x − z)fXn (z)dz = dz = .
0 0 (n − 1)! n!

44
Sum of i.i.d. uniform random variables

Example: Determine the expected number of i.i.d. uniform (0, 1) random


variables that need to be summed to exceed 1. That is, with X1, X2, . . .
being i.i.d. uniform (0, 1) random variables, determine E[N ], where

N = min{n : X1 + . . . + Xn > 1}.

45
Sum of i.i.d. uniform random variables

Solution to the example shown on the previous page: N > n(> 0) if


and only if X1 + . . . + Xn ≤ 1. Thus, P {N > n} = Fn(1) = n!
1
, n > 0.
For n = 0, P {N > 0} = 1 = 0!1
. Therefore,

n−1
P {N = n} = P {N > n − 1} − P {N > n} = , n ≥ 1.
n!

With the above, E[N ] can be readily calculated:


∑ ∞
∑ ∞

n(n − 1) 1n
E[N ] = nP {N = n} = = = e.
n=1 n=1
n! n=0
n!

That is, the average number of i.i.d. uniform (0, 1) random variables that
must be summed for the sum to exceed 1 is equal to e.

46
Gamma random variables

Recall: A gamma random variable with parameters (t, λ) has a density


of the form

λ · e−λy · (λy)t−1
f (y) = , 0<y<∞
Γ(t)

An important property of this family of distributions is that, for a fixed


value of λ, it is closed under convolutions.

47
Sum of independent gamma random variables

Proposition
If X and Y are independent gamma random variables with
respective parameters (s, λ) and (t, λ), then X + Y is a gamma
random variable with parameters (s + t, λ).

48
Proof: From previous discussions,
∫ ∞
fX+Y (a) = fX (a − y)fY (y)dy
−∞

λ · e−λ(a−y) · (λ(a − y))s−1 λ · e−λy · (λy)t−1
a
= · dy
0 Γ(s) Γ(t)
∫ a
=Ke−λa (a − y)s−1y t−1dy
0
∫ 1
x=y/a s+t−1 −λa
= a e K (1 − x)s−1xt−1dx = Cas+t−1e−λa
| 0 {z }
C

where C is a constant that does not depend on a. Sinc the pdf must
integrate to 1, the value of C is determined, and we have

λe−λa(λa)s+t−1
fX+Y (a) = .
Γ(s + t)

49
Sum of independent gamma random variables

Using this Proposition and induction, the following can be readily


established.

If Xi, i = 1, . . . , n are independent gamma∑random variables with


n
respective parameters
∑n (ti , λ), i = 1, . . . , n, then i=1 Xi is gamma with
parameters ( i=1 ti, λ).

50
Sum of independent gamma random variables

Example:

Let X1, X2, . . . , Xn be n i.i.d. exponential random variables, each


having parameter λ.

Then, since an exponential random variable with parameter λ is the


same as a gamma random variable with parameters (1, λ), it follows from
the Proposition that X1 + X2 + . . . + Xn is a gamma random variable with
parameters (n, λ).

51
Gamma and chi-squared random variables

Let∑Z1, Z2, . . . , Zn be i.i.d. standard normal random variables. Then


n 2 2
Y = i=1 Zi is said to have the chi-squared (χ ) distribution with n
degrees of freedom (χ2n) .

• Consider the case when n = 1, i.e., Y = Z12. From an example in


previous chapters, the pdf of Y is given by

1 √ √
fZ 2 (y) = √ [fZ1 ( y) + fZ1 (− y)]
1 2 y
1 2
= √ √ e−y/2
2 y 2π
1
1 −y/2 (y/2) 2 −1
= e · √
2 π

52
(1 1
)
which is the gamma distribution with parameters ,
2 2 .
• A by-product of the above analysis is that

Γ(1/2) = π.

(1 )
• But since each Zi2 is gamma 1
2, 2, using the previous proposition, it
follows that χ2n is just the gamma distribution with parameters (n/2, 1/2)
and its pdf is given by
n
1
· e−y/2 · (y/2) 2 −1
fY (y) = 2
, 0<y<∞
Γ(n/2)

53
Normal random variables

Proposition
If Xi, i = 1, . . . , n, are independent random variables that are
2
normally distributed
∑n with respective parameters (µ i , σ i ), i =
1,∑. . . , n, then
∑ i=1 Xi is normally distributed with parameters
n n
( i=1 µi, i=1 σi2).

54
Proof: Steps:

(1) To begin, let X and Y be independent normal rvs with X having mean
0 and variance σ 2 and Y having mean 0 and variance 1. It can be
shown that X + Y is normal with mean 0 and variance 1 + σ 2.
(2) Further, Let X1 and X2 be independent normal rvs with Xi having
mean µi and variance σi2, i = 1, 2. Then
 
 
 
 
 X1 − µ1 X − µ 
X + X}2 
= σ2  +σ2
2 2  +µ1 + µ2
| 1 {z 
| σ | {z }
σ
 ({z 2 )}
2 2
2 +σ 2 )
N (µ1 +µ2 ,σ1 
2
 σ1 N (0,1) 
N 0, 2
σ2
| {z
( )
}
σ2
Step (1):N 0,1+ 1
2
σ2

(3) Finally, using induction, we can obtain the general case as shown in the
proposition.

55
Normal random variables: Example

A basketball team will play a 44-game season. Among the 44 games,


26 of them are against class A teams and 18 are against class B teams.
Suppose that the team will win each game against a class A team with
probability 0.4 and will win each game against a class B team with
probability 0.7. Suppose also that the results of the different games are
independent. Approximate the probability that

(a) the team wins 25 games or more;


(b) the team wins more games against class A teams than it does against
class B teams.

56
Solution: Let XA and XB respectively denote the number of games the
team wins against class A and against class B teams. XA and XB are
independent binomial random variables and

E[XA] =26(.4) = 10.4; Var(XA) = 26(.4)(.6) = 6.24


E[XB ] =18(.7) = 12.6; Var(XB ) = 18(.7)(.3) = 3.78

By the normal approximation to the binomial, XA and XB will have


approximately the same distribution as would independent normal random
variables with the preceding expected values and variances. Therefore,
XA + XB , XA − XB are also normal with respective means and variances.
The solution is complete by noting that the required probabilities are
given by

(a) P {XA + XB ≥ 25}


(b) P {XA − XB ≥ 1}

57
Lognormal random variables

The random variable Y is said to be a lognormal random variable


with parameters µ and σ if log(Y ) is a normal random variable
with mean µ and variance σ 2. That is, Y is lognormal if it can be
expressed as
Y = eX
where X is a normal random variable.

58
Lognormal random variables: Example

Starting at some fixed time, let S(n) denote the price of a certain security
at the end of n additional weeks, n ≥ 1. A popular model for the evolution
of these prices assumes that the price ratios S(n)/S(n − 1), n ≥ 1,
are independent and identically distributed lognormal random variables.
Assuming this model, with parameters µ = 0.0165, σ = 0.0730, what is the
probability that

(a) the price of the security increases over each of the next two weeks?
(b) the price at the end of two weeks is higher than it is today?

59
Solution: Let Z be a standard normal random variable.

(a) Note that x > 1 if and only if log(x) > log(1) = 0. As a result, we
have
{ } { ( ) }
S(1) S(1)
P >1 =P log >0
S(0) S(0)
 ( ) 

 log S(1) 
S(0) − µ 0 − µ
=P >

| σ
{z } σ 
Z

60
(b) Similarly, note that
{ } { ( ) }
S(2) S(1) S(2)
P > 1 =P log · >0
S(0) S(0) S(1)
 

 

 ( S(1) ) (
S(2)
) 
=P log + log >0

 S(0) S(1) 

| {z } | {z } 
Z1 Z2

where Z1, Z2 are independent normal random variables. Thus, Z1 + Z2


is normal with corresponding mean and variance. The similar method
to that used in (a) can be applied.

61
Sums of independent Poisson random variables

If X and Y are independent Poisson random variables with respective


parameters λ1 and λ2, compute the distribution of X + Y .

62
Sums of independent Poisson random variables

Solution: The event {X + Y = n} may be written as the union of the


disjoint events {X = k, Y = n − k}, 0 ≤ k ≤ n, we have


n
P {X + Y = n} = P {X = k, Y = n − k}
k=0
∑n
= P {X = k}P {Y = n − k}
k=0
∑n
λk1 −λ1 λ2n−k −λ2 e−(λ1+λ2) ∑ λk1 λn−k
n
n!
2
= e e =
k! (n − k)! n! k!(n − k)!
k=0 k=0

−(λ1 +λ2 ) (λ1 + λ2)n


=e · .
n!
Therefore, X + Y is Poisson with parameter λ1 + λ2.

63
Sums of independent binomial random variables

Let X and Y be independent binomial random variables with respective


parameters (n, p) and (m, p), respectively. Compute the distribution of
X +Y.

64
Sums of independent binomial random variables
( ) ( )( )
n+m ∑i n m
Solution: Note that = k=0 .
i k i−k


i
P {X + Y = i} = P {X = k, Y = i − k}
k=0


i
= P {X = k}P {Y = i − k}
k=0
i ( )
∑ ( )
n k n−k m i−k m−(i−k)
= p (1 − p) p (1 − p)
k i−k
k=0

∑i ( )( )
n m i n+m−i
= p (1 − p)
k i−k
k=0
( )
n+m i n+m−i
= p (1 − p) =⇒ X + Y binomial (n + m, p)
i

65
Geometric random variables

Let X1, . . . , Xn be independent geometric random variables, with Xi


having parameter pi for i =∑ 1, . . . , n. We have the following proposition on
n
the pmf of their sum Sn = i=1 Xi.

Proposition
Let X1, . . . , Xn be independent geometric random variables, with
Xi having parameter pi for i = 1, . . . , n. Define qi = 1 − pi,
i = 1, . . . , n. If all the pi are distinct, then, for k ≥ n,
 

n ∏ pj 
k−1 
P {Sn = k} = pi q i .
i=1
pj − pi
j̸=i

66
Geometric random variables

Proof: The proof is based on induction. When n = 1, the result is


trivially true. Now consider the case when n = 2.


k
P {S2 = k} =P {X1 + X2 = k} = P {X1 = j}P {X2 = k − j}
j=0


k−1 ∑
k−1
= P {X1 = j}P {X2 = k − j} = q1j−1p1q2k−j−1p2
j=1 j=1
 ( )k−1 
∑( )j−1 −
k−1 q1
q1  1 q2 
=p1p2q2k−2 = p1p2q2k−2  
j=1
q2 1 − qq12

p1 p2
=p2q2k−1 + p1q1k−1
p1 − p2 p2 − p1

67
Geometric random variables

Proof of the Proposition (cont’d):

Consider n = 3 to gain more insight.


k
P {S3 = k} =P {X1 + X2 + X3 = k} = P {S2 = j}P {X3 = k − j}
j=0


k−1
= P {S2 = j}P {X3 = k − j}
j=1
p2 p3 p1 p3
=p1q1k−1 · + p2q2k−1 ·
(p2 − p1) (p3 − p1) (p1 − p2) (p3 − p2)
p2 p1
+ p3 q 3 ·
k−1
(p2 − p3) (p1 − p3)

68
Geometric random variables

Proof of the Proposition (cont’d):


Consider the general case. Suppose that the proposition holds true for
Sn−1 and for P {Sn = r}, r ≤ k − 1.

• To compute P {Sn = k}, we condition on whether Xn = 1.

P {Sn = k} =P {Sn = k|Xn = 1}pn + P {Sn = k|Xn > 1}qn


P {Sn = k|Xn = 1} =P {Sn−1 = k − 1|Xn = 1}
independence
= P {Sn−1 = k − 1}

n−1 ∏ pj
= piqik−2
i=1
pj − pi
j̸=i
j≤(n−1)

69
• If X is geometric with parameter p, then the conditional distribution of
X given X > 1 is the same as the distribution of 1 (the first failed trial)
plus a geometric with parameter p (the number of additional trials after
the first until a success occurs). Therefore,

P {Sn = k|Xn > 1} =P {X1 + X2 + . . . + Xn + 1 = k}


=P {Sn = k − 1}

n ∏ pj
= piqik−2
i=1
pj − pi
j̸=i
j≤n

70
• We obtain P {Sn = k} as follows:
 

n−1 ∏ pj 
P {Sn = k} =pn 
 p q k−2 
pj − pi 
i i
i=1 j̸=i
j≤(n−1)
 
∑ n ∏ pj 
+ qn 

k−2
piqi 

i=1
p j − p i
j̸=i
j≤n

After careful calculations to obtain the following, the proof is completed:


n ∏ pj
P {Sn = k} = piqik−1 .
i=1
pj − pi
j̸=i

71
Conditional distributions: discrete case

72
Condition distributions: Discrete case

• Recall: that, for any two events E and F , the conditional probability of
E given F is defined, provided that P (F ) > 0, by

P (EF )
P (E|F ) = .
P (F )

• Let X and Y be discrete random variables. The conditional probability


mass function of X given that Y = y is given by

P {X = x, Y = y} p(x, y)
pX|Y (x|y) = P {X = x|Y = y} = =
P {Y = y} pY (y)

for all values of y such that pY (y) > 0.

73
• Similarly, the conditional probability distribution function (cdf) of X
given that Y = y is defined, for all y such that pY (y) > 0, by

FX|Y (x|y) = P {X ≤ x|Y = y} = pX|Y (a|y).
a≤x

• If X is independent of Y , then the conditional pmf and the conditional


cdf are the same as the respective unconditional ones, since

P {X = x, Y = y}
pX|Y (x|y) =P {X = x|Y = y} =
P {Y = y}
P {X = x}P {Y = y}
=
P {Y = y}
=P {X = x}.

74
Condition distributions – Discrete case: Example 1

Suppose that p(x, y), the joint probability mass function of X and Y , is
given by

p(0, 0) = 0.4, p(0, 1) = 0.2, p(1, 0) = 0.1, p(1, 1) = .3.

Calculate the conditional pmf of X given that Y = 1.

75
Condition distributions – Discrete case: Example 1

Solution:

Consider first pY (Y = 1).



pY (1) = p(x, 1) = p(0, 1) + p(1, 1) = 0.5.
x

Thus,
p(0, 1) 0.2
pX|Y (0|1) = = = 0.4,
pY (1) 0.5
and
p(1, 1) 0.3
pX|Y (1|1) = = = 0.6.
pY (1) 0.5

76
Condition distributions – Discrete case: Example 2

If X and Y are independent Poisson random variables with respective


parameters λ1 and λ2, calculate the conditional pmf of X given that
X + Y = n.

77
Condition distributions – Discrete case: Example 2

Solution: For 0 ≤ k ≤ n, the required condition pmf is given by

P {X = k|X + Y = n} =P {X = k, Y = n − k|X + Y = k}
P {X = k, Y = n − k, X + Y = n}
=
P {X + Y = n}
P {X = k, Y = n − k}
=
P {X + Y = n}
P {X = k} · P {Y = n − k}
=
P {X + Y = n}

Recall that X + Y is a Poisson random variable with parameter λ1 + λ2.


Therefore,

78
−λ1 λk −λ2 λn−k
e · 1
k! ·e · (n−k)!
2

P {X = k|X + Y = n} = (λ1 +λ2 )n


e−(λ1+λ2) · n!
( )( )k ( )n−k
n λ1 λ2
= .
k λ1 + λ2 λ1 + λ2

Based on the above, the conditional distribution of X given that X +Y = n


is the binomial distribution with parameters n and λ1/(λ1 + λ2).

79
Condition distributions – Discrete case: Example 3
Consider the multinomial distribution with joint pmf

n! n
∑k
P {Xi = ni, i = 1, . . . , k} = pn1 1 . . . pk k , ni ≥ 0, ni = n.
n 1 ! . . . nk ! i=1

Such a mass function results when n independent trials∑kare performed, with


each trial resulting in outcome i with probability pi, i=1 pi = 1.

The random variables Xi, i = 1, . . . , k, represent, respectively, the


number of trials that result in outcome i, i = 1, . . . , k.

80
(Cont’d)

Suppose we are given that nj of the trials resulted in outcome j, for


∑k
j = r + 1, . . . , k, where j=r+1 nj = m ≤ n. Then, each of the other
n − m trials must have resulted in one of the trials 1, . . . , r.

Show that the conditional distribution of X1, . . . , Xr is the multinomial


distribution on n − m trials with respective trial outcome probabilities

pi
P {outcome i|outcome is not any of r + 1, . . . , k} = , i = 1, . . . , r
Fr
∑r
where Fr = i=1 pi is the probability that a trial results in one of the
outcomes 1, . . . , r.

81
Condition distributions – Discrete case: Example 3
∑r
Proof: Let n1, . . . , nr be such that i=1 ni = n − m. The conditional
probability here is given by

P {X1 = n1, . . . , Xr = nr |Xr+1 = nr+1, . . . , Xk = nk }


P {X1 = n1, . . . , Xr = nr , Xr+1 = nr+1, . . . , Xk = nk }
=
P {Xr+1 = nr+1, . . . , Xk = nk }
n! n1 nr nr+1 nk
p
n1 !...nk ! 1 . . . p p
r r+1 . . . p k
= n! n−m nr+1 nk
(n−m)!nr+1 !...nk ! F r p r+1 . . . p k

where the probability in the denominator was obtained by regarding


outcomes 1, . . . , r as a single outcome having probability Fr , thus showing
that the probability is a multinomial probability on n trials with outcome
probabilities Fr , pr+1, . . . , pk .

82
∑r
Note that i=1 ni = n − m. The above conditional probability can be
further simplified as

P {X1 = n1, . . . , Xr = nr |Xr+1 = nr+1, . . . , Xk = nk }


( )n1 ( )n2 ( )nr
(n − m)! p1 p2 pr
= .
n1! . . . nr ! Fr Fr Fr

This concludes the proof.

83
Condition distributions – Discrete case: Example 4

Consider n independent trials, with each trial being a success with


probability p. Given a total of k successes, show that all possible orderings
of the k successes and n − k failures are equally likely.

84
Condition distributions – Discrete case: Example 4

Solution: Let a particular realization with k successes (and n−k failures)


be denoted Ak . For example, Ak = [s f s s . . .]. Denote as Tk the event
that out of n independent trials there are k successes. Then the conditional
probability P {Ak |Tk } is given by

P {Ak , Tk }
P {Ak |Tk } =
P {Tk }
P {Ak } pk (1 − p)n−k
= =( )
P {Tk } n k
p (1 − p)n−k
k
1
=( )
n
k

85
Conditional distributions: continuous case

86
Condition distributions: continuous case
• If X and Y have a joint probability density function f (x, y), then for
all values of y such that fY (y) > 0, the conditional probability density
function of X given that Y = y is defined by

f (x, y)
fX|Y (x|y) = .
fY (y)

• Insight: Note that

f (x, y)dxdy P {x ≤ X ≤ x + dx, y ≤ Y ≤ y + dy}


fX|Y (x|y)dx = ≈
fY (y)dy P {y ≤ Y ≤ y + dy}
=P {x ≤ X ≤ x + dx|y ≤ Y ≤ y + dy}.

Therefore, for small values of dx and dy, fX|Y (x|y)dx represents the
conditional probability that X is between x and x + dx given that Y is
between y and y + dy.

87
Condition distributions: continuous case

• Let X and Y be jointly continuous. Then conditional probabilities of


events associated with X given the value of Y can be determined, i.e.,
for any set A,

P {X ∈ A|Y = y} = fX|Y (x|y)dx.
A

• In particular, let A = (∞, a]. Define the conditional cumulative


distribution function of X given that Y = y by
∫ a
FX|Y (a|y) = P {X ≤ a|Y = y} = fX|Y (x|y)dx.
−∞

• One can interpret the definitions of the conditional probabilities by


considering a small interval around y, even though the event Y = y has
probability 0.

88
Condition distributions – continuous case: Example 1

Suppose that the joint pdf of X and Y is given by


{
e−x/y ·e−y
y 0 < x < ∞, 0 < y < ∞
f (x, y) = .
0 otherwise

Find P {X > 1|Y = y}.

89
Condition distributions – continuous case: Example 1

Solution: Steps:

1. First obtain the conditional density of X given that Y = y.

f (x, y) f (x, y)
fX|Y (x|y) = = ∫∞
fY (y) 0
f (x, y)dx
1 −x/y
= ·e
y

2. Hence, ∫ ∞
1 −x/y
P {X > 1|Y = y} = ·e dx = e−1/y .
1 y

90
Condition distributions – continuous case: Example 2
The bivariate normal distribution

One of the most important joint distributions is the bivariate normal


distribution.

• The random variables X, Y are said to have a bivariate normal distribution


if, for constants µx, µy , σx(σx > 0), σy (σy > 0), −1 < ρ < 1, their joint
density function is given, for all −∞ < x, y < ∞, by
{ [( )2
1 1 x − µx
f (x, y) = √ exp −
2πσxσy 1 − ρ2 2(1 − ρ2) σx
( )2 ]}
y − µy (x − µx)(y − µy )
+ − 2ρ .
σy σxσy

91
Condition distributions – continuous case: Example 2
The bivariate normal distribution
We now determine the conditional density of X given that Y = y.

• Method: Continually collect all factors that do not depend on x and


represent them by the∫ ∞ constants Ci. The final constant will then be
found by using that −∞ fX|Y (x|y)dx = 1.
Details are given on the next page.
• Similarly, we can obtain the conditional density of Y given that X = x.

92
f (x, y)
fX|Y (x|y) = = C1f (x, y)
fY (y)
{ [( )2 ]}
1 x − µx x(y − µy )
=C2 exp − − 2ρ
2(1 − ρ2) σx σxσy
{ [ ( )]}
1 ρσx
=C3 exp − 2 x − 2x µx +
2
(y − µy )
2σx(1 − ρ2) σy
{ [ ( )]2}
1 ρσx
=C4 exp − 2 x − µ x + (y − µy )
2σx(1 − ρ )2 σy

The preceding equation is a normal density! Thus, given Y = y, the


σy (y − µy )
random variable X is normally distributed with mean µx + ρσ x

and variance σx2 (1 − ρ2).

93
Condition distributions – continuous case: Example 2
The bivariate normal distribution

• The random variables X, Y are said to have a bivariate normal distribution


if, for constants µx, µy , σx(σx > 0), σy (σy > 0), −1 < ρ < 1, their joint
density function is given, for all −∞ < x, y < ∞, by
{ [( )2
1 1 x − µx
f (x, y) = √ exp −
2πσxσy 1 − ρ2 2(1 − ρ2) σx
( )2 ]}
y − µy (x − µx)(y − µy )
+ − 2ρ .
σy σxσy

• From the above, we can find the marginal pdfs of X and Y , respectively.
In fact, one can show that X is normal with mean µx and variance σx2 .
Similarly, Y is normal with mean µy and variance σy2.

94
Condition distributions – continuous case: Discussions

• If X and Y are independent continuous random variables, the conditional


density of X given that Y = y is the unconditional density of X, since
in the independent case,

f (x, y) fX (x)fY (y)


fX|Y (x|y) = = = fX (x).
fY (y) fY (y)

• The conditional distributions can be defined when the random variables


are neither jointly continuous nor jointly discrete.

95
– For example, let X be a continuous rv with pdf f and let N be a
discrete rv. Consider the conditional probability density function of X
given that N = n.

P {x < X < x + dx|N = n}


fX|N (x|n) = lim
dx→0 dx
P {N = n|x < X < x + dx} P {x < X < x + dx}
= lim ·
dx→0 P {N = n} dx
P {N = n|X = x}
= · f (x)
P {N = n}

96
Condition distributions – continuous case: Example 3

Consider n + m trials having a common probability of success. Suppose,


however, that this success probability is not fixed in advance but is chosen
from a uniform (0, 1) population. What is the conditional distribution of
the success probability given that the n + m trials result in n successes?

97
Condition distributions – continuous case: Example 3

Solution: Let X denote the probability that a given trial is a success,


then X is a uniform (0, 1) random variable. Given that X = x, the n + m
trials are independent with common probability of success x. Thus, the
number of successes, N , is a binomial random variable with parameters
(n + m, x). Hence, the conditional density of X given that N = n is
( )
n+m n
x (1 − x)m
P {N = n|X = x} n
fX|N (x|n) = fX (x) = fX (x)
P {N = n} P {N = n}
fX (x)=1
= c · xn(1 − x)m, 0 < x < 1

where c does not depend on x. Thus, the conditional density is that of a


beta random variable with parameters (n + 1, m + 1).

98
Condition distributions – continuous case: Example 3

Based on the example, if the original or prior (to the collection of data)
distribution of a trial success probability is uniformly distributed over (0, 1)
[or, equivalently, is beta with parameters (1, 1)], then the posterior (or
conditional) distribution given a total of n successes in n + m trials is beta
with parameters (1 + n, 1 + m).

This result enhances our intuition as to what it means to assume that a


random variable has a beta distribution.

99
Order statistics

100
Order statistics

Let X1, X2, . . . , Xn be n i.i.d. continuous random variables having a


common density f and distribution function F . Define

X(1) = smallest of X1, X2, . . . , Xn


X(2) = second smallest of X1, X2, . . . , Xn
..

X(j) = j-th smallest of X1, X2, . . . , Xn


..

X(n) = largest of X1, X2, . . . , Xn

The ordered values X(1) ≤ X(2) ≤ . . . ≤ X(n) are known as the order
statistics corresponding to the random variables X1, X2, . . . , Xn, i.e.,
X(1), . . . , X(n) are the ordered values of X1, X2, . . . , Xn.

101
Joint density function of the order statistics
The order statistics X(1), . . . , X(n) will take on the values x1 ≤ x2 ≤
. . . ≤ xn if and only if, for some permutation (i1, i2, . . . , in) of (1, 2, . . . , n),
X1 = xi1 , X2 = xi2 , . . . , Xn = xin . Since , for any permutation (i1, . . . , in)
of (1, 2, . . . , n),
[ ϵ ϵ ϵ ϵ]
P xi1 − < X 1 ≤ xi1 + , . . . , x in − < X n ≤ xin +
2 2 2 2
≈ ϵnfX1,...,Xn (xi1 , . . . , xin ) = ϵnf (xi1 ) . . . f (xin ) = ϵnf (x1) . . . f (xn)

for x1 < x2 < . . . < xn, we have


[ ϵ ϵ ϵ ϵ]
P x1 − < X(1) ≤ x1 + , . . . , xn − < X(n) ≤ xn +
2 2 2 2
≈ n!ϵnf (x1) . . . f (xn).
=⇒ fX(1),...,X(n) (x1, x2, . . . , xn) = n!f (x1) . . . f (xn), x1 < x 2 < . . . < x n .

102
Order statistics: Example 1

Along a road 1 mile long are 3 people “distributed at random”. Find


the probability that no 2 people are less than a distance of d miles apart
when d ≤ 1/2.

103
Order statistics: Example 1 – Solution

The meaning of “distributed at random” is taken as having the positions


of the 3 people as i.i.d. uniformly distributed over (0, 1) (miles). Let
Xi denote the position of the i-th person, then the desired probability is
P {X(3) > X(2) + d, X(2) > X(1) + d}.
Clearly, the desired probability can be calculated as
∫ ∫
... fX(1),...,X(n) (x1, x2, . . . , xn)dx1 . . . dxn
X(3) >X(2) +d
X(2) >X(1) +d
∫ 1−2d ∫ 1−d ∫ 1
= 3! dx1 dx2 dx3
0 x1 +d x2 +d

= (1 − 2d)3

104
Joint probability distribution of functions of random
variables

105
Joint pdf of functions of random variables
Let X1 and X2 be jointly continuous rvs with joint pdf fX1,X2 . Let

Y1 = g1(X1, X2), Y2 = g2(X1, X2)

for some functions g1 and g2, where g1 and g2 satisfy the following:

1. The equations y1 = g1(x1, x2), y2 = g2(x1, x2) can be uniquely solved


for x1 and x2 in terms of y1 and y2, with solutions denoted as x1 =
h1(y1, y2), x2 = h2(y1, y2).
2. The functions g1 and g2 have continuous partial derivatives at all points
(x1, x2) and are such that the 2 × 2 determinant

∂g1 ∂g1 ∂g ∂g
∂x1 ∂x2 1 2 ∂g1 ∂g2
J(x1, x2) = ∂g2 ∂g2 = − ̸= 0
∂x ∂x ∂x1 ∂x2 ∂x2 ∂x1
1 2

at all points (x1, x2).

106
Joint pdf of functions of random variables

Under the two conditions stated in the previous page, it can be shown
that the random variables Y1 and Y2 are jointly continuous with joint density
function given by

fY1,Y2 (y1, y2) = fX1,X2 (x1, x2)|J(x1, x2)|−1,

where x1 = h1(y1, y2), x2 = h2(y1, y2).

• Note that J(x1, x2) is the determinant which can be positive or negative,
but |J(x1, x2)| is its absolute value. We have used the notation |A| as
the determinant of a square matrix A, and |a| for the absolute value of
a real number a.
• The proof of the result will be left as an exercise.

107
Joint pdf of functions of rvs: Example 1

Let X1 and X2 be jointly continuous random variables with probability


density function fX1,X2 . Let Y1 = X1 + X2, Y2 = X1 − X2. Find the joint
density function of Y1 and Y2 in terms of fX1,X2 .

108
Joint pdf of functions of rvs: Example 1

Solution: Let g1(x1, x2) = x1 + x2 and g2(x1, x2) = x1 − x2. Then



1 1
J(x1, x2) = = −2.
1 −1

Since the solution to the equations y1 = x1 + x2 and y2 = x1 − x2 is given


by x1 = (y1 + y2)/2, x2 = (y1 − y2)/2 as their solution, it follows from the
previous result that the desired density is
( )
1 y1 + y2 y1 − y2
fY1,Y2 (y1, y2) = fX1,X2 , .
2 2 2

109
Joint pdf of functions of rvs: Example 1 – Applications

1. If X1 and X2 are i.i.d. uniform (0, 1) random variables, then


{
1
2, 0 ≤ y1 + y2 ≤ 2, 0 ≤ y1 − y2 ≤ 2
fY1,Y2 (y1, y2) = .
0, otherwise

2. If X1 and X2 are independent exponential rvs with parameters λ1 and


λ2, respectively, then

fY1,Y2 (y1, y2)


{ λ (y +y ) λ (y −y )
− 1 12 2 − 2 12 2
2 ·e ·e , y1 + y2 ≥ 0, y1 − y2 ≥ 0
λ1 λ2
= .
0, otherwise

110
Joint pdf of functions of rvs: Example 1 – Applications

3. If X1 and X2 are i.i.d. standard normal random variables, then

1 −(y1 +y2 )2 /8 −(y1 −y2 )2 /8


fY1,Y2 (y1, y2) = · e ·e

1 −(y12 +y22 )/4
= ·e

1 2 1 2
= √ · e−y1 /4 √ · e−y2 /4
4π 4π

Thus, not only do we obtain that both X1 + X2 and X1 − X2 are normal


with mean 0 and variance 2, but we also conclude that these two random
variables are independent.2
2
In fact, it can be shown that if X1 and X2 are independent random variables having a common
distribution function F , then X1 + X2 will be independent of X1 − X2 if and only if F is a normal
distribution function.

111
Joint pdf of functions of rvs: Example 2

Let (X, Y ) denote a random point in the plane, and assume that the
rectangular coordinates X and Y are independent standard normal random
variables.

Determine the joint pdf of R, Θ, the polar coordinate representation of


(x, y).

R
Y
ȣ
X

112
Joint pdf of functions of rvs: Example 2

Solution: Suppose first that X, Y are both positive. For x > 0, y > 0,
let √
r = g1(x, y) = x2 + y 2, θ = g2(x, y) = tan−1(y/x),
we see that

∂g1 x ∂g1 y
=√ , =√
∂x x2 + y 2 ∂y x2 + y 2
∂g2 −y ∂g2 x
= 2 2
, = 2 2
.
∂x x + y ∂y x +y

Then
1
J(x, y) = .
r

113
Since the conditional joint pdf given that X, Y are both positive is

f (x, y) 2 2 2
f (x, y|X > 0, Y > 0) = = e−(x +y )/2, x > 0, y > 0,
P (X > 0, Y > 0) π

we have the conditional joint pdf of R, Θ given that X, Y are both positive
given by
2r −r2/2 π
f (r, θ|X > 0, Y > 0) = e , r > 0, 0 < θ < .
π 2
Similarly, one can find the following:
2r −r2/2 π
f (r, θ|X < 0, Y > 0) = e , r > 0, < θ < π
π 2
2r −r2/2 3π
f (r, θ|X < 0, Y < 0) = e , r > 0, π < θ <
π 2
2r −r2/2 3π
f (r, θ|X > 0, Y < 0) = e , r > 0, < θ < 2π
π 2

114
Thus, the joint pdf of R, Θ is given by

r −r2/2
f (r, θ) = e , r > 0, 0 < θ < 2π.

From the above, we can find the marginal pdf for R, which is a Rayleigh
distribution:
−r 2 /2
fR(r) = re , r > 0.

This implies that

• Θ is uniformly distributed over {(0, 2π)};


• R and Θ are independent!!

115
Joint pdf of functions of n rvs

Let the joint pdf of the n random variables X1, X2, . . . , Xn be given.
Consider the joint pdf of Y1, Y2, . . . , Yn, where

Y1 = g1(X1, . . . , Xn), Y2 = g2(X1, . . . , Xn), . . . , Yn = gn(X1, . . . , Xn).

Assume that

• the functions gi have continuous partial derivatives;


• and that the Jacobian determinant
∂g
1 ∂g1 . . . ∂g1
∂x1 ∂x2 ∂xn
∂g2 ∂g2 ∂g
∂x1 ∂x2 . . . ∂xn2
J(x1, . . . , xn) = .
. .. ... .. ̸= 0
∂gn ∂gn
∂x ∂x . . . ∂x
∂g n
1 2 n

116
at all points (x1, . . . , xn).
• Furthermore, we suppose that the equations y1 = g1(x1, . . . , xn), y2 =
g2(x1, . . . , xn), . . . , yn = gn(x1, . . . , xn) have a unique solution
denoted as

x1 = h1(y1, . . . , yn), . . . , xn = hn(y1, . . . , yn).

Under these assumptions, the joint pdf of the rvs Yi’s is given by

fY1,...,Yn (y1, . . . , yn) = fX1,...,Xn (x1, . . . , xn)|J(x1, . . . , xn)|−1

where xi = hi(y1, . . . , yn), i = 1, 2, . . . , n.

117
Joint pdf of functions of n rvs: Example

Let X1, X2, . . . , Xn be i.i.d. exponential random variables with rate λ.


Let
Yi = X1 + . . . + Xi, i = 1, . . . , n.

(a) Find the joint density function of Y1, . . . , Yn.


(b) Use the result of part (a) to find the density of Yn.

118
Solution:

(a) The Jacobian here is given by


1 0 0 0 ... 0

1 1 0 0 ... 0

1 1 1 0 ... 0
J = . .. .. .. .. .. = 1
.. ..
. .. .. .. ..

1 1 1 1 ... 1

Since X1 = Y1, X2 = Y2 − Y1, X3 = Y3 − Y2, . . . , Xn = Yn − Yn−1, and


n −λ( n
i=1 xi )
fX1,...,Xn (x1, . . . , xn) = λ e , xi > 0, for all i,

119
we obtain
∑n
n −λ(y1 + i=2 (yi −yi−1 ) ) = λne−λyn ,
fY1,...,Yn (y1, . . . , yn) = λ e
where 0 < y1 < y2 < . . . < yn−1 < yn.

(b) To obtain the marginal pdf for Yn, we can do the following:
∫ yn [∫ y4 (∫ y3 ∫ y2 ) ]
fYn (yn) =λne−λyn ... ( dy1)dy2 dy3 . . . dyn−1
0 0 0 0
λn · ynn−1 −λyn
= e , yn > 0.
(n − 1)!

Thus, Yn is a gamma random variable with parameters n and λ.

120
Exchangeable random variables

121
Exchangeable random variables

The random variables X1, X2, . . . , Xn are said to be exchangeable if, for
every permutation i1, . . . , in of the integers 1, . . . , n,

P {Xi1 ≤ x1, Xi2 ≤ x2, . . . , Xin ≤ xn} = P {X1 ≤ x1, X2 ≤ x2, . . . , Xn ≤ xn}

for all x1, . . . , xn.

That is, the n random variables are exchangeable if their joint distribution
is the same no matter in which order the variables are observed.

122
Exchangeable random variables

Discrete random variables will be exchangeable if

P {Xi1 = x1, Xi2 = x2, . . . , Xin = xn} = P {X1 = x1, X2 = x2, . . . , Xn = xn}

for all permutations i1, . . . , in and all values x1, . . . , xn.

This is equivalent to stating that p(x1, . . . , xn) = P {X1 = x1, . . . , Xn =


xn} is a symmetric function of the vector x1, . . . , xn, which means that its
value does not change when the values of the vector are permuted.

123
Exchangeable random variables: Example

Let X1, X2, . . . , Xn be independent uniform (0, 1) random variables,


and denote their order statistics by X(1), . . . , X(n). That is, X(j) is the jth
smallest of X1, X2, . . . , Xn. Also, let

Y1 =X(1),
Yi =X(i) − X(i−1), i = 2, . . . , n.

Show that Y1, . . . , Yn are exchangeable.

124
Solution: The transformations

y1 = x1, . . . , yi = xi − xi−1, i = 2, . . . , n

yield
xi = y1 + . . . + yi, i = 1, . . . , n.
The Jacobian of the preceding transformations is equal to 1. Thus,

fY1,...,Yn (y1, y2, . . . , yn) = f (y1, y1 + y2, . . . , y1 + . . . + yn)

where f is the joint density function of the order statistics. Hence, we


obtain that

fY1,...,Yn (y1, y2, . . . , yn) = n!, 0 < y1 < y1 + y2 < . . . < y1 + . . . + yn < 1,

or, equivalently,

fY1,...,Yn (y1, y2, . . . , yn) = n!, 0 < yi < 1, y1 + . . . + yn < 1,

125
Because the preceding joint density is a symmetric function of y1, . . . , yn,
we see that the random variables Y1, . . . , Yn are exchangeable.

126
Summary

• Joint distribution functions


• Independent random variables
• Sums of independent random variables
• Conditional distributions: discrete case
• Conditional distributions: continuous case
• Order statistics
• Joint probability distribution of functions of random variables
• Exchangeable random variables

127

Anda mungkin juga menyukai