Stats 116 SU

STATS 116
Theory of Probability
Jointly Distributed Random Variables
Prathapasinghe Dharmawansa
Department of Statistics
Stanford University
Summer 2018
Agenda
• Joint distribution functions

• Independent random variables
• Sums of independent random variables
• Conditional distributions: discrete case
• Conditional distributions: continuous case
• Order statistics
• Joint probability distribution of functions of random variables
• Exchangeable random variables
1
Joint distribution functions
2
Joint cdf of two random variables
Joint cdf of two random variables

For any two random variables X and Y , the joint cumulative
probability distribution function (cdf) of X and Y is given by
F (a, b) = P {X ≤ a, Y ≤ b}, −∞ < a, b < ∞
All joint probability statements about X and Y can, in theory, be

answered in terms of their joint distribution function.
3
• Obtaining the distribution of X from the joint cdf of X and Y :
FX (a) =P {X ≤ a} = P {X ≤ a, Y < ∞}
( )
=P lim {X ≤ a, Y ≤ b}
b→∞
(1) (2)
= lim P {X ≤ a, Y ≤ b} = lim F (a, b) = F (a, ∞)
b→∞ b→∞
In (1) we have used the fact that probability is a continuous set (that
is, event) function. In (2), we have used the definition on the previous
page.
• Similarly, FY (b) = P {Y ≤ b} = lima→∞ F (a, b) = F (∞, b).
• FX (·) and FY (·) are referred to as the marginal distributions of X and
Y , respectively.
4
It can be shown that, whenever a1 < a2, b1 < b2,
P {a1 < X ≤ a2, b1 < Y ≤ b2}

= F (a2, b2) + F (a1, b1) − F (a1, b2) − F (a2, b1)
5
Joint probability mass function
Given two discrete random variables X and Y , the joint probability mass
function of X and Y is defined by
p(x, y) = P {X = x, Y = y}.
The probability mass function of X can be obtained from p(x, y) by

∑
pX (x) = P {X = x} = p(x, y).
y:p(x,y)>0
Similarly, ∑
pY (y) = P {Y = y} = p(x, y).
x:p(x,y)>0
6
Joint probability mass function: Example
Suppose that 3 balls are randomly selected from an urn containing 3

red, 4 white, and 5 blue balls. If we let X and Y denote, respectively,
the number of red and white balls chosen, then the joint probability mass
function of X and Y , p(i, j) = P {X = i, Y = j}, is given by
( ) ( ) ( )( ) ( )
5 12 10 4 5 12 40
p(0, 0) = / = , p(0, 1) = / =
3 3 220 1 2 3 220
( )( ) ( ) ( ) ( )
4 5 12 30 4 12 4
p(0, 2) = / = , p(0, 3) = / =
2 1 3 220 3 3 220
( )( ) ( ) ( )( )( ) ( )
3 5 12 30 3 4 5 12 60
p(1, 0) = / = , p(1, 1) = / =
1 2 3 220 1 1 1 3 220
( )( ) ( ) ( )( ) ( )
3 4 12 18 3 5 12 15
p(1, 2) = / = , p(2, 0) = / =
1 2 3 220 2 1 3 220
7
( )( ) ( ) ( ) ( )
3 4 12 12 3 12 1
p(2, 1) = / = , p(3, 0) = / = .
2 1 3 220 3 3 220
i \\\ j 0 1 2 3 P {X = i} (row sum)
10 40 30 4 84
0 220 220 220 220 220
30 60 18 108
1 220 220 220 0 220
15 12 27
2 220 220 0 0 220
1 1
3 220 0 0 0 220
P {Y = j} (col. sum) 56
220
112
220
48
220
4
220
8
In the previous table, the probability mass function (pmf) of X is
obtained by computing the row sums, whereas the pmf of Y is obtained
by computing the column sums. Because the individual pmfs of X and Y
thus appear in the margin of such a table, they are often referred to as the
marginal pmfs of X and Y , respectively.
9
Joint pdf of two random variables
Two random variables X and Y are said to be jointly continuous if

there exists a function f (x, y), defined for all real x and y, having
the property that, for every set C of pairs of real numbers (that
is, C is a set in the 2-D plane),
∫∫
P {(X, Y ) ∈ C} = f (x, y)dxdy.
(x,y)∈C
The function f (x, y) is called the joint probability density function

of X and Y .
10
• If A and B are any sets of real numbers, then, by defining C = {(x, y) :

x ∈ A, y ∈ B}, we have
∫ ∫
P {X ∈ A, Y ∈ B} = f (x, y)dxdy.
B A
• Since
∫ b ∫ a
F (a, b) = P {X ∈ (−∞, a], Y ∈ (−∞, b]} = f (x, y)dxdy,
−∞ −∞
it follows that
∂2
f (a, b) = F (a, b)
∂a∂b
wherever the partial derivatives are defined.
11
• Since
∫ b+db ∫ a+da
P {a < X < a + da, b < Y < b + db} = f (x, y)dxdy
b a
≈f (a, b)dadb
when da and db are small and f (x, y) is continuous at a, b, one can

see that f (a, b) is a measure of how likely it is that the random vector
(X, Y ) will be near (a, b).
12
If X and Y are jointly continuous, they are individually continuous, and

their probability density functions can be obtained as follows:
P {X ∈ A} =P {X ∈ A, Y ∈ (−∞, ∞)}
∫ ∫ ∞ ∫
= f (x, y)dydx = fX (x)dx
A −∞ A
where ∫ ∞
fX (x) = f (x, y)dy
−∞
is thus the probability density function of X. Similarly, the probability
density function of Y is given by
∫ ∞
fY (y) = f (x, y)dx.
−∞
13
Example: The joint density function of X and Y is given by

{
2e−xe−2y , 0 < x < ∞, 0 < y < ∞
f (x, y) = .
0, otherwise
Compute:
(a) P {X > 1, Y < 1}.

(b) P {X < Y }.
(c) P {X < a}.
14
Example: (solution) The joint density function of X and Y is given by

{
2e−xe−2y , 0 < x < ∞, 0 < y < ∞
f (x, y) = .
0, otherwise
Compute:
(a) P {X > 1, Y < 1} = e−1(1 − e−2).

(b) P {X < Y }=1/3.
(c) P {X < a}=1 − e−a.
15
Example: Consider choosing a point which is uniformly distributed within
a circle of radius R with its center at the origin. Define X and Y to be the
coordinates of the chosen point. The joint density function of X and Y is
given by (for(0,0)
some value x of c)
(X, Y)
{
R c, if x2 + y 2 ≤ R2
f (x, y) = .
0, if x2 + y 2 > R2
y
(a) Determine c.
(b) Find the marginal density functions
of X and Y .
(c) Let D denote the distance from the
origin of the selected point. Compute
the probability that D
is less than or equal to a.
(d) Find E[D].
16
Solution:
1
(a) Determine c. (c = πR 2 .)
(b) Find the marginal density functions of X and Y .
{ √
2 R2 −x2
πR2
, −R ≤ x ≤ R
fX (x) = .
0, otherwise
(c) Let D denote the distance from the origin of the selected point.
Compute the probability that D is less than or equal to a.
a2
P {D ≤ a} = 2 , 0 ≤ a ≤ R.
R
(d) Find E[D].

2R
E[D] = .
3
17
Joint distributions of n random variables
• Joint probability distributions for n random variables can be defined in

exactly the same manner as for n = 2.
• The joint cdf F (a1, a2, . . . , an) of the n random variables X1, X2, . . . , Xn
is defined by
F (a1, a2, . . . , an) = P {X1 ≤ a1, X2 ≤ a2, . . . , Xn ≤ an}.
• The n random variables are said to be jointly continuous if there exists

a function f (x1, x2, . . . , xn), called the joint pdf, such that, for any set
C in n-dimensional space,
P {(X1, X2, . . . , Xn) ∈ C}

∫ ∫
= ··· f (x1, . . . , xn)dx1dx2 . . . dxn
(x1 ,x2 ,...,xn )∈C
18
In particular, for any n sets of real numbers A1, A2, . . . , An,
P {X1 ∈ A1, X2 ∈ A2, . . . , Xn ∈ An}

∫ ∫ ∫
= ... f (x1, . . . , xn)dx1dx2 . . . dxn
An An−1 A1
19
Example: The multinomial distribution
One of the most important joint distributions is the multinomial

distribution, which arises when a sequence of n independent and identical
experiments is performed.
Suppose that each experiment can result in any ∑rone of r possible

outcomes, with respective probabilities p1, p2, . . . , pr , i=1 pi = 1.
Let Xi denote the number out of the n experiments that result in the
i-th outcome (i = 1, . . . , r), then
n!
P {X1 = n1, X2 = n2, . . . , Xr = nr } = pn1 1 pn2 2 . . . pnr r
n1!n2! . . . nr !
∑r
where i=1 ni = n. The joint distribution whose joint pmf is shown above
is called the multinomial distribution. When r = 2, the multinomial reduces
to the binomial distribution.
20
Independent random variables
21
The random variables X and Y are said to be independent if, for

any two sets of real numbers A and B,
P {X ∈ A, Y ∈ B} = P {X ∈ A}P {Y ∈ B}. (⋆)
In other words, X and Y are independent if, for all A and B, the
events EA = {X ∈ A} and FB = {Y ∈ B} are independent.
Using the three axioms of probability, Equation (⋆) follows if and only
if, for all a, b,
P {X ≤ a, Y ≤ b} = P {X ≤ a}P {Y ≤ b}.
22
• In terms of the joint distribution function F of X and Y : X and Y are

independent if for all a, b,
F (a, b) = FX (a)FY (b).
23
When X and Y are discrete random variables, the condition of
independence in the definition (Equation (⋆)) is equivalent to, for all
x, y,
p(x, y) = pX (x)pY (y). (⋆1)
Discussion of the equivalence:
• Consider the one-point sets A = {x} and B = {y}. The equivalence

follows since if Equation (⋆) is satisfied, then (⋆1) is obtained.
• Furthermore, if Equation (⋆1) is valid, then, for any sets A, B,
∑∑ ∑∑
P {X ∈ A, Y ∈ B} = p(x, y) = pX (x)pY (y)
y∈B x∈A y∈B x∈A
∑ ∑
= pY (y) pX (x) = P {Y ∈ B}P {X ∈ A},
y∈B x∈A
and Equation (⋆) is established.

24
In the jointly continuous case, the condition of independence is equivalent

to, for all x, y,
f (x, y) = fX (x)fY (y).
Comments:
• Roughly speaking, X and Y are independent if knowing the value of one

does not change the distribution of the other.
• Random variables that are not independent are said to be dependent.
25
Example
Suppose that n + m independent trials having a common probability of

success p are performed. If X is the number of successes in the first n
trials, and Y is the number of successes in the final m trials, then X and Y
are independent, since knowing the number of successes in the first n trials
does not affect the distribution of the number of successes in the final m
trials (by the assumption of independent trials). In fact, for integer x and
y, 0 ≤ x ≤ n, 0 ≤ y ≤ m,
( ) ( )
n x n−x m
P {X = x, Y = y} = p (1 − p) py (1 − p)n−y
x y
=P {X = x}P {Y = y}
In contrast, X and Z will be dependent, where Z is the total number of

successes in the n + m trials. (Why?)
26
Example
Suppose that the number of people who enter a post office on a given
day is a Poisson random variable with parameter λ. Show that if each
person who enters the post office is a male with probability p and a female
with probability 1p, then the number of males and females entering the post
office are independent Poisson random variables with respective parameters
λp and λ(1 − p).
27
Steps in the solution: Let X and Y denote, respectively, the number
of males and females that enter the post office. We shall show the
independence of X and Y by establishing Equation (⋆1).
• Consider P {X = i, Y = j}.
P {X = i, Y = j} =P {X = i, Y = j|X + Y = i + j}P {X + Y = i + j}
+ P {X = i, Y = j|X + Y ̸= i + j} P {X + Y ̸= i + j}
| {z }
=0
• P {X + Y = i + j} follows the assumption of the Poisson arrivals.

• P {X = i, Y = j|X + Y = i + j} is binomial.
• Thus, we obtain P {X = i, Y = j}. From this, we can sum over i to get
P {Y = j}. Similarly, we can get P {X = i}.
• By comparing P {X = i, Y = j} with P {X = i}P {Y = j}, we obtain
the desired result.
28
Example
Two people decide to meet at a certain location. If each of them

independently arrives at a time uniformly distributed between 12 noon and
1 P.M., find the probability that the first to arrive has to wait longer than
10 minutes.
29
Solution: Let X and Y denote, respectively, the time past 12 that
Person 1 and Person 2 arrive¿ Clearly, X and Y are independent random
variables, each uniformly distributed over (0, 60).
The desired probability, P {|X − Y | > 10}, can be written as P {Y >

X + 10} + P {X > Y + 10} = 2P {Y > X + 10} by symmetry of X and Y .
∫ 50 ∫ 60
1 1
2P {Y > X + 10} =2 dx dy
0 60 x+10 60
1250
=2 ×
3600
25
= .
36
30
A necessary and sufficient condition for the random variables X and Y

to be independent is for their joint pdf (or joint pmf in the discrete case)
f (x, y) to factor into two terms, one depending only on x and the other
depending only on y.
Proposition:
The continuous (discrete) random variables X and Y are
independent if and only if their joint probability density (mass)
function can be expressed as
fX,Y (x, y) = h(x)g(y), −∞ < x < ∞, −∞ < y < ∞.
31
Proof: Consider the continuous case.
• =⇒. Since independence implies that the joint density is the product
of the marginal densities of X and Y, so the preceding factorization will
hold when the random variables are independent.
• ⇐= Now, suppose that fX,Y (x, y) = h(x)g(y). Then,
∫ ∞∫ ∞ ∫ ∞ ∫ ∞
1= fX,Y (x, y)dxdy = h(x)dx · g(y)dy = C1C2.
−∞ −∞ | −∞ {z } | −∞ {z }
C1 C2
∫∞
In addition, since by definition fX (x) = −∞ fX,Y (x, y)dy = C2h(x)
∫∞
and fY (y) = −∞ fX,Y (x, y)dx = C1g(y), we have
C1 C2 =1
fX (x)fY (y) = C1C2h(x)g(y) = h(x)g(y) = fX,Y (x, y).
This shows that X, Y are independent.
32
• In general, the n random variables X1, X2, . . . , Xn are said to be

independent if, for all sets of real numbers A1, A2, . . . , An,
∏
n
P {X1 ∈ A1, X2 ∈ A2, . . . , Xn ∈ An} = P {Xi ∈ Ai}
i=1
• As before, it can be shown that this condition is equivalent to
∏
n
P {X1 ≤ a1, X2 ≤ a2, . . . , Xn ≤ an} = P {Xi ≤ ai},
i=1
for all a1, a2, . . . , an.

• Finally, an infinite collection of random variables is said to be independent
if every finite sub-collection of them is independent.
33
Example:
Let X, Y, Z be independent and uniformly distributed over (0, 1).

Compute P {X ≥ Y Z}.
34
Solution: Based on the given information,
fX,Y,Z (x, y, z) =fX (x) · fY (y) · fZ (z)

{
1, 0 ≤ x ≤ 1, 0 ≤ y ≤ 1, 0 ≤ z ≤ 1
=
0, otherwise.
Thus, ∫ 1 [∫ 1 (∫ 1 ) ]
P {X ≥ Y Z} = dx dz dy.
0 0 yz
35
Remark on the independence
Independence is a symmetric relation.
The random variables X and Y are independent if their joint density

function (or mass function in the discrete case) is the product of their
individual density (or mass) functions.
Therefore, to say that X is independent of Y is equivalent to saying

that Y is independent of X, or just that X and Y are independent.
36
Remark on the independence
Recall the chain rule for conditional probability:
P {X1 ≤ a1, . . . , Xn ≤ an} =P {X1 ≤ a1}P {X2 ≤ a2|X1 ≤ a1} . . .

. . . P {Xn ≤ an|X1 ≤ a1, . . . , Xn−1 ≤ an−1}.
From the above, one can see that the independence of X1, . . . , Xn can be
established sequentially, i.e., by showing that
• X2 is independent of X1
• X3 is independent of X1, X2
• X4 is independent of X1, X2, X3
• ...
• Xn is independent of X1, . . . , Xn−1.
37
Sums of independent random variables
38
Consider the distribution of X + Y from the distributions of X and

Y when X and Y are independent. Further assume that X and Y are
independent, continuous rvs with pdfs fX and fY . Then the cumulative
distribution function (cdf) of X + Y is obtained as follows:
∫∫
FX+Y (a) = P {X + Y ≤ a} = fX (x)fY (y)dxdy
x+y≤a
∫ ∞ [∫ a−y ]
= fX (x)dx fY (y)dy
−∞ −∞
∫ ∞
= FX (a − y)fY (y)dy.
−∞
39
From the previous page,

∫ ∞
FX+Y (a) = P {X + Y ≤ a} = FX (a − y)fY (y)dy.
−∞
Differentiating the above, we obtain the pdf fX+Y of X + Y , i.e.,

∫ ∞
dFX+Y (a) dFX (a − y)
fX+Y (a) = = fY (y)dy
da −∞ da
∫ ∞
= fX (a − y)fY (y)dy.
−∞
The probability density function fX+Y is called the convolution of the pdfs
fX and fY (the pdfs of X and Y , respectively).1
1
Note that this part is different from the textbook.
40
Sum of i.i.d. uniform random variables
Note: Independent and identically distributed (i.i.d.)
Example: If X and Y are independent random variables, both uniformly

distributed on (0, 1), find the pdf of X + Y .
41
Example: If X and Y are independent random variables, both uniformly

distributed on (0, 1), find the pdf of X + Y .
{
1, 0 < a < 1
Solution: Clearly, fX (a) = fY (a) = . Then
0, otherwise
∫ ∞ ∫ 1
fX+Y (a) = fX (a − y)fY (y)dy = fX (a − y)dy
−∞ 0
∫ a

∫0 1dy = a, 0<a≤1
1
= 1dy = 2 − a, 1 ≤ a < 2 .


a−1
0, otherwise
Here X + Y is said to have a triangular distribution due to the shape of its

pdf (Draw it!).
42
Let X1, X2, . . . , Xn be i.i.d. uniform (0, 1) random variables, and let
Fn(x) = P {X1 + . . . + Xn ≤ x}.
Show that Fn(x) = xn/n!, for 0 ≤ x ≤ 1.
Note that a general formula for Fn(x) is messy.
43
Let X1, X2, . . . , Xn be i.i.d. uniform (0, 1) random variables, and let
Fn(x) = P {X1 + . . . + Xn ≤ x}. Show that Fn(x) = xn/n!, for 0 ≤ x ≤ 1.
Proof: Use mathematical induction.
• Clearly, the result holds when n = 1.

• Suppose that the identity( F n−1 (x)
) = x n−1
/(n − 1)!, 0 ≤ x ≤ 1 is true.
∑n ∑n−1
Then write i=1 Xi = i=1 Xi + Xn . From previous discussions,
∫ ∫
1 x
(x − z)n−1 xn
Fn(x) = Fn−1(x − z)fXn (z)dz = dz = .
0 0 (n − 1)! n!
44
Example: Determine the expected number of i.i.d. uniform (0, 1) random

variables that need to be summed to exceed 1. That is, with X1, X2, . . .
being i.i.d. uniform (0, 1) random variables, determine E[N ], where
N = min{n : X1 + . . . + Xn > 1}.
45
Solution to the example shown on the previous page: N > n(> 0) if

and only if X1 + . . . + Xn ≤ 1. Thus, P {N > n} = Fn(1) = n!
1
, n > 0.
For n = 0, P {N > 0} = 1 = 0!1
. Therefore,
n−1
P {N = n} = P {N > n − 1} − P {N > n} = , n ≥ 1.
n!
With the above, E[N ] can be readily calculated:
∞
∑ ∞
∑ ∞
∑
n(n − 1) 1n
E[N ] = nP {N = n} = = = e.
n=1 n=1
n! n=0
n!
That is, the average number of i.i.d. uniform (0, 1) random variables that
must be summed for the sum to exceed 1 is equal to e.
46
Gamma random variables
Recall: A gamma random variable with parameters (t, λ) has a density

of the form
λ · e−λy · (λy)t−1
f (y) = , 0<y<∞
Γ(t)
An important property of this family of distributions is that, for a fixed

value of λ, it is closed under convolutions.
47
Sum of independent gamma random variables
Proposition
If X and Y are independent gamma random variables with
respective parameters (s, λ) and (t, λ), then X + Y is a gamma
random variable with parameters (s + t, λ).
48
Proof: From previous discussions,
∫ ∞
fX+Y (a) = fX (a − y)fY (y)dy
−∞
∫
λ · e−λ(a−y) · (λ(a − y))s−1 λ · e−λy · (λy)t−1
a
= · dy
0 Γ(s) Γ(t)
∫ a
=Ke−λa (a − y)s−1y t−1dy
0
∫ 1
x=y/a s+t−1 −λa
= a e K (1 − x)s−1xt−1dx = Cas+t−1e−λa
| 0 {z }
C
where C is a constant that does not depend on a. Sinc the pdf must
integrate to 1, the value of C is determined, and we have
λe−λa(λa)s+t−1
fX+Y (a) = .
Γ(s + t)
49
Using this Proposition and induction, the following can be readily

established.
If Xi, i = 1, . . . , n are independent gamma∑random variables with

n
respective parameters
∑n (ti , λ), i = 1, . . . , n, then i=1 Xi is gamma with
parameters ( i=1 ti, λ).
50
Example:
Let X1, X2, . . . , Xn be n i.i.d. exponential random variables, each

having parameter λ.
Then, since an exponential random variable with parameter λ is the

same as a gamma random variable with parameters (1, λ), it follows from
the Proposition that X1 + X2 + . . . + Xn is a gamma random variable with
parameters (n, λ).
51
Gamma and chi-squared random variables
Let∑Z1, Z2, . . . , Zn be i.i.d. standard normal random variables. Then

n 2 2
Y = i=1 Zi is said to have the chi-squared (χ ) distribution with n
degrees of freedom (χ2n) .
• Consider the case when n = 1, i.e., Y = Z12. From an example in

previous chapters, the pdf of Y is given by
1 √ √
fZ 2 (y) = √ [fZ1 ( y) + fZ1 (− y)]
1 2 y
1 2
= √ √ e−y/2
2 y 2π
1
1 −y/2 (y/2) 2 −1
= e · √
2 π
52
(1 1
)
which is the gamma distribution with parameters ,
2 2 .
• A by-product of the above analysis is that
√
Γ(1/2) = π.
(1 )
• But since each Zi2 is gamma 1
2, 2, using the previous proposition, it
follows that χ2n is just the gamma distribution with parameters (n/2, 1/2)
and its pdf is given by
n
1
· e−y/2 · (y/2) 2 −1
fY (y) = 2
, 0<y<∞
Γ(n/2)
53
Normal random variables
Proposition
If Xi, i = 1, . . . , n, are independent random variables that are
2
normally distributed
∑n with respective parameters (µ i , σ i ), i =
1,∑. . . , n, then
∑ i=1 Xi is normally distributed with parameters
n n
( i=1 µi, i=1 σi2).
54
Proof: Steps:
(1) To begin, let X and Y be independent normal rvs with X having mean
0 and variance σ 2 and Y having mean 0 and variance 1. It can be
shown that X + Y is normal with mean 0 and variance 1 + σ 2.
(2) Further, Let X1 and X2 be independent normal rvs with Xi having
mean µi and variance σi2, i = 1, 2. Then
 
 
 
 
 X1 − µ1 X − µ 
X + X}2 
= σ2  +σ2
2 2  +µ1 + µ2
| 1 {z 
| σ | {z }
σ
 ({z 2 )}
2 2
2 +σ 2 )
N (µ1 +µ2 ,σ1 
2
 σ1 N (0,1) 
N 0, 2
σ2
| {z
( )
}
σ2
Step (1):N 0,1+ 1
2
σ2
(3) Finally, using induction, we can obtain the general case as shown in the
proposition.
55
Normal random variables: Example
A basketball team will play a 44-game season. Among the 44 games,

26 of them are against class A teams and 18 are against class B teams.
Suppose that the team will win each game against a class A team with
probability 0.4 and will win each game against a class B team with
probability 0.7. Suppose also that the results of the different games are
independent. Approximate the probability that
(a) the team wins 25 games or more;

(b) the team wins more games against class A teams than it does against
class B teams.
56
Solution: Let XA and XB respectively denote the number of games the
team wins against class A and against class B teams. XA and XB are
independent binomial random variables and
E[XA] =26(.4) = 10.4; Var(XA) = 26(.4)(.6) = 6.24

E[XB ] =18(.7) = 12.6; Var(XB ) = 18(.7)(.3) = 3.78
By the normal approximation to the binomial, XA and XB will have

approximately the same distribution as would independent normal random
variables with the preceding expected values and variances. Therefore,
XA + XB , XA − XB are also normal with respective means and variances.
The solution is complete by noting that the required probabilities are
given by
(a) P {XA + XB ≥ 25}

(b) P {XA − XB ≥ 1}
57
Lognormal random variables
The random variable Y is said to be a lognormal random variable

with parameters µ and σ if log(Y ) is a normal random variable
with mean µ and variance σ 2. That is, Y is lognormal if it can be
expressed as
Y = eX
where X is a normal random variable.
58
Lognormal random variables: Example
Starting at some fixed time, let S(n) denote the price of a certain security
at the end of n additional weeks, n ≥ 1. A popular model for the evolution
of these prices assumes that the price ratios S(n)/S(n − 1), n ≥ 1,
are independent and identically distributed lognormal random variables.
Assuming this model, with parameters µ = 0.0165, σ = 0.0730, what is the
probability that
(a) the price of the security increases over each of the next two weeks?
(b) the price at the end of two weeks is higher than it is today?
59
Solution: Let Z be a standard normal random variable.
(a) Note that x > 1 if and only if log(x) > log(1) = 0. As a result, we
have
{ } { ( ) }
S(1) S(1)
P >1 =P log >0
S(0) S(0)
 ( ) 

 log S(1) 
S(0) − µ 0 − µ
=P >

| σ
{z } σ 
Z
60
(b) Similarly, note that
{ } { ( ) }
S(2) S(1) S(2)
P > 1 =P log · >0
S(0) S(0) S(1)
 

 

 ( S(1) ) (
S(2)
) 
=P log + log >0

 S(0) S(1) 

| {z } | {z } 
Z1 Z2
where Z1, Z2 are independent normal random variables. Thus, Z1 + Z2

is normal with corresponding mean and variance. The similar method
to that used in (a) can be applied.
61
Sums of independent Poisson random variables
If X and Y are independent Poisson random variables with respective

parameters λ1 and λ2, compute the distribution of X + Y .
62
Sums of independent Poisson random variables
Solution: The event {X + Y = n} may be written as the union of the

disjoint events {X = k, Y = n − k}, 0 ≤ k ≤ n, we have
∑
n
P {X + Y = n} = P {X = k, Y = n − k}
k=0
∑n
= P {X = k}P {Y = n − k}
k=0
∑n
λk1 −λ1 λ2n−k −λ2 e−(λ1+λ2) ∑ λk1 λn−k
n
n!
2
= e e =
k! (n − k)! n! k!(n − k)!
k=0 k=0
−(λ1 +λ2 ) (λ1 + λ2)n

=e · .
n!
Therefore, X + Y is Poisson with parameter λ1 + λ2.
63
Sums of independent binomial random variables
Let X and Y be independent binomial random variables with respective

parameters (n, p) and (m, p), respectively. Compute the distribution of
X +Y.
64
Sums of independent binomial random variables
( ) ( )( )
n+m ∑i n m
Solution: Note that = k=0 .
i k i−k
∑
i
P {X + Y = i} = P {X = k, Y = i − k}
k=0
∑
i
= P {X = k}P {Y = i − k}
k=0
i ( )
∑ ( )
n k n−k m i−k m−(i−k)
= p (1 − p) p (1 − p)
k i−k
k=0
∑i ( )( )
n m i n+m−i
= p (1 − p)
k i−k
k=0
( )
n+m i n+m−i
= p (1 − p) =⇒ X + Y binomial (n + m, p)
i
65
Geometric random variables
Let X1, . . . , Xn be independent geometric random variables, with Xi

having parameter pi for i =∑ 1, . . . , n. We have the following proposition on
n
the pmf of their sum Sn = i=1 Xi.
Proposition
Let X1, . . . , Xn be independent geometric random variables, with
Xi having parameter pi for i = 1, . . . , n. Define qi = 1 − pi,
i = 1, . . . , n. If all the pi are distinct, then, for k ≥ n,
 
∑
n ∏ pj 
k−1 
P {Sn = k} = pi q i .
i=1
pj − pi
j̸=i
66
Proof: The proof is based on induction. When n = 1, the result is

trivially true. Now consider the case when n = 2.
∑
k
P {S2 = k} =P {X1 + X2 = k} = P {X1 = j}P {X2 = k − j}
j=0
∑
k−1 ∑
k−1
= P {X1 = j}P {X2 = k − j} = q1j−1p1q2k−j−1p2
j=1 j=1
 ( )k−1 
∑( )j−1 −
k−1 q1
q1  1 q2 
=p1p2q2k−2 = p1p2q2k−2  
j=1
q2 1 − qq12
p1 p2
=p2q2k−1 + p1q1k−1
p1 − p2 p2 − p1
67
Proof of the Proposition (cont’d):
Consider n = 3 to gain more insight.
∑
k
P {S3 = k} =P {X1 + X2 + X3 = k} = P {S2 = j}P {X3 = k − j}
j=0
∑
k−1
= P {S2 = j}P {X3 = k − j}
j=1
p2 p3 p1 p3
=p1q1k−1 · + p2q2k−1 ·
(p2 − p1) (p3 − p1) (p1 − p2) (p3 − p2)
p2 p1
+ p3 q 3 ·
k−1
(p2 − p3) (p1 − p3)
68
Proof of the Proposition (cont’d):

Consider the general case. Suppose that the proposition holds true for
Sn−1 and for P {Sn = r}, r ≤ k − 1.
• To compute P {Sn = k}, we condition on whether Xn = 1.
P {Sn = k} =P {Sn = k|Xn = 1}pn + P {Sn = k|Xn > 1}qn

P {Sn = k|Xn = 1} =P {Sn−1 = k − 1|Xn = 1}
independence
= P {Sn−1 = k − 1}
∑
n−1 ∏ pj
= piqik−2
i=1
pj − pi
j̸=i
j≤(n−1)
69
• If X is geometric with parameter p, then the conditional distribution of
X given X > 1 is the same as the distribution of 1 (the first failed trial)
plus a geometric with parameter p (the number of additional trials after
the first until a success occurs). Therefore,
P {Sn = k|Xn > 1} =P {X1 + X2 + . . . + Xn + 1 = k}

=P {Sn = k − 1}
∑
n ∏ pj
= piqik−2
i=1
pj − pi
j̸=i
j≤n
70
• We obtain P {Sn = k} as follows:
 
∑
n−1 ∏ pj 
P {Sn = k} =pn 
 p q k−2 
pj − pi 
i i
i=1 j̸=i
j≤(n−1)
 
∑ n ∏ pj 
+ qn 

k−2
piqi 

i=1
p j − p i
j̸=i
j≤n
After careful calculations to obtain the following, the proof is completed:
∑
n ∏ pj
P {Sn = k} = piqik−1 .
i=1
pj − pi
j̸=i
71
Conditional distributions: discrete case
72
Condition distributions: Discrete case
• Recall: that, for any two events E and F , the conditional probability of
E given F is defined, provided that P (F ) > 0, by
P (EF )
P (E|F ) = .
P (F )
• Let X and Y be discrete random variables. The conditional probability

mass function of X given that Y = y is given by
P {X = x, Y = y} p(x, y)
pX|Y (x|y) = P {X = x|Y = y} = =
P {Y = y} pY (y)
for all values of y such that pY (y) > 0.
73
• Similarly, the conditional probability distribution function (cdf) of X
given that Y = y is defined, for all y such that pY (y) > 0, by
∑
FX|Y (x|y) = P {X ≤ x|Y = y} = pX|Y (a|y).
a≤x
• If X is independent of Y , then the conditional pmf and the conditional

cdf are the same as the respective unconditional ones, since
P {X = x, Y = y}
pX|Y (x|y) =P {X = x|Y = y} =
P {Y = y}
P {X = x}P {Y = y}
=
P {Y = y}
=P {X = x}.
74
Condition distributions – Discrete case: Example 1
Suppose that p(x, y), the joint probability mass function of X and Y , is
given by
p(0, 0) = 0.4, p(0, 1) = 0.2, p(1, 0) = 0.1, p(1, 1) = .3.
Calculate the conditional pmf of X given that Y = 1.
75
Solution:
Consider first pY (Y = 1).

∑
pY (1) = p(x, 1) = p(0, 1) + p(1, 1) = 0.5.
x
Thus,
p(0, 1) 0.2
pX|Y (0|1) = = = 0.4,
pY (1) 0.5
and
p(1, 1) 0.3
pX|Y (1|1) = = = 0.6.
pY (1) 0.5
76
If X and Y are independent Poisson random variables with respective

parameters λ1 and λ2, calculate the conditional pmf of X given that
X + Y = n.
77
Solution: For 0 ≤ k ≤ n, the required condition pmf is given by
P {X = k|X + Y = n} =P {X = k, Y = n − k|X + Y = k}
P {X = k, Y = n − k, X + Y = n}
=
P {X + Y = n}
P {X = k, Y = n − k}
=
P {X + Y = n}
P {X = k} · P {Y = n − k}
=
P {X + Y = n}
Recall that X + Y is a Poisson random variable with parameter λ1 + λ2.

Therefore,
78
−λ1 λk −λ2 λn−k
e · 1
k! ·e · (n−k)!
2
P {X = k|X + Y = n} = (λ1 +λ2 )n

e−(λ1+λ2) · n!
( )( )k ( )n−k
n λ1 λ2
= .
k λ1 + λ2 λ1 + λ2
Based on the above, the conditional distribution of X given that X +Y = n

is the binomial distribution with parameters n and λ1/(λ1 + λ2).
79
Consider the multinomial distribution with joint pmf
n! n
∑k
P {Xi = ni, i = 1, . . . , k} = pn1 1 . . . pk k , ni ≥ 0, ni = n.
n 1 ! . . . nk ! i=1
Such a mass function results when n independent trials∑kare performed, with

each trial resulting in outcome i with probability pi, i=1 pi = 1.
The random variables Xi, i = 1, . . . , k, represent, respectively, the

number of trials that result in outcome i, i = 1, . . . , k.
80
(Cont’d)
Suppose we are given that nj of the trials resulted in outcome j, for

∑k
j = r + 1, . . . , k, where j=r+1 nj = m ≤ n. Then, each of the other
n − m trials must have resulted in one of the trials 1, . . . , r.
Show that the conditional distribution of X1, . . . , Xr is the multinomial

distribution on n − m trials with respective trial outcome probabilities
pi
P {outcome i|outcome is not any of r + 1, . . . , k} = , i = 1, . . . , r
Fr
∑r
where Fr = i=1 pi is the probability that a trial results in one of the
outcomes 1, . . . , r.
81
∑r
Proof: Let n1, . . . , nr be such that i=1 ni = n − m. The conditional
probability here is given by
P {X1 = n1, . . . , Xr = nr |Xr+1 = nr+1, . . . , Xk = nk }

P {X1 = n1, . . . , Xr = nr , Xr+1 = nr+1, . . . , Xk = nk }
=
P {Xr+1 = nr+1, . . . , Xk = nk }
n! n1 nr nr+1 nk
p
n1 !...nk ! 1 . . . p p
r r+1 . . . p k
= n! n−m nr+1 nk
(n−m)!nr+1 !...nk ! F r p r+1 . . . p k
where the probability in the denominator was obtained by regarding

outcomes 1, . . . , r as a single outcome having probability Fr , thus showing
that the probability is a multinomial probability on n trials with outcome
probabilities Fr , pr+1, . . . , pk .
82
∑r
Note that i=1 ni = n − m. The above conditional probability can be
further simplified as
P {X1 = n1, . . . , Xr = nr |Xr+1 = nr+1, . . . , Xk = nk }

( )n1 ( )n2 ( )nr
(n − m)! p1 p2 pr
= .
n1! . . . nr ! Fr Fr Fr
This concludes the proof.
83
Consider n independent trials, with each trial being a success with

probability p. Given a total of k successes, show that all possible orderings
of the k successes and n − k failures are equally likely.
84
Solution: Let a particular realization with k successes (and n−k failures)

be denoted Ak . For example, Ak = [s f s s . . .]. Denote as Tk the event
that out of n independent trials there are k successes. Then the conditional
probability P {Ak |Tk } is given by
P {Ak , Tk }
P {Ak |Tk } =
P {Tk }
P {Ak } pk (1 − p)n−k
= =( )
P {Tk } n k
p (1 − p)n−k
k
1
=( )
n
k
85
Conditional distributions: continuous case
86
Condition distributions: continuous case
• If X and Y have a joint probability density function f (x, y), then for
all values of y such that fY (y) > 0, the conditional probability density
function of X given that Y = y is defined by
f (x, y)
fX|Y (x|y) = .
fY (y)
• Insight: Note that
f (x, y)dxdy P {x ≤ X ≤ x + dx, y ≤ Y ≤ y + dy}

fX|Y (x|y)dx = ≈
fY (y)dy P {y ≤ Y ≤ y + dy}
=P {x ≤ X ≤ x + dx|y ≤ Y ≤ y + dy}.
Therefore, for small values of dx and dy, fX|Y (x|y)dx represents the
conditional probability that X is between x and x + dx given that Y is
between y and y + dy.
87
Condition distributions: continuous case
• Let X and Y be jointly continuous. Then conditional probabilities of

events associated with X given the value of Y can be determined, i.e.,
for any set A,
∫
P {X ∈ A|Y = y} = fX|Y (x|y)dx.
A
• In particular, let A = (∞, a]. Define the conditional cumulative

distribution function of X given that Y = y by
∫ a
FX|Y (a|y) = P {X ≤ a|Y = y} = fX|Y (x|y)dx.
−∞
• One can interpret the definitions of the conditional probabilities by

considering a small interval around y, even though the event Y = y has
probability 0.
88
Condition distributions – continuous case: Example 1
Suppose that the joint pdf of X and Y is given by

{
e−x/y ·e−y
y 0 < x < ∞, 0 < y < ∞
f (x, y) = .
0 otherwise
Find P {X > 1|Y = y}.
89
Solution: Steps:
1. First obtain the conditional density of X given that Y = y.
f (x, y) f (x, y)
fX|Y (x|y) = = ∫∞
fY (y) 0
f (x, y)dx
1 −x/y
= ·e
y
2. Hence, ∫ ∞
1 −x/y
P {X > 1|Y = y} = ·e dx = e−1/y .
1 y
90
The bivariate normal distribution
One of the most important joint distributions is the bivariate normal

distribution.
• The random variables X, Y are said to have a bivariate normal distribution

if, for constants µx, µy , σx(σx > 0), σy (σy > 0), −1 < ρ < 1, their joint
density function is given, for all −∞ < x, y < ∞, by
{ [( )2
1 1 x − µx
f (x, y) = √ exp −
2πσxσy 1 − ρ2 2(1 − ρ2) σx
( )2 ]}
y − µy (x − µx)(y − µy )
+ − 2ρ .
σy σxσy
91
We now determine the conditional density of X given that Y = y.
• Method: Continually collect all factors that do not depend on x and

represent them by the∫ ∞ constants Ci. The final constant will then be
found by using that −∞ fX|Y (x|y)dx = 1.
Details are given on the next page.
• Similarly, we can obtain the conditional density of Y given that X = x.
92
f (x, y)
fX|Y (x|y) = = C1f (x, y)
fY (y)
{ [( )2 ]}
1 x − µx x(y − µy )
=C2 exp − − 2ρ
2(1 − ρ2) σx σxσy
{ [ ( )]}
1 ρσx
=C3 exp − 2 x − 2x µx +
2
(y − µy )
2σx(1 − ρ2) σy
{ [ ( )]2}
1 ρσx
=C4 exp − 2 x − µ x + (y − µy )
2σx(1 − ρ )2 σy
The preceding equation is a normal density! Thus, given Y = y, the

σy (y − µy )
random variable X is normally distributed with mean µx + ρσ x
and variance σx2 (1 − ρ2).
93
• The random variables X, Y are said to have a bivariate normal distribution

if, for constants µx, µy , σx(σx > 0), σy (σy > 0), −1 < ρ < 1, their joint
density function is given, for all −∞ < x, y < ∞, by
{ [( )2
1 1 x − µx
f (x, y) = √ exp −
2πσxσy 1 − ρ2 2(1 − ρ2) σx
( )2 ]}
y − µy (x − µx)(y − µy )
+ − 2ρ .
σy σxσy
• From the above, we can find the marginal pdfs of X and Y , respectively.
In fact, one can show that X is normal with mean µx and variance σx2 .
Similarly, Y is normal with mean µy and variance σy2.
94
Condition distributions – continuous case: Discussions
• If X and Y are independent continuous random variables, the conditional

density of X given that Y = y is the unconditional density of X, since
in the independent case,
f (x, y) fX (x)fY (y)

fX|Y (x|y) = = = fX (x).
fY (y) fY (y)
• The conditional distributions can be defined when the random variables

are neither jointly continuous nor jointly discrete.
95
– For example, let X be a continuous rv with pdf f and let N be a
discrete rv. Consider the conditional probability density function of X
given that N = n.
P {x < X < x + dx|N = n}

fX|N (x|n) = lim
dx→0 dx
P {N = n|x < X < x + dx} P {x < X < x + dx}
= lim ·
dx→0 P {N = n} dx
P {N = n|X = x}
= · f (x)
P {N = n}
96
Consider n + m trials having a common probability of success. Suppose,

however, that this success probability is not fixed in advance but is chosen
from a uniform (0, 1) population. What is the conditional distribution of
the success probability given that the n + m trials result in n successes?
97
Solution: Let X denote the probability that a given trial is a success,

then X is a uniform (0, 1) random variable. Given that X = x, the n + m
trials are independent with common probability of success x. Thus, the
number of successes, N , is a binomial random variable with parameters
(n + m, x). Hence, the conditional density of X given that N = n is
( )
n+m n
x (1 − x)m
P {N = n|X = x} n
fX|N (x|n) = fX (x) = fX (x)
P {N = n} P {N = n}
fX (x)=1
= c · xn(1 − x)m, 0 < x < 1
where c does not depend on x. Thus, the conditional density is that of a

beta random variable with parameters (n + 1, m + 1).
98
Based on the example, if the original or prior (to the collection of data)
distribution of a trial success probability is uniformly distributed over (0, 1)
[or, equivalently, is beta with parameters (1, 1)], then the posterior (or
conditional) distribution given a total of n successes in n + m trials is beta
with parameters (1 + n, 1 + m).
This result enhances our intuition as to what it means to assume that a

random variable has a beta distribution.
99
Order statistics
100
Order statistics
Let X1, X2, . . . , Xn be n i.i.d. continuous random variables having a

common density f and distribution function F . Define
X(1) = smallest of X1, X2, . . . , Xn

X(2) = second smallest of X1, X2, . . . , Xn
..
X(j) = j-th smallest of X1, X2, . . . , Xn

..
X(n) = largest of X1, X2, . . . , Xn
The ordered values X(1) ≤ X(2) ≤ . . . ≤ X(n) are known as the order
statistics corresponding to the random variables X1, X2, . . . , Xn, i.e.,
X(1), . . . , X(n) are the ordered values of X1, X2, . . . , Xn.
101
Joint density function of the order statistics
The order statistics X(1), . . . , X(n) will take on the values x1 ≤ x2 ≤
. . . ≤ xn if and only if, for some permutation (i1, i2, . . . , in) of (1, 2, . . . , n),
X1 = xi1 , X2 = xi2 , . . . , Xn = xin . Since , for any permutation (i1, . . . , in)
of (1, 2, . . . , n),
[ ϵ ϵ ϵ ϵ]
P xi1 − < X 1 ≤ xi1 + , . . . , x in − < X n ≤ xin +
2 2 2 2
≈ ϵnfX1,...,Xn (xi1 , . . . , xin ) = ϵnf (xi1 ) . . . f (xin ) = ϵnf (x1) . . . f (xn)
for x1 < x2 < . . . < xn, we have

[ ϵ ϵ ϵ ϵ]
P x1 − < X(1) ≤ x1 + , . . . , xn − < X(n) ≤ xn +
2 2 2 2
≈ n!ϵnf (x1) . . . f (xn).
=⇒ fX(1),...,X(n) (x1, x2, . . . , xn) = n!f (x1) . . . f (xn), x1 < x 2 < . . . < x n .
102
Order statistics: Example 1
Along a road 1 mile long are 3 people “distributed at random”. Find

the probability that no 2 people are less than a distance of d miles apart
when d ≤ 1/2.
103
Order statistics: Example 1 – Solution
The meaning of “distributed at random” is taken as having the positions

of the 3 people as i.i.d. uniformly distributed over (0, 1) (miles). Let
Xi denote the position of the i-th person, then the desired probability is
P {X(3) > X(2) + d, X(2) > X(1) + d}.
Clearly, the desired probability can be calculated as
∫ ∫
... fX(1),...,X(n) (x1, x2, . . . , xn)dx1 . . . dxn
X(3) >X(2) +d
X(2) >X(1) +d
∫ 1−2d ∫ 1−d ∫ 1
= 3! dx1 dx2 dx3
0 x1 +d x2 +d
= (1 − 2d)3
104
Joint probability distribution of functions of random
variables
105
Joint pdf of functions of random variables
Let X1 and X2 be jointly continuous rvs with joint pdf fX1,X2 . Let
Y1 = g1(X1, X2), Y2 = g2(X1, X2)
for some functions g1 and g2, where g1 and g2 satisfy the following:
1. The equations y1 = g1(x1, x2), y2 = g2(x1, x2) can be uniquely solved

for x1 and x2 in terms of y1 and y2, with solutions denoted as x1 =
h1(y1, y2), x2 = h2(y1, y2).
2. The functions g1 and g2 have continuous partial derivatives at all points
(x1, x2) and are such that the 2 × 2 determinant

∂g1 ∂g1 ∂g ∂g
∂x1 ∂x2 1 2 ∂g1 ∂g2
J(x1, x2) = ∂g2 ∂g2 = − ̸= 0
∂x ∂x ∂x1 ∂x2 ∂x2 ∂x1
1 2
at all points (x1, x2).
106
Joint pdf of functions of random variables
Under the two conditions stated in the previous page, it can be shown
that the random variables Y1 and Y2 are jointly continuous with joint density
function given by
fY1,Y2 (y1, y2) = fX1,X2 (x1, x2)|J(x1, x2)|−1,
where x1 = h1(y1, y2), x2 = h2(y1, y2).
• Note that J(x1, x2) is the determinant which can be positive or negative,
but |J(x1, x2)| is its absolute value. We have used the notation |A| as
the determinant of a square matrix A, and |a| for the absolute value of
a real number a.
• The proof of the result will be left as an exercise.
107
Joint pdf of functions of rvs: Example 1
Let X1 and X2 be jointly continuous random variables with probability

density function fX1,X2 . Let Y1 = X1 + X2, Y2 = X1 − X2. Find the joint
density function of Y1 and Y2 in terms of fX1,X2 .
108
Solution: Let g1(x1, x2) = x1 + x2 and g2(x1, x2) = x1 − x2. Then

1 1
J(x1, x2) = = −2.
1 −1
Since the solution to the equations y1 = x1 + x2 and y2 = x1 − x2 is given

by x1 = (y1 + y2)/2, x2 = (y1 − y2)/2 as their solution, it follows from the
previous result that the desired density is
( )
1 y1 + y2 y1 − y2
fY1,Y2 (y1, y2) = fX1,X2 , .
2 2 2
109
Joint pdf of functions of rvs: Example 1 – Applications
1. If X1 and X2 are i.i.d. uniform (0, 1) random variables, then

{
1
2, 0 ≤ y1 + y2 ≤ 2, 0 ≤ y1 − y2 ≤ 2
fY1,Y2 (y1, y2) = .
0, otherwise
2. If X1 and X2 are independent exponential rvs with parameters λ1 and

λ2, respectively, then
fY1,Y2 (y1, y2)

{ λ (y +y ) λ (y −y )
− 1 12 2 − 2 12 2
2 ·e ·e , y1 + y2 ≥ 0, y1 − y2 ≥ 0
λ1 λ2
= .
0, otherwise
110
Joint pdf of functions of rvs: Example 1 – Applications
3. If X1 and X2 are i.i.d. standard normal random variables, then
1 −(y1 +y2 )2 /8 −(y1 −y2 )2 /8

fY1,Y2 (y1, y2) = · e ·e
4π
1 −(y12 +y22 )/4
= ·e
4π
1 2 1 2
= √ · e−y1 /4 √ · e−y2 /4
4π 4π
Thus, not only do we obtain that both X1 + X2 and X1 − X2 are normal

with mean 0 and variance 2, but we also conclude that these two random
variables are independent.2
2
In fact, it can be shown that if X1 and X2 are independent random variables having a common
distribution function F , then X1 + X2 will be independent of X1 − X2 if and only if F is a normal
distribution function.
111
Let (X, Y ) denote a random point in the plane, and assume that the
rectangular coordinates X and Y are independent standard normal random
variables.
Determine the joint pdf of R, Θ, the polar coordinate representation of

(x, y).
R
Y
ȣ
X
112
Solution: Suppose first that X, Y are both positive. For x > 0, y > 0,
let √
r = g1(x, y) = x2 + y 2, θ = g2(x, y) = tan−1(y/x),
we see that
∂g1 x ∂g1 y
=√ , =√
∂x x2 + y 2 ∂y x2 + y 2
∂g2 −y ∂g2 x
= 2 2
, = 2 2
.
∂x x + y ∂y x +y
Then
1
J(x, y) = .
r
113
Since the conditional joint pdf given that X, Y are both positive is
f (x, y) 2 2 2
f (x, y|X > 0, Y > 0) = = e−(x +y )/2, x > 0, y > 0,
P (X > 0, Y > 0) π
we have the conditional joint pdf of R, Θ given that X, Y are both positive
given by
2r −r2/2 π
f (r, θ|X > 0, Y > 0) = e , r > 0, 0 < θ < .
π 2
Similarly, one can find the following:
2r −r2/2 π
f (r, θ|X < 0, Y > 0) = e , r > 0, < θ < π
π 2
2r −r2/2 3π
f (r, θ|X < 0, Y < 0) = e , r > 0, π < θ <
π 2
2r −r2/2 3π
f (r, θ|X > 0, Y < 0) = e , r > 0, < θ < 2π
π 2
114
Thus, the joint pdf of R, Θ is given by
r −r2/2
f (r, θ) = e , r > 0, 0 < θ < 2π.
2π
From the above, we can find the marginal pdf for R, which is a Rayleigh
distribution:
−r 2 /2
fR(r) = re , r > 0.
This implies that
• Θ is uniformly distributed over {(0, 2π)};

• R and Θ are independent!!
115
Joint pdf of functions of n rvs
Let the joint pdf of the n random variables X1, X2, . . . , Xn be given.
Consider the joint pdf of Y1, Y2, . . . , Yn, where
Y1 = g1(X1, . . . , Xn), Y2 = g2(X1, . . . , Xn), . . . , Yn = gn(X1, . . . , Xn).
Assume that
• the functions gi have continuous partial derivatives;

• and that the Jacobian determinant
∂g
1 ∂g1 . . . ∂g1
∂x1 ∂x2 ∂xn
∂g2 ∂g2 ∂g
∂x1 ∂x2 . . . ∂xn2
J(x1, . . . , xn) = .
. .. ... .. ̸= 0
∂gn ∂gn
∂x ∂x . . . ∂x
∂g n
1 2 n
116
at all points (x1, . . . , xn).
• Furthermore, we suppose that the equations y1 = g1(x1, . . . , xn), y2 =
g2(x1, . . . , xn), . . . , yn = gn(x1, . . . , xn) have a unique solution
denoted as
x1 = h1(y1, . . . , yn), . . . , xn = hn(y1, . . . , yn).
Under these assumptions, the joint pdf of the rvs Yi’s is given by
fY1,...,Yn (y1, . . . , yn) = fX1,...,Xn (x1, . . . , xn)|J(x1, . . . , xn)|−1
where xi = hi(y1, . . . , yn), i = 1, 2, . . . , n.
117
Joint pdf of functions of n rvs: Example
Let X1, X2, . . . , Xn be i.i.d. exponential random variables with rate λ.

Let
Yi = X1 + . . . + Xi, i = 1, . . . , n.
(a) Find the joint density function of Y1, . . . , Yn.

(b) Use the result of part (a) to find the density of Yn.
118
Solution:
(a) The Jacobian here is given by

1 0 0 0 ... 0

1 1 0 0 ... 0

1 1 1 0 ... 0
J = . .. .. .. .. .. = 1
.. ..
. .. .. .. ..

1 1 1 1 ... 1
Since X1 = Y1, X2 = Y2 − Y1, X3 = Y3 − Y2, . . . , Xn = Yn − Yn−1, and
∑
n −λ( n
i=1 xi )
fX1,...,Xn (x1, . . . , xn) = λ e , xi > 0, for all i,
119
we obtain
∑n
n −λ(y1 + i=2 (yi −yi−1 ) ) = λne−λyn ,
fY1,...,Yn (y1, . . . , yn) = λ e
where 0 < y1 < y2 < . . . < yn−1 < yn.
(b) To obtain the marginal pdf for Yn, we can do the following:
∫ yn [∫ y4 (∫ y3 ∫ y2 ) ]
fYn (yn) =λne−λyn ... ( dy1)dy2 dy3 . . . dyn−1
0 0 0 0
λn · ynn−1 −λyn
= e , yn > 0.
(n − 1)!
Thus, Yn is a gamma random variable with parameters n and λ.
120
Exchangeable random variables
121
The random variables X1, X2, . . . , Xn are said to be exchangeable if, for
every permutation i1, . . . , in of the integers 1, . . . , n,
P {Xi1 ≤ x1, Xi2 ≤ x2, . . . , Xin ≤ xn} = P {X1 ≤ x1, X2 ≤ x2, . . . , Xn ≤ xn}
for all x1, . . . , xn.
That is, the n random variables are exchangeable if their joint distribution
is the same no matter in which order the variables are observed.
122
Discrete random variables will be exchangeable if
P {Xi1 = x1, Xi2 = x2, . . . , Xin = xn} = P {X1 = x1, X2 = x2, . . . , Xn = xn}
for all permutations i1, . . . , in and all values x1, . . . , xn.
This is equivalent to stating that p(x1, . . . , xn) = P {X1 = x1, . . . , Xn =

xn} is a symmetric function of the vector x1, . . . , xn, which means that its
value does not change when the values of the vector are permuted.
123
Exchangeable random variables: Example
Let X1, X2, . . . , Xn be independent uniform (0, 1) random variables,

and denote their order statistics by X(1), . . . , X(n). That is, X(j) is the jth
smallest of X1, X2, . . . , Xn. Also, let
Y1 =X(1),
Yi =X(i) − X(i−1), i = 2, . . . , n.
Show that Y1, . . . , Yn are exchangeable.
124
Solution: The transformations
y1 = x1, . . . , yi = xi − xi−1, i = 2, . . . , n
yield
xi = y1 + . . . + yi, i = 1, . . . , n.
The Jacobian of the preceding transformations is equal to 1. Thus,
fY1,...,Yn (y1, y2, . . . , yn) = f (y1, y1 + y2, . . . , y1 + . . . + yn)
where f is the joint density function of the order statistics. Hence, we

obtain that
fY1,...,Yn (y1, y2, . . . , yn) = n!, 0 < y1 < y1 + y2 < . . . < y1 + . . . + yn < 1,
or, equivalently,
fY1,...,Yn (y1, y2, . . . , yn) = n!, 0 < yi < 1, y1 + . . . + yn < 1,
125
Because the preceding joint density is a symmetric function of y1, . . . , yn,
we see that the random variables Y1, . . . , Yn are exchangeable.
126
Summary
• Joint distribution functions

• Independent random variables
• Sums of independent random variables
• Conditional distributions: discrete case
• Conditional distributions: continuous case
• Order statistics
• Joint probability distribution of functions of random variables
• Exchangeable random variables
127

Stats 116 SU

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Stats 116 SU

Diunggah oleh

Hak Cipta:

Format Tersedia

STATS 116

• Joint distribution functions

Joint cdf of two random variables

F (a, b) = P {X ≤ a, Y ≤ b}, −∞ < a, b < ∞

All joint probability statements about X and Y can, in theory, be

P {a1 < X ≤ a2, b1 < Y ≤ b2}

The probability mass function of X can be obtained from p(x, y) by

Suppose that 3 balls are randomly selected from an urn containing 3

i \\\ j 0 1 2 3 P {X = i} (row sum)

Two random variables X and Y are said to be jointly continuous if

The function f (x, y) is called the joint probability density function

• If A and B are any sets of real numbers, then, by defining C = {(x, y) :

when da and db are small and f (x, y) is continuous at a, b, one can

If X and Y are jointly continuous, they are individually continuous, and

Example: The joint density function of X and Y is given by

(a) P {X > 1, Y < 1}.

Example: (solution) The joint density function of X and Y is given by

(a) P {X > 1, Y < 1} = e−1(1 − e−2).

(d) Find E[D].

• Joint probability distributions for n random variables can be defined in

F (a1, a2, . . . , an) = P {X1 ≤ a1, X2 ≤ a2, . . . , Xn ≤ an}.

• The n random variables are said to be jointly continuous if there exists

P {(X1, X2, . . . , Xn) ∈ C}

P {X1 ∈ A1, X2 ∈ A2, . . . , Xn ∈ An}

One of the most important joint distributions is the multinomial

Suppose that each experiment can result in any ∑rone of r possible

The random variables X and Y are said to be independent if, for

P {X ∈ A, Y ∈ B} = P {X ∈ A}P {Y ∈ B}. (⋆)

• In terms of the joint distribution function F of X and Y : X and Y are

F (a, b) = FX (a)FY (b).

• Consider the one-point sets A = {x} and B = {y}. The equivalence

and Equation (⋆) is established.

In the jointly continuous case, the condition of independence is equivalent

• Roughly speaking, X and Y are independent if knowing the value of one

Suppose that n + m independent trials having a common probability of

In contrast, X and Z will be dependent, where Z is the total number of

• P {X + Y = i + j} follows the assumption of the Poisson arrivals.

Two people decide to meet at a certain location. If each of them

The desired probability, P {|X − Y | > 10}, can be written as P {Y >

A necessary and suﬃcient condition for the random variables X and Y

fX,Y (x, y) = h(x)g(y), −∞ < x < ∞, −∞ < y < ∞.

This shows that X, Y are independent.

• In general, the n random variables X1, X2, . . . , Xn are said to be

• As before, it can be shown that this condition is equivalent to

for all a1, a2, . . . , an.

Let X, Y, Z be independent and uniformly distributed over (0, 1).

Solution: Based on the given information,

fX,Y,Z (x, y, z) =fX (x) · fY (y) · fZ (z)

Independence is a symmetric relation.

The random variables X and Y are independent if their joint density

Therefore, to say that X is independent of Y is equivalent to saying

Recall the chain rule for conditional probability:

P {X1 ≤ a1, . . . , Xn ≤ an} =P {X1 ≤ a1}P {X2 ≤ a2|X1 ≤ a1} . . .

Consider the distribution of X + Y from the distributions of X and

From the previous page,

Diﬀerentiating the above, we obtain the pdf fX+Y of X + Y , i.e.,

Note: Independent and identically distributed (i.i.d.)

Example: If X and Y are independent random variables, both uniformly

Example: If X and Y are independent random variables, both uniformly

Here X + Y is said to have a triangular distribution due to the shape of its

Fn(x) = P {X1 + . . . + Xn ≤ x}.

Show that Fn(x) = xn/n!, for 0 ≤ x ≤ 1.

Note that a general formula for Fn(x) is messy.