Core course for Quantitative Finance, Risk Management and Statistics Majors:
Definition 4.1.
A random variable X : Ω → R is a numerical valued function defined on a sample space. In other
words, a number X(ω), providing a measure of the characteristic of interest, is assigned to each
outcome ω in the sample space.
J
Remark
Always keep in mind that X is a function rather than a number. The value of X depends on the
outcome. We write X = x to represent the event {ω ∈ Ω|X(ω) = x} and X ≤ x to represent the
event {ω ∈ Ω|X(ω) ≤ x}.
Example 4.1.
Let X be the number of aces in a hand of three cards drawn randomly from a deck of 52 cards.
Denote A as an ace card and N as a non-ace card. Then,
Ω = {AAA, AAN, AN A, AN N, N AA, N AN, N N A, N N N } .
The space of X is X(Ω) = {0, 1, 2, 3}. Hence, X is discrete.
X : Ω → {0, 1, 2, 3} such that
X(AAA) = 3,
X(AAN ) = X(AN A) = X(N AA) = 2,
X(AN N ) = X(N AN ) = X(N N A) = 1,
X(N N N ) = 0.
Example 4.2.
The annual income ω of a randomly selected citizen has a sample space Ω = [0, ∞). Suppose the
annual income is taxable if it exceeds c. Let X be the taxable income. Then the space of X is also
[0, ∞) and X : Ω → [0, ∞) such that
(
0, if ω ≤ c;
X(ω) =
ω − c, if ω > c.
F
Note
Conventionally, we use capital letters X, Y, . . . to denote the random variables and small letters
x, y, . . . to denote the possible numerical values (or realizations) of these variables.
Definition 4.2.
A random variable X defined on the sample space Ω is called a discrete random variable if X(Ω) =
{X(ω) : ω ∈ Ω} is countable (e.g., X : Ω → {0, 1, 2, . . .}).
J
Definition 4.3.
The probability mass function (pmf ) of a discrete random variable X is defined as
Example 4.3.
For the previous example of card drawing, the pmf of X is
Example 4.4.
x
Is p(x) = , x = 1, 2, 3, a valid pmf?
6
p(x)
1
2
1
3
1
6
x
0 1 2 3
X(Ω) = {1, 2, 3}
x
1. p(x) = > 0 for all x = 1, 2, 3;
6
3
X 1 1 1
2. p(x) = + + = 1;
i=1
6 3 2
1
3. Pr(X ≤ 2) = p(1) + p(2) = .
2
F
Definition 4.4.
The (cumulative) distribution function (cdf ) of a discrete random variable X is defined as
X
F (x) = Pr(X ≤ x) = p(t), −∞ < x < ∞.
t≤x
Example 4.5.
x
Continue with Example 4.4., p(x) = , x = 1, 2, 3, we have
6
1
F (1) = Pr(X ≤ 1) = Pr(X = 1) =
6
1
F (1.5) = Pr(X ≤ 1.5) = Pr(X = 1) = = F (1.2601) = F (1.999999) = · · ·
6
1
F (2) = Pr(X ≤ 2) = p(1) + p(2) =
2
F (3) = Pr(X ≤ 3) = p(1) + p(2) + p(3) = 1
F (x)
1
2
1
6
x
0 1 2 3
F
As can be seen, the cdf of a discrete random variable would be a step-function with p(x) as the size
of the jumps at the possible value x.
Properties of a cdf
4. F (x) is right continuous. That is, for any b and any decreasing sequence {bn , n ≥ 1} that
converges to b, lim F (bn ) = F (b).
n→∞
5. F (x) is a step-function if X is a discrete random variable. The step size at the point x ∈ X(Ω)
is Pr(X = x).
6. The cdfs are useful in describing probability models. The probabilities attributed to events
concerning a random variable X can be reconstructed from the cdf of X, i.e., the cdf of X
completely specifies the random behaviour of X.
For game 2,
1 5 5 25
Pr(Y = 1) = 3 × × × = ,
6 6 6 72
5 1 1 5
Pr(Y = 2) = 3 × × × = ,
6 6 6 72
1 1 1 1
Pr(Y = 3) = × × = ,
6 6 6 216
25 5 1 125
Pr(Y = −1) = 1 − − − = .
72 72 216 216
Suppose we play game 1 for 36000 times. Since the relative frequency is a good estimate of the
probability when number of trials is large, approximately in 1000 times we will win $24 and 35000
times we will lose $1. So in these 36000 trials of game 1, we win
Approximately we will lose $11000 in 36000 trial of game 1. The average amount we win in each
trial is
−11000 11
=− .
36000 36
This is the long term average of gain if we play game 1 repeatedly. Indeed it can be calculated as
11 24 × 1000 + (−1) × 35000
− =
36 36000
1 35
= 24 × + (−1) ×
36 36
= 24 × Pr(X = 24) + (−1) × Pr(X = −1).
Therefore game 2 is better than game 1 in terms of long term average gain.
However, since in the long run you will lose money in both games, the best strategy is do not gamble
at all.
F
Definition 4.5.
Let X be a discrete random variable with pmf p(x). The mathematical expectation (expected value)
of X is defined by X
E(X) = xp(x),
x∈X(Ω)
provided that the summation exists. In general, for any function g, the expected value of g(X) is
X
E[g(X)] = g(x)p(x).
x∈X(Ω)
X X
For example, E(X 2 ) = x2 p(x), E(ln X) = (ln x)p(x), ..., etc.
x∈X(Ω) x∈X(Ω)
J
Properties
" n
# n
X X
3. If c1 , c2 , . . . , cn are constants, then E ci gi (X) = ci E [gi (X)].
i=1 i=1
2 2
For example, E(5X + 3X ) = 5E(X) + 3E(X ).
However, E(X 2 ) 6= [E(X)]2 , E(ln X) 6= ln E(X).
5. E(|X|) ≥ |E(X)|.
Example 4.7.
Continue with the previous gambling example (Example 4.6.), for game 1,
1 35 611
E(X 2 ) = (24)2 × + (−1)2 × = = 16.9722
( 36 36 36
2 ) 2 2
11 11 1 11 35
E X− − = 24 + × + −1 + × = 16.8788
36 36 36 36 36
Alternatively,
( 2 ) 2
11 2 11 11
E X− − = E(X ) + +2 E(X)
36 36 36
= 16.9722 + 0.09336 + 2(0.3056)(−0.3056) = 16.8788
The value E [X − E(X)]2 can tell us the variation of our gains among long term trials of game 1.
Definition 4.6.
If X is a discrete random variable with pmf p(x) and space X(Ω), then E(X) is called the (pop-
ulation) mean of X (of the distribution) and is usually denoted by µ. It is a measure of central
location of the random variable X.
X
2
Furthermore, E (X − µ) = (x − µ)2 p(x) is called the (population) variance of X (of the
x∈X(Ω)
distribution) and is usually denoted by σ 2 or √
Var(X).
p
The positive square root of the variance, σ = σ 2 = Var(X), is called the (population) standard
deviation of X (of the distribution). Both σ and σ 2 are measures of spread.
J
Remark
1. The variance, and hence the standard deviation, provide a measure of the variation of the
probability distribution of the random variable X. Together with the mean µ, which is the
measure of the central location, they give an informative summary on the probabilistic behavior
of the random variable. Usually, a general random variable X would often take a value within
two standard deviations from the mean, i.e., from µ − 2σ to µ + 2σ. However, there are always
exceptions.
2. A useful computational formula for the calculation of the variance is
σ 2 = Var(X) = E (X − µ)2
= E X 2 − 2µX + µ2
= E(X 2 ) − 2µE(X) + µ2
= E(X 2 ) − µ2
Example 4.8.
Consider the probability distribution for the returns on stock A and B provided below:
µA = E(X) = 0.01 × 0.2 + 0.02 × 0.3 + 0.03 × 0.3 + 0.04 × 0.2 = 2.5%
E(X 2 ) = 0.012 × 0.2 + 0.022 × 0.3 + 0.032 × 0.3 + 0.042 × 0.2 = 0.00073
σA2 = Var(X) = E(X 2 ) − µ2A = 0.00073 − (0.025)2 = 0.000105
√
σA = 0.000105 = 1.02%
E(Y 2 ) = 0.12 × 0.2 + 0.062 × 0.3 + 0.022 × 0.3 + (−0.02)2 × 0.2 = 0.00328
σB2 = Var(Y ) = E(Y 2 ) − µ2B = 0.00328 − (0.04)2 = 0.00168
√
σB = 0.00168 = 4.10%
Although stock B offers a higher expected return than stock A, it is also riskier since its variance
and standard deviation are greater than stock A’s.
This, however, is only part of the picture because most investors choose to hold securities as part of
a diversified portfolio.
F
Properties
Let X be a random variable and a, c be two constants.
1. E(aX + c) = aE(X) + c.
Proof.
X
E(aX + c) = (ax + c)p(x)
X X
= a xp(x) + c p(x)
= aE(X) + c
2. Var(aX + c) = a2 Var(X).
Proof.
= E [aX − aE(X)]2
= E a2 (X − µ)2
= a2 E (X − µ)2 = a2 Var(X)
Example 4.9.
Consider the stock returns in Example 4.8.. Suppose a simple portfolio allocates 70% of the fund
to invest in stock B and 30% to a term deposit with 2% fixed interest rate. Then the overall return
of the portfolio can be expressed as
This simple example demonstrates the risk-return trade off. Comparing to a total assets of stock B,
this portfolio can reduce the risk substantially with only little sacrifice in the expected return.
F
Definition 4.7.
Let r be a positive integer. E(X r ) is called the r-th moment of X. E [(X − b)r ] is called the r-th
moment of X about b if it exists. It is also called the r-th central moment if b = µ. For example,
E(X) is the 1st moment of X and σ 2 is the 2nd central moment of X.
J
Definition 4.8.
Let X be a discrete random variable with pmf p(x) and space X(Ω), then
X
MX (t) = E(etX ) = etx p(x)
x∈X(Ω)
is called the moment generating function (mgf ) of X if it exists. The domain of MX (t) is the set
of all real numbers t such that etX has a finite expected value.
J
Example 4.10.
x
1
Suppose X is a random variable with pmf p(x) = 2 , x = 1, 2, 3, . . .
3
Then the moment generating function of X is
∞ x X ∞ t x
X
tx 1 e
MX (t) = e 2 = 2 .
x=1
3 x=1
3
et
For MX (t) exists, the series must be converging, i.e., < 1. Therefore, for t < ln 3,
3
t
e
3 2et
MX (t) = 2 × t = .
1 − e3
3 − et
MX (t) is undefined if t ≥ ln 3.
F
dr
(r)
MX (0) = r MX (t) = E(X r )
dt t=0
Proof. X
MX (t) = E(etX ) = etx p(x) =⇒ MX (0) = 1
x∈X(Ω)
X X
MX0 (t) = tx
xe p(x) =⇒ MX0 (0) = xp(x) = E(X)
x∈X(Ω) x∈X(Ω)
X X
MX00 (t) = x2 etx p(x) =⇒ MX00 (0) = x2 p(x) = E(X 2 )
x∈X(Ω) x∈X(Ω)
..
.
Example 4.11.
For Example 4.10.,
(3 − et )(2et ) − (2et )(−et ) 4+2 3
MX0 (t) = =⇒ µ = MX0 (0) = = .
(3 − et )2 4 2
Therefore,
2et
MX (t) = =⇒ RX (t) = ln 2 + t − ln(3 − et ),
3 − et
0 et 00 (3 − et )et − et (−et )
RX (t) = 1 + and RX (t) = .
3 − et (3 − et )2
Hence,
0 1 3 00 2+1 3
µ = RX (0) = 1 + = and σ 2 = RX (0) = = .
3−1 2 4 4
F
Remark
Moment generating function uniquely characterizes the distribution. That is, if X and Y have the
same moment generating function, then they must have the same distribution.
Some examples
X ∼ Ber(p).
(ii) independent, i.e., the success probability of any trial is not affected by the outcome of
other trials;
then X is said to have a binomial distribution with n trials and success probability p. It is denoted
as
X ∼ B(n, p) or X ∼ Bin(n, p).
J
Example 4.12.
Let X be the number of boys in a family with four children.
Success: boy
Failure: girl
Value of X 0 1 2 3 4
Outcome FFFF SFFF SSFF SSSF SSSS
FSFF SFSF SSFS
FFSF SFFS SFSS
FFFS FSSF FSSS
FSFS
FFSS
Example 4.13.
An examination paper consists of 50 multiple choice questions, with 5 choices for each question. A
student goes into the examination without knowing a thing, and tries to answer all the questions by
pure guessing. Let X be the number of questions this student can answer correctly. Then obviously
X ∼ B(50, 0.2).
On average, he will get E(X) = 50×0.2 = 10 correct answers by pure guessing, and the corresponding
variance is Var(X) = 50 × 0.2 × 0.8 = 8.
The probability of getting 15 correct answers by pure guessing is
50
Pr(X = 15) = p(15) = (0.2)15 (0.8)35 = 0.02992.
15
Suppose that two marks will be given for each correct answer, while half mark will be deducted for
each incorrect answer. Let Y be the total score this student can get. Then,
If the passing mark is set to 40, the probability that he will pass the examination is
Example 4.14.
Eggs are sold in boxes of six. Each egg has independently of the others a probability 0.2 being
cracked. A shopper requires three boxes of eggs and regards as satisfactory if a box contains not
more than two cracked eggs.
Let X be the number of cracked eggs in a particular box. Then X ∼ B(6, 0.2).
Let Y be the number of satisfactory boxes in five boxes. Then Y ∼ B(5, 0.90112).
F
Properties of the Binomial Distribution
1. When n is equal to 1, the binomial distribution B(1, p) is just the Bernoulli distribution Ber(p).
3. If X ∼ B(n, p), then Pr(X = k) first increases monotonically and then decreases monotonically,
it attains its largest value when k is the largest integer less than or equal to (n + 1)p.
X ∼ Geo(p).
Distribution function:
x
X
F (x) = Pr(X ≤ x) = p(i)
i=1
x
X
= p (1 − p)i−1
i=1
1 − (1 − p)x
= p
1 − (1 − p)
= 1 − (1 − p)x , x = 1, 2, . . . .
pet
= , t < − ln(1 − p).
1 − (1 − p)et
1 1−p
µ= and σ2 = .
p p2
Example 4.15.
In a casino game, roulette, suppose you bet on the number 00 for every trial. Let X be the number
of games played until you win once, then X ∼ Geo(1/38).
On average, you will need to play for E(X) = 38 games in order to win once. The probability of no
more than 4 games played until your first win is
4
1
Pr(X ≤ 4) = F (4) = 1 − 1 − = 0.1012.
38
Example 4.16.
Often packages that you buy in a store include a card with a picture, or other items of a set, and you
try to collect all of the N possible cards. We would be interested in the expected number of trials
one should make in order to collect a set of the cards. Define
and assume that the packages are independent with equal chances to contain the N possible cards.
N −i+1
Then for i > 1, the distribution of Xi is geometric with success probability and therefore
N
N
E(Xi ) = . Let W be the number of trials needed for collecting the whole set of N different
N −i+1
XN
cards. Then W = Xi and therefore
i=1
N N N
X X N X1
E(W ) = E(Xi ) = =N .
i=1 i=1
N −i+1 i=1
i
In particular, if N = 9, then
1 1
E(W ) = 9 × 1 + + · · · + = 25.4607.
2 9
Remarks
1. There is another definition of the geometric distribution. Let Y be the number of failures
before the first success. Then obviously Y = X − 1.
p(y) = (1 − p)y p, y = 0, 1, 2, . . .
y+1
F (y) = 1 − (1 − p) , y = 0, 1, 2, . . .
p
MY (t) = , t < − ln(1 − p)
1 − (1 − p)et
1−p
µ =
p
1 − p
σ2 = 2
p
Therefore,
Hence, conditional on no success before a-th trial, the probability of no success in the next b
trials is equal to the unconditional probability of no success before the first b-th trial. This prop-
erty is called the memoryless property. Among all discrete distributions, geometric distribution
is the only one that has the memoryless property.
Example 4.17.
Consider the roulette example. Since the geometric distribution is memoryless, although you may
have already lost 100 games in a row, the probability of waiting more than 5 games till you win is
just the same as the scenario that you did not lose the 100 games at all. This dispels the so called
Gambler’s fallacy that “cold hands” are more (less) likely to come up after observing a series of “hot
hands”.
F
r r(1 − p)
µ= and σ2 = .
p p2
Example 4.18.
Fermat and Pascal are sitting in a cafe in Paris and decide to play the simplest of all games, flipping
a coin. If the coin comes up head, Fermat gets a point. If the coin comes up tail, Pascal gets a point.
The first to get 10 points wins the total pot worth 100 Francs. But then a strange thing happens.
Fermat is winning 7 points to 6, when he receives an urgent message that a friend is sick, and he
must rush to his home town of Toulouse immediately. Of course Pascal understands, but later, in
correspondence, the problem arises: how should the 100 Francs be divided?
Answer:
Let X be the number of additional games they need to play so that Fermat can get 3 more points.
Then X is the number of trials until 3 heads (successes) are obtained. Therefore X is a negative
binomial random variable with r = 3 and p = 0.5. The probability mass function is given by
x−1 3 x−3 x−1
p(x) = (0.5) (0.5) = (0.5)x , x = 3, 4, 5, . . . .
3−1 2
For Fermat to win the game, Pascal should get less than 4 points before he gets 3 points, i.e., X
must be less than 7. Therefore,
Hence Fermat should receive 65.625 Francs, while Pascal should receive 34.375 Francs.
Consider a general problem of points in which a player needs N points to win. Suppose the inter-
mediate scores are m to n when the game is interrupted. Let r = N − m, s = N − n respectively
be the additional points the two sides need in order to win the game. Then the probability that the
side with m points would win the game is
r+s−1
X
i−1 r
p (1 − p)i−r .
i=r
r−1
Remark
1. If r is equal to 1, then the negative binomial distribution NB(1, p) becomes the geometric
distribution Geo(p), i.e., NB(1, p) ≡ Geo(p).
2. There is another definition of the negative binomial distribution. Let Y be the number of
failures before the r-th success. Then obviously, Y = X − r.
y+r−1 r
p(y) = p (1 − p)y , y = 0, 1, 2, . . .
r−1
r
p
MY (t) = , t < − ln(1 − p)
1 − (1 − p)et
r(1 − p)
µ =
p
r(1 − p)
σ2 =
p2
Remark
The hypergeometric distribution can be regarded as a finite population counterpart of the binomial
distribution.
The variance of a hypergeometric distribution and that of a binomial distribution differ by the
N −n
multiplier and this expression is therefore called the finite population correction factor.
N −1
Example 4.19.
Let X be the number of 2’s in a hand of 13 cards drawn randomly from a deck of 52 cards. Then X
has a hypergeometric distribution with N = 52, m = 4, n = 13.
4
E(X) = 13 × =1
52
52 − 13 4 4
Var(X) = (13) 1− = 0.7059
52 − 1 52 52
4 48
3 10
Pr(X = 3) = 52
= 0.04120
13
2. Tag these m fishes with a certain marker and release them back to the lake.
4. Count the number of tagged fishes found in this new sample, denote as X.
Consider
Pi (N ) (N − m)(N − n)
= .
Pi (N − 1) N (N − m − n + i)
Hence for fixed m, n, i, the value Pi (N ) is first increasing, and then decreasing, and it attains its
mn
maximum value at the largest integral value not exceeding . Therefore a reasonable estimation
i
of the population size N is j mn k
,
i
where i is the number of tagged fishes we found in the new sample.
This kind of estimation is known as the maximum likelihood estimation (MLE ).
F
e−λ λx
p(x) = Pr(X = x) = , x = 0, 1, 2, . . . .
x!
It is denoted as
X ∼ Po(λ) or X ∼ Poi(λ).
J
Distribution function: x
X e−λ λi
F (x) = , x = 0, 1, 2, . . . .
i=0
i!
λ
Pr(X = k + 1) = Pr(X = k).
k+1
Definition 4.15.
Let an event E with occurrence in time obeying the following postulates:
1. Independence – The number of times E occurs in non-overlapping time intervals are inde-
pendent.
2. Lack of clustering – The probability of two or more occurrences in a sufficiently short interval
is essentially zero.
3. Rate – The probability of exactly one occurrence in a sufficient short time interval of length
h is approximately λh, i.e., directly proportional to h.
Denote N (t) as the number of occurrences of E within the time interval [0, t]. Then {N (t), t ≥ 0}
is said to be a Poisson process and the probabilistic behaviour of N (t) can be modelled by the
Poisson distribution with parameter λt.
J
The formal derivation of the distribution of N (t) requires the knowledge of differential equations and
is omitted here. The following provides an informal justification of the result.
First we may partition the time interval into n subintervals each with length h = t/n.
For n sufficiently large, i.e., h sufficiently short,
λt
Pr(one occurrence in a subinterval) = λh = (by postulate 3),
n
Pr(more than one occurrence in a subinterval) = 0 (by postulate 2).
Hence each subinterval can be regarded as a Bernoulli trial with success probability λt/n. From
postulate 1, the occurrences in the subintervals are independent. Therefore we have n independent
Bernoulli trials each with success probability λt/n. Hence,
λt
N (t) ∼ B n, ,
n
x n−x
n λt λt
p(x) = Pr(N (t) = x) = 1− , x = 0, 1, 2, . . . , n.
x n n
Since we need the subintervals to be sufficiently small, we should consider the limit of the above
expression when n → ∞,
x n−x
n λt λt
p(x) = lim 1−
n→∞ x n n
n−x
n(n − 1) · · · (n − x + 1) (λt)x
λt
= lim 1−
n→∞ x! nx n
−x n
(λt)x
n 1 x−1 λt λt
= lim 1− ··· 1 − 1− 1−
x! n→∞ n n n n n
−λt x
e (λt)
= .
x!
e−λt (λt)x
p(x) = Pr(N (t) = x) = , x = 0, 1, 2, . . . .
x!
Therefore, N (t) has a Poisson distribution with parameter λt, i.e., N (t) ∼ Po(λt).
Remarks
1. Note that
N (t)
E [N (t)] = λt =⇒ E = λ.
t
Therefore λ can be interpreted as the average number of occurrence per unit time interval. The
value of λ depends on the time unit used.
2. According to above derivation, the Poisson distribution can be used to approximate a binomial
distribution when n is large and p is small.
e−np (np)x
n x
p(x) = p (1 − p)n−x ≈ .
x x!
Example 4.21.
Customers arrive to a departmental store in accordance with a Poisson distribution, with a rate of
30 per hour. What is the probability that there will be less than 10 customers arriving in the next
half hour? What is the probability that there will be more than 6 customers arriving in ten minutes?
Time unit = hour =⇒ λ = 30
For the number of customers arriving in half an hour, the mean is λt = 30 × 0.5 = 15, i.e., X =
N (0.5) ∼ Po(15). The probability that there will be less than 10 customers arriving is
Pr(X < 10) = Pr(X = 0) + Pr(X = 1) + · · · + Pr(X = 9)
152 153 159
−15
= e 1 + 15 + + + ··· +
2! 3! 9!
= 0.0699.
For the number of customers arriving in ten minutes, the mean is λt = 30× 16 = 5, i.e., Y = N (1/6) ∼
Po(5). The probability that there will be more than 6 customers arriving is
Pr(Y > 6) = 1 − Pr(Y ≤ 6)
= 1 − Pr(Y = 0) − Pr(Y = 1) − · · · − Pr(Y = 6)
52 53 56
−5
= 1−e 1+5+ + + ··· +
2! 3! 6!
= 1 − 0.7622
= 0.2378.
Example 4.22.
In a certain manufacturing process in which glass items are being produced, defects or bubbles occur,
occasionally rendering the piece undesirable for marketing. It is known that on the average 1 in every
1000 of these items produced has one or more bubbles. What is the probability that a random sample
of 8000 will yield fewer than 8 items possessing bubbles?
Let X be the number of items possessing bubbles. Then
1
X ∼ B 8000, ,
1000
can be approximated by Po(8).
The required probability is
Pr(X < 8) = Pr(X ≤ 7)
7
X 8000
= (0.001)x (0.999)8000−x
x=0
x
7
X e−8 8x
≈
x=0
x!
= 0.4530.
Example 4.23.
Suppose that the total number of goals in a soccer match of the English Premier League follows the
Poisson distribution with λ = 2.8, i.e., on average there are 2.8 goals per match. Determine, in a
soccer match,
Answer:
Let X be the number of goals in a particular match. Then X follows Po(2.8), i.e.,
e−2.8 (2.8)x
Pr(X = x) = , x = 0, 1, 2, . . . .
x!
(a)
Pr(X > 2) = 1 − Pr(X ≤ 2)
= 1 − [Pr(X = 0) + Pr(X = 1) + Pr(X = 2)]
(2.8)2
−2.8
= 1−e 1 + 2.8 +
2!
= 0.5305
(b)
Pr(X is even) = Pr(X = 0) + Pr(X = 2) + Pr(X = 4) + · · ·
(2.8)2 (2.8)4
−2.8
= e 1+ + + ···
2! 4!
Recall the Taylor series expansion of the exponential function,
x 2 x3 x4
ex = 1 + x + + + + ··· ,
2! 3! 4!
we have
(2.8)2 (2.8)3 (2.8)4
e2.8 = 1 + 2.8 + + + + ··· ,
2! 3! 4!
(2.8)2 (2.8)3 (2.8)4
e−2.8 = 1 − 2.8 + − + − ··· .
2! 3! 4!
Therefore,
(2.8)2 (2.8)4
2.8 −2.8
e +e =2 1+ + + ··· .
2! 4!
Hence,
e2.8 + e−2.8
Pr(X is even) = e−2.8 ×
2
1 + e−5.6
=
2
= 0.5018.
Example 4.24.
Random variables that are often modelled by the Poisson distribution include the following:
7. Number of α-particles discharged in a fixed period of time from some radioactive material.
10. Number of electrons emitted from a heated cathode during a fixed time period.
11. Number of deaths in a given period of time of the policyholders of a life insurance company.
∼ End of Chapter 4 ∼