Week 1 Annotated

ACTL2002/ACTL5101 Probability and Statistics: Week 1 Video Lecture Notes
ACTL2002/ACTL5101 Probability and Statistics

c Katja Ignatieva

School of Risk and Actuarial Studies
Australian School of Business
University of New South Wales
k.ignatieva@unsw.edu.au
Week 1 Video Lecture Notes

Week 2
Week 3
Week 4
Probability: Week 1
Week 6
Review
Estimation: Week 5
Week
7
Week
8
Week 9
Hypothesis testing:
Week
10
Week
11
Week
12
Linear regression:
Week 3 VL
Week 4 VL
Week 5 VL
Video lectures: Week 2 VL
Links to UNSW TV
Click on the topic to go to the online recording:
Week 1: Probability space, Calculating with probabilities,
Counting.
Week 2: Bernoulli distribution, Binomial distribution,
Geometric distribution, Negative Binomial distribution.
Week 3: Numerical methods to summarize data, Graphical
procedures to summarize data.
Week 4: Sampling with and without replacement, Properties
of sample mean and variance.
Week 5: Chi-squared distribution, Student-t distribution,
Snecdors F distribution, Distribution of sample mean and
variance.

Probability space
Sample Space & -algebra
Introduction in Probability
Probability space
Probability Measure
Calculating with probabilities

Properties of the Probability Measure
Conditional Probability
Independence
Bayes Theorem
Counting
Counting Principles
Computing Probabilities

Probability space

In a random experiment, the outcomes cannot be predicted
with certainty in advance.
The set of all possible outcomes is called the sample space,
denoted by . An element of is a sample point.
A family F of subsets of the sample space is said to be a
-algebra (-field) if the following conditions hold:
1. E F implies E c F;
2. E1 , E2 , . . . are pairwise disjoint sets in F, that is, Ei Ej =
S
(null/empty set) for all i 6= j, then
Ek F.
k=1
An element E of F is called an event. It is nothing but a

subset of .
Note: A sample point is a simple event.
2/33

Probability space
If there are N elements of the sample space, a -algebra F of

events may consist of 2N elements.
Example:
Consider the experiment of selecting a card at random from a
deck of four cards with four different suits, and noting its suit:
hearts (H), diamonds (D), spades (S), or clubs (C ).
The sample space is: = {H, D, S, C }.
All the sets of possible events are a -algebra of F.
3/33

Probability space
The following collection of 24 = 16 sets is a -algebra F :

(H)
(D)
(S)
(C )
(H, D)
(H, S)
(H, C )
(D, S)
(D, C )
(S, C )
(H, D, S) (H, D, S, C )(= )

(H, D, C )
(H, S, C )
(D, S, C )
Each element in the -algebra is called an event.

Note that a standard deck has 52 elements in the sample
space, so one -algebra has 252 elements.
4/33

Probability space
Sample Spaces
5/33

Probability space
Probability Measure
Probability space
Probability Measure

Independence
Bayes Theorem
Counting
Counting Principles

Probability space
Probability Measure
Probability Measure
A probability measure on is a function Pr from the subsets
of to < that satisfies the following axioms:
1. For all events Ei , 0 Pr (Ei ) 1;
2. Pr () = 1;
3. For any events E1 , E2 , . . . where Ei Ej = for every i 6= j,
then:
!
[
X
Pr
Ek = Pr (E1 ) + Pr (E2 ) + . . . =
Pr (Ek ) .
k=1
k=1
A random experiment is therefore described as a probability

triple (or probability space): {, F, P}.
6/33

Probability space
Probability Measure

Independence
Bayes Theorem
Counting
Counting Principles

Definitions
Union of two events: C = A B
is an event that both A and/or B occurs, i.e., C iff A
or B.
Intersection of two events: C = A B
is an event that both A and B occurs, i.e., C iff A
and B.
Complement of an event: B = Ac
is the event that A does not occur, i.e., Ac iff
/ A.
Events are disjoint: if they have no outcomes in commun
i.e., A and C are disjoint iff A C = .
7/33

Useful laws
Commutative laws:
AB =B A
and A B = B A.
Associative laws:
(A B) C = A (B C )
and (A B) C = A (B C ).
Distributive laws:
(A B) C = (A C ) (B C )
and (A B) C = (A C ) (B C ).
DeMorgans laws:
(A B)c = Ac B c and (A B)c = Ac B c .
8/33


Complement: Pr(E c ) = 1 Pr(E ).
Null / Empty Set: Pr() = 0.
Subsets: If E1 and E2 are two events such that E1 E2 , then
Pr (E1 ) Pr (E2 ).
Additive: If E1 and E2 are any two events, then:
Pr (E1 E2 ) = Pr (E1 ) + Pr (E2 ) Pr (E1 E2 ) .
9/33

10/33

In the case of three events, we have:

Pr (E1 E2 E3 )
Pr ((E1 E2 ) E3 )
Pr (E1 E2 ) + Pr (E3 )
Pr ((E1 E2 ) E3 )
Pr (E1 E2 ) + Pr (E3 )
Pr ((E1 E3 ) (E2 E3 ))
Pr (E1 ) + Pr (E2 ) + Pr (E3 )

Pr (E1 E2 ) Pr (E1 E3 )
Pr (E2 E3 ) + Pr (E1 E2 E3 ) .
11/33
* using additive rule: Pr (A B) = Pr (A) + Pr (B) Pr (A B) with

A = (E1 E2 ) and B = E3 .
** using distributive law: (A B) C = (A C ) (B C ) with A = E1 ,
B = E2 , C = E3 .
*** using additive rule: Pr (A B) = Pr (A) + Pr (B) Pr (A B) with
A = E1 and B = E2 & A = (E1 E3 ) and B = (E2 E3 ).

Inequality rules
Using the definition of probability measure one can prove the
following inequalities:
Booles Inequality: If E1 , E2 , . . . , En are any n events, then
!
n
n
[
X
Pr
Ek
Pr (Ek ) ,
k=1
k=1
which follows the additive law of probability.

Bonferronis Inequality: If E1 , E2 , . . . , En are any n events,
then:
n
X
Pr (E1 . . . En ) 1
Pr (Ekc ) .
k=1
12/33

Inequality rules
13/33

Probability space
Probability Measure

Independence
Bayes Theorem
Counting
Counting Principles

The conditional probability of A, given B, as:
Pr (A |B ) =
Pr (A B)
Pr (B)
provided Pr (B) > 0, otherwise Pr (A |B ) = 0.

The multiplication rule immediately follows:
Pr (A B) = Pr (A |B ) Pr (B) .
The following properties are also immediate:
1. Pr (A |B ) 0;
2. Pr (A |A ) = 1;
3. If A1 , A2 , . . . are
disjoint events, then
mutually
S
P
Pr
Ak |B =
Pr (Ak |B ).
14/33
k=1
k=1

15/33

Law of total probability

Law of total probability: If E1 , E2 , . . . are mutually disjoint
(E
Si Ej = for i 6= j) and comprise the entire sample space
( k=1 Ek = ), then for any event A F, we have:
Pr (A) =
X
k=1
16/33
Pr (A |Ek ) Pr (Ek ) .

Independence
Probability space
Probability Measure

Independence
Bayes Theorem
Counting
Counting Principles

Independence
Independence
Events A and B are said to be independent if:
Pr (A B) = Pr (A) Pr (B) .
Equivalently, we have events A and B independent if:
Pr (A |B ) = Pr (A)
and
Pr (B |A ) = Pr (B) .
Note that we say the collection of events E1 , E2 , . . . , En are

independent if:
Pr (E1 E2 . . . En ) = Pr (E1 ) Pr (E2 ) . . . Pr (En ) .
For a collection of several events E1 , E2 , . . . , En , we say that
the events are mutually independent if for any sub-collection
Ei1 , Ei2 , . . . , Eim , we have:
Pr (Ei1 Ei2 . . . Eim ) = Pr (Ei1 ) Pr (Ei2 ) . . . Pr (Eim ) .
17/33

Independence
18/33

Independence
Mutually Exclusive Events: A and B are said to be mutually

exclusive if they are mutually disjoint, i.e., A B = so that
Pr (A B) = 0 and
Pr (A B) = Pr (A) + Pr (B) .
We can generalize this to several events as follows: events
E1 , E2 , . . . , En are said to be mutually exclusive events if no
two have an element in common (mutually disjoint) and that
Pr (E1 E2 . . . En ) = Pr (E1 ) + Pr (E2 ) + . . . + Pr (En ) .
19/33

Independence
Example
Consider the experiment of selecting n cards at random from a
deck of 52 cards with 13 hearts (H), 13 diamonds (D), 13 spades
(S), and 13 clubs (C ) cards.
False For n = 2, the second card is independent of the first card.
False For n = 14, the 14th card is mutually independent of the first
13 cards.
True Previous two questions, but now when the card is put back in
the stock before the next one is selected.
True For n = 3, A=at least 2 H and B=at least 2 C are
mutually exclusive events.
False For n = 4, A=at least 2 H and B=at least 2 C are
mutually exclusive events.
20/33

Bayes Theorem
Probability space
Probability Measure

Independence
Bayes Theorem
Counting
Counting Principles

Bayes Theorem
Bayes Theorem
Bayes Theorem: Suppose E1 , E2 , . . . represent a complete
partitioning of the sample space , then for any non-empty
event A F, we have:
Pr (A |Ek ) Pr (Ek )
,
Pr (Ek |A ) = P
j=1 Pr (A |Ej ) Pr (Ej )
for any k = 1, 2, . . ..
21/33

Bayes Theorem
Example
An insurance company classifies its policyholders according to
three risk classes: L (low risk), M (medium risk) and H (high
risk). The proportion of H policyholders is 20% and the
proportion of L policyholders is 50%. For each of the risk
classes, the probability of a claim is 0.01 for L, 0.02 for M,
and 0.04 for H.
a. Question: If a claim occurs, what is the probability that it is
from a L (low risk) policyholder?
22/33
a. Solution: Let C = be the event that there is a claim, then

we have:
)
Pr (L|C ) = Pr(LC
Pr(C ) = 26%, where
Pr (L C ) = Pr (C |L) Pr (L) = 0.01 0.5 = 0.005
Pr (C ) = Pr (C |L)Pr (L)+Pr (C |M)Pr (M)+Pr (C |H)Pr (H)
= 0.01 0.5 + 0.02 0.3 + 0.04 0.2 = 0.019 (using LTP).

Bayes Theorem
Example (cont.)
b. Question: If a claim occurs, what is the probability that it is
from a M (medium risk) policyholder?
c. Question: If a claim occurs, what is the probability that it is
from a H (high risk) policyholder?
b. Solution: Similar to a.,

Pr(C |M)Pr(M)
)
=
Pr (M|C ) = Pr(MC
Pr(C ) =
Pr(C )
c. Solution: Similar to a. and b.,
)
Pr(C |H)Pr(H)
Pr (H|C ) = Pr(HC
=
Pr(C ) =
Pr(C )
23/33
0.30.02
0.019
0.20.04
0.019
= 32%.
= 42%.

Counting
Counting Principles
Probability space
Probability Measure

Independence
Bayes Theorem
Counting
Counting Principles

Counting
Counting Principles
Counting Principles
Multiplication Rule: Suppose S1 , S2 , . . . , Sm are m sets with
respective number of elements n1 , n2 , . . . , nm . The number of
ways of choosing one element from each set is given by:
n1 n2 . . . nm .
Permutation: The number of ways of arranging n distinct

objects is given by:
n! n (n 1) (n 2) . . . 2 1
n! =n (n 1)!
0! 1.
24/33

Counting
Counting Principles
Combination: The number of ways of choosing r objects from

n distinct objects, where r n, is given by:

n
n!
.
r ! (n r )!
r
Multinomial: The number of ways that n objects can be

grouped into r classes with nk in the k th class, where
r
P
k = 1, 2, . . . , r and
nk = n, is given by:
k=1
n
n1 , n2 , . . . , nr
25/33
n!
.
n1 ! n2 ! . . . nr !

Counting
Counting Principles
Example
Question: An airline has 6 flights daily from Sydney to
Honolulu, and 8 flights daily from Honolulu to Los Angeles.
The airline offers no direct flight from Sydney to Los Angeles.
If the flights are to be made on separate days, how many
different flight arrangements can the airline offer from Sydney
to Los Angeles?
Solution: Use the multiplication rule. Let:

S1 = Flight from Sydney to Honolulu, with n1 = 6;
S2 = Flight from Honolulu to Los Angeles, with n2 = 8,
then we have n1 n2 = 6 8 = 48 different flight arrangements.

26/33

Counting
Counting Principles
Question: A committee is to consist of 4 academics and 2

practitioners, to be selected from a larger group of 8
academics and 5 practitioners. How many ways can you form
a committee if:
a. there are no additional restrictions.
b. two of the chosen academics must be the two female members
of the group of 8.
27/33
a. Solution: We use combinations & multiplication.

For the academics (S1 ) we have n = 8 distinct ones and we
need
r = 4 academics, possible number of ways:
to8choose

n
=
=
70.
r
4
For the practitioners (S2 ) we have n = 5 distinct ones and
we need to choose
r = 2 practitioners, possible number of
n
5
ways: r = 2 = 10.
Total ways of forming a committee: n1 n2 = 70 10 = 700.

Counting
Counting Principles
b. Solution: We use combinations & multiplication again.

- For the female academics (S1 ) we have n = 2 distinct ones
and we need to choose
r 2=
2 female academics, possible
n
number of ways: r = 2 = 1.
- For the male academics (S2 ) we have n = 6 distinct ones
and we need to choose
r 6=
2 male academics, possible
n
number of ways: r = 2 = 15.
- For the practitioners (S3 ) we have n = 5 distinct ones and
we need to choose
r = 2 practitioners, possible number of
ways: nr = 52 = 10.
Total ways of forming a committee:
n1 n2 n3 = 1 15 10 = 150.
28/33
Note: probability of having two female academic members is:

150/700 = 3/14 0.214

Counting
Counting Principles
Question: How many different letter arrangements can be

obtained from the letters of the word mississippi, using all
the letters?
Solution: Let
- class one be letter m, with n1 = 1;
- class two be letter i, with n2 = 4;
- class three be letter s, with n3 = 4;
- class four be letter p, with n4 = 2.
Note: we have n = n1 + n2 + n3 + n4 = 11.

We use multinomial, hence there are

n
n!
= 34650
=
n1 ! n2 ! n3 ! n4 !
n1 , n2 , n3 , n4
29/33
different letter arrangements.

Counting
Counting Principles
Question: An actuarial consulting company has four projects

to do. Two of the projects require 3 actuaries, one requires 2
actuaries, and the other requires 4 actuaries. The company
currently has 15 actuaries, of which 5 are females. How many
different ways are there to assign the actuaries to the projects?
30/33
Solution: We have 15 distinct actuaries. Use combinations.

For the first project we need to choose
r=

3 actuaries from
n = 15 distinct actuaries, hence nr = 15
3 = 455 ways. For
the second project we need to choose r = 3 actuaries
from the

n = 12 left distinct actuaries, hence nr = 12
=
220
ways.
3
For the third project we need to choose r = 2 actuaries
from

n
9
the n = 9 left distinct actuaries, hence r = 2 = 36 ways.
For the last project we need to choose r =
from
4 actuaries

the n = 7 left distinct actuaries, hence nr = 74 = 35 ways.
The total number of ways to assign the actuaries to the
projects is 455 220 36 35 = 126, 126, 000.

Counting
Probability space
Probability Measure

Independence
Bayes Theorem
Counting
Counting Principles

Counting
If the elements of the sample space all have equal
probability and assuming: (a) there are N elements in and
(b) the event A can occur in any of n mutually exclusive ways,
then:
n
Pr (A) = .
N
Question: A drawer contains 10 pairs of socks. If 6 socks are
taken at random and without replacement, compute the
probability that there is at least one matching pair among
these 6 socks.
31/33
Solution: Let M = at least one matching pair among 6

socks. We have Pr (M) = 1 Pr (M c ), where M c = no
matching pair among 6 socks.
16 14 12 10
Pr (M c ) = 1 18
19 18 17 16 15 = 0.3467. Hence,
Pr (M) = 1 Pr (M c ) = 1 0.3467 = 0.6533.

Counting
Question: A committee of four individuals is to be formed

from a group of 5 males and 6 females. What is the
probability that the committee formed has both sexes equally
represented?
Solution: Let E = sexes are equally represented in the

committee.
Number of combinations: E =
{{MMFF }, {MFMF }, {MFFM},

{FMMF }, {FMFM}, {FFMM}},
i.e., n = 4 and r = 2: nr = 42 = 6.
Each combination has equal probability, i.e.,
5
4
Pr (MMFF ) = . . . = Pr (FFMM) = 11
10
96
Combing we have:
5
4
Pr (E ) = 42 11
10
32/33
6
9
5
8
= 6 0.0758 =
5
11 .
5
8
= 0.0758.

Counting
Odds and probabilities

Sometimes, we get confused between odds and probabilities.
The odds that an event will occur is the ratio of the
probability that the event will occur to the probability it will
not occur, provided neither probability is zero. That is, if E is
the event, the odds that E will occur is:
Pr (E )
Pr (E )
=
.
c
Pr (E )
1 Pr (E )
Often, odds are quoted in terms of positive integers. For
example, the odds are a to b that an event will occur. Then
they mean that the probability it will occur is:
Pr (E ) =
33/33
a
.
a+b
ACTL2002/ACTL5101 Probability and Statistics: Week 1
ACTL2002/ACTL5101 Probability and Statistics

c Katja Ignatieva

School of Risk and Actuarial Studies
Australian School of Business
University of New South Wales
k.ignatieva@unsw.edu.au
Week 1
Week 3
Week 4
Probability: Week 2
Week 6
Review
Estimation: Week 5
Week
7
Week
8
Week 9
Hypothesis testing:
Week
10
Week
11
Week
12
Linear regression:
Week 2 VL
Week 3 VL
Week 4 VL
Video lectures: Week 1 VL
Week 5 VL
101/144

Introduction
Course introduction
Moments and measures of dispersion

Introduction
Course introduction
Introduction in probability
Exercise
Mathematical methods
Random variables and distributions
Measures of location: probability distributions
Measures of dispersion
r th central/non-central moments
Generating functions
Summary
Summary

Introduction
Course introduction
Course overview
Week 1
Week 2-4
Week 5-6
Week 7-9
Week 10-12
102/144
General introduction in probability.

Distribution functions:
- univariate & multivariate special distributions;
- joint distributions;
- functions of distributions.
Parameter estimation.
Hypothesis testing.
Linear regression.

Introduction
Course introduction
This week
Describing distribution using measures of location and
dispersion:
-
mean;
variance;
skewness;
kurtosis.
Calculating these measures:

- central moments;
- non-central moments;
- generating functions.
103/144

Exercise

Introduction
Course introduction
Exercise
Summary
Summary

Exercise
Sample spaces
An insurer offers health insurance. An individual can have file
either one, two, three, or no claim in a given quarter. For
simplicity, it is not possible to issue more than three claims.
Questions:
a. State all possible events.
b. Give the sample space.
c. Give the algebra.
Solutions:
a. E {{0}, {1}, {2}, {3}, {0, 1}, {0, 2}, {0, 3}, {1, 2}, {1, 3},
{2, 3}, {0, 1, 2}, {0, 1, 3}, {0, 2, 3}, {1, 2, 3}, {0, 1, 2, 3}, }.
b. = {0, 1, 2, 3}.
104/144
c. {{0}, {1}, {2}, {3}, {0, 1}, {0, 2}, {0, 3}, {1, 2}, {1, 3}, {2, 4},
{0, 1, 2}, {0, 1, 3}, {0, 2, 3}, {1, 2, 3}, {0, 1, 2, 3}, }.

Exercise

We have the following probabilities: Pr(C = 0) = 0.8,
Pr(C = 1) = 0.1, Pr(C = 2) = 8/90, and Pr(C = 3) = 1/90.
Questions:
a. What is the probability of 2 or 3 claims given that there are
odd number of claims. Same for even number of claims.
b. Using a., Pr (even) = 8/9, Pr (odd) = 1/9, and the law of
total probability, find Pr(C 2).
c. Are odd number of claims and two or three claims
independent?
d. Using a. and Bayes theorem find the probability of odd number
of claims given that the number of claims is two or three.
105/144

Exercise
Solutions:
a. Pr (C = {2, 3}|C = {1, 3}) =
Pr(C ={3})
(Pr(C ={1})+Pr(C ={3}))
Pr(C ={2,3}C ={1,3})

Pr(C ={1,3})
= 1/10, and
Pr (C = {2, 3}|C = {0, 2}) =
Pr(C ={2})
(Pr(C ={0})+Pr(C ={2}))
= 1/10.
b. Pr ({2, 3}) = Pr ( {2, 3}| odd) Pr (odd) + Pr ( {2, 3}| even)

1
1
1
Pr (even) = 91 10
+ 89 10
= 10
.
c. Pr (C = {2, 3}|C = {1, 3}) = 1/10 = Pr ({2, 3}), thus
independent.
d. Pr (C = {1, 3}|C = {2, 3}) =
Pr(C ={2,3}|C ={1,3})Pr(C ={1,3})
Pr(C ={2,3}|C ={1,3})Pr(C ={1,3})+Pr(C ={2,3}|C ={0,2})Pr(C ={0,2})
1 1
1
10 9
1 1
1 8 = 9.
10 9 + 10 9
106/144

Exercise
Counting principles
The history of an individuals claiming record is had he had 3
quarters with 2 claims, 2 quarters with 1 claim and 15
quarters without a claim.
Questions:
a. What is the probability that the insured had first 15 quarters
without a claim and then 5 quarters with at least one claim?
b. What is the probability that the insured had first 15 quarters
without a claim and then 2 quarters with one claim and then 3
quarters with two claims?
c. Comment on your results.
Solutions:
107/144
a. Use Combination,
n = 20, r = 5, number of ways choosing

objects: 20
=
15,
504. Thus, probability is 1/15,504.
5
b. Use Multinomial, n = 20, r1 = 15, r2 = 2, r3 = 3, number of
20!
= 155, 040. Thus, probability is
ways choosing objects: 15!2!3!
1/155,040.


Introduction
Course introduction
Exercise
Summary
Summary


A random variable is generally a quantity X whose value
depends on the outcome of a random experiment. It is a
mapping from the sample space to the set of real numbers
<. That is,
X : <.
The cumulative distribution function, abbreviated c.d.f., of X
is defined by:
FX (x) = Pr (X x) ,
for all x.
The survival function of X is defined by:

SX (x) = 1 FX (x) = Pr(X > x).
108/144

Distribution function
Properties of a distribution function:
1. FX () is a non-decreasing function, i.e., FX (x1 ) FX (x2 )
whenever x1 x2 ;
2. FX () is right-continuous, that is for all x,
lim+ FX (x + ) = FX (x);
0
3. FX () = 0;
4. FX (+) = 1.
Types of Random Variables:
109/144
- continuous;
- discrete;
- mixed.

Which one is a distribution function?

1.2
1
A
B
C
D
E
F
F(x)
0.8
0.6
0.4
0.2
0
10
110/144
0
x
Which one
is/are
distribution
functions?
Solution:
B and F.
Not nondecreasing:
A;
Not rightcontinuous:
C;
Not
F () = 0:
D;
Not
F () = 1:
10
E.

The probability mass function (p.m.f.) is defined by:

pX (xk ) = Pr (X = xk ) = FX (xk ) FX (xk1 ) .
The probability density function (p.d.f.) is defined by:

fX (x) =
Note that:
p.m.f. satisfies
FX (x) .
x
pX (xk ) = 1;
k=0
p.m.f. requires Rthe right-continuous property;
p.d.f. satisfies fX (x) dx = 1.

111/144


Introduction
Course introduction
Exercise
Summary
Summary


Let X be a r.v. with pdf pX (if discrete) or fX (if continuous).
The expected value of X is:
X
x pX (x) ,
all x
if
is discrete;
ZX
X = E[X ] =
x fX (x) dx,
if X is continuous.
112/144
Exercise: Calculate E [X ] when X is a r.v. with pdf:

1, if 0 x 1;
fX (x) =
0, else.
Solution:R

1
R1
E [X ] = x fX (x)dx = 0 x fX (x)dx = 12 x 2 0 = 12 .

Mathematical expectation of functions of a r.v.

Let X be a r.v. with pdf pX (if discrete) or fX (if continuous)
and let h(X ) be a real-valued function.
X
h (x) pX (x) ,
all x
if
is discrete;
ZX
E[h(X )] =
h (x) fX (x) dx,
if X is continuous.
1
Exercise: Let X be a r.v. with pdf: pX (x) = 10
for
x = 0, 1, 2, . . . , 9 and zero otherwise, let h(X ) = X 2 .
Calculate E [h(X )].
Solution: P
P
1
E [h(X )] = all x h(x) pX (x) = 9x=0 x 2 10
= 28.5.
113/144

Properties of the expected value operator:

Let X and Y be random variables, and m, b <, we have:
E[mX + b] = mE[X ] + b;
E[X Y ] = E[X ] E[Y ],
E[X + Y ] = E[X ] + E[Y ];

only if X , Y independent.
Proof for discrete r.v.: (* using X and Y independent)

X
X
E[mX + b] =
x (mx + b) pX (x) =m
x (xpX (x)) + b =mE[X ] + b
XX
X
E[X + Y ] =
(x + y ) pX ,Y (x, y ) =
xpX ,Y (x, y ) + ypX ,Y (x, y )
x,y
xpX ,Y (x, y ) +
x,y
E[X Y ] =
X
X
x
114/144
ypX ,Y (x, y ) = E[X ] + E[Y ]
x,y
(x y ) pX ,Y (x, y ) =
x,y
XX
x
(x pX (x))
X
y
(x pX (x)) (y pY (y ))
(y pY (y )) = E[X ] E[Y ].

Example
An insurance company offers motor vehicle insurance. The
probability that an insured files a claim is 20%. Assume that
the insured files not more than one claim.
a. Question: What is the probability mass function?
b. Question: What is the expected number of claims.
c. Question: Assumes that there are 150 insured. What is the
expected number of claims.
The claims will be paid at the end of the year. The required
capital will depends on investment return. A 10% increase in
asset value occurs w.p. 20% and a decrease of 5% occurs
w.p. 15%. The claim value is $1,000 for each claim.
115/144
d. Question: What is the expected value the insurer needs to

cover the claims?

Solution
0.2, if x = 1;
a. pX (x) =
and zero otherwise.
0.8, if x = 0,
P
b. E [X ] = all x pX (x) x = 0.2 1 + 0.8 0 = 0.2.
c. E [150 X ] = 150 E [X ] = 150 0.2 = 30.
d. Let Y be
the random variable of the asset value. We have:
0.2, if y = 1/1.1;
0.65, if y = 1;
and zero otherwise.
pY (y ) =
0.15, if y = 1/0.95,
E [1000 Y ] =1000 E [Y ]

1
1
=1000 0.2
+ 0.65 1 + 0.15
1.1
0.95
=1000 0.99689 = 996.89.
E [(150 X ) (1000 Y )] =E [150 X ] E [1000 Y ]

116/144
=30 996.89 = 29, 907.


Introduction
Course introduction
Exercise
Summary
Summary

X X gives deviation from the expected value.
Question: Can we use expected deviation as measure of

dispersion: E [X X ]?
Solution: No, we have that:
E [X X ] = E [X ] E [X ] = X X = 0.
One measure of dispersion: Mean absolute deviation:

MAD(X ) = E [|X X |] .
117/144
Note: the mean absolute deviation minimized when X is the

median of the distribution.

Variance of X
MAD is not easily to calculate, and has not the nice
properties (not related to moments of distribution).
Another measure of dispersion: Let X be a random variable,
the variance is given by:

2 = Var (X ) =E (X X )2

=E X 2 2X .

The function E (X )2 is minimized when = X .
The standard deviation of X is given by:

p
= Var (X ).
118/144

Properties of the variance:

Let X and Y be random variables, and m, b, c <, we have:
Var (c) = 0;
Var (mX + b) = m2 Var (X ) ;
Var (X + Y ) = Var (X ) + Var (Y ) , only if X , Y independent.
Proof:

Var (c) =E (c c )2 = E (c c)2 = 0

Var (mX + b) =E (mX + b mX +b )2 = E (mX + b mX b)2

=m2 E (X X )2 = m2 Var (X )
h
i
Var (X + Y ) =E ((X X ) + (Y Y ))2

=E (X X )2 + (Y Y )2 + 2(X X ) (Y Y )
=Var (X ) + Var (Y ) ,
119/144
* using independence between X and Y .

Exercises
The claims of the motor vehicle insurance are itself stochastic.
The distribution of the claim value of an insured is:

0,
if x < 0;
fX (x) =
5 exp ( x) , if x > 0;
pX (0) =0.8.
a. Question: Find the expected value of the claims size.
b. Question: Find the variance of the claims size.
c. Question: The price of an insurance contract is the expected
value plus half the standard deviation. Find the price of the
MVI contract.
d. Question: Same as c., but now for the 150 contract together.
120/144

Solution
R
a. E [X ] = 0 0.8 + 0 5 exp ( x) xdx
i
h
= 5 0
1)
= 0 + 5 exp(x)
()2
c. Price = E [X ] + 0.5
p
Var (X ) =
1
5
+ 0.5
3
5
q
i
P150
Xi + 0.5 Var ( i=1 Xi ) =
p
3 150
150E [X ]+0.5 150 Var (X ) = 150
+0.5
5
5 =
d. Price = E
121/144

(1) =

exp ( x) x 2 dx
E X 2 =02 0.8 +
5
0

2

x
2x
2
=0 + exp( x)
+
5
()2
()3 0

2
2
=
.
= 0
3
5
5 2

2
1
9
2
Var (X ) =E X 2 (E [X ]) = 2
=
.
5
252
252
b.
1
2
1
2 .
hP
150
i=1
913.5
30.22
.
1
5 .

Skewness
Let X be a random variable. The skewness of X is given by:
"
#
X X 3
.
X = E
X
Coefficient of skewness:
3 =

E X3
(E [X 2 ])3/2
Properties of skewness (X +Y is the s.d. of X + Y ):

X
mX +b
X +Y
122/144
= 0, if the distribution of X is symmetric;

= sign(m) X ;
=
X X3 +Y Y3
X3 +Y
if X , Y are independent.

Which statements are true?

Note: E[X ] = 0 and = 1 for i) and ii)
and E[X ] = 4 and = 3 for iii).
Pr(X=x)
Pr(X=x)
Pr(X=x)
distribution i)
1/2
1/3
1/6
0
1/2
1/3
1/6
0
2
1/2
1/3
1/6
0
10
123/144
0
x
distribution ii)
a. Question: Distribution i) has a

positive skewness.
a. Solution: True: skewness is 1.
b. Question: Distribution ii) has a
positive skewness.
0
x
distribution iii)
4
x
b. Solution: False: skewness is -1.

c. Question: Distribution iii) has a
smaller skewness than distribution ii).
c. Solution: False: both have the SAME
skewness.

Snedecors F p.d.f.
Snedecors F c.d.f.
1
v = 20, v = 40
1
0.9
cumulative density function
probability density function
0.8
0.8
0.6
0.4
0.7
0.6
0.5
0.4
0.3
0.2
0.2
0.1
0
0
124/144
2
x
0
0
v1= 20, v2= 40

2
x
Question: Positive/negative skew (skewed to the right/left)?

Solution: Positive skew (skewed to the right).

Beta p.d.f.
Beta c.d.f.
1
a= 4, b= 2
2
0.9
0.8
1.5
0.5
0.7
0.6
0.5
0.4
0.3
0.2
0.1
a= 4, b= 2
0
0
125/144
0.5
x
0
0
0.5
x
Question: Positive/negative skew (skewed to the right/left)?

Solution: Negative skew (skewed to the left).

Kurtosis
Let X be a random variable. The (excess) kurtosis of X is
given by:
"
#
X X 4
X = E
3.
X
Measures the peakedness (positive) or flatness (negative) of a
random variable.
Kurtosis coefficient:
4 =
126/144

E X4
(E [X 2 ])2

Which statements are true?

Note: E[X ] = 0, = 1, and = 0 for i) and ii)
and E[X ] = 4, = 3 and = 0 for iii).
1/18
3
Pr(X=x)
Pr(X=x)
distribution i)
8/9
0
x
distribution ii)
1/8
0
x
distribution iii)
3/4
1/8
10
127/144
a. Question: Distribution ii) has a

smaller excess kurtosis than
distribution i).
a. Solution: True: excess kurtosis for i)
is 6 and for ii) is 1.
3/4
Pr(X=x)
4
x
c. Question: Distribution iii) has a

smaller excess kurtosis than
distribution ii).
c. Solution: False: both have the SAME
excess kurtosis.

p.d.f.
c.d.f.
1
0.7
0.9
0.6
0.5
0.4
0.3
0.2
0.8
0.7
0.6
0.5
0.4
0.3
0.2
Uniform( 3, 3)
Normal(0,1)
Laplace(0,1/ 2)
0.1
0.1
0
0
x
0
x
Question: Positive/none/negative excess kurtosis?

Solution: Positive: green; none: red; negative: blue excess kurtosis.
128/144
Note: E[X ] = 0, Var (X ) = 1 and X = 0 for all distributions.

Exercises
An insurance company is pricing its policies using the
standard deviation pricing principle.
The regulator requires that insurers have enough reserves in
order to reduce the probability of ruin to 0.5%.
Holding capital is a cost.
a. Would the insurer company prefer claims with a distribution
I. with mean $100 and standard deviation of $5.
II. with mean $50 and standard deviation of $10.
b. Would the insurer company prefer claims with a distribution
129/144
I. with mean $100 and standard deviation of $5 and a skewness

of $5.
II. with mean $100 and standard deviation of $5 and a skewness
of -$5.

Exercises
c. Would the insurer company prefer claims with a distribution
I. with mean $100 and standard deviation of $5 and a skewness
of $0 and a kurtosis of $5.
II. with mean $100 and standard deviation of $5 and a skewness
of $0 and a kurtosis of $2.
d. Would the insurer company prefer claims with a distribution

I. with mean $100 and standard deviation of $5, a positive
skewness and a negative kurtosis.
II. with mean $100 and standard deviation of $5, a negative
skewness and a positive kurtosis.
Solution:
a.
b.
c.
d.
130/144
II.
II.
II.
Cannot say from the question.


Introduction
Course introduction
Exercise
Summary
Summary

r th central moments
Let X be a r.v., the rth central moment is given by:
X
(x X )r pX (x) ,
all x
if
is discrete;
r
ZX
E [(X X ) ] =
(x X )r fX (x) dx,

if X is continuous.
Relation central & non-central moments:
r
h i
X
r
r
E [(X X ) ] =
E X k (X )r k .
k
k=0
* using binomial expansion.

131/144

(Non-central) Moments
Let X be a r.v., the rth (non-central) moment is given by:
X
x r pX (x) ,
all x
if
is discrete;
ZX
E [X r ] =
x r fX (x) dx,
if X is continuous.
Consider the Motor Vehicle insurer from slide 115.
Recall the mean of the claim size from b. on slide 116.
a. Question: Find the second central moment for an insured.
a. Solution:
Var (X ) =

E hX 2 (E [Xi ])2 = (0.2 12 + .08 02 ) 0.22 = 0.16 or
132/144
E (X X )2 = (1 0.2)2 0.2 + (0 0.2)2 0.8 = 0.16.

Exercises
b. Question: Find the skewness using the third central moment.
b. Solution: Start with the third central moment:
h
i
E (X X )3 =0.2 (1 0.2)3 + 0.8 (0 0.2)3 =0.1024 0.0064=0.096.
h
i
#
"
3
3
E
(X
)
X
0.096
(X X )
=
=
= 1.5.
X =E
3
3
0.163/2
c. Question: Find
and third non-central moments.
2 the second
2
c. Solution:
E X = 1 0.2 + 02 0.8 = 0.2 and
3
E X = 13 0.2 + 03 0.8 = 0.2.
d. Question: Find the skewness using only non-central moments.
d. Solution:
h
i

E (X X )3 = E X 3 3X X 2 + 32X X 3X =

E X 3 3X E X 2 + 32X E [X ] 33X =0.2 0.12 + 0.024 0.008=0.096.
133/144

Knowledge of the mean, variance, skewness and kurtosis can

provide useful knowledge of the distribution.
Theorem: The complete set of all moments is required to

characterize an arbitrary distribution, i.e., every distinct
distribution function has an unique set of moments.
Proof: Using Taylor series:
t2
t3
1 + E [X ] t + E X 2
+ E X3
+ ...
2!
3!

t2
t3
+ X3
+ ...
= E 1 + X t + X2
2!
3!
h i
= E e Xt ,
* using exp(x) = 1 + x +
134/144
x2
2!
x3
3!
+ . . ., with x = X t.


Introduction
Course introduction
Exercise
Summary
Summary

Moment generating function (mgf) of a r.v.

The moment generating function of a r.v. X is defined as:
h
i
MX (t) =E e X t
t3
t2
+ E X3
+ ....
=1 + E [X ] t + E X 2
2!
3!
Properties of m.g.f.:
(r )
MX (0) = E [X r ] ,
for r = 0, 1, 2, 3, . . . ;
MmX +b (t) = MX (m t) e bt , for constants m, b;
MX +Y (t) = MX (t) MY (t), only if X , Y are independent.
135/144

Use of moment generating function

Relation m.g.f. and non-central moments: we can write the
m.g.f. as an infinite series of the moments as follows:
h
i X
tk
MX (t) = E e X t =
k .
k!
k=0
Generating non-central moments using the m.g.f.: we can

generate the moments from the m.g.f. using the relationship:

(r )
r
.
r = E [X ] = M X (t)
t=0
Function of random variables: week 4.

136/144

Proof: To prove the above result, consider the continuous

case (similar proof in the discrete case):
Z
r h X t i
r
(r )
MX (t) = r E e
e xt fX (x) dx
= r
t
t

Z r
xt
=
e
fX (x) dx
t r
Z

=
x r e xt fX (x) dx
h
i
=E X r e X t .
Set t = 0 and you get the desired result.
Remark: If the m.g.f. exists for t in an open interval
containing zero, then it uniquely determines the probability
distribution.
137/144

Example: Consider the MVI from slide 115. The m.g.f. is:
h i X
MX (t) = E e Xt =
pX (x) e xt = 0.2 e 1t + 0.8 e 0t = 0.8 + 0.2e t .
all x
The first k non-central moments are:

MX (t)
= 0.2e t t=0 = 0.2
E [X ] =

t
t=0
h i k M (t)

X

E Xk =
= 0.2e t t=0 = 0.2.

k
t
t=0
The kurtosis is given by:

Var (X ) =E X 2 (E [X ])2 = 0.2 0.22 = 0.16.
h
i

E (X X )4 =E X 4 4X E X 3 + 62X E X 2 43X E [X ] + 4X
138/144
=0.2 0.16 + 0.048 0.0064 + 0.00032 = 0.08192.

h
i
X =E (X X )4 /Var (X )2 3 = 0.08192/0.162 3 = 0.2.

Exercises
Consider the Motor Vehicle insurer from slide 120.
a. Question: Determine the m.g.f. of the claim size.
b. Question: Use the m.g.f. to determine the skewness.
a. Solution: We have:
Z
h
i
x tx
tX
0t
MX (t) = E e
=0.8 e +
e
e dx
5
Z 0
(t)x
=0.8 +
e
dx
5
0

=0.8 +
e (t)x
5 (t )
0
=0.8
.
5 (t )
139/144
* note m.g.f. exists if t < 0.

b. Solution: Using MX (t) = 0.8
5(t) .
Find the non-central moments using the derivatives:

MX (t)
1
2
E [X ] =
=
1 (t )
=

t
5
5
t=0

t=0
2

MX (t)
2

= 2 (t )3
= 2
E X2 =

2
t
5
5
t=0

t=0
3

MX (t)
6
2

E X3 =
3 (t )4
= 3.
=

3
t
5
5
t=0
t=0
2
1
9
2
=
Var (X ) =E X (E [X ])2 = 2
5
(5)2
252

E (X X )3
E X 3 3X E X 2 + 32X E [X ] 3X
X =
=
Var (X )3/2
Var (X )3/2
6
12
11
6 52
25
1
3 3 253 + 2 1253
= 5
=
3 3 + 2 3 = 122/27.
3
3
(3/(5))
3
3
3
140/144

Probability generating function (p.g.f.) of a r.v.

Let Y be an integer-valued random variable with
Pr(Y = i) = pi for i = 0, 1, 2, . . ., the p.g.f. is defined as:
h i X
PY (t) = E t Y =
pY (i) t i .
i=1
Properties of p.g.f.:
- The relationship between p.g.f. and m.g.f. is as follows:
PY (t) = MY (log(t)) .

(r )
- Probabilities: Pr(Y = r ) = P Y (t) /r !
t=0
- Take the k th derivative:

(k)
PY (1) = E [Y (Y 1) (Y 2) . . . (Y k + 1)] [k] .
141/144

Cumulant generating function (c.g.f.) of a r.v.

The cumulant generating function CX (t) for a random
variable is given by:
X
1 i
h i or MX (t) = e CX (t) ,
CX (t) = log(MX (t)) =
i!
i=1
where i =
(i)
CX (0)
is the i th cumulant.
Properties of c.g.f.:
i = i th central moment, for r = 2, 3;
CmX +b (t) = CX (m t) + b t,
for constants m, b;
CX +Y (t) = CX (t) + CY (t),
X , Y independent.
(r )
142/144
Note that CX (0) 6= E [(X X )r ] ,
for r = 4, 5, 6, . . ..

Summary
Summary

Introduction
Course introduction
Exercise
Summary
Summary

Summary
Summary
Mathematical Expectation
The mathematical expectation of h (X ) is:
X
h (x) pX (x) ,
if X is discrete;
all x
E [h (X )] =
Z
h (x) fX (x) dx, if X is continuous.
Mean: = E [X ].
Moments: r = E [X r ] refers to the r th (non-central)
moment.
Central moments: E [(X x )r ] refers to the r th central
moment.
143/144

Summary
Summary
Dispersion
h
i

Variance: 2 = Var (X ) = E (X )2 = E X 2 2 .
#
X X 3
refers to the skewness. It
Skewness: X = E
X
measures the lack of symmetry in the p.d.f..
"
#
X X 4
Kurtosis: X = E
3 refers to the kurtosis.
X
It measures the peakedness or flatness of the p.d.f..

Moment generating function: MX (t) = E e X t , it
uniquely defines a density function.
It is useful to calculate

(r )
r
moments: r = E [X ] = M X (t) .
"
t=0
144/144

Week 1 Annotated

Diunggah oleh

Informasi Dokumen

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Week 1 Annotated

Diunggah oleh

Hak Cipta:

Format Tersedia

ACTL2002/ACTL5101 Probability and Statistics: Week 1 Video Lecture Notes

ACTL2002/ACTL5101 Probability and Statistics

Week 1 Video Lecture Notes

ACTL2002/ACTL5101 Probability and Statistics: Week 1 Video Lecture Notes

ACTL2002/ACTL5101 Probability and Statistics: Week 1 Video Lecture Notes

Calculating with probabilities

ACTL2002/ACTL5101 Probability and Statistics: Week 1 Video Lecture Notes

Sample Space & -algebra

An element E of F is called an event. It is nothing but a

ACTL2002/ACTL5101 Probability and Statistics: Week 1 Video Lecture Notes

If there are N elements of the sample space, a -algebra F of

ACTL2002/ACTL5101 Probability and Statistics: Week 1 Video Lecture Notes

The following collection of 24 = 16 sets is a -algebra F :

(H, D, S) (H, D, S, C )(= )

Each element in the -algebra is called an event.

ACTL2002/ACTL5101 Probability and Statistics: Week 1 Video Lecture Notes

ACTL2002/ACTL5101 Probability and Statistics: Week 1 Video Lecture Notes

Calculating with probabilities

ACTL2002/ACTL5101 Probability and Statistics: Week 1 Video Lecture Notes

A random experiment is therefore described as a probability

ACTL2002/ACTL5101 Probability and Statistics: Week 1 Video Lecture Notes

Calculating with probabilities

ACTL2002/ACTL5101 Probability and Statistics: Week 1 Video Lecture Notes

ACTL2002/ACTL5101 Probability and Statistics: Week 1 Video Lecture Notes

ACTL2002/ACTL5101 Probability and Statistics: Week 1 Video Lecture Notes

Properties of the Probability Measure

ACTL2002/ACTL5101 Probability and Statistics: Week 1 Video Lecture Notes

Properties of the Probability Measure

ACTL2002/ACTL5101 Probability and Statistics: Week 1 Video Lecture Notes

In the case of three events, we have:

Pr (E1 ) + Pr (E2 ) + Pr (E3 )

* using additive rule: Pr (A B) = Pr (A) + Pr (B) Pr (A B) with

ACTL2002/ACTL5101 Probability and Statistics: Week 1 Video Lecture Notes

which follows the additive law of probability.

ACTL2002/ACTL5101 Probability and Statistics: Week 1 Video Lecture Notes

ACTL2002/ACTL5101 Probability and Statistics: Week 1 Video Lecture Notes

Calculating with probabilities

ACTL2002/ACTL5101 Probability and Statistics: Week 1 Video Lecture Notes

provided Pr (B) > 0, otherwise Pr (A |B ) = 0.

ACTL2002/ACTL5101 Probability and Statistics: Week 1 Video Lecture Notes

ACTL2002/ACTL5101 Probability and Statistics: Week 1 Video Lecture Notes

Law of total probability

ACTL2002/ACTL5101 Probability and Statistics: Week 1 Video Lecture Notes

Calculating with probabilities

ACTL2002/ACTL5101 Probability and Statistics: Week 1 Video Lecture Notes

Note that we say the collection of events E1 , E2 , . . . , En are

ACTL2002/ACTL5101 Probability and Statistics: Week 1 Video Lecture Notes

ACTL2002/ACTL5101 Probability and Statistics: Week 1 Video Lecture Notes

Mutually Exclusive Events: A and B are said to be mutually

ACTL2002/ACTL5101 Probability and Statistics: Week 1 Video Lecture Notes

ACTL2002/ACTL5101 Probability and Statistics: Week 1 Video Lecture Notes

Calculating with probabilities

ACTL2002/ACTL5101 Probability and Statistics: Week 1 Video Lecture Notes

ACTL2002/ACTL5101 Probability and Statistics: Week 1 Video Lecture Notes

a. Solution: Let C = be the event that there is a claim, then

ACTL2002/ACTL5101 Probability and Statistics: Week 1 Video Lecture Notes

b. Solution: Similar to a.,

ACTL2002/ACTL5101 Probability and Statistics: Week 1 Video Lecture Notes

Calculating with probabilities

ACTL2002/ACTL5101 Probability and Statistics: Week 1 Video Lecture Notes

Permutation: The number of ways of arranging n distinct

ACTL2002/ACTL5101 Probability and Statistics: Week 1 Video Lecture Notes

Combination: The number of ways of choosing r objects from

Multinomial: The number of ways that n objects can be

Exercise: Calculate E [X ] when X is a r.v. with pdf: