Anda di halaman 1dari 249

Review of last lecture

• Midterm on April 5

• Check your CIVL2160 ID

• Any questions?
Ch1 Uncertainty
• “Uncertainty is the refuge of hope.”
Henri Frederic Amiel

• “For my part I know nothing with any certainty,


but the sight of the stars makes me dream.”
Vincent van Gogh

• “Uncertainty is fun.”
JP Wang
Aleatory uncertainty in daily life

• 290, 310, 430, 285, 270, 340, 280, 300


=> this is what I paid for taxi from HK airport to
HKUST

• The variation is mainly caused by natural


randomness
Epistemic uncertainty

• In studying geotechnical
foundation engineering, you will
be introduced with Terzaghi and
Meyerhof and Vesic methods, all
targeting on the same problem

• Epistemic uncertainty is a result


of our imperfect understanding of
the real world
Summary of Ch 1

• Aleatory uncertainty => natural randomness


for example: taxi fare,…..

• Epistemic uncertainty => imperfect knowledge


for example: shallow foundation design
Ch2 Probability
• Probability is a measure of uncertainty
Calculating probability

• By definition: “Probability is the chance of


the occurrence of the events relative to
other events.” “Probability is the ratio of
sample points to the sample space.”

• Let’s find out what is the terminology about


by looking at the following example:
Flipping a coin twice
• When you flip a coin twice, you get:
HH, HT, TH, TT

• What is the probability of getting at least one H?


=> 3/4

• What is the probability of getting exactly one H?


=> 2/4

• Therefore,
=> probability = sample point / sample space
Finite and infinite sample space

• The sample space of flipping a coin twice


=> finite

• Taxi fare from HKUST to airport


=> infinite
Ex2.1
The beam designed to carry a load of 100 kN at any point
along the beam

1) sample space of Ra?


=> [0, 100]

2) Pr (Ra > 80 kN)?


=> 1 / 5
Venn’s diagram

• A diagram showing a convenient visualization on


the relationship between sample space, sample
point

• Pr. = area sample point / area sample space


Union and intersection and complementary

• Union (denoted as U): A or B

• Intersection (denoted as ∩): A and B

• Complementary (denoted as Ē or E’):


Pr (A) + Pr(A’) = 1
Tree diagrams

Ex 2.2
Jerry managed three projects and the outcome will
be good, bad, and terrible with equal chance.
He was told that as long as he gets at least one
bad and one terrible, he is then the history of the
company. What is the probability of Jerry getting
fired?

=> at least two “good”


GGG,
GGB, GBG, BGG,
GGT, GTG, TGG, => (27-7) / 27 = 20/27
Review of last lecture

• Aleatory and epistemic uncertainty

• Alea (Latin) : a game with dice

• Episteme (Greek): knowledge

• Venn’s diagram
Tree diagrams

Ex 2.2
Jerry managed three projects and the outcome will
be good, bad, and terrible with equal chance.
He was told that as long as he gets at least one
bad and one terrible, he is then the history of the
company. What is the probability of Jerry getting
fired?

=> at least two “good”


GGG, GGG’, GG’G, G’GG
= (27-4) / 27 = 23/27
Proposition 2.1:

Pr(A U B) = Pr(A) + Pr(B) – Pr(AB)

• Proof with Venn’s diagram:


Ex 2.3
F ~ (100, 300) kN,

1) sample space Ra
Ra : [0, 300]

2) sample space Rb
Rb: [0, 300]

4
Fundamental probability rules:
• Commutative rule (switching place)
=> A U B = B U A ; AB = BA

• Associate Rule (changing bracket)


=> (A U B) U C = A U (B U C)
(AB)C = A(BC)

5
• Distributive rules => (A U B)C = AC U BC

!! NOTES: “U” is like “+”


“∩” is like “x”
• de Morgan’s rule
(A U B)’ = A’B’ or (AB)’ = A’ U B’

• General de Morgan’s rule:


=> (A U B U …..)’ = A’B’………
Ex 2.4
de Morgan’s rule in system failure

Given A => link one OK ; A’ => link one NG


B => link two OK ; B’ => link two NG

Because the system will fail when A or B is NG


=> System failure => A’ U B’
Also because system failure = 1 – system OK
= 1 - both OK => 1 – AB = (AB)’
Mutually exclusive and statistical independent

• ME => Pr(AB) = 0

• SI = Pr(AB) = Pr(A) x Pr(B)

Ex 2.5
Draw respective Venn’s diagram for ME and SI
Conditional probability *****

• Pr (A | B)
=> probability of A happening GIVEN B has occurred

• Pr (A | B) = Pr(AB) / Pr(B) *****

• When A and B is “SI,” meaning


Pr(A) = Pr(A | B), so that
Pr(AB) = Pr(A) x Pr(B)
Ex2.6
Traveling problem.

A => route 1 open


B => route 2 open
Pr(A) = 0.75
Pr(B) = 0.5
Pr(AB) = 0.4

Pr(route 1 open given route 2 open)?


=> Pr (A | B) = Pr(AB) / Pr(B) = 0.4 / 0.5 = 0.8

11
A => route 1 open
B => route 2 open
Pr(A) = 0.75
Pr(B) = 0.5
Pr(AB) = 0.4

Pr(route 1 not open given route 2 not open)?


=> Pr (A’ | B’) = Pr(A’B’) / Pr(B’)
because Pr(A’B’) = 1 – Pr (A U B)
= 1 – Pr(A) – Pr(B) + Pr(AB) = 0.15
So Pr (A’ | B’) = 0.15 / 0.5
A’B’
= 0.3
A B

12
Review of last lecture

• Venn’s diagram examples

• Conditional probability *****

Pr (A | B) = Pr(AB) / Pr(B)

• SI => Pr(AB) = Pr(A) x Pr(B)


Ex2.6
Traveling problem.

A => route 1 open


B => route 2 open
Pr(A) = 0.75
Pr(B) = 0.5
Pr(AB) = 0.4

Pr(route 1 open given route 2 open)?


=> Pr (A | B) = Pr(AB) / Pr(B) = 0.4 / 0.5 = 0.8

2
A => route 1 open
B => route 2 open
Pr(A) = 0.75
Pr(B) = 0.5
Pr(AB) = 0.4

Pr(route 1 not open given route 2 not open)?


=> Pr (A’ | B’) = Pr(A’B’) / Pr(B’)
because Pr(A’B’) = 1 – Pr (A U B)
= 1 – Pr(A) – Pr(B) + Pr(AB) = 0.15
So Pr (A’ | B’) = 0.15 / 0.5
A’B’
= 0.3
A B

3
• What is your findings?

=> A = AE1 U AE2 U …

• What is the relationship between


Pr(A | E1), Pr(AE1) and Pr(E1)
=> Pr(A | E1) = Pr(AE1) / Pr(E1)
=> Pr(AE1) = Pr(A | E1) x Pr(E1)

• What is Pr(A)?
Theorem of total probability
Ex2.7: flood question
Given:
H: heavy snow accumulation; Pr(H) = 0.2
N: normal snow accumulation ; Pr(N) = 0.5
L: light snow accumulation ; Pr(L) = 0.3
Pr(F|H) = 0.9; Pr(F|N) = 0.4; Pr(F|L) = 0.1
F: flood

What is Pr(F)?
• Can we solve the problem with alternative or
“elementary-school” methods?

As far as 100 years are concerned, based on the


probability there will be 20, 50, 30 years having
heavy, normal, and low snow accumulation. As
a result, 18, 20, and 3 years are having flood, so
that the probability of flood is 0.41.
Ex2.8: Hurricane and damage
Given:
A hurricane labeled as C1 (weak) to C5 (strong).
Pr(C1) = 0.35 ; Pr(D|C1) = 0.05
Pr(C2) = 0.25 ; Pr(D|C2) = 0.1
Pr(C3) = 0.14 ; Pr(D|C3) = 0.25
Pr(C4) = 0.05 ; Pr(D|C4) = 0.6
Pr(C5) = 0.01 ; Pr(D|C5) = 1
D : damage caused by hurricanes

What is Pr(D)?
=> Pr(D) = 0.35 x 0.05 + 0.25 x 0.1 + 0.14 x 0.25
+ 0.05 x 0.6 + 0.01 x 1 = 0.12
• What if now you are interested in: given A
happening, what is the probability of E1
happening
=> Pr(E1 | A) = Pr(E1A) / Pr(A)
Pr(Ei | A) = Pr(EiA) / Pr(A) (1)

• How to find Pr(A)


=> total probability
=> Pr(A) = ∑ Pr(A | Ei) x Pr(Ei) (2)

• What is the relationship between


Pr(A | Ei), Pr(AEi), and Pr(Ei)
=> Pr(A | Ei) = Pr(AEi) / Pr(Ei)
=> Pr(AEi) = Pr(A | Ei) x Pr(Ei) (3)

• Put (2) and (3) into (1) =>


=> Pr(Ei | A) = Pr(A | Ei) x Pr(Ei) / ∑ Pr(A | Ei) x Pr(Ei)
Bayes’ theorem

Pr( A | Ei )  Pr(Ei )
Pr(Ei | A) 
 Pr( A | Ei )  Pr(Ei )
Ex2.9: construction question
You order 60% and 40% of aggregates from A and B
company. The sub-standard rates are 3% and 1%.
What is poor aggregates from A?

• Let’s do some translation at first:


Pr(A) = 0.6 ; Pr(B) = 0.4
Let P => poor aggregate
Pr(P | A) = 0.03 ; Pr(P | B) = 0.01
and we want to find Pr( A | P)

• Using the Bayes’ theorem:


Pr( A | P) =
Pr( P | A) x Pr(A) / {Pr( P | A) x Pr(A) + Pr( P | B) x Pr(B)}
=> Pr( A | P) = 0.82
• How about using elementary-school methods

if I order 100 kg, what will happen?

A B
Aggregates 60 40
Poor quality 1.8 0.4

Pr(Poor aggregates from A) = 1.8 / 2.2 = 0.82


Review of last lecture

• Theorem of total probability

• Bayes’ theorem
Summary of Ch 2
• Probability =

• Venn’s diagram

• Conditional probability
=> Pr(A|B) = Pr(AB) / Pr(B)

• Total probability theorem and Bayes’ theorem


Ch3 Probability distributions
Random variable (RV)
• “possible outcomes of a random process be
presented by a number”

• Ex: How much time do you sleep

• The outcome of tossing a coin, H and T, is not a


RV. But when one specifies A = 0 and 1 when H
and T occurring, respectively, A is a RV
Probability model
• The probability density of a RV can be modeled
by a function, which is known as probability
models such as the normal distribution

35

30

25
No. Students

20

15

10

0
30-40 40-50 50-60 60-70 70-80 80-90 90-100

CIVL 2160 Midterm in 2012


• CDF and PDF (you can get one from the other)
CDF = cumulative density function;
PDF = probability density function;

PDF = fX(x) = Pr(X = x) CDF = FX(x) = Pr (X ≤ x) *****


Discrete and continuous RV

Discrete RV Continuous RV
Basic axioms:
F (infinite) = Pr( X ≤ infinite) = 1
F(-infinite) = Pr( X ≤ - infinite) = 0
Pr (a < Y ≤ b) = F(b) – F(a)
Ex3.1: construction management example
you manage to use three bulldozers in
construction. Each of them has 50-to-50
probability to be functional or not after six
months. Let Y is a RV of functional bulldozers
1) what type of Y is it? 3) PDF of Y
=> discrete RV

2) range of Y
=> 0 ~ 3
4) CDF of Y
F(0) = f(0) = 1/ 8
F(1) = f(0) + f(1) = 1/2
F(2) = f(0) + f(1) + f(2) = 7/8
F(3) = 1

5) Plots of CDF and PDF


Ex3.2: service life of bulldozers (T)
1) what type of RV it is
=> continuous

2) Given PDF of T:
What is the CDF?
3) Given λ = 1 years, what is the probability that a
bulldozer will fail within 2 years?
=> Pr(T ≤ 2) = F(2)
= 1 – e-1x2 = 0.86

Note: This specific probability function is the so-


called exponential model
Ex3.3: service life of bulldozers (T)
1) Given PDF of T:

and σ = 10 and μ = 25, what is Pr(X ≤ 25)

=> 0.5

Note: This specific probability function is the so-


called the normal distribution or Gaussian
distribution
Review of last lecture

• RV

• PDF

• CDF
How to describe a RV, or the ID of a RV
1. Distribution
2. Central values
– Mean => E(X) or μX
– Mode => Pr (X = xmode) = highest
– Median => Pr (X ≤ x50) = 0.5
3. Variability
– Standard deviation or variance (= SD2)
=> V(X) = E[(X - μX)2]
4. Higher order of central moment => skewness,
kurtosis, etc…
Ex3.4: Statistics about how we use the gym in
HKUST
1) What is mode?
120 => 1
120
A total of 420 data
110
No. people using the HKUST Gym

100

90

80
70
2) What is median?
70

60
50 50
60
=> 4
50
40
40
30
30

20

10
3) What is mean?
0
1 2 3 4 5 6 7 => 3.7
Duration (in 10 minutes)
What is “E” or how to calculate the mean and SD
• Let f(x) is the PDF of X,
E(X) = ∫ xf(x) dx ***** (continuous RV)
E(X) = ∑ xf(x) ***** (discrete RV)

V(X) = E[(X - μX)2] = ∫ (x - μX)2 f(x) dx *****


V(X) = E[(X - μX)2] = ∑ (x - μX)2 f(x) *****

• How about E(HKUST) = ?


E(HKUST) = ∫ HKUST * PDF of HKUST dx
• E[(X - μX)4]
= ∫ (x - μX)4 f(x) dx (kurtosis of the RV)
Ex3.5: Statistics about how we use the HKUST Gym
1) E[(X)]
= 1x120/420
120
120 + 2x 50/420
A total of 420 data
110
+ … = 3.7
No. people using the HKUST Gym

100

90 2) E[(X - μX)2]
80

70
70 = (1-3.7)2x120/420
60
60

50
50 50 +(2-3.7)2x120/420
40
40
30 + … = 5.2
30

20 3) E[(X - μX)3]
10

0
= (1-3.7)3x120/420
1 2 3 4 5 6 7

Duration (in 10 minutes) +(2-3.7)3x120/420


+ … = 1.6 (skewness)
Normal distribution *****
• Time back to the 19th century as Gauss measured a
table in his living room

• Errors are independent

• Large errors are more unlikely to occur than small


errors

• Positive and negative errors are equally likely

• PDF:
Ex3.6: HKUST students’ height
Given our height (X) following the normal
distribution with mean and SD equal to 170 cm
and 7.5 cm, what is the probability that Jerry’s
roommate is taller than 185 cm

1) Translation => Pr(X > 185) = 1 – Pr(X ≤ 185)


2) Find ∫ dx,

However, looks like it is not that easy to solve,


so Pr(X ≤ 185; μ = 170, σ = 7.5)
= Ф{(185-170) / 7.5} = Ф(2) = 0.98
=> Pr(X > 185; μ = 170, σ = 7.5) = 0.02
How to find Ф()
• Probability table
1) Ф(0.15)
=> 0.5596

2) Ф(-0.15)
=> 1-Ф(0.15)
= 0.4404
Review of last lecture

• How to describe a RV: central value, variability


(SD), skewness, etc…(higher-order central
moment), and probability distribution

• Expected value: “E” *****


for example: E(X) = ∫ xf(x) dx

• Normal distribution *****


Ex3.7
Given X follows a symmetrical probability
distribution, what is E(X – μX)?

=> E(X – μX) = E(X) – E(μX)

PDF
=∫ xf(x) dx – μX Skewness = 0
= μX – μX = 0
X

How about E{(X – μX)3}


=> similarly, the skewness is 0
PDF Skewness > 0

X
How to find Ф()
• Using Excel
Ф(0.15) = NORMSDIST(0.15)

• How to use NORMSDIST to make the normal


probability table of your own ?
=> details given in the Excel supplemental
material by JP available on LMES
Ex3.8: 2012 CIVL 2160 midterm
Given the midterm scope X ~ N(70, 12). If you got
90 this year, how many of your classmates are
probably doing better than you? given 133
students in total this year.

1) Translation => Pr(X > 90) = 1 – Pr(X ≤ 90)


2) Pr(X ≤ 90)
= Ф{(90-70) / 12} = Ф(1.67) = 0.95
=> Pr(X > 90) = 0.05
so around 6 ~ 7 students getting score > 90.
Lognormal distribution
• When X not following the normal distribution but
ln(X) does, we called X ~ the lognormal distribution

• Pr(X ≤ x) = Pr(lnX ≤ lnx)


= Ф {(lnx - μlnX) / σlnX)}

• μlnX = lnμX – 0.5(σlnX)2


σlnX2 = ln{1 + (σX / μX) 2}

• Approximation when σX / μX < 0.3:


σlnX = σX / μX
μlnX = ln(X50)
Ex3.9
The number X of students attending CIVL2160 follows
a lognormal distribution and the mean, median and
SD are 60, 60 and 15, respectively. What is the
probability that JP prepares 80 copies of handout
that is short in a class?

=> Pr(X ≤ 80) = Pr(lnX ≤ ln80)


σlnX = σX / μX = 0.25
μlnX = ln(X50) = 4.1
Pr(lnX ≤ ln80) = Ф{(ln80 – 4.1) / 0.25)}
= Ф (1.15) = 0.88
=> It is about 12% that 80 handouts are not enough
Ex3.10
Following Ex3.9, what is the PDF?

=> Compute Pr(X=1), Pr(X=2),………….


For such a repetitive calculation, Excel is your
good friend most of the time.
1.8

1.6

1.4

1.2

0.8

0.6

0.4

0.2

0
0 20 40 60 80 100 120 140
-0.2
Uniform distribution
• Clearly, it means that the probability density is
uniform, so that its PDF looks like follows when X in
[a, b]

PDF

a X b

• The probability density = 1 / (b-a), because the area


is equal to 1
Ex3.11
What is the variance and skewness of X
following the uniform distribution
between a and b
=> Mean of X = (a+b) / 2

PDF
=> PDF of X = 1 / (b-a)
a X b

=> V(X) = E[(X - μX)2] = ∫ (x - μX)2 f(x) dx


= ∫ (x – ((a+b) / 2))2 (1 / (b-a)) dx
= (b-a)2 / 12

=> Symmetrical => skewness = 0


Poisson distribution
• To simulate the number of rare events such as
earthquakes

• PDF => Pr(X = x; v) = vxe-v / x! ; v = mean rate

• It is a model for a discrete RV


Ex3.12
Bus M91 should come to HKUST twice in an hour,
what is the probability that you wait 30 minutes for
nothing?

=> Mean rate = 2 per hour = 1 per 30 minutes


Pr(X = 0; v = 1) = 10e-1 / 0!
= e-1 = 37%

0.35

What is the PDF? 0.30

mean rate = 1
=> Once again, Excel 0.25
PDF

0.20

can help 0.15

0.10

0.05

0.00
0 1 2 3 4 5 6 7 8 9 10

X
Review of last lecture

• Lognormal distribution: lnX following normal

• Uniform distribution: probability density is


uniform

• Poisson distribution: for rare event such as


earthquakes

• Excel can help you the plotting works


Ex3.12
Bus M91 should come to HKUST twice in an hour,
what is the probability that you wait 30 minutes for
nothing?

=> Mean rate = 2 per hour = 1 per 30 minutes


Pr(X = 0; v = 1) = 10e-1 / 0!
= e-1 = 37%

0.35

What is the PDF? 0.30

mean rate = 1
=> Once again, Excel 0.25
PDF

0.20

can help 0.15

0.10

0.05

0.00
0 1 2 3 4 5 6 7 8 9 10

X
Exponential distribution
• It is part of the statistical Poisson process. The
difference between it from the Poisson model is
that the exponential model is about this RV: the
recurrence time

• For example: what are the probabilities that the


next earthquake occur in 1 year and 100 years

• CDF: Pr(T ≤ t) = F(t) = 1 – e-vt


here v is the return period per unit time, or
the rate in unit time is equal to 1
Ex3.13
Bus M91 should come to HKUST twice in an hour,
what is the probability that you wait 30 minutes for
nothing?

=> Pr(T > 30) = 1 - Pr(T ≤ 30)


Let’s make 30-minute a unit, so that the problem
becomes: Pr(T > 1), and one bus should come in
this unit time

=> 1 - Pr(T ≤ 1) = 1 – (1 – e-1x1) = e-1x1 = 37%

=> the same as Ex3.12 using the Poisson distribution


Binomial distribution
• When the outcome is “yes or no,” what is the no. of
“yes” in n trials

• Example: flipping a coin 100 times

• PDF is totally derivable; say, m “yes” in n trials,


given the probability of getting “yes” in each trial = p

• Apparently, m trials of “yes” so (n-m) trials of “no”

n
=> Pr (M = m) = pm x (1-p)n-m x  
m
Ex3.14
You manage three projects and the probability that
your supervisor is happy and not happy with your
performance is 0.9 and 0.1. What is the probability
that your boss is happy two out of three projects?

=> 0.9 x 0.9 x 0.1 x 3 = 0.243

What about 18 out of 30?

=> In Excel: BINOMDIST(30,18,0.9,false) = 0.000013


Summary:
• normal, lognormal, uniform, Poisson, exponential,
binomial

• Those not introduced:


Gamma, Beta, Weibull, Pareto, …

• No matter what model you are using, ideas are the


same and the difference is the PDF and CDF

• Say, what is your midterm scope greater than 80


=> Pr(X > 80) = 1 - Pr(X ≤ 80)
= 1 - ∫f(x)dx = 1 – F(80)
Summary of Ch3:
• PDF = Pr (Y = y) = f(y)
• CDF = Pr (Y ≤ y) = F(y)
• Normal dist., Ф()
• Lognormal is closely related to normal
• Binomial => good and not good
• Poisson and Exponential => rare events
• Uniform => is uniform in PDF
Review of last lecture

• Binomial distribution and exponential distribution

• For the lottery question in HW1 (due this Friday):

i) 1/40 x 1/39 x 1/38 x 1/37 x 1/36


ii) 5/40 x 4/39 x 3/38 x 2/37 x 1/36
Ch4 Multiple RV and function of RV

• Sometimes, the problem involves multiple RVs;

• For example: your midterm score and final


score; your stress level and the level of
happiness

• PDF and CDF is the terminology used in Ch3; in


this chapter, joint PDF and CDF are those we
are referring to
CDF and PDF:
FX (x) = Pr(X ≤ x)
fX (x) = Pr(X = x)

Joint CDF and PDF:


FX, Y (x, y) = Pr(X ≤ x, Y ≤ y)
fX, Y (x, y) = Pr(X = x, Y = y)

F (∞, ∞) = 1
Pr
F (-∞, -∞) = 0
Y
X
• Joint PDF can be reduced to single PDF or the
so-called marginal PDF

Pr (Y = 10) = fY(10)
The summation of blue columns
Pr (Y = 20) = fY(20)
The summation of purple columns
30 Pr (Y = 30) = fY(30)
20
1 10 The summation of yellow columns
2
3

=> Marginal fY(y) = f(x1, y) + f(x2, y) + …. = ∑ f(xi, y)

=> Marginal fX(x) = f(x, y1) + f(x, y2) + …. = ∑ f(x, yi)


Ex4.1

What is the joint PDF?


What is fX( x = 8)?
=> marginal PDF of X => sigma (∑) Y
fX( x = 8) = 0.036 + 0.216 + 0.18 = 0.43

What is fY( y = 90)


=> marginal PDF of Y => sigma (∑) X
=> fY( y = 90) = 0.072 + 0.18 + 0.079 + 0.014
= 0.35

6
Ex4.2: continuous case
6
 ( x  y 2 ) ; 0  x  1, 0  y  1
f ( x, y )   5

 0

What is fX(x)?
1
f X ( x)   f ( x, y )dy
0

1
6
  ( x  y 2 )dy
0
5

6 2
 x
5 5
What is the No.1 quality of a “legal” joint PDF
=> The summation of it needs to be 1.0

Ex4.3: Show the joint PDF in Ex4.2 is good

6
 ( x  y 2 ) ; 0  x  1, 0  y  1
f ( x, y )   5

 0

1 1
6
Since   ( x  y 2 )dxdy  1 ,
x 0 y 0
5

this joint PDF is legal


Ex4.4: Is the following joint PDF is good?

=> ∫ ∫ PDF dxdy 1.0

1 1

 
0.8

Can we do ?
x 0 y 0 0.6
Y x*, 1-x*
0.4
1 1 x

How about  
x 0 y 0
0.2 Region of
positive density
0.0
x*, 0
0.0 0.2 0.4 0.6 0.8 1.0

X
Therefore,

=> This joint PDF is good


What is Pr (0 ≤ X, Y ≤ 1, X+Y ≤ 0.5)?

What is the marginal PDF of X?


i) when 0 ≤ X ≤ 1, ii) otherwise,

f(x) = 0

11
Review of last lecture

• Joint PDF and CDF

• Marginal PDF and CDF


Statistical independence between X and Y in
multiple RV questions
• Review of SI in Ch 2:
SI <=> Pr(A) x Pr(B) = Pr(AB)

• In this case:
fX, Y(x, y) = fX (x) * fY(y)

For example:
fX (x) = 0.1, fY (y) = 0.1 and fX, Y(x, y) = 0.01
Ex4.5
Is X and Y in Ex4.1 statistical independent?

fX( x = 8) = 0.036 + 0.216 + 0.18 = 0.43


fY( y = 90) = 0.072 + 0.18 + 0.079 + 0.014 = 0.35

=> Since 0.18 not equal to 0.43 x 0.35, X and Y are


dependent
3
Ex4.6
Given the life of two bulldozers being SI and each following
the exponential distribution, what is the joint PDF and
what does this PDF mean?

=> fX, Y(x, y) = fX (x) * fY(y) = λX(e- λ x) * λY(e- λ y)


X Y

=> The probability that X bulldozer fails at time x and Y


bulldozer fails at time y

What if λX = 1/1000 and λY = 1/1200 (expected life time =


1000 and 1200 hours), both fails within 1500 hours?

=> ∫ λX(e- λ x) * ∫ λY(e- λ y) = 0.78 x 0.71 = 0.55


X Y
“E”
• Review: E(FRIDAY) = ∫ friday * PDF

Ex4.7
What is E(XY) in Ex4.4 1.0

0.8

0.6

Y
0.4

0.2

0.0
0.0 0.2 0.4 0.6 0.8 1.0

=> E(XY) = ∫∫ xy*f(x,y) dxdy


X

1 1 x

=  
x 0 y 0
xy*24xy dy dx = 2/15
Covariance
• Cov(X,Y) = E [ (X-μX)*(Y-μY) ] ***
= ∫∫ (X-μX)*(Y-μY) * f(x,y) dxdy

!! Cov(X,Y) is unit-dependent

• Correlation ρ = Cov(X,Y) / (σX* σY) ***


!! ρ is unit-independent, from -1 to 1

• When ρXY = 0, X and Y are independent


Function of RV
• Y = f(X) (for example, y = 3x + 5), you know X
follows N~(0, 1) and you want to find the PDF of
Y, which is a problem relating to the function of
RV
0.4

X
0.3
PDF

0.2

Y
0.1

0.0

-10 -5 0 5 10 15 20
Ex4.8
Given X ~ U(0, 20) and Y = 3X

i) Pr(Y < 0)?


0.05
X
=0 0.04

PDF
ii) Pr(Y < 100)? 0.03

=1 0.02 Y
iii) Pr(Y ≤ 30)? 0.01

= 0.5
0.00
0 20 40 60

=> Pr(Y ≤ 30) = Pr(X ≤ 10) *****


Derivations (1): with those basics about CDF *****

• FY(Y)
= Pr(Y ≤ y) (You can imagine
Y = 3X so that X = Y/3
= Pr(X ≤ x) and use x = 10 and y = 30)

= Pr(X ≤ f-1(y))
= FX(f-1(y))
Ex4.9
Given X ~ U(10, 100) and Y = 1/X and X = 1/Y

0.12

i) What is the range of Y 0.10

PDF of X
=> 0.01 to 0.1 0.08

0.06

ii) Pr(Y < 1)? 0.04

=1 0.02

0.00

iii) Pr(Y ≤ 0.02)? 10 20 30 40 50 60 70 80 90 100

= FY(0.02) = FX(f-1(0.02)) = FX(50) = 4/9


=> But it seems like Pr(Y ≤ 0.02) = Pr(X > 50)
so that the final answer should be 5/9
Derivations (2): when Y = g(X) is a decreasing
function

• FY(Y)
= Pr(Y ≤ y)
(You can imagine
= Pr(X > x) Y = 1/X so that X = 1/Y
and use x = 50 and y = 0.02)
= 1 - Pr(X ≤ x)
= 1 - Pr(X ≤ f-1(y))
= 1 - FX(f-1(y))

• Applying this equation to Ex4.9:


Pr(Y ≤ 0.02) = 1 - FX(1 / 0.02) = 1 - FX(50) = 5/9
Review of last lecture

• SI => fX, Y(x, y) = fX (x) * fY(y)

• Function of RV
=> given PDF of X, and y = g(x), find PDF of Y
=> mapping
Ex4.10: X ~ Uniform(0, Π); Y = sinX
What is the range of Y
=> 0 to 1
What is Pr (Y ≤ 1)
=> 1
What is Pr(Y ≤ 0.5)
=> mapping
Pr(Y ≤ 0.5) = Pr (X ≤ Π/6) + Pr (X > 5Π/6)
Pr (X ≤ Π/6) = Π/6 * 1/ Π = 1/6
Pr (X > 5Π/6) = 1 - Pr (X ≤ 5Π/6) = 1/6
Pr(Y ≤ 0.5) = 1/6 + 1/6 = 1/3
Linear combination of RVs
• Given PDFs of X and Y, and Z = aX + bY for
example, what is PDF of Z

• E(Z) = E(aX+bY) = aE(X) + bE(Y) = aμX + bμY

• V(Z) = V(aX+bY) = a2V(X) + b2V(Y) + ab*Cov(X,Y)


when X and Y are SI:
V(Z) = V(aX+bY) = a2V(X) + b2V(Y) = a2σX2 + b2σY2

• Special case: when X and Y are Gaussian RV, so is


Z.
Ex4.11
Given the midterm X and final Y of CIVL2160 last year are
X~(70, 12) and Y~(65, 8). If you got 70 in both exams,
what is your percentile? Given final is weighted twice as
much as midterm.

=> Let Z = X + 2Y; Find Pr(Z < 210)


=> μZ = 70 + 2*65 = 200
σZ2 = 122 + 22*82 = 400 => σZ = 20

Since Z ~ N as well,
Pr(Z < 210) = Ф ((210 - 200) / 20) = 0.69
Extended special case:
=> Given Z = X/Y and both X and Y follow the
lognormal distribution, and Z follows the
lognormal distribution as well

=> lnZ = ln(X/Y) = ln(X) – ln(Y)


since ln(X) and ln(Y) follow normal, so ln(Z)
follows normal, or Z follows lognormal

=> Similar for Z = XY


EX4.12: engineering management. The annual
expense (C) of your company is governed by:
C = WF / E0.5
Given W, F and E following lognormal with their
median and COV (= SD / mean) as follows:

find Pr (C > 35000)?

=> I need to know the distribution of C and some


of its parameters (e.g., mean and SD)
C = WF / E0.5 => lnC = lnW + lnF – 0.5lnE
Because W, F and E ~ lognormal, C ~ lognormal
or lnC ~ normal:

Pr (C > 35000)
= Pr (lnC > ln35000)
= 1 – Pr (lnC ≤ ln35000)
= 1 - Ф ((ln35000 – 10.36) /0.26)
= 1 - Ф (0.39) = 0.35
Special case II: Poisson distribution
=> Z = X + Y and both X and Y follows the Poisson
distribution, so does Z

=> Mean rate of Z = mean rate of X + mean rate of Y

=> More proof will be given in the next chapter:


computer-aided probabilistic analysis
Ex4.13
Given minibus 91 to “Rainbow” is coming to HKUST
once in 10 minutes, and “big bus” M91 is once in 30
minutes, and given the number of trips for both is a
Poisson RV, what is the probability that you wait 20
minutes without a bus coming taking you to
“Rainbow.”

=> Z = X + Y, and find Pr (Z = 0)


In 20 minutes, mean rate of X = 2, and mean rate Y
= 2/3, so mean rate of Z = 8/3
=> In Excel,
Pr (Z = 0) = Poisson (0, 8/3, false) = 0.07
Review of last lecture

• Three special cases for Z = g(X,Y)


– When X and Y follow the normal, Z follows the normal
as it is a linear combination of X and Y
– When X and Y follow the lognormal, Z follows the
lognormal as it is a product of X and Y (i.e., XY, XY2,
Y/X)
– When X and Y follow the Poisson distribution, Z
follows the Poisson as it is the summation of X and Y
What if Z = f(X,Y) is not one of the special cases and
we want to find PDF of Z?

Ex4.14: Given that X and Y are SI and Z = Y – X,


what is CDF of Z?

 ke kx ; x  0 he  hy ; y  0
f ( x)   f ( y)  
0 ; otherwise 0 ; otherwise

=> FZ(z) = Pr(Z ≤ z) = Pr(Y – X ≤ z)


i) When z < 0

Y-X=z
Therefore,
FZ(z) = Pr(Z ≤ z) = Pr(Y – X ≤ z) A
= ∫∫ f(x,y) dxdy -z

 zx z

 
x   z y 0

Also f(x,y) = khe-kxe-hy ; after doing some math


 z x
h kz
 
 kx  hy
FZ ( z )  khe e dydx  e
x   z y 0
hk
ii) When z > 0 Y-X=z

Therefore,
FZ(z) = Pr(Z ≤ z) = Pr(Y – X ≤ z)
z A
= ∫∫ f(x,y) dxdy
 zx

 
x 0 y 0
 zx
h hz
 
 kx  hy
As a result, FZ ( z )  khe e dydx  1  e
x 0 y 0
hk
Ex4.15: Following Ex4.14, what is the CDF of Z,
given Z = X + Y and Z = Y/X?

 ke
 kx
; x0 he  hy ; y  0
f ( x)   f ( y)  
0 ; otherwise 0 ; otherwise

X+Y=z
i) Z < 0
=> F(z) = 0
ii) Z ≥ 0
FZ(z) = Pr (Z ≤ z)
A
= Pr (X + Y ≤ z)
= ∫∫ f(x,y) dxdy z zx

 
x 0 y 0
With some tedious calculation:

zx
 
z
1
FZ ( z )   
x 0 y 0
khekx e hy dydx  1 
k h
kehz  he kz

**********************************************************
Y - zX= 0
As for Z = Y/X,
FZ(z) = Pr(Y/X ≤ z)
= Pr(Y – zX ≤ 0) A
FZ(z) = ∫∫ f(x,y) dxdy
 zx
hz
 
 kx  hy
FZ ( z )  khe e dydx 
x 0 y 0
k  hz
Tips of finding FZ(z) given Z = g(X,Y)

• FZ(z) = Pr(Z ≤ z) = Pr(g(X,Y) ≤ z)


• Plot the “area”
• Find the “integrating range of X and Y”
• Do the math
Summary of Ch4
• Joint PDF and CDF
• Marginal PDF from joint PDF
• Function of RV in Y = g(X):
– wanting to know PDF of Y, given PDF of X, and Y =
g(X)
– mapping
• Function of RV in Z = g(X,Y):
– wanting to know PDF of Z, given PDF of X and Y,
and Z = g(X,Y)
– Check if it is a special case
– FZ(z) = Pr(Z ≤ z) = Pr(g(X,Y) ≤ z)
(see more in the previous slice)
Ch5 Computer-aided probability analysis
• What is the probability that you get “2” when tossing
a fair dice?

• What if you do not know the theoretical solution and


want to find it from an alternative manner?

• How about keeping tossing the dice, say, 10,000


times and see how many of trials getting “2.”

• How about computer does the 10,000-tossing for


you
• What is the probability that you toss a dice 5
times in a row and the total point is equal to15?

• What is the theoretical answer?

• What is the simulated answer from computers?


Review of last lecture

• Find F(Z) given Z = g(X,Y) but the three special


cases (normal, lognormal and Poisson) are not
applicable
– FZ(z) = Pr(Z ≤ z) = Pr(g(X,Y) ≤ z)
– Plot the “area”
– Find the “integrating range of X and Y”
– Do the math
Recap Ex4.14:
find FZ(z) when z > 0, given Z = Y - X

From definition: Y-X=z

FZ(z) A?
= Pr(Z ≤ z)
= Pr(Y – X ≤ z) A?
= ∫∫ f(x,y) dxdy
 zx

 
x 0 y 0
Computer-aided analysis:
• Repeating a number of “tossing-a-dice”
experiments in computers

1) Probability of getting Point 2 when tossing a


dice

2) Probability of getting total point = 15 when


tossing a dice five times in a row

!! See the uploaded Excel for details


• What do we need in a computer-aided
probability analysis?
=> random number generator; in Excel, it is
RAND or RANDBETWEEN

• Another more important role of computer-aided


analyses is to solve the problem without
analytical solution, for instance:

Find Pr( A > 60), given A = B + C + D, and B~U(5,


25), C~N(15,6), and D~logN with the mean and
SD of lnD = 3 and 0.3 (see Excel file for details)
• The procedure that we used for solving the three
problems is exactly referred to as Monte Carlo
Simulation (MCS).

• The essential of MCS starts with random number


generations, and then the probability is calculated
as follows:

No. trial the event occurring *****


Pr(event ) 
No. total trial
Important notes about MCS
• It is not surprising that MCS answers are
different from time to time, although they are
very close to each other

• Sample size matters; the higher the size, the


more reliable the answer is (just use size = 1
and you can sense the difference)

• As a rule of thumb, the size of 50,000 is enough


for most problems
Review of last lecture

• What is the key to MCS or computer-aided


probabilistic analysis

=> Random number generator


Ex5.1
Z = 2XY, given X ~ U(1, 10) and Y ~ U (5,
12), plot the CDF of Z
Interesting applications of MCS
• How to find out the size of the area?

20

18

16

14

12

10

4 6 8 10 12 14 16 18 20 22 24
26
24
22
20
18
16
14
12
10
8
6
4
2
0
-2
-2 0 2 4 6 8 10 12 14 16 18 20 22 24 26


total point point inside
• Can you calculate the size of Hong Kong
by MCS?
Summary of computer-aided probabilistic
analysis or MCS

• Random number generators

• Useful for the problems whose analytical


solutions are hard to find or non existence

• Sample size matters


Ch6: Estimation
• From this point on, we are going to learn
“statistics.” Before that, what we have learnt is
about “probability.”

• For example, what is the mean value and


standard deviation of the size of the
cockroaches in HKUST?

• We randomly catch a few of them and measure


their size.
Sample and population
• What is the average size of the
population?
=> Constant
=> You can not have it

• What is the average size of the samples?


=>Random variable
=>You can have it
Review of last lecture

• Sample and population

• Population mean (same as other statistics) is a


constant but unknown

• In contrast, sample mean is a RV but accessible.

• Estimation is to use accessible values to best


estimate the known.
Estimator
• A equation to calculate sample statistics

x1    xn x1    xn
X X X  x1
n n2

• By definition, all thee equations are


estimators in terms of sample mean or
population mean

• Which one is better?


Biased and unbiased Estimator
• Unbiased estimators are better

• Take mean value as example:


when
E(Estimator) = μ
this estimator is an unbiased one
Ex6.1
Given the population mean of X equal to μ,
are the two estimators in the following
unbiased?

x1    xn x1    xn
X X
n n2

=> E(x1) = μ, E(x2) = μ,…. *****


As a result,

 x    xn  1
E 1   E ( x1 )    E ( xn ) 
 n  n

   unbiased
 E ( x1 )    E ( xn )  
1

n n

 x    xn 
E 1  
1
E ( x1 )    E ( xn ) 
 n2  n2

   biased

1
E ( x1 )    E ( xn )   
n2 n2
• Given the two estimators both unbiased,
we compare their consistency and
efficiency,…

• Consistency:
lim Pr(| ˆ   |  )  0 ε: very small value
n

• Efficiency:
V(Estimator)
Central Limit Theorem
No matter what distribution X is following, when n
> 30, sample mean of X:

X ~ N ( , )
n
• Make-up midterm will be held on April 13, for
those who want to improve their midterm grades
(40%)

• 15 similar questions will be given in the make-up


exam.

• But the rule is that you get “150” in the make-up


exam, your midterm score is “70,” if it higher
than the original score.

• Contact me or TAs by the end of this week if you


want to take the make-up exam
Review of last lecture

• Unbiased estimator

x1    xn
• Ex: X  ,
n
is a unbiased estimator for calculating sample
mean, used as an estimation for population
mean
Estimator for standard deviation

S 
2 1
 X i  X 2

n 1

• How to prove it is an unbiased estimator?


=> find E(S2)
=> after some calculation (will be proved in
tutorial), E(S2) = σ2 (σ is population SD),
so this estimator is an unbiased estimator

• The estimators introduced so far are the so-


called “Method of Moment.”
Maximum Likelihood Estimation (MLE)
• In short, “to find an estimate that can maximize
the probability of observation.”

Ex6.2:
You bought 10 helmets and found 3 of them are
flawed, what is your estimate for the flawed
rate?
(By Method of Moment: 3/10)
=> Maximum the probability of observed samples
Pr(3 bad out of 10) = C x p3 x (1-p)7
(p denotes the flawed rate)
Ex6.2:
You bought 10 helmets and found 3 of them are flawed, what is your estimate for the
flawed rate?

=> Find the value maximizing the following equation:

Pr(3 bad out of 10) = C x p3 x (1-p)7


=> ln{p3 x (1-p)7} = 3ln(p) + 7ln(1-p)

Let y = 3ln(p) + 7ln(1-p)

dy 3 7 3
=>   0 ; p
dp p 1  p 10
Ex6.3:
x1,…,xn are samples from a normal distribution,
what are the estimates for the population’s mean
and SD from MLE

=> Maximum the probability of observed samples


Pr = y = f(x1, μ, σ) * f(x2, μ, σ) * … * f(xn, μ, σ)

1 1 1
e ( x1   ) / 2 2
e  ( x2   ) / 2 2
e  ( xn   ) / 2 2
2
   
2 2

2 2 2 2 2 2

n/2
 1    ( xi   ) 2 / 2 2
 2 
e
 2 
Ex6.3:
x1,…,xn are samples from a normal distribution, what is the estimates for the population’s
mean and SD

n 1
=> ln( y )   ln( 2 )  2 ( xi   ) 2
2

2 2

dy
=> find    0  2( xi   )  0
d

 xi  n  0

xi
  
n
Ex6.3:
x1,…,xn are samples from a normal distribution, what is the estimates for the population’s
mean and SD

=> The same as for SD

dy
=> find   d  0

 x  X
2

  
2 i

=> MLE estimator is biased for this case


Summary of Maximum Likelihood Estimation
• Maximizes the probability of observed samples

• Find the probability, or the so-called maximum


likelihood function y

• Usually take logarithm on y

• Do some calculus
Review of last lecture

• MLE: maximum likelihood estimation


=> find the estimate that can maximize the
probability for what has happened (what you see
in the samples)

=> write the probability (maximum likelihood


function) and do some calculus
Interval estimation

Sample mean = 1.8

Sample mean = 1.5

• What is Pr(population mean = sample mean)?

=> should be very very small, say 0.01%


• How about this probability?
Pr(1 < population mean < 10)
=> it should be larger, say 95%

• And how about Pr(0 < population mean < ∞)?


=> 100%

• As a result, interval estimation is to find out the


three numbers (a, b, c) in the following equation:
Pr (a < Y < b) = c *****
=> all you need is Y’s mean, SD, distribution
PDF
0.4
Confidence interval
=> 1 - α
0.3

0.2

0.1 α/2 c α/2

0.0
Y
a b
Ex6.4
JP caught 100 roaches and found sample mean = 40
mm, what is the interval estimation for population
mean given confidence interval = 95%. Given
population SD = 4mm.

  
=> Based on CLT => X ~ N   , 
 n
X 
Let Z = , PDF 0.4
Z ~ N(0 ,1)
/ n
0.3

Z ~ N(0, 1) 0.2

95%
0.1

2.5% 2.5%
0.0
Ex6.4
JP caught 100 roaches and found sample mean = 40 mm, what is the interval
estimation for population mean given confidence interval = 95%. Given
population SD = 4mm.
PDF

0.4
Z ~ N(0 ,1)

0.3

0.2

95%
0.1
X 
2.5% 2.5% * because Z =
/ n
0.0

-1.96 1.96 -1.96 = (40 - μ) / (4/10)

a = 39.2* b = 40.8 => μ = 39.2


• But usually population SD is unknown

• In that case, we use sample SD to replace population


SD in Ex6.4

X 
Z
S/ n

• But the equation above is more applicable for large


sample size

• When it is small, say n < 30, we use


X 
t ; t follows t-distribution
S/ n
• You can find t-table
different from one
textbook to another

• Always check what


kind of probability
used in the table
before looking up the
number

• How to find the


number
corresponding to
right-tail probability =
0.95 using this table?

• t-distribution also
symmetrical
Ex6.5
JP caught 4 roaches and found sample mean = 40 mm and
sample SD = 4mm, what is the interval estimation for
population mean given confidence interval = 95%.

X 
PDF
=> t  0.4
t-distribution with DOF = 3
S/ n
0.3

0.2

95%
0.1

2.5% 2.5%
0.0
*
3.18 = (40 - μ) / (4/2) -3.18 3.18

=> μ = 46.4 a = 33.6 b = 46.4*


• Be careful when using Excel’s TINV function. It
requires two-tail probability but not right-tail
probability

=> TINV(two-tail pb., degree of freedom)

• But you can modify it easily to create a new function


in Excel as:
=> Mytinv (right-tail, degree of freedom)

Public Function mytinv(right_tail, dof)

mytinv = Application.WorksheetFunction.TInv(right_tail * 2, dof)

End Function
Review of last lecture
• Interval estimation
Interval estimation for population SD

(n  1) S 2
 
2
******
 2

•  2
, chi-square, is a random variable following
chi-square distribution

In Excel:
CHIDIST
CHIINV
Chi-square table
Ex6.6
JP caught 21 roaches and found sample SD = 1 mm,
what is the interval estimation for population SD
given confidence interval = 95%.

=> Thinking of chi-square 2


 with DOF = 20

PDF
0.4

(n  1) S 2
 
2

2
0.3

0.2

=> 0.58 < SD2 < 2.08 95%


0.1

2.5% 2.5%

=> 0.76 < SD < 1.44 0.0

Chi2 => 9.6 34.2

SD2 => 2.08* 0.58

* 2.08 = 20x1/9.6
Interval estimation for variance ratio
between two populations

S12 /  12
F 2 2 ******
S2 /  2

• F is a random variable following F-distribution

In Excel:
FDIST
FINV
Ex6.7
Find F-value with DOF1 = 10 and DOF2 = 5 with
right-tail probability = 5%
=> 4.74

Ex6.8
Find F-value with DOF1 = 10 and DOF2 = 5 with
left-tail probability = 5%

1
=> Fleft tail (v1 , v2 )  = 1 / 3.33 = 0.3
Frighttail (v2 , v1 )
Ex6.9
Jerry caught 21 cockroaches in Hall I and Mike got 16
in Hall II in the HKUST. The sample variance are
1.85 and 1.65, respectively. Find the ratio of two
populations’ variance given confidence interval =
90%
F-dist with DOF1=20 and DOF2=15

PDF
=> Thinking of 0.4

S12 /  12 0.3

F 2 2
S2 /  2 0.2

90%
0.1

5% 5%
0.0

F => 0.45 2.33


SD12/SD22=> 2.45* 0.48
*2.45 = (1.85/1.65)/0.45
Summary of performing an interval estimation
• Z, t, Chi2, or F

• Find the “RIGHT” and “LEFT” value of Z, t, Chi2, or F

PDF 0.4

0.3

0.2

CI
0.1

(1-CI)/2
(1-CI)/2
0.0

LEFT RIGHT

• Do some math finding the corresponding values


from “LEFT” and “RIGHT” values
Summary of Ch6 Estimation
• Why?
=> because population statistics (i.e., a constant) is
unknown, so we use sample’s statistics (i.e., a random
variable) to estimate it

• Biased and unbiased estimator

• Maximum likelihood estimation (MLE)

• Interval estimation

• Three new distributions: t, Chi2, and F


Review of last lecture
• Interval estimation
for SD

(n  1) S 2
 
2

 2

• Interval estimation for SD’s ratio

S /2 2
F 1 1
S /
2
2 2
2
Ch7 Hypothesis testing
• It is like placing a bet on unknowns

• Unknowns => population’s statistics

• Who is the judge?

• Samples are the judge


Null hypothesis H0 and alternative hypothesis H1

• For example, a null hypothesis: the population mean


of cockroaches in HKUST is 12 mm
H0: μ = 12

• Three alternative hypotheses are:


H0: μ ≠12 (two-tail test)
H0: μ >12 (right-tail test)
H0: μ <12 (left-tail test)
Rejection region
• Take two-tail test for example:
H0: μ = 12 and H1: μ ≠ 12

0.4
Z ~ N(0, 1)
PDF

Pr (Rejection region)
= level of significance = α
0.3

0.2

0.1
Rejection Acceptance Rejection
region region region
0.0

α = 10%, -1.64 1.64


Decision rule
• Testing statistics: Z, t, Chi2 or F

• For population mean, think of

Z ~ N(0, 1)
X 
0.4

PDF
Z
S/ n 0.3

• When Z in the 0.2

rejection region,
0.1 Rejection Acceptance Rejection
reject H0, accept H1 region region region

0.0
Type I and Type II errors in hypothesis
testing
• It is a mistake that a judge sends an innocent
person to jail
• It is also a mistake that a judge does not sends a
guilty man to jail
• As a result, hypothesis testing is associated with
a given error, like interval estimation associated
with a confidence interval
• Type I error: reject H0 given H0 is true
• Type II error: accept H0 given H0 is false
Ex7.1
The population mean is 168 cm in HKUST.
Should this hypothesis be accepted given α =
5% and with10 random samples

i) H0: μ = 168
0.4
t-dist with DOF = 9
ii) H1: μ ≠168 (two-tail)

PDF
0.3

iii)Statistics:
0.2

X 
t 0.1

S/ n
0.0

iv)Rejection region -2.26 2.26


Ex7.2
The population SD is 3 cm in HKUST. Should this
hypothesis be accepted given α = 5% and
with10 random samples

i) H0: σ = 3 2
0.4  with DOF = 9
ii) H1: σ ≠ 3 (two-tail)

PDF
0.3

iii)Statistics:
0.2

(n  1) S 2
 
2


0.1
2

0.0

iv)Rejection region 2.7 19


Notes on level of significance α
Pr (Rejection region)
0.4
= level of significance = α
PDF
0.3

0.2

0.1

0.0

• The larger the α, the larger the rejection zone, the, the
more easily H0 got rejected
• Given H0 not rejected in 1% α and in 10% α, the latter is
“stronger.” That should be the reason α is referred to as
“level of significance.”
Review of last lecture
• Set H0 and H1
• Use a proper statistics (Z, t, F, Chi2…) and calculate it
• Find rejection region
• As the statistics is inside the rejection zone, reject H0

0.4
PDF

Pr (Rejection region)
= level of significance = α
0.3

0.2

0.1 Acceptance
Rejection Rejection
region
region region
0.0
Ex7.3
The steel company claims that its steel can sustain
more than 2650 pounds. Randomly select 6
samples and their strengths are:
2680, 2780, 2450, 2620, 2480, 2500
Should we trust the statement of the company given
level of sig. = 5%?

0.4
t-dist with DOF = 5
=> H0: μ = 2650 PDF

H1: μ ≠ 2650
0.3

H1: μ > 2650 0.2

H1: μ < 2650 0.1

5%
0.0

=> Small size => t 2.01


Ex7.3
The steel company claims that its steel can sustain more than 2650 pounds. Randomly select 6 samples and their
strengths are: 2680, 2780, 2450, 2620, 2480, 2500
Should we trust the statement of the company given level of sig. = 5%?

=> Sample SD = 130.2


sample mean = 2585
t-dist with DOF = 5
X 
0.4

PDF
Since t 
S/ n 0.3

2585  2650 0.2

t  1.22
130.2 / 6
0.1

0.0
=> Accept H0 2.01
=> Reject the company’s statement
• In Ex7.3, why not set
H0: μ > 2650
H1: μ < 2650
In this case, when H0 is accepted, the company is
right?

=> the statistics calculated becomes a range; in this


example:

X 
t and μ > 2650, so that => t < -1.22
S/ n
• In Ex7.3, why not set
H0: μ > 2650
H1: μ < 2650
In this case, when H0 is accepted, the company is
right?

0.4
t-dist with DOF = 5
PDF

t < -1.22 is inside and outside


0.3 the rejection region

0.2 This is a shortcoming of this


kind of H0 in some cases
0.1

0.0

-2.01
How to set a solid H0 and H1

• Always set H0 as “=”

• From three options, select one that is the most


relevant to the statement you want to test

• For example, the statement claims it is less than a


given value, use the left-tail test => H1: A < a
Ex7.4
The car company claims “MPG” > 12 km/liter. The
government selected 49 cars of the company and
found sample mean and SD = 13.5 and 4. Is the
company is right given level of sig. = 5%

=> H0: μ = 12 Z(0, 1)


0.4
H1: μ ≠ 12

PDF
H1: μ > 12 0.3

H1: μ < 12
0.2

=> Large size => Z 0.1

X  5%
Z  2.6 0.0
S/ n
1.64
=> Reject H0; the company is right, given 5% level of sig.
P-value
• When level of significance is not given, we calculate
p-value for decision making

• take Ex7.4 for example, Z-value is equal to 2.6, so


that p-value = Pr(Z > 2.6) = 0.004

Z(0, 1)
0.4
=> when p-value is small,
PDF

0.3 reject H0
0.2

0.1
0.4% = p-value = Pr(A > calculation)

0.0

2.6
Ex7.5
Below is the final scores of six male and females
students in CIVL2160. Is any difference in the
performance between boys and girls? Given level of
sig. = 5%

=> Let d = boy’s score – girl’s score


Ex7.5
Below is the final scores of six male and females students in CIVL2160. Is any difference in the performance between
boys and girls? Given level of sig. = 5%

=> H0: μd = 0 0.4


t-dist with DOF = 5

PDF
H1: μd ≠ 0
H1: μd > 0
0.3

H1: μd < 0 0.2

0.1

=> Small size => t


0.0

X  d -2.57 2.57
t  0.4
Sd / n => accept H0 => no difference in
the performance
Ex7.6
Selecting 21 samples for each production line:

Is Line 2 more stable than Line 1? Given level of sig.=1%


=> H0: Var1 = Var2
H1: Var1 ≠ Var2
H1: Var1 > Var2
H1: Var1 < Var2
S12 / S 22 S12 / S 22 1.02
=> F 2 2 F   1.5
1 /  2 1 0.69
Ex7.6
Selecting 21 samples for each production line:
Is Line 2 more stable than Line 1? Given level of sig.=1%

S12 / S 22 S12 / S 22 1.02


F 2 2 F   1.5
1 /  2 1 0.69

0.4
f-dist with DOF1 = 20 and DOF2 = 20
PDF

0.3 => Accept H0


=> Line 2 is not more
0.2 stable than Line 1

0.1

1%
0.0

2.94
Review of last lecture

• P-value
right-tail test left-tail test
0.4
PDF

0.4

PDF
0.3
P-value 0.3

P-value
= Pr (A > a*)
0.2
0.2 = Pr (A < a*)
0.1
0.1

0.0
0.0

• P-value is small (say < 1%), reject H0


Summary of Hypothesis testing
• Type I and Type II error
• Level of significance
• P-value

Steps of hypothesis testing


• Set H0 and H1 (list all H1 and choose the one that is
more fair to both)
• Calculate Z, t, Chi2 or F from samples
• Determine rejection zone from H1
• If Z, t, Chi2 or F in rejection zone, reject H0
Ch8 Regression Analysis
• Has anyone done something like this in Excel?
10

9
y = 0.9179x + 0.7658
8 R²= 0.9711
7

5 Series1
Linear (Series1)
4

0
0 2 4 6 8 10

• It is the so-called regression analysis, finding the


relationship between X and Y
• In this chapter, we are going to learn the
fundamentals of regression analysis, including

– Model basics: y = ax + b + ε
– How to solve model parameters a, b, and ε
– What is r2 and how to determine it
– ……
Model basics:
=> Y = aX + b + ε *****
a: slope of the regression line
b: intercept of regression line
ε: model error; it follows N ~ (0, σε)
8

7
Y = aX + b + 
6

5
Y

1 2 3 4 5 6

X
Ex8.1
Given a regression model as Y = 3X + 4; σε = 5. What
is the mean and SD of Y given X = 1?

=> Y = 3X + 4 + ε
E(Y) = E(3X + 4 + ε)
because X = 1 => E(Y) = E(3 + 4 + ε) = 7 + E(ε)
also because E(ε) = 0 => E(Y) = 7

=> For SD of Y
Var(Y) = Var(3X + 4 + ε) = Var(7 + ε) = Var(ε)
therefore, σY = σε = 5
Ex8.2
Following Ex8.1, what is Pr(Y > 12) given X = 1 with
the regression model Y = 3X + 4; σε = 5

=> Three elements: mean, SD, distribution


because Y = 3X + 4 + ε
ε follows Normal, Y follows Normal as well

=> Pr(Y > 12) = 1 – Pr(Y ≤ 12)


= 1 – Ф{(12 - 7) / 5}
= 1 - Ф{1} = 0.16
But why Y is a random variable?
8

7
Y = aX + b + 
6

5
Y

1 2 3 4 5 6

X
Finding out model parameters: a and b

=> Least square ******


7
(Xi, aXi + b)
6

di
5
(Xi, Yi)
4
Y

3
Total distance or difference:
n
f (a, b)    yi  (axi  b)  *****
2 2

i 1
1
1 2 3 4 5 6

X
n
f (a, b)    yi  (axi  b) 
2
7
(Xi, aXi + b)
i 1
6

di
f (a, b)
5
(Xi, Yi)
0 4

Y
a 3

f (a, b)
2

0 1

b
1 2 3 4 5 6

Two equations two unknowns, a and b can be solved

a
 x  x y  y 
i i

 x  x 
2
i

b  y  ax (detail derivations given in tutorials)


• What is now my regression model is Y = aX, and
how to solve “a”

=> Least square

7 (Xi, aXi) Total distance or difference:


6
n
f (a)    yi  (axi ) 
5
di 2

4 (Xi, Yi) i 1
Y

f (a)
2 Let 0
1
a
0 then you can find “a”
0 1 2 3 4 5 6

X
population regression sample regression
8 8

7
Y = apX + bp + p 7

6 6
Y = asX + bs + s
5 5
Y

4 4

Y
3 3

2 2

1 1

1 2 3 4 5 6 1 2 3 4 5 6

X X
• ap, bp, and εp are constants but you never know
• You can have as, bs, and εs but they are R.V.
• as, bs, and εs are point estimates for ap, bp, and εp
Review of last lecture

• Simple linear regression analysis: Y = aX + b + ε

• ε ~ N (0, σε)

• Least square

a
 x  x y  y 
i i
b  y  ax
 x  x 
2
i
Ex8.3
Data below shows the study time and score of the an
exam, find the relationship between the two
variables with simple linear regression analysis.

a
 x  x y  y 
i i
b  y  ax
 x  x 
2
i

after some computation:


=> a = 4.82 and b = 26.75
=> computer-aided analysis, see the
accompanying Excel file
=> Functions used: “Average” and “Sum”
or “Slope” and “Intercept”
Model error and standard deviation of ε

If the model has no error: Otherwise:


14 7 (Xi, aXi)
12
sample pointes
6
regression model
10
5
di
8
Y 4 (Xi, Yi)
6

Y
3
4

2
2

0 1

-2 0
-2 0 2 4 6 8 10 12 14
0 1 2 3 4 5 6
X X

=> Model error or sum square error (SSE) and SD2 of ε is


defined as:
n
SSE
SSE    yi  (axi  b)   
2 2

i 1 n2
Ex8.4
Following Ex8.3, what is the SD of ε
n
SSE
SSE    yi  (axi  b)   
2 2

i 1 n2

=> SD of ε = 13.66

=> Functions used: “Count” and “Sum”

or “STEYX” or “FORECAST”
Model explain-ability or R2
n
SSE    yi  (axi  b) 
7 (Xi, aXi) 2
6

di i 1
5

(Xi, Yi)
4
SSE
R  1
2
Y

2 SST
1

0
0 1 2 3 4 5 6 SST   ( yi  y ) 2
X

• SSE is the variation or randomness that the model can


not explain!!

• Model can explain = 1 – model cannot explain

• SST = total sum of error, used for normalization


Ex8.5
Following Ex8.3, what is R2 of the model

SSE
R  1 SST   ( yi  y ) 2
2

SST

=> R2 = 0.53

=> Functions used: “Sum”

or “Rsq”
Point-estimate and interval estimate in regression
analysis

• In the previous example, what is the point estimate


of “a” for the population regression line?
=> 4.82, from the sample regression line

• What if I want to perform the interval estimate?


=> let me know the mean, SD, and probability
distribution of “a”
PDF
0.4

0.3

0.2

0.1

0.0
Y
Left_t right_t

• “a” follows t-distribution with (n-2) DOF, and its


mean is equal to the point estimate, and its SD is
equal to:
 upper bound  mean
a  right _ t 
 i
( x  x ) 2
sd
Ex8.6
Following Ex8.3, what is the interval estimate of slope
given 95% CI


a 
 i
( x  x ) 2

=> Pr(4.77 < a < 4.88) = 95%

=> Functions used: “Sum” “Tinv”


Review of last lecture

• R2 =>
variability model can explain
= 1 – model can not explain

• Spreadsheet skills to manage regression


calculations
Correlation coefficient between X and Y

Let S xy   xi  x  yi  y 
• Sxy normalized by square root of Sxx and Syy to make it
unit-independent

S xy
r
S xx  S yy

where r is
correlation coefficient
in regression analysis

S xx   xi  x xi  x 

S yy    yi  y  yi  y 
Some notes on r and R2

• Correlation coefficient r
= (coefficient of determination R2)0.5

• 0 ≤ R2 ≤ 1
(R2 =1 => model fits to samples perfectly)

• -1 ≤ r ≤ 1
(r =1 or -1 => model fits to samples perfectly)
Hypothesis testing on population correlation ρ

1. Set hypothesis H0 and H1


2. Calculate statistics (Z, t,…) from samples
3. Find rejection region based on H1
4. Make decision => in rejection region, then reject H0

• A formula you need to know for performing such a


hypothesis test on ρ is:
Model checking
• The presumption of regression models is model
error ε following the normal distribution with zero
mean and SD = 1
• Given n samples available, n errors can be plotted
yi  yi _ model
ei  yi  yi _ model ; ei * 
1 ( xi  x ) 2
s 1 
n S xx
• These two cases also have mean = 0 and SD = 1

• But errors in samples are not random, indicating the


possibility of a better relationship between X and Y
than the existing relationship is available
Ex8.7
Plot the error’s distribution with a simple linear
regression model
=> using “Forecast” 40

30

20

Y - Forecast
250

200
10

150
0

100
Y

-10
50

0 -20

0 2 4 6 8 10 12 14 16
0 2 4 6 8 10 12 14 16
X
X
• For Ex.8.7, what if one uses the square root of Y, to
perform linear regression on X and Y0.5

250 16

14
200

R2 = 0.94 12 R2 = 1.0
150 10

0.5
Y

100

Y
6

50 4

2
0
0

0 2 4 6 8 10 12 14 16 0 2 4 6 8 10 12 14 16

X X

• A non-linear relationship can be fit linearly by variable


transformation, the so-called “intrinsic linear”
• Everything is the same whether it is a simple linear
or intrinsic linear
=> least square to find “slope” and “intercept”
=> R2 = 1 – SSE/SST
=> ε2 = SSE / (n-2)
……………………………

• Some suggestions in variable transformation:


• The data we saw so far are either monotonically
increase or decrease, linearly or nonlinearly

• For the following data, “Y = ax + b” seems not


applicable
For such data, a better-fit regression
Model should be:

Y = aX2 + bX + c

Or even a higher order polynomial


• This is what we called
polynomial regression,
applicable to
“increase-decrease” data

• How to find parameters a, b, c….


=> least square

• How to find R2
=> 1 – model cannot explain => 1 – SSE/SST
Review of last lecture

• Regression analysis:

– Simple linear
– Intrinsic linear (nonlinear) after variable
transformation
– Polynomial regression for “increase-decrease”
data

• Ideas are the same


Ch9 Bayesian approach
• Statistics is based on numerical samples
for example: we sample 5 students and estimate the
population’s parameters, or perform hypothesis
testing about them

• The Bayesian approach is to perform a statistical


estimate, with both samples and judgments
(experiences…) taken into account
Ex9.1
Below are five students’ height:
170, 165, 165, 175, 175 cm. What is your point
estimate for the population’s mean?

=> 170 cm

What if the lady in the medical center of HKUST said


the population’s mean should be around 168 cm
based on her 20-year experience, what is the
Bayesian estimate of it, with two data both taken
into account?
• In short, the Bayesian approach is to update the
prior experience with samples

• For example, below is the prior experience or prior


PDF about the quality of a manufacturing company:

• What is the mean ratio


• based on the prior?
• => 0.3x0.2 + ….. = 0.44

• This is a discrete case !!

Bad-quality ratio
• Bayesian updating algorithm for discrete cases:

Pr |  i  Pr'  i 
Pr"( i ) 
 Pr |  i  Pr'  i 

For this case, θi are:


0.2, 0.4, 0.6, 0.8, 1.0

ε: observations (samples )
• The key to using the Bayesian approach:

Pr | i 
the likelihood function

=> The probability of ε is happening given θi


• The key to using the Bayesian approach:

Pr | i 
Ex9.2
You randomly picked one sample from Jerry’s
company and found it is bad, what is the point
estimate for the bad ratio?
100%

Given the prior PDF is like =>


what is the posterior PDF
updated with the sample?
and what is the
Bayesian estimate?
P |  i  P'  i 
P" ( i ) 
P |  i  P'  i 

i) When θ = 0.2,
obviously, P’(θ = 0.2) = 0.3

what is Pr(ε|θ=0.2)?
ε: one pile tested and failed

=> Pr(ε|θ=0.2) = 0.2


Pr(ε|θ=0.4) = 0.4
Pr(ε|θ=0.6) = 0.6
Pr(ε|θ=0.8) = 0.8
Pr(ε|θ=1.0) = 1.0
What is the bad pile ratio from the posterior PMF or based on Bayesian approach
=> 0.136x0.2 + … + 0.114 x 1 = 0.55
• What if you pick another one, and also find it is a
bad product?

• Using the posterior as the new prior in the updating


following the same procedure

• Or using the original prior updated with two samples


together. In this case what is the likelihood function
for θi = 0.2?

Pr | i 
=> 0.2 x 0.2 = 0.04
• This is the
sequence
updated with bad
product one after
another
Review of last lecture

• Classical statistics <=> samples

• Bayesian method = judgment + sample


or=> posterior = prior + likelihood function

Pr | i 
Discrete prior
Pr |  i  Pr'  i 
Pr"( i ) 
 Pr |  i  Pr'  i 

Continuous prior Pr |  i  Pr'  i 


Pr"( i ) 
2.0

1.5  Pr |  i  Pr'  i 


1.0
f " ( )  kL( ) f ' ( )
0.5

0.0 (not recommended)


0.0 0.2 0.4 0.6 0.8 1.0
2.0
Ex9.3
1.5
What is the prior estimate
1.0
of the poor-quality rate (r)?
0.5

=> PFD => Pr(r) = 2r 0.0


0.0 0.2 0.4 0.6 0.8 1.0

=> Mean = ∫ (2r x r) dr = 2/3 (obviously r = 0 to 1)

What if the posterior rate after updating given one bad


sample collected?

Pr |  i  Pr'  i 
Pr"( i ) 
 Pr |  i  Pr'  i 
Pr |  i  Pr'  i 
Pr"( i ) 
2.0

 Pr |  i  Pr'  i  1.5

1.0
Given f’(r) = 2r

Pr | r   ?
0.5

0.0
0.0 0.2 0.4 0.6 0.8 1.0

Pr | r   r
Therefore,

(r  2r )dr  2 / 3 and Pr" ( )  r  2r  3r 2


1

 i
0 2/3
Posterior = f’’(r) = 3r2
3.0
Posterior mean
2.5

=> ∫ (3r2 x r) dr = 0.75


Pr. density

2.0

1.5
Prior = f’(r) = 2r

1.0

0.5

0.0

0.0 0.2 0.4 0.6 0.8 1.0

Poor-quality ratio (r)


Ex9.4
A traffic engineer estimates the number of accident per year as the
figure in the right:

What is the mean rate based on the prior?


=> 2 per year

Given one accident happening in one


Month, what is the mean rate based on
the observation?
=> 12 per year

What is the mean rate with the two combined by the Bayesian
approach? Assuming accident following the Poisson distribution

Pr |  i  Pr'  i 
Pr"( i ) 
 Pr |  i  Pr'  i 
One accident per year = 1/12 accident per month

Pr | v  1 / 12  ?
  1 accident in one month
v x
e v
Poisson' s PDF 
x!
Pr(ε | v1) = 1/12*e-1/12

Pr(ε | v2) = 2/12*e-2/12


Pr(ε | v3) = 3/12*e-3/12
0.5 0.5

Prior Posterior
0.4 0.4
Probability

Probability
0.3 0.3

0.2 0.2

0.1 0.1

0.0 0.0
1 2 3 1 2 3

Annual rate Annual rate

The posterior accident rate is equal to


=> 1x0.166 + 2x0.421 + 3x0.423 = 2.28 per year
Summary of Bayesian approach:

• Judgment + samples

Pr |  i  Pr'  i 
Pr"( i ) 
 Pr |  i  Pr'  i 

Pr | i 
About the Fina, on 22nd may, art hall, 0430pm (please
double check yourself!)

• 99% questions from lecture notes (i.e., powerpoints,


demonstrations,…)

• Close-book and no cheat sheet

• Some simple, important equations are needed (no


more than five), such as Bayesian update, Chi-
square, finding F-value with large right-tail
probability…
Summary of CIVL 2160
• About probability
– Random variables
– How to describe a variable (e.g., mean, SD,…)
– Probability distributions (e.g., normal,….)
– With variable’s mean, SD,… and distribution, probability
can be calculated

• About statistics
– Use samples (or sample + judgment) to picture the
populations

Anda mungkin juga menyukai