Anda di halaman 1dari 62

# Lecture Notes

on
Probability and Statistics
hOg
ain
Joe O
E-mail: johog@maths.tcd.ie
Main Text: Kreyszig; Advanced Engineering Mathematics
Other Texts: Schaum Series, Robert B. Ash, Hayter
Online Notes:

## Hamilton.ie EE304, Prof. Friedman Lec-

tures

Probability Function
Definition:

outcomes.
Definition:

## all possible outcomes.

Toss a coin: S = {H, T }

Examples:

## Throw a die: S = {1, 2, 3, 4, 5, 6}

Toss a coin twice: S = {HH, HT, T H, T T }.
Toss a coin until H appears and count the number of times it
is tossed: S = {1, 2, 3, ..., }, where means that H never
appears.
Definition:

## Any subset of S is called an event.

Let P(S) be the set of all events in S i.e. the collection of all
subsets of S.
Definition:

(i)

P (S) = 1

2

for all i 6= j.
Theorem 1:
Proof:

P () = 0.

## P (A) = P (A) + P (); hence P () = 0.

Theorem 2:
Proof:

P (Ac) = 1 P (A).

P (S) + P (Ac).
Theorem 3:
Proof:

Theorem 4:
Proof:

AB P (A) P (B).

and P (B A) 0.

## Finite Sample Space

An event containing one element is a singleton. If S contains
n elements x1, x2, ..., xn say, and each one has the same probability p of occuring, then 1 = P (S) = P ({x1, x2, ..., xn}) =
P ({x1}{x2}...{xn}}) = P ({x1})+P ({x2})+...+P ({xn}) =
p + p + ... + p = np, so p = n1 . Then, for any AS we have
P (A) =

|A|
|S| ,

## Conversely, if we define P on S by this formula, then it is easy

to check that P gives a probability function on S.
Example:

12
52 , P (AB)

3
52

and P (AB) =

13
52

12
52

3
52

13
52 , P (B)

22
52 ,

using

## Theorem 6 ( or just count them).

In general singletons need not be equiprobable. Then let P (xi) =
pi for 1 i n. (We write P (xi) for P ({xi}) for convenience). We have P (A) =
n
P

xi A

i=1

tion table
4

## where pi = P (xi) for all 1 i n. Again, going backwards,

the table defines a probability function on S.
Example: Three horses A,B,C race against each other. A is
twice as likely to win as B and B is twice as likely to win as C.
Assuming no deadheats find P (A), P (B), P (C) and P (AC).
Let P (C) = p. Then P (B) = 2p, P (A) = 4p. Hence p + 2p +
4p = 1, so 7p = 1 or p = 71 . Then P (C) = 17 , P (B) =
P (A) = 47 . Also P (BC) = P (B) + P (C) = 73 .

2
7

and

## Countably Infinite Sample Space

In this case S = {x1, x2, ..., xn, ...} with P (xi) = pi for all
i 1. Then 1 =

pi .

i=1

Example:

## ber of times it is tossed: S = {1, 2, 3, ..., }, where means

that H never appears. Set P (1) = 21 , P (2) = 14 , ..., P (n) =

1
2n

## for all n. (P () = 0.)

Let A = {1, 2, 3}. Then P (A) =

1
2

+ 14 +

1
8

7
8

= the proba-

## bility of H in the first 3 throws.

If B = {2, 4, 6, ...}, then the probability of H on an even throw
is P (B) =

1
22

1
24

## Note that P (S) =

1
26

+ ... =

1
22

1 12
2

1 1
1
2 + 22 + 24 +...

of H on an odd throw is 23 .

= 13 .
1
2

1 21

## = 1 and the probability

Conditional Probability
Let A, E be events in S with P (E) 6= 0.
Definition: The conditional probability of A given E is defined by P (A|E) =
Example:
P (AE)
P (E)

## If S is a finite equiprobable space, then P (A|E) =

|AE|
|S
|E|
|S|

Example:

P (AE)
P (E) .

|AE|
|E| .

Pair of dice. S = {(1, 1), (1, 2), ..., (6, 6)}. Find

the probability that one die shows 2 given that the sum is
6. Let A be the probability that one die is 2 and E be the
probability that the sum is 6. Then P (A|E) = 52 .
Note that, in general, P (A E) = P (E)P (A|E).
Example:

## Let A, B be events with P (A) = 6, P (B) = 3

and P (A B) = 2. Then
P (A|B) =

2
3 , P (B|A)

2
6 , P (A

B) = 6 + 3 2 =

7, P (Ac) = 1 6 = 4, P (B c) = 1 3 = 7, P (Ac B c) =
3
c c
P ((A B)c) = 1 7 = 3, P (Ac|B c) = 3
7 , P (B |A ) = 4 .

Example:

7

## probability that it both departs and arrives on time is 78.

Find the probability that
(i)

(ii)

## does not arrive on time given that it did not depart on

time.
The sample space may be taken as {(D, A), (D, Ac), (Dc, A), (Dc, Ac)}.
Let B = {(D, A), (D, Ac)} and let C = {(D, A), (Dc, A))}.
Then P (B) = 8, P (C) = 9 and
78
8 ,

(i)

P (C|B) =

(ii)

P (C c|B c) =

P (C c B c )
P (B c )

P (CB)c
P (B c )

1P (CB)
1P (B)

1(8+9+78)
.
18

## Suppose that S = A1 A2 ... An, where Ai Aj = for all

i 6= j. We say that the Ai are mutually exclusive and form a
partition of S. Let E S. Then E = E S = E (A1 A2
... An) = (E A1) (E A2) ... (E An), disjoint, So
P (E) = P (EA1)+P (EA2)+...P (EAn) =

n
P

P (EAi).

i=1

P (E) =

n
P

i=1

8

P (Aj E)
p(E)

P (EAj )
p(E)

P (Aj )P (E|Aj )
P (E)

P (Aj )P (E|Aj )
n
P

P (Ai )P (E|Ai )

i=1

## Formula or Theorem. We use it if we know all the P (E|Ai).

Example: Three machines X, Y, Z produce items.X produces 50%, 3% of which are defective. Y produces 30%, 4%
of which are defective and Z produces 20%, 5% of which are
defective. Let D be the event that an item is defective. Let
an item be chosen at random.
(i)
(ii)

## Find the probability that it is defective.

Given that it is defective, find the probability that it

## came from machine X.

Let A1 be the event consisting of elements of X, let A2 be
the event consisting of elements of Y and let A3 be the event
consisting of elements of Z. Then P (A1) = 5, P (A2) = 3
and P (A3) = 2. Also P (D|A1) = 03, P (D|A2) = 04 and
P (D|A3) = 05.
(i)

(ii)

P (A1|D) =

P (A1 )P (D|A1 )
P (D)

9

(5)(03)
037

= 405 = 40 5%.

## Example: A hospital has 300 nurses. During the past year

48 of the nurses got a pay rise. At the beginning of the year
the hospital offered a training seminar which was attended by
138 of the nurses. 27 of the nurses who got a pay rise attended
the seminar. What is the probability that a nurse who got a
pay rise attended the seminar?
Let A be the event consisting of nurses who attended the seminar and Let B be the event consisting of nurses who got a
pay rise. Then P (A) =
27
c
138 , P (B |A)

138
300

and P (Ac) =

111
c
138 , P (B|A )

21
162

162
300 .

Also P (B|A) =

and P (B c|Ac) =

141
300 .

138 27
21
Therefore P (B) = ( 300
)( 138 ) + ( 162
300 )( 162 ) =

48
300 ,

P (A)P (B|A)
P (B)

138 )( 27 )
( 300
138
48
300

27
48 .
10

which is
=

Exercise:

## vote Liberal and 25% vote Independent. During an election

45% of conservative, 40% of Liberal and 60% of Independent
voted. A person is selected at random. Find the probability
that the person voted. If the person voted, find the probability
that the voter is (i) Con. (ii) lib. and (iii) Ind.

11

Independent Events
Definition

## Two events A, B S are independent if

P (A B) = P (A)P (B).
If A and B are independent then P (A|B) =

P (AB)
P (B)

P (A)P (B)
P (B)

## P (A) i.e. the conditional probability of A given B is the same

as the probability of A. The converse is obviously true.
Note: A, B are mutually exclusive if and only if A B = ,
if and only if P (A B) = 0, hence A, B are not independent
unless either P (A) = 0 or P (B) = 0.
Example:

P (A) =

13
52 , P (B)

12
52

and P (AB) =

3
52 .

Hence P (AB) =

Example:

## S = {HHH, HHT, HT H, HT T, T HH, T HT, T T H, T T T }

Let A be the event where the first toss is heads, B be the event
where the second toss is heads and let C be the event where
there are exactly two heads in a row. Then P (A) = 84 , P (B) =
12

4
8 , P (C)

= 82 , P (A B) = 28 , P (A C) = 18 , P (B C) = 28 .

## Hence A, B are independent, A, C are independent but B, C

are not independent.
Example:

## probability that B hits the target is 25 . Assume that A and B

are independent. What is the probability that either A or B
hits the target?
P (A B) = P (A) + P (B) P (A B) = P (A) + P (B)
P (A)P (B) = 14 + 52 14 25 =
Exercise:

11
20 .

13

## Product Probability and Independent Trials

Let S = {a1, a2, ..., as} and T = {b1, b2, ..., bt} be the sample
spaces for two experiments. LetPS (ai) = pi and PT (bj ) = qj
for all 1 i s and 1 j t, where PS and PT are the
probability functions on S and T respectively.
Let S T = {(ai, bj )|ai S, bj T }. Define a function P on
P(S T ) by P ({(ai, bj )}) = piqj and addition. Then P is a
probability function on S T :
(i)

(ii)

(iii)

## P above is called the product probability on S T . It is not

the only probability function on S T. We can extend this
definition to the product of any finite number of sample spaces.
Suppose that S T has the product probability P .
Let A = {ai} T and B = S {bj }.
Then P (A) = P ({ai} T ) = PS (ai) PT (T ) = pi 1 = pi
14

## and P (B) = P (S {bj }) = PS (S) PT (bj ) = qj . Now

A B = {(ai, bj )} so P (A B) = P {(ai, bj )} = piqj =
P (A)P (B) and hence A and B are independent. Similarly,
any two events of the form C T and S D are independent,
where C S and D T.
Conversely, suppose that P is a probability function on S T
such that P ({ai} T ) = pi and P (S {bj }) = qj for all
ai S and bj T and all sets of this form are independent.
Then P {(ai, bj )} = P (({ai} T ) (S {bj })) =
P ({ai} T ) P (S {bj }) = piqj , so that P must be the
product probability.
We deduce that the product probability is the unique probability on S T with these two independence properties.
Example:

## other their respective probabilities of winning are always 21 , 13

and 16 . Suppose they race twice. Then, assuming independence,
the probability of C winning the first race and A winning the
second race is

1
6

21 =

1
12

etc.
15

## Now suppose that we perform the same experiment a number

of times. The sample space S S ... S consists of tuples.
If we assume that the experiments are independent, then the
probability function on this sample space is the product probability. If we do it n times we say that we have n independent
trials.
Example:
P (HT H) =

1
2

12

1
2

1
8

## etc. This is the same probability

function as assuming that all triples are equiprobable, as before. Hence we can consider the problem in either of the two
ways.

16

Counting Techniques
Suppose we have n objects. How many permutations of size
1 r < n can be made?

n!
Using the Fundamental Principle of counting gives (nr)!
, which

is written nPr .
How many combinations of size 1 r < n can be made?
Let the answer be nCr . Each of these combinations gives r!
permutations, so nCr r! = nPr . Hence nCr =

nP

r!

n!
(nr)!r! .

## If r = 0 or r = n the answer is 1, so if we agree tthat 0! = 1,

we can use the formula for all 0 r n.
A lot of problems in finite probability can be done from first
principles i.e. using boxes.
Example:

## we need to ensure that the probability of at least two having

the same birthday is greater than 21 ?
Let the answer be n. Then 1

365 P

n
(365)n

17

Random Variables
Definition:

## tion P . A random variable (R.V.) is a function X : S R.

The image or range of X is denoted by RX . If RX is finite
we say that X is a finite or finitely discrete R.V., if RX is
countably infinite we say that X is a countably infinite or
infinitely discrete R.V. and if RX is uncountable we say that
X is an uncountable or continuous R.V.
Example:

(i)

## Let X : S R be the maximum number of each pair and

let Y : S R be the sum of the two numbers. Then RX =
{1, 2, 3, 4, 5, 6} and RY = {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}. X
and Y are finite R.V.s
(ii)

## Let X : S R be the number of times the coin is tossed

or (the number of T s) +1. Then RX = {1, 2, 3, ...}. X is a
countably infinite R.V.
(iii)

18

## S = {(x, y)|x2 + y 2 1}. Let X R be the distance of

the point from the centre (0, 0). Then RX = [0, 1] and X is a
continuous R.V.

19

## Finite Random Variables

Let X : S RX be finite with RX = {x1, x2, ..., xn} say.
X induces a function f on RX by f (xk ) = P (X = xk ) =
p({s S|X(s) = xk }). f is called a probability distribution
function (p.d.f.). Note that f (xk ) 0 and

n
P

f (xk ) = 1.

k=1

## We can extend f to all of R by defining f (x) = 0 for all

x 6= x1, x2, ..., xn. We often write f using a table:

## Recall the idea of a discrete frequency distribution:

If we let f (xk ) =

fk
,
n
P
fi

i=1

Note:

20

Example:

Exercise:

table for X.

21

Definition:

Let

## be the probability distribution of a R.V. X. The expectation or mean of X is defined by

E(X) = x1f (x1) + x2f (x2) + ... + xnf (xn) =

n
P

xk f (xk ).

k=1

This is the same definition as the mean in the case of a frequency distribution.
Example:

1
3
E(X) = 1 36
+ 2 36
+ ... + 6 11
36 = 4 47,
2
1
1
+ 3 36
+ ... + 12 36
= 7.
E(Y ) = 2 36

## Note that E(X) need not belong to RX .

We write X or simply , if there is no confusion, for E(X).
E(X) is the weighted average where the weights are the probabilities. We can apply this definition to games of chance:
a game of chance is an experiment with n outcomes, a1, a2, ..., an
and corresponding probabilities p1, p2, ..., pn. Suppose the payout for each ai is wi. We define a R.V. X by X(ai) = wi.
22

n
P

i=1

Example:

## that number of euros. If 1, 4 or 6 occur we lose that number

of euros. Should we play?
We have a distribution table

Then E(X) = 2 16 +3 61 +5 16 1 61 4 16 6 16 = 16 .
Dont play!
Definition X a finite R.V. The variance of X is defined by
var(X) = E((X)2) = (x1 )2f (x1)+(x2 )2f (x2)+...+
2

n
P

## (xk )2f (xk ). The standard deviation

k=1
p
of X is defined to be X = var(X). If X is understood, we
(xn ) f (xn) =

just write .
Note:
=

n
P

var(X) = E((X ) ) = =

n
P

k=1

k=1

23

n
P

x2k f (xk )

n
P

xk f (xk ) +

k=1
2

k=1

n
P

f (xk )

k=1

Example:
2

E(X ) =

6
P
k=1

## X, Y from (i) above again:

3
1
x2k f (xk ) = 12 36
+22 36
+...+62 11
36 = 2197.

## Hence var(X) = E(X 2) (E(X))2 = 21 97 20 25 = 1 99.

2

E(Y ) =

11
P
k=1

1
2
1
yk2 f (yk ) = 22 36
+32 36
+...+122 36
= 548.

Theorem:

## If X is a finite R.V. and a, b R, then

(i)

E(aX) = aE(X),

(ii)

E(X + b) = E(X) + b.

Proof: (i)

E(aX) =

n
P

axk f (xk ) = a

k=1

n
P

xk f (xk ) =

k=1

aE(X).
(ii)

E(X+b) =

n
P

## (xk +b)f (xk ) =

k=1

n
P

xk f (xk )+b

k=1

n
P
k=1

E(X) + b.
Hence E(aX + b) = E(aX) + b = aE(X) + b.
Theorem:

## If X is a finite R.V. and a, b R, then

(i)

var(aX) = a2var(X),

(ii)

var(X + b) = var(X).
24

f (xk ) =

Proof: (i)

## (aE(X))2 = a2E(X 2)a2(E(X))2 = a2(E(X 2)(E(X))2 =

a2var(X). We could also just use the definition of var(X).
var(X + b) = E((X + b)2) (E(X + b))2

(ii)
=

n
P

k=1
n
P

2b

(x2k
k=1
n
P

xk f (xk ) + b

k=1

n
P
2

n
P

x2k f (xk ) +

k=1

k=1

## = E(X 2) + 2bE(X) + b2 (E(X))2 2bE(X) b2

= E(X 2) (E(X))2 = var(X).
Hence var(aX + b) = var(aX) = a2var(X) and aX+b =
|a|X .
Definition:

defined as Z =

X
.

E(Z) = E( X
) =
1
var(X)
2

2
2

E(X)

= 0 and var(Z) =

= 1.

25

Definition:

P

## f (xk ) is called the cumulative distribution function (c.d.f.)

xk x

of X.
Example:

Suppose X is defined by

Then F (2) = 14 , F (1) = 83 , F (2) = 78 , F (4) = 1.F (x) is obvious for all other x.

## F is a step function. In general F is always an increasing

function since f (xk ) 0 for all xk .
Note:

26

## The Binomial Distribution

One of the most important finite distributions is the binomial
distribution. Suppose we have an experiment with only two
possible outcomes, called success and failure i.e. S = {s, f }.
Let P (s) = p and P (f ) = q. Then p + q = 1. Each such
experiment is called a Bernouilli trial. Suppose we repeat
the experiment n times and assume that the trials are independent. The sample space for the n Bernouilli trials is
S S ...S and a typical elelment in the sample space looks
like (a1, a2, ..., an), where each ai = s or f . Define a R.V. X
on this sample space by X(a1, a2, ..., an) = the number of successes in (a1, a2, ..., an). Then RX = {0, 1, 2, ..., n}. The p.d.f.
of X, f (x), is given by f (0) = q n, f (1) = nC1pq n1, f (2) =
n

## Then f (0)+f (1)+...+f (n) =n C0q n +n C1pq n1 +n C2p2q n2 +

... +n Ck pk q nk + ... +n Cnpn = (p + q)n = 1n = 1, by the Binomial Theorem. X is said to have the binomial distribution,
written as B(n, p).
27

Example:

getting
(i)

(ii)

(iii)

(i)

## f (2) = 6C2( 21 )2( 12 )4 =

(ii)

f (4)+f (5)+f (6) = 6C4( 12 )4( 12 )2+ 6C5( 12 )5( 12 )1+ 6C6( 12 )6 =

15
64 .

22
64 .

(iii)

1 f (0) = 1 ( 12 )6 =

Example:

63
64 .

(i)

(ii)

(i)

## f (3) = 7C3( 31 )3( 23 )4 = 0 26.

(ii)

1 ( 32 )7 = 0 94.

Example:

## such that there is a better than even chance of getting at least

28

one six.
Here p = 16 , q = 65 . Let n be the required number. Then
1

( 56 )n

>

1
2,

so

( 65 )n

<

1
2

or

n ln( 56 )

<

ln( 12 )

i.e. n <

ln( 12 )
.
ln( 56 )

Hence n = 4.
Example:

random,
(i)

none

(ii)

one

(iii)

## bolts will be defective. Here n = 10, p = 2, q = 8.

(i) f (0) = (8)10,
10

C1(2)1(8)9,

(ii)

f (1) =

(iii)

10

C2(2)2(8)8).

29

## Continuous Random Variables

Suppose that X is a R.V. on a sample space S whose range RX
is an interval in R or all of R. Then X is called a continuous
R.V. If there exists a piece-wise continuous function
Rb
f : R R such that P (a X b) = f (x)dx for any
a

a < b, where a, b S, then f is called the probability distribution function (p.d.f.) or density function for X.
R
f must satisfy f (x) 0 for all x and
f (x)dx = 1. Note

that P (X = a) = P (a X a) =
define E(X) =
R

Ra

f (x)dx = 0. We

## we can easily show that var(X) = E(X 2) (E(X))2 =

R 2
x f (x)dx 2. Again 2 = var(X).

## The cumulative distribution function (c.d.f.) of X is defined

Rx
by F (x) =
f (t)dt. Then P (a x b) = F (b) F (a).

0

30

Example:

x, 0 x < 2
2
f (x) =

0, x < 0, 2 < x

P (1 X 1 5) =

15
R
1

R2
0

x2
2 dx

4
3

x
2 dx

5
16 .

E(X) =

xf (x)dx =

## and var(X) = E(X ) (E(X)) =

( 43 )2 = 29 , so =

2
3 .

## If x < 0, then F (x) =

R2

Rx

t
2 dt

If 0 x 2, then F (x) =
If 2 < x, then F (x) =

Rx

x3
2 dx

= 0.
t
2 dt

x
R t
2 dt

Rx

0
2
R
0

t
2 dt

t
2 dt

x2
4.

= 1, as we would

expect!

## Recall the idea of a grouped frequency distribution:

Example The heights of 1,000 people.

## 62 63 means 62 h 6331 etc. The relative frequencies

are

50
1,000

P
= 05 etc.
(rel. freqs.) = 1. The graph is now a

histogram:

## The area of each rectangle represents the rel. freq. of that

group. The proportion of people of height less than any number is the sum of the areas up to that number. Joining the
midpoints of the tops of the rectangles gives a bell-shaped
curve. For large populations the (relative) frequency function
approximates to such a curve.
The most important continuous random variable is the normal
distribution whose p.d.f. has a bell shape.
Definition

1 x 2
1 e 2 ( ) .
2

We say that X is

## N (, 2) if this f (x) is its p.d.f. Note that f (x) is symmetric

32

about x = and the bigger the the wider the graph of f (x)
is.

p.d.f.
R

Theorem: (i)
(ii)

x2
2

2
12 ( x
)

dx =

dx =

2.

2.

(iii)

2

## Suppose X is N (, ). Its c.d.f. is F (x) =

and P (a X b) = F (b) F (a) =

1
2

1
2

Rb

Rx

1 v )2

e 2 (

dv

1 v )2

e 2 (

dv.

These integrals cant be found analytically, so they are tabulated numerically. This would have to be done for all values of
33

## and . We do so for the case = 0 and = 1 and then use

the standardized normal R.V. Z =

X
.

## For = 0, = 1 we write Z for the R.V. and z for the real

variable. We denote the p.d.f of Z by (z) =
Rz 1 u2
1
c.d.f. by (z) = 2
e 2 du.

Consider F (x) =
u =

v
; dv

v = x, u =

1
2

Rx

1 v )2

e 2 (

1 2
1 e 2 z
2

and its

## = du; as v , u and when

x
.

Therefore F (x) =

1
2

1 2

e 2 u du = ( x
).

a
Hence P (a X b) = F (b) F (a) = ( b
) ( ).

## From the tables we have: P ( X + ) =

(1) (1) = P (1 Z 1) = 682,
P ( 2 X + 2) =
(2) (2) = P (2 Z 2) = 954, and
P ( 3 X + 3) =
(3) (3) = P (3 Z 3) = 997, etc.

34

## The tables give values of (z) for 0 z 3, in steps of 0 01.

(z) = P ( < Z z) and (0) = P ( < Z 0) = 5.
Then (z) = 1 (z) for z < 0.

## Suppose that a < b.

(i)

0 a < b.

P (a Z b) = (b) (a).
(ii)

a < 0 < b.

P (a Z b) = (b) (a)
= (b) (1 (a))
= (b) + (a) 1.
(iii)

a < b < 0.

P (a Z b) = (b) (a)
= 1 (b) (1 (a))
= (a) (b).
Example:

35

(i) P (Z 2 44)
(ii)

P (Z 1 16)

(iii)

P (Z 1)

(iv)

P (2 Z 10)

## (i) P (Z 2 44) = (2 44) = 9927.

P (Z 1 16) = 1 P (Z 1 16) =

(ii)

## 1 (1 16) = 1 877 = 123.

P (Z 1) = 1 P (Z < 1) = 1 (1) = 1 8413 =

(iii)
1587.
(iv)

Example:

## For N (0, 1) find c if

(i)

P (Z c) = 10%

(ii)

P (Z c) = 5%

(iii)

P (0 Z c) = 45%

(iv)

P (c Z c) = 99%.

(i)

P (Z c) = 1 P (Z c) = 1 (c), so 1 (c) = 1

(ii)

36

c = 1 645.
(iv)

Example:

(i) P (X 2 44)
(ii)

P (X 1 16)

(iii)

P (X 1)

(iv)

P (2 X 10)

(i)
(ii)

## P (X 244) = F (244) = ( 2448

) = (82) = 7939.
2
P (X 1 16) = F (1 16) = ( 1168
)=
2

( 98) = 1635.
(iii)

P (X 1) = 1P (X 1) = 1F (1) = 1( 18
2 ) =

1 (1) = 4602.
(iv)

28
P (2 X 10) = F (10)F (2) = ( 108
2 )( 2 ) =

(4 6) (6) = 2743.
Example:

37

(i)

## the probability that the athlete throws a distance greater

than 18 5m,
(ii)

the distance d that the throw will exceed with 90% prob-

ability.
(i)

P (X > 18 5) = 1 P (X 18 5) = 1 F (18 5) =

) = 1 (75) = 2266.
1 ( 18517
2
(ii)

Hence ( d17
2 ) = 05, so
Example:

d17
2

## dard deviation 2 5 years. Assuming that the lifetime X of the

stoves is nomally distributed find
(i)

less,
(ii)

20 years.
(i)

## P (X 10) = F (10) = ( 1015

25 ) = (2) = 1 (2) =

0228 = 2 28%.
38

(ii)

## P (16 X 20) = F (20) F (16) = ( 2015

25 )

( 1615
25 ) = (2) (4) = 3218.

39

## Jointly Distributed Random Variables

Let X, Y be finite R.V.s on the same sample space S with probability function P. Let the range of X be RX = {x1, x2, ..., xn}
and the range of Y be RY = {y1, y2, ..., ym} respectively.
Consider the pair (X, Y ) defined on S by (X, Y )(s) = (X(s), Y (s)).
Then (X, Y ) is a R.V. on S with range RX RY =
{(x1, y1), ..., (x1, ym), (x2, y1), ..., (x2, ym), ..., (xn, y1), ..., (xn, ym)}.
We sometimes call (X, Y ) a vector R.V.
Let Ai = {s S | X(s) = xi} = {X = xi} and
Bj = {s S | Y (s) = yj } = {Y = yj }. We write
Ai Bj = {X = xi, Y = yj }. Define a function
h : RX RY R by h(xi, yj ) = P (Ai Bj )
= P (X = xi, Y = yj ). Then h(xi, yj ) 0 and

h(xi, yj ) =

i,j

## 1, since the Ai Bj form a partition of S. h is called the joint

probability distribution function of (X, Y ) associated with the
probability function P and X, Y are said to be jointly distributed. Suppose that f and g are the p.d.f.s of X and Y
respectively. What is the connection between f, g and h?
40

m
m
S = m
j=1 Bj , so Ai = Ai S = Ai (j=1 Bj ) = j=1 (Ai Bj ),

j=1 (Ai Bj )) =
m
P

P (Ai Bj ) =

j=1

m
P

j=1

n
P

h(xi, yj ).

i=1

## f and g are sometimes called the marginal distributions of h.

We often write the joint distribution in a table:

Example:

## Throw a pair of dice. LetX(a, b) = max{a, b}

and Y (a, b) = a + b.

41

Definition:

## X and Y are independent if h(xi, yj ) = f (xi)g(yj )

for all i, j.
This means that P (X = xi, Y = yj ) = P (X = xi)P (Y = yj )
or P (Ai Bj ) = P (Ai)P (Bj ) for all i, j.
Note that in the above example X and Y are not independent.
If G : R2 R, then we define a R.V.

Definition:

## G(X, Y ) on S by G(X, Y )(s) = G(X(s), Y (s)) = G(xi, yj )

with p.d.f. h.
We now define the expectation and variance of G(X, Y ) as
E(G(X, Y )) =

i,j

Example:

## G(x, y) = x + y. Then (X + Y )(s) = X(s) +

P
Y (s) = xi + yj and E(X + Y ) = (xi + yj )h(xi, yj ), etc.
i,j

Theorem: (i)

E(X) + E(Y ).
(ii)

## var(X) + var(Y ). (This is not true in general).

Proof: (i)

E(X + Y ) =

P
(xi + yj )h(xi, yj )
i,j
42

PP

xih(xi, yj ) +

xi

h(xi, yj )+

PP
j

yj

yj h(xi, yj )

h(xi, yj ) =

xif (xi)+

yj g(yj )

= E(X) + E(Y ).
(ii)

## First we show that E(XY ) = E(X)E(Y ).

E(XY ) =

xiyj h(xi, yj ) =

i,j

xiyj f (xi)g(yj )

i,j

P
P
= ( xif (xi))( yj g(yj )) = E(X)E(Y ).
i

## Now var(X + Y ) = E((X + Y )2) (E(X + Y ))2

= E(X 2 + 2XY + Y 2) (E(X) + E(Y ))2 = E(X 2) +
2E(X)E(Y ) + E(Y 2) (E(X))2 (E(Y ))2 2E(X)E(Y )
= E(X 2) (E(X))2 + E(Y 2) (E(Y ))2 = var(X) + var(Y ).
Important Example:

## tion B(n, p). The sample space is S S ... S, n times,

where S = {s, f }. For 1 trial, n = 1 define the R.V. X by
X(s) = 1 and X(f ) = 0. Then E(X) = 1 p + 0 q = p
and var(X) = E(X 2) (E(X))2 = p p2 = p(1 p) = pq.
For n trials define X1(a1, a2, ..., an) = X(a1), ...,
Xn(a1, a2, ..., an) = X(an), so that Xi is 1 if s is in the ith
place and 0 if f is in the ith place for all 1 i n.
43

## Then E(Xi) = E(X) = p and var(Xi) = var(X) = pq.

Now Let Y = X1 + X2 + ... + Xn, so that Y gives the total
number of successes in the n trials. Then E(Y ) = E(X1) +
E(X2) + ... + E(Xn) = p + p + ... + p = np and var(Y ) =
var(X1) + var(X2) + ... + var(Xn) = npq.

44

Sampling Theory
Suppose that we have an infinite or very large finite sample
space S. This sample space is often called a population. Getting information about the total population may be difficult,
so we consider much smaller subsets of the population, called
samples. We want to get information about the population
by studying the samples. We consider the samples to be random samples i.e. each element of the population has the same
probability of being in a sample.
Example:

## son at random and consider the age of this person. Do this

n times. This gives a random sample of size n of the ages of
people in Ireland.
Mathematically the situation is described in the following way:
Let X be a random variable on a sample space S with probability function P and let f (x) be the probability distribution
function of X. Consider the sample space = S S ...S,
(n times) with the product probability function P i.e.
45

## P(A1 A2 ... An) = P (A1)P (A2)...P (An).

For each 1 i n define a random variable Xi on by
Xi(s1, s2, ..., sn) = X(si) where (s1, s2, ..., sn) . Then the
probability distribution function of Xi is also f (x) for each
i. The vector random variable (X1, X2, ..., Xn) defined by
(X1, X2, ..., Xn)(s1, s2, ..., sn) = (X1(s1), X2(s2), ..., Xn(sn)) =
(x1, x2, ..., xn) is a random variable on with joint distribution
P(X1 = x1, X2 = x2, ..., Xn = xn) = f (x1)f (x2)...f (xn).
Choosing a sample is simply applying the vector random variable (X1, X2, ..., Xn) to to get a random sample (x1, x2, ..., xn).
Each Xi has the same mean and variance 2 as X and
they are independent, by definition. They are called independent identically distributed random variables (i.i.d.). Functions of the X1, X2, ..., Xn and numbers associated with them
are called statistics, while functions of the original X and associated numbers are called parameters. Our task is to get
information about the parameters by studying the statistics.
The mean and variance 2 of X are called the population
46

## mean and variance. We define two important statistics, the

sample mean and sample variance:
Definition:

Sample mean X =
2

sample variance S =

n
P
(Xi X)2
n1

2

## S (s1, s2, ..., sn) =

x1 +x2 +...+xn
n

n
P
(xi x)2
n1

X1 +X2 +...+Xn
,
n

= x = S and

We have
Theorem: (i) Expection of X = E(X) = , the population mean,
(ii)

2
Variance of X = X
=

2
n,

## the population variance over

n.
Proof: (i) E(X) = E( X1+X2n+...+Xn ) =
++...+
n

(ii)

n
n

## E(X1 )+E(X2 )+...+E(Xn )

n

= .

2
X
= var(X) = var( X1+X2n+...+Xn ) = var( Xn1 )+var( Xn2 )+

var(X1 )
n2

var(Xn )

2)
+ var(X
+
=
n
=
2
2
n
n
n2

2
n.

## The reason for the n 1 instead of n in the definition of S 2 is

given by the following result.

47

Theorem:

Proof:

E(S ) = E(

n
P
(Xi X)2
n1

)=

1
n1 E(

n
P
2
1
2
(X

2X
X
+
X
)) =
E(
i
i
n1
1
n1 [
1
n1 [

n
P
1

n
P
1

E(Xi2)

2E((

n
P

1
2

## E(Xi2) 2E((nX)(X)) + nE(X )] =

2

+ 2) 2nE(X ) + nE(X )] =

1
2
n1 [n(

+ 2) nE(X )] =

Note:

Xi)X) + nE(X )] =

1
2
n1 [n(

1
n1 [(n

n
P
(Xi X)2) =

1
2
n1 [n(

+ 2) n( n + 2)] =

1) 2)] = 2.
If the mean or expectation of a statistic is equal to

the corresponding parameter, the statistic is called an unbiased estimator of the parameter. Hence X and S 2 are unbiased estimators of and 2 respectively. An estimate of
a population parameter given by a single number is called a
point estimate e.g. if we take a sample of size n and calculate
S =

x1 +x2 +...+xn
n

and S =

n
P
(xi x)2
n1

## point estimates of and 2 respectively. We shall, however,

concentrate on interval estimates, where the parameter lies
within some interval, called a confidence interval.
48

## Confidence Intervals for

Suppose we have n i.i.d. random variables X1, X2, ..., Xn with
E(Xi) = and var(Xi) = 2 for each 1 i n. Then if
X=

X1 +X2 +...+Xn
,
n

## we get E(X) = and var(X) =

2
n.

As before, X1, X2, ..., Xn are jointly distrbuted random variables defined on the product sample space. We have the very
important result:
Central Limit Theorem:

and variance

2
n

## i.e. N (, n ) or, in other words,

is

approximately N (0, 1). The larger the n the better the approximation.
Note:

values of n.
Recall N (0, 1).

49

## If we want P (z1 Z z1) = 95%, then (z1) = 97.5%, so

z1 = 1.96. We say that 1.96 Z 1.96 is a 95% confidence
interval for N (0, 1).
Hence P (1.96

1.96) = 95%

## P (1.96 n X 1.96 n ) = 95%

P (1.96 n X 1.96 n X) = 95%
P (1.96 n + X 1.96 n + X) = 95%
P (X 1.96 n X + 1.96 n ) = 95%.
If we know this gives us a 95% confidence interval for i.e.
given any random sample there is a 95% probability that lies
within the above interval or we can say with 95% confidence
that is between the two limits of the interval. Put another
way, 95% of samples will have in the above interval.
Example:

## A sample of size 100 is taken from a population

with unknown mean and variance 9. Determine a 95% confidence interval for if the sample mean is 5.
Here X = 5, = 3 and n = 100.
P (X 1.96 n X + 1.96 n ) = 95%
50

3
3
P (5 1.96 10
5 + 1.96 10
) = 95%

Example:

## in a very large company. The average wage of the sample of

workers is 25,000 euro. If the standard deviation of the whole
company is 1,000 euro, construct a confidence interval for the
mean wage in the company at the 95% level.
Here X = 25, 000, = 1, 000 and n = 80.
P (X 1.96 n X + 1.96 n ) = 95%
P (25, 000 1.96

1,000

80

1,000
)
80

## 95% P (24, 781 25, 219) = 95%.

We can have different confidence intervals:
Let be a small percentage ( 5% above). Then
P (X z 2

X + z 2

)
n

= 1 gives an

51

Note:

Example:

## mean of a normal population if the population variance is 2 =

4.84, using the sample 28, 24, 31, 27, 22.
(Note that we need normality here since the sample size < 30).

X = 28+24+31+27+22
=
26.4
and

=
4.84 = 2.2. Then
5

P (26.4 2.5752.2
26.4 2.5752.2
) = .99
5
5

Example:

## with 2 = 9, how large must a sample be if the 95% confidence

interval has length at most 0.4?
In general the length of the confidence interval is
(X + z 2

)
n

(X z 2

)
n

= 2z 2

.
n

3
n

.4, which

## confidence interval Z 2 = 1.96, so 2 1.96

gives

21.963
.4

or n = 865.

In all the previous examples we knew 2, the population variance. If that is not so and n 30 we can use S 2 as a point
52

## estimate for 2 and assume that X

is approximately N (0, 1).
S
n

Example:

## the average life of its watches. In a random sample of 121

watches it is found that X = 14.5 years and S = 2 years.
Construct a (i) 95%, (ii) 99% confidence interval for .
(i)

2
2
14.5 + 1.96 11

14.5 1.96 11

14.14 14.86.
(ii)

2
2
14.5 2.575 11
14.5 + 2.575 11

14.03 14.97.
Note that the greater the confidence the greater the interval.
If n is small (< 30) this is not very accurate, even if the original
X is normal. In this case we must use the following:
Theorem:

## tributed random variables, each with mean and variance 2,

then the random variable

X
S
n

is a t-distribution with n 1

degrees of freedom.
We denote the number of degrees of freedom n 1 by . For
each the t-distribution is a symmetric bell-shaped distribu53

tion.

## For = we get the standard normal distribution N (0, 1).

The statistical tables usually read P (|T | > k) for each .

Example:

Example:

## mean and variance. A sample of size 20 is taken. The sample

mean is 15.5 and the sample variance is 0.09. Obtain a 99%
confidence interval for , the population mean.
54

## Since n = 20 < 30 we must use the t-distribution with =

n 1 = 19. We have X = 15.5 and S 2 = 0.09. For = 19 we
have P (|T | > k) = 0.01 giving k = 2.861.
Now

X
S
20

P (2.861

15.5
0.3

20

2.861) = 99%

P (15.5 2.8610.3
15.5 + 2.8610.3
) = 99%
20
20

Example:

## Five independent measurements, in degrees F,

of the flashpoint of diesel oil gave the reults 144, 147, 146, 142, 144.
Assuming normality, determine a (i) 95%, (ii)99% confidence
interval for the mean flashpoint.
Since n < 30 we must apply the t-distribution. n = 5, so
= 4. We have X =
S2 =
(i)

144+147+146+142+144
5

## (.6)2 +(2.4)2 +(1.4)2 +(2.6)2 +(.6)2

5

= 144.6. Also

= 3.8, so S = 1.949.

2.7761.949

P (144.6 2.7761.949

144.6+
) = 95%
5
5

(ii)

P (144.6

4.6041.949

144.6 +

## 99% P (140.59 148.61) = 99%.

55

4.6041.949

)
5

Hypothesis Testing
Suppose that a claim is made about some parameter of a population, in our case always the population mean . This claim
is called the null hypothesis and is denoted by H0. Any claim
that differs from this is called an alternative hypothesis, denoted by H1. We must test H0 against H1.
Example:

H0 : = 90.

## Possible alternatives are

H1 : 6= 90
H1 : > 90
H1 : < 90
H1 : = 95.
We must decide whether to accept or reject H0. If we reject
H0 when it is in fact true we commit what is called a type I
error and if we accept H0 when it is in fact false we commit a
typeII error. The maximum probability with which we would
be willing to risk a type I error is called the level of significance
of the test, usually 10%, 5% or 1%. We perform a hypothesis
56

## test by taking a random sample from the population.

Suppose that we are given H0 : = 0, some fixed value.
(i)

## We take a random sample X. We might have X > 0, X < 0

or X = 0. Now if the mean is 0, then X is approximately
N (0, 2) or

X0

## is approximately N (0, 1).

There is a 5% probability that X is in either of the end regions of N (0, 2) or, equivalently,
regions of N (0, 1). If our

X0

X0

## we reject H0 at th 5% significance level. Otherwise we do not

reject H0. This is called a two-tailed test.
(ii)

## Our X is now > 0 (this is why we suspect that > 0.) We

only check for probability on the right-hand side.

57

## Again if the mean is 0, then if our

X0

is in this rejection

## region we reject H0 at th 5% significance level. Otherwise we

do not reject H0. This is called a one-tailed test.

(iii)

58

Example:

## A battery company claims that its batteries

have an average life of 1,000 hours. In a sample of 100 batteries it was found that X = 985 hours and S = 30 hours.
Test the hypothesis H0 : = 1, 000 hours against the alternative hypothesis H1 : 6= 1, 000 hours at the 5% significance
level, assuming that the lifetime of the batteries is normally
distributed.
n = 100 > 30 so we can take S for . If = 1, 000, then

X
S
n

## is approximately N (0, 1). We are interested in extreme values

of X on both sides of = 1, 000 so we use a two-tailed test.
Values of

X
S
n

X
S
n

9851,000
30
100

## rejection region. So we reject H0 at the 5% significance level.

There is a 5% probability of a type I error.

59

## Example: A researcher claims that 10 year old children

watch 6.6 hours of television daily. In a sample of 100 it was
found that X = 6.1 hours and S = 2.5 hours. Test the hypothesis H0 : = 6.6 hours against the alternative H1 : 6= 6.6
hours at the (i) 5%, (ii)1% significance levels.
n = 100 > 30, so we can take S for .
Then

(i)

If = 6.6, then

S
n

## is between -1.96 and 1.96 with

S
n

X
S
n

6.16.6
2.5
10

= 2 is in the rejection

(ii)

If = 6.6, then

X
S
n

X
S
n

6.16.6
2.5
10

= 2, which

## now is in the non-rejection region. We do not reject H0 at the

1% level.

60

Example: A manufacturer produces bulbs that are supposed to burn with a mean life of at least 3,000 hours. The
standard deviation of 500 hours. A sample of 100 bulbs is
taken and the sample mean is found to be 2,800 hours. Test
the hypothesis H0 : 3, 000 hours against the alternative
H1 : < 1, 000 hours at the 5% significance level.
In this case if our X value is greater than 3,000 we do not reject
it since it agrees with H0, so we are only interested in extreme
values on the left. We use a one-tailed test. Again
is approximately N (0, 1) and

X

2,8003,000
500
10

## We also need to use the t-distribution.

Example: We need to buy a length of a certain type of wire.
The manufacturer claims that the wire has a mean breaking
61

## limit of 200 kg or more. We suspect that the mean is less. We

have H0 : 200 and H1 : < 200. We take a random
sample of 25 rolls of wire and find that X = 197 kg and
S = 6 kg. Test H1 against H0 at the 5% level, assuming the
breaking limit of the wire is normally distributed.
Here n = 25 < 30, so we must use a t-distribution with = 24.
If the mean is , then

X
S
n

of freedom.

X
S
n

197200

62

6
25

= 2.5, which is