Anda di halaman 1dari 62

Lecture Notes

on
Probability and Statistics
hOg
ain
Joe O
E-mail: johog@maths.tcd.ie
Main Text: Kreyszig; Advanced Engineering Mathematics
Other Texts: Schaum Series, Robert B. Ash, Hayter
Online Notes:

Hamilton.ie EE304, Prof. Friedman Lec-

tures

Probability Function
Definition:

An experiment is an operation with well-defined

outcomes.
Definition:

The sample space S of an experiment is set of

all possible outcomes.


Toss a coin: S = {H, T }

Examples:

Throw a die: S = {1, 2, 3, 4, 5, 6}


Toss a coin twice: S = {HH, HT, T H, T T }.
Toss a coin until H appears and count the number of times it
is tossed: S = {1, 2, 3, ..., }, where means that H never
appears.
Definition:

Any subset of S is called an event.

Let P(S) be the set of all events in S i.e. the collection of all
subsets of S.
Definition:

A probability function on S is a function

P : P(S) [0, 1] such that


(i)

P (S) = 1

(ii) P (A1A2...An...) = P (A1) + P (A2) + ... + P (An) + ...,


2

where A1.A2, ..., An, ... are mutually exclusive i.e.AiAj =


for all i 6= j.
Theorem 1:
Proof:

P () = 0.

For any AS, A = A and A = , so P (A) =

P (A) = P (A) + P (); hence P () = 0.


Theorem 2:
Proof:

P (Ac) = 1 P (A).

AAc = S and AAc = , so 1 = P (S) = P AAc =

P (S) + P (Ac).
Theorem 3:
Proof:

From Theorem we get 2 P (A) = 1 P (Ac) 1.

Theorem 4:
Proof:

P (A) 1, for all AS.

AB P (A) P (B).

B = A(B A) P (B) = P (A) + P (B A)

and P (B A) 0.

Finite Sample Space


An event containing one element is a singleton. If S contains
n elements x1, x2, ..., xn say, and each one has the same probability p of occuring, then 1 = P (S) = P ({x1, x2, ..., xn}) =
P ({x1}{x2}...{xn}}) = P ({x1})+P ({x2})+...+P ({xn}) =
p + p + ... + p = np, so p = n1 . Then, for any AS we have
P (A) =

|A|
|S| ,

where |A| means the number of elements of A.

Conversely, if we define P on S by this formula, then it is easy


to check that P gives a probability function on S.
Example:

A card is selected from a pack of 52 cards. Let

A = { hearts }, B = { face cards }. Then P (A) =


12
52 , P (AB)

3
52

and P (AB) =

13
52

12
52

3
52

13
52 , P (B)

22
52 ,

using

Theorem 6 ( or just count them).


In general singletons need not be equiprobable. Then let P (xi) =
pi for 1 i n. (We write P (xi) for P ({xi}) for convenience). We have P (A) =
n
P

P (xi) and 1 = P (S) =

xi A

P (xi.) We can form a table, called a probability distribu-

i=1

tion table
4

where pi = P (xi) for all 1 i n. Again, going backwards,


the table defines a probability function on S.
Example: Three horses A,B,C race against each other. A is
twice as likely to win as B and B is twice as likely to win as C.
Assuming no deadheats find P (A), P (B), P (C) and P (AC).
Let P (C) = p. Then P (B) = 2p, P (A) = 4p. Hence p + 2p +
4p = 1, so 7p = 1 or p = 71 . Then P (C) = 17 , P (B) =
P (A) = 47 . Also P (BC) = P (B) + P (C) = 73 .

2
7

and

Countably Infinite Sample Space


In this case S = {x1, x2, ..., xn, ...} with P (xi) = pi for all
i 1. Then 1 =

pi .

i=1

Example:

Toss a coin until H appears and count the num-

ber of times it is tossed: S = {1, 2, 3, ..., }, where means


that H never appears. Set P (1) = 21 , P (2) = 14 , ..., P (n) =

1
2n

for all n. (P () = 0.)


Let A = {1, 2, 3}. Then P (A) =

1
2

+ 14 +

1
8

7
8

= the proba-

bility of H in the first 3 throws.


If B = {2, 4, 6, ...}, then the probability of H on an even throw
is P (B) =

1
22

1
24

Note that P (S) =

1
26

+ ... =

1
22

1 12
2

1 1
1
2 + 22 + 24 +...

of H on an odd throw is 23 .

= 13 .
1
2

1 21

= 1 and the probability

Conditional Probability
Let A, E be events in S with P (E) 6= 0.
Definition: The conditional probability of A given E is defined by P (A|E) =
Example:
P (AE)
P (E)

If S is a finite equiprobable space, then P (A|E) =

|AE|
|S
|E|
|S|

Example:

P (AE)
P (E) .

|AE|
|E| .

Pair of dice. S = {(1, 1), (1, 2), ..., (6, 6)}. Find

the probability that one die shows 2 given that the sum is
6. Let A be the probability that one die is 2 and E be the
probability that the sum is 6. Then P (A|E) = 52 .
Note that, in general, P (A E) = P (E)P (A|E).
Example:

Let A, B be events with P (A) = 6, P (B) = 3

and P (A B) = 2. Then
P (A|B) =

2
3 , P (B|A)

2
6 , P (A

B) = 6 + 3 2 =

7, P (Ac) = 1 6 = 4, P (B c) = 1 3 = 7, P (Ac B c) =
3
c c
P ((A B)c) = 1 7 = 3, P (Ac|B c) = 3
7 , P (B |A ) = 4 .

Example:

The probability that a certain flight departs on

time is 8 and the probability that it arrives on time is 9. The


7

probability that it both departs and arrives on time is 78.


Find the probability that
(i)

it arrives on time given that it departed on time,

(ii)

does not arrive on time given that it did not depart on

time.
The sample space may be taken as {(D, A), (D, Ac), (Dc, A), (Dc, Ac)}.
Let B = {(D, A), (D, Ac)} and let C = {(D, A), (Dc, A))}.
Then P (B) = 8, P (C) = 9 and
78
8 ,

(i)

P (C|B) =

(ii)

P (C c|B c) =

P (C c B c )
P (B c )

P (CB)c
P (B c )

1P (CB)
1P (B)

1(8+9+78)
.
18

Suppose that S = A1 A2 ... An, where Ai Aj = for all


i 6= j. We say that the Ai are mutually exclusive and form a
partition of S. Let E S. Then E = E S = E (A1 A2
... An) = (E A1) (E A2) ... (E An), disjoint, So
P (E) = P (EA1)+P (EA2)+...P (EAn) =

n
P

P (EAi).

i=1

Now P (E Ai) = P (Ai)P (E|Ai), for each i, so


P (E) =

n
P

P (Ai)P (E|Ai). This is called the Law of Total

i=1

Probability. We also have P (Aj |E) =


8

P (Aj E)
p(E)

P (EAj )
p(E)

P (Aj )P (E|Aj )
P (E)

P (Aj )P (E|Aj )
n
P

for all j. This is known as Bayes

P (Ai )P (E|Ai )

i=1

Formula or Theorem. We use it if we know all the P (E|Ai).


Example: Three machines X, Y, Z produce items.X produces 50%, 3% of which are defective. Y produces 30%, 4%
of which are defective and Z produces 20%, 5% of which are
defective. Let D be the event that an item is defective. Let
an item be chosen at random.
(i)
(ii)

Find the probability that it is defective.


Given that it is defective, find the probability that it

came from machine X.


Let A1 be the event consisting of elements of X, let A2 be
the event consisting of elements of Y and let A3 be the event
consisting of elements of Z. Then P (A1) = 5, P (A2) = 3
and P (A3) = 2. Also P (D|A1) = 03, P (D|A2) = 04 and
P (D|A3) = 05.
(i)

P (D) = (5)(03) + (3)(04) + (2)(05) = 037 = 3 7%.

(ii)

P (A1|D) =

P (A1 )P (D|A1 )
P (D)

We often use a tree diagram:


9

(5)(03)
037

= 405 = 40 5%.

Example: A hospital has 300 nurses. During the past year


48 of the nurses got a pay rise. At the beginning of the year
the hospital offered a training seminar which was attended by
138 of the nurses. 27 of the nurses who got a pay rise attended
the seminar. What is the probability that a nurse who got a
pay rise attended the seminar?
Let A be the event consisting of nurses who attended the seminar and Let B be the event consisting of nurses who got a
pay rise. Then P (A) =
27
c
138 , P (B |A)

138
300

and P (Ac) =

111
c
138 , P (B|A )

21
162

162
300 .

Also P (B|A) =

and P (B c|Ac) =

141
300 .

138 27
21
Therefore P (B) = ( 300
)( 138 ) + ( 162
300 )( 162 ) =

48
300 ,

obvious from the beginning. Also P (A|B) =

P (A)P (B|A)
P (B)

138 )( 27 )
( 300
138
48
300

27
48 .
10

which is
=

Exercise:

In a certain city 40% vote Conservative, 35%

vote Liberal and 25% vote Independent. During an election


45% of conservative, 40% of Liberal and 60% of Independent
voted. A person is selected at random. Find the probability
that the person voted. If the person voted, find the probability
that the voter is (i) Con. (ii) lib. and (iii) Ind.

11

Independent Events
Definition

Two events A, B S are independent if

P (A B) = P (A)P (B).
If A and B are independent then P (A|B) =

P (AB)
P (B)

P (A)P (B)
P (B)

P (A) i.e. the conditional probability of A given B is the same


as the probability of A. The converse is obviously true.
Note: A, B are mutually exclusive if and only if A B = ,
if and only if P (A B) = 0, hence A, B are not independent
unless either P (A) = 0 or P (B) = 0.
Example:

Pick a card. let A be the event consisting of

hearts and let B be the event consisting of face-cards. Then


P (A) =

13
52 , P (B)

12
52

and P (AB) =

3
52 .

Hence P (AB) =

P (A)P (B) and so A and B are independent events.


Example:

Toss a fair coin three times.

S = {HHH, HHT, HT H, HT T, T HH, T HT, T T H, T T T }


Let A be the event where the first toss is heads, B be the event
where the second toss is heads and let C be the event where
there are exactly two heads in a row. Then P (A) = 84 , P (B) =
12

4
8 , P (C)

= 82 , P (A B) = 28 , P (A C) = 18 , P (B C) = 28 .

Hence A, B are independent, A, C are independent but B, C


are not independent.
Example:

The probability that A hits a target is 41 . The

probability that B hits the target is 25 . Assume that A and B


are independent. What is the probability that either A or B
hits the target?
P (A B) = P (A) + P (B) P (A B) = P (A) + P (B)
P (A)P (B) = 14 + 52 14 25 =
Exercise:

11
20 .

Show that A, B independent A, B c inde-

pendent and hence Ac, B c independent.

13

Product Probability and Independent Trials


Let S = {a1, a2, ..., as} and T = {b1, b2, ..., bt} be the sample
spaces for two experiments. LetPS (ai) = pi and PT (bj ) = qj
for all 1 i s and 1 j t, where PS and PT are the
probability functions on S and T respectively.
Let S T = {(ai, bj )|ai S, bj T }. Define a function P on
P(S T ) by P ({(ai, bj )}) = piqj and addition. Then P is a
probability function on S T :
(i)

piqj 0 for all i, j.

(ii)

P (S T ) = p1q1 +p1q2 +...+p1qt +...+psq1 +psq2 +...+

psqt = p1(q1 + ... + qt) + ... + ps(q1 + ... + qt) = p1 + ... + ps = 1.


(iii)

Obvious by addition.

P above is called the product probability on S T . It is not


the only probability function on S T. We can extend this
definition to the product of any finite number of sample spaces.
Suppose that S T has the product probability P .
Let A = {ai} T and B = S {bj }.
Then P (A) = P ({ai} T ) = PS (ai) PT (T ) = pi 1 = pi
14

and P (B) = P (S {bj }) = PS (S) PT (bj ) = qj . Now


A B = {(ai, bj )} so P (A B) = P {(ai, bj )} = piqj =
P (A)P (B) and hence A and B are independent. Similarly,
any two events of the form C T and S D are independent,
where C S and D T.
Conversely, suppose that P is a probability function on S T
such that P ({ai} T ) = pi and P (S {bj }) = qj for all
ai S and bj T and all sets of this form are independent.
Then P {(ai, bj )} = P (({ai} T ) (S {bj })) =
P ({ai} T ) P (S {bj }) = piqj , so that P must be the
product probability.
We deduce that the product probability is the unique probability on S T with these two independence properties.
Example:

When three horses A, B, C race against each

other their respective probabilities of winning are always 21 , 13


and 16 . Suppose they race twice. Then, assuming independence,
the probability of C winning the first race and A winning the
second race is

1
6

21 =

1
12

etc.
15

Now suppose that we perform the same experiment a number


of times. The sample space S S ... S consists of tuples.
If we assume that the experiments are independent, then the
probability function on this sample space is the product probability. If we do it n times we say that we have n independent
trials.
Example:
P (HT H) =

Toss a coin three times as before. Now e.g.


1
2

12

1
2

1
8

etc. This is the same probability

function as assuming that all triples are equiprobable, as before. Hence we can consider the problem in either of the two
ways.

16

Counting Techniques
Suppose we have n objects. How many permutations of size
1 r < n can be made?

n!
Using the Fundamental Principle of counting gives (nr)!
, which

is written nPr .
How many combinations of size 1 r < n can be made?
Let the answer be nCr . Each of these combinations gives r!
permutations, so nCr r! = nPr . Hence nCr =

nP

r!

n!
(nr)!r! .

If r = 0 or r = n the answer is 1, so if we agree tthat 0! = 1,


we can use the formula for all 0 r n.
A lot of problems in finite probability can be done from first
principles i.e. using boxes.
Example:

The birthday problem: How many people do

we need to ensure that the probability of at least two having


the same birthday is greater than 21 ?
Let the answer be n. Then 1

365 P

n
(365)n

> 12 , so n = 23. For

more on perms. and combs. see Schaum Series.


17

Random Variables
Definition:

Let S be a sample space with probability func-

tion P . A random variable (R.V.) is a function X : S R.


The image or range of X is denoted by RX . If RX is finite
we say that X is a finite or finitely discrete R.V., if RX is
countably infinite we say that X is a countably infinite or
infinitely discrete R.V. and if RX is uncountable we say that
X is an uncountable or continuous R.V.
Example:

(i)

Throw a pair of dice. S = {(1, 1), ..., (6, 6)}.

Let X : S R be the maximum number of each pair and


let Y : S R be the sum of the two numbers. Then RX =
{1, 2, 3, 4, 5, 6} and RY = {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}. X
and Y are finite R.V.s
(ii)

Toss a coin until H appears. S = {H, T H, T T H, T T T H, ...}.

Let X : S R be the number of times the coin is tossed


or (the number of T s) +1. Then RX = {1, 2, 3, ...}. X is a
countably infinite R.V.
(iii)

A point is chosen on a disc of radius 1.


18

S = {(x, y)|x2 + y 2 1}. Let X R be the distance of


the point from the centre (0, 0). Then RX = [0, 1] and X is a
continuous R.V.

19

Finite Random Variables


Let X : S RX be finite with RX = {x1, x2, ..., xn} say.
X induces a function f on RX by f (xk ) = P (X = xk ) =
p({s S|X(s) = xk }). f is called a probability distribution
function (p.d.f.). Note that f (xk ) 0 and

n
P

f (xk ) = 1.

k=1

We can extend f to all of R by defining f (x) = 0 for all


x 6= x1, x2, ..., xn. We often write f using a table:

Recall the idea of a discrete frequency distribution:

If we let f (xk ) =

fk
,
n
P
fi

the relative frequency, we get a proba-

i=1

bility distribution. Its graph is a bar-chart:

Note:

We are usually interested in the distribution of a

particular R.V. rather than the underlying sample space e.g.


20

the heights or ages of a given set of people.


Example:

X, Y from example (i) above:

Exercise:

A fair coin is tossed three times. Let X be the

number of heads. RX = {0, 1, 2, 3}. Draw the distribution


table for X.

21

Definition:

Let

be the probability distribution of a R.V. X. The expectation or mean of X is defined by


E(X) = x1f (x1) + x2f (x2) + ... + xnf (xn) =

n
P

xk f (xk ).

k=1

This is the same definition as the mean in the case of a frequency distribution.
Example:

X, Y from (i) above again:

1
3
E(X) = 1 36
+ 2 36
+ ... + 6 11
36 = 4 47,
2
1
1
+ 3 36
+ ... + 12 36
= 7.
E(Y ) = 2 36

Note that E(X) need not belong to RX .


We write X or simply , if there is no confusion, for E(X).
E(X) is the weighted average where the weights are the probabilities. We can apply this definition to games of chance:
a game of chance is an experiment with n outcomes, a1, a2, ..., an
and corresponding probabilities p1, p2, ..., pn. Suppose the payout for each ai is wi. We define a R.V. X by X(ai) = wi.
22

Then the average payout is E(X) =

n
P

wipi. We would play

i=1

the game if E(X) > 0.


Example:

A fair die is thrown. If 2, 3 or 5 occur we win

that number of euros. If 1, 4 or 6 occur we lose that number


of euros. Should we play?
We have a distribution table

Then E(X) = 2 16 +3 61 +5 16 1 61 4 16 6 16 = 16 .
Dont play!
Definition X a finite R.V. The variance of X is defined by
var(X) = E((X)2) = (x1 )2f (x1)+(x2 )2f (x2)+...+
2

n
P

(xk )2f (xk ). The standard deviation


k=1
p
of X is defined to be X = var(X). If X is understood, we
(xn ) f (xn) =

just write .
Note:
=

n
P

var(X) = E((X ) ) = =

n
P

(xk )2f (xk )

k=1

(x2k 2xk + 2)f (xk )

k=1

23

n
P

x2k f (xk )

n
P

xk f (xk ) +

k=1
2

k=1

n
P

f (xk )

k=1

= E(X 2) 22 + = E(X)2 2 = E(X 2) (E(X))2.


Example:
2

E(X ) =

6
P
k=1

X, Y from (i) above again:


3
1
x2k f (xk ) = 12 36
+22 36
+...+62 11
36 = 2197.

Hence var(X) = E(X 2) (E(X))2 = 21 97 20 25 = 1 99.


2

E(Y ) =

11
P
k=1

1
2
1
yk2 f (yk ) = 22 36
+32 36
+...+122 36
= 548.

Hence var(Y ) = E(Y 2) (E(Y ))2 = 54 8 49 = 5 8.


Theorem:

If X is a finite R.V. and a, b R, then

(i)

E(aX) = aE(X),

(ii)

E(X + b) = E(X) + b.

Proof: (i)

E(aX) =

n
P

axk f (xk ) = a

k=1

n
P

xk f (xk ) =

k=1

aE(X).
(ii)

E(X+b) =

n
P

(xk +b)f (xk ) =

k=1

n
P

xk f (xk )+b

k=1

n
P
k=1

E(X) + b.
Hence E(aX + b) = E(aX) + b = aE(X) + b.
Theorem:

If X is a finite R.V. and a, b R, then

(i)

var(aX) = a2var(X),

(ii)

var(X + b) = var(X).
24

f (xk ) =

Proof: (i)

var(aX) = E((aX)2)(E(aX))2 = E(a2X 2)

(aE(X))2 = a2E(X 2)a2(E(X))2 = a2(E(X 2)(E(X))2 =


a2var(X). We could also just use the definition of var(X).
var(X + b) = E((X + b)2) (E(X + b))2

(ii)
=

n
P

k=1
n
P

2b

(xk + b)2f (xk ) (E(X) + b)2

(x2k
k=1
n
P

+ 2bxk + b )f (xk ) (E(X) + b) =

xk f (xk ) + b

k=1

n
P
2

n
P

x2k f (xk ) +

k=1

f (xk ) (E(X))2 2bE(X) b2

k=1

= E(X 2) + 2bE(X) + b2 (E(X))2 2bE(X) b2


= E(X 2) (E(X))2 = var(X).
Hence var(aX + b) = var(aX) = a2var(X) and aX+b =
|a|X .
Definition:

If X is a R.V. with mean and standard

deviation , then the standardized R.V. associated with X is


defined as Z =

X
.

E(Z) = E( X
) =
1
var(X)
2

2
2

E(X)

= 0 and var(Z) =

= 1.

So Z has mean 0 and standard deviation 1.

25

Definition:

Let X be a finite R.V. with p.d.f. f (x). The

function F : R R defined by F (x) = P (X x) =


P

f (xk ) is called the cumulative distribution function (c.d.f.)

xk x

of X.
Example:

Suppose X is defined by

Then F (2) = 14 , F (1) = 83 , F (2) = 78 , F (4) = 1.F (x) is obvious for all other x.

F is a step function. In general F is always an increasing


function since f (xk ) 0 for all xk .
Note:

The c.d.f. is more important for continuous distri-

butions (see later).

26

The Binomial Distribution


One of the most important finite distributions is the binomial
distribution. Suppose we have an experiment with only two
possible outcomes, called success and failure i.e. S = {s, f }.
Let P (s) = p and P (f ) = q. Then p + q = 1. Each such
experiment is called a Bernouilli trial. Suppose we repeat
the experiment n times and assume that the trials are independent. The sample space for the n Bernouilli trials is
S S ...S and a typical elelment in the sample space looks
like (a1, a2, ..., an), where each ai = s or f . Define a R.V. X
on this sample space by X(a1, a2, ..., an) = the number of successes in (a1, a2, ..., an). Then RX = {0, 1, 2, ..., n}. The p.d.f.
of X, f (x), is given by f (0) = q n, f (1) = nC1pq n1, f (2) =
n

C2p2q n2 etc. In general f (k) = nCk pk q nk .

Then f (0)+f (1)+...+f (n) =n C0q n +n C1pq n1 +n C2p2q n2 +


... +n Ck pk q nk + ... +n Cnpn = (p + q)n = 1n = 1, by the Binomial Theorem. X is said to have the binomial distribution,
written as B(n, p).
27

Example:

Afair coin is tossed six times. Success is defined

to be heads. Here n = 6, p = 21 , q = 12 . Find the probability of


getting
(i)

exactly two heads,

(ii)

at least four heads,

(iii)

at least one head.

(i)

f (2) = 6C2( 21 )2( 12 )4 =

(ii)

f (4)+f (5)+f (6) = 6C4( 12 )4( 12 )2+ 6C5( 12 )5( 12 )1+ 6C6( 12 )6 =

15
64 .

22
64 .

(iii)

1 f (0) = 1 ( 12 )6 =

Example:

63
64 .

The probability of hitting a target at any time

is 13 . If we take seven shots, what is the probability of


(i)

exactly three hits?

(ii)

at least one hit?

(i)

f (3) = 7C3( 31 )3( 23 )4 = 0 26.

(ii)

1 ( 32 )7 = 0 94.

Example:

Find the number of dice that must be thrown

such that there is a better than even chance of getting at least


28

one six.
Here p = 16 , q = 65 . Let n be the required number. Then
1

( 56 )n

>

1
2,

so

( 65 )n

<

1
2

or

n ln( 56 )

<

ln( 12 )

i.e. n <

ln( 12 )
.
ln( 56 )

Hence n = 4.
Example:

If 20% of the bolts produced by a machine are

defective, find the probability that out of 10 bolts chosen at


random,
(i)

none

(ii)

one

(iii)

greater than two

bolts will be defective. Here n = 10, p = 2, q = 8.


(i) f (0) = (8)10,
10

C1(2)1(8)9,

(ii)

f (1) =

(iii)

1 (f (0) + f (1) + f (2)) = 1 ((8)10+ 10C1(2)1(8)9+

10

C2(2)2(8)8).

29

Continuous Random Variables


Suppose that X is a R.V. on a sample space S whose range RX
is an interval in R or all of R. Then X is called a continuous
R.V. If there exists a piece-wise continuous function
Rb
f : R R such that P (a X b) = f (x)dx for any
a

a < b, where a, b S, then f is called the probability distribution function (p.d.f.) or density function for X.
R
f must satisfy f (x) 0 for all x and
f (x)dx = 1. Note

that P (X = a) = P (a X a) =
define E(X) =
R

Ra

f (x)dx = 0. We

xf (x)dx and var(X) = E((X )2) =

(x )2f (x)dx, where = E(X). As for the finite case,

we can easily show that var(X) = E(X 2) (E(X))2 =


R 2
x f (x)dx 2. Again 2 = var(X).

The cumulative distribution function (c.d.f.) of X is defined


Rx
by F (x) =
f (t)dt. Then P (a x b) = F (b) F (a).

Note that The Fundamental Theorem of Calculus implies that


0

F (x) = f (x) for all x.


30

Example:

X is a R.V. with p.d.f.

x, 0 x < 2
2
f (x) =

0, x < 0, 2 < x

P (1 X 1 5) =

15
R
1

R2
0

x2
2 dx

4
3

x
2 dx

5
16 .

E(X) =

xf (x)dx =

and var(X) = E(X ) (E(X)) =

( 43 )2 = 29 , so =

2
3 .

If x < 0, then F (x) =

R2

Rx

t
2 dt

If 0 x 2, then F (x) =
If 2 < x, then F (x) =

Rx

x3
2 dx

= 0.
t
2 dt

x
R t
2 dt

Rx

0
2
R
0

t
2 dt

t
2 dt

x2
4.

= 1, as we would

expect!

Recall the idea of a grouped frequency distribution:


Example The heights of 1,000 people.

62 63 means 62 h 6331 etc. The relative frequencies

are

50
1,000

P
= 05 etc.
(rel. freqs.) = 1. The graph is now a

histogram:

The area of each rectangle represents the rel. freq. of that


group. The proportion of people of height less than any number is the sum of the areas up to that number. Joining the
midpoints of the tops of the rectangles gives a bell-shaped
curve. For large populations the (relative) frequency function
approximates to such a curve.
The most important continuous random variable is the normal
distribution whose p.d.f. has a bell shape.
Definition

A R.V. is said to be normally distributed if its

p.d.f has the form f (x) =

1 x 2
1 e 2 ( ) .
2

We say that X is

N (, 2) if this f (x) is its p.d.f. Note that f (x) is symmetric


32

about x = and the bigger the the wider the graph of f (x)
is.

The following theorem shows that this f (x) gives a well-defined


p.d.f.
R

Theorem: (i)
(ii)

x2
2

2
12 ( x
)

dx =

dx =

2.

2.

(iii)

If X is N (, 2), then E(X) = and var(X) = 2.

Proof: See tutorial 3.


2

Suppose X is N (, ). Its c.d.f. is F (x) =


and P (a X b) = F (b) F (a) =

1
2

1
2

Rb

Rx

1 v )2

e 2 (

dv

1 v )2

e 2 (

dv.

These integrals cant be found analytically, so they are tabulated numerically. This would have to be done for all values of
33

and . We do so for the case = 0 and = 1 and then use


the standardized normal R.V. Z =

X
.

For = 0, = 1 we write Z for the R.V. and z for the real


variable. We denote the p.d.f of Z by (z) =
Rz 1 u2
1
c.d.f. by (z) = 2
e 2 du.

Consider F (x) =
u =

v
; dv

v = x, u =

1
2

Rx

1 v )2

e 2 (

1 2
1 e 2 z
2

and its

dv, the c.d.f. of X. Let

= du; as v , u and when

x
.

Therefore F (x) =

1
2

1 2

e 2 u du = ( x
).

a
Hence P (a X b) = F (b) F (a) = ( b
) ( ).

From the tables we have: P ( X + ) =


(1) (1) = P (1 Z 1) = 682,
P ( 2 X + 2) =
(2) (2) = P (2 Z 2) = 954, and
P ( 3 X + 3) =
(3) (3) = P (3 Z 3) = 997, etc.

34

The tables give values of (z) for 0 z 3, in steps of 0 01.


(z) = P ( < Z z) and (0) = P ( < Z 0) = 5.
Then (z) = 1 (z) for z < 0.

Suppose that a < b.


(i)

0 a < b.

P (a Z b) = (b) (a).
(ii)

a < 0 < b.

P (a Z b) = (b) (a)
= (b) (1 (a))
= (b) + (a) 1.
(iii)

a < b < 0.

P (a Z b) = (b) (a)
= 1 (b) (1 (a))
= (a) (b).
Example:

For N (0, 1) find


35

(i) P (Z 2 44)
(ii)

P (Z 1 16)

(iii)

P (Z 1)

(iv)

P (2 Z 10)

(i) P (Z 2 44) = (2 44) = 9927.


P (Z 1 16) = 1 P (Z 1 16) =

(ii)

1 (1 16) = 1 877 = 123.


P (Z 1) = 1 P (Z < 1) = 1 (1) = 1 8413 =

(iii)
1587.
(iv)

P (2 Z 10) = (10) (2) = 1 9772 = 0228.

Example:

For N (0, 1) find c if

(i)

P (Z c) = 10%

(ii)

P (Z c) = 5%

(iii)

P (0 Z c) = 45%

(iv)

P (c Z c) = 99%.

(i)

P (Z c) = 1 P (Z c) = 1 (c), so 1 (c) = 1

or (c) = 9, giving c = 1 282.


(ii)

P (Z c) = (c), so (c) = 05, giving c = 1 645.


36

(iii) P (0 Z c) = (c) 5, so (c) = 95, giving


c = 1 645.
(iv)

P (c Z c) = (c) (c) = 2(c) 1, so

2(c) = 1 99 or (c) = 995, giving c = 2 576.


Example:

For X = N (8, 4) find

(i) P (X 2 44)
(ii)

P (X 1 16)

(iii)

P (X 1)

(iv)

P (2 X 10)

(i)
(ii)

P (X 244) = F (244) = ( 2448


) = (82) = 7939.
2
P (X 1 16) = F (1 16) = ( 1168
)=
2

( 98) = 1635.
(iii)

P (X 1) = 1P (X 1) = 1F (1) = 1( 18
2 ) =

1 (1) = 4602.
(iv)

28
P (2 X 10) = F (10)F (2) = ( 108
2 )( 2 ) =

(4 6) (6) = 2743.
Example:

Assume that the distance an athlete throws a

shotputt is a normal R.V. X with mean 17m and standard


37

deviation 2m. Find


(i)

the probability that the athlete throws a distance greater

than 18 5m,
(ii)

the distance d that the throw will exceed with 90% prob-

ability.
(i)

P (X > 18 5) = 1 P (X 18 5) = 1 F (18 5) =

) = 1 (75) = 2266.
1 ( 18517
2
(ii)

P (X > d) = 95, so 1 F (d) = 95 or F (d) = 05.

Hence ( d17
2 ) = 05, so
Example:

d17
2

= 1 645 giving d = 13 88m.

The average life of a stove is 15 years with stan-

dard deviation 2 5 years. Assuming that the lifetime X of the


stoves is nomally distributed find
(i)

The percentage of stoves that will last only 10 years or

less,
(ii)

The percentage of stoves that will last between 16 and

20 years.
(i)

P (X 10) = F (10) = ( 1015


25 ) = (2) = 1 (2) =

0228 = 2 28%.
38

(ii)

P (16 X 20) = F (20) F (16) = ( 2015


25 )

( 1615
25 ) = (2) (4) = 3218.

39

Jointly Distributed Random Variables


Let X, Y be finite R.V.s on the same sample space S with probability function P. Let the range of X be RX = {x1, x2, ..., xn}
and the range of Y be RY = {y1, y2, ..., ym} respectively.
Consider the pair (X, Y ) defined on S by (X, Y )(s) = (X(s), Y (s)).
Then (X, Y ) is a R.V. on S with range RX RY =
{(x1, y1), ..., (x1, ym), (x2, y1), ..., (x2, ym), ..., (xn, y1), ..., (xn, ym)}.
We sometimes call (X, Y ) a vector R.V.
Let Ai = {s S | X(s) = xi} = {X = xi} and
Bj = {s S | Y (s) = yj } = {Y = yj }. We write
Ai Bj = {X = xi, Y = yj }. Define a function
h : RX RY R by h(xi, yj ) = P (Ai Bj )
= P (X = xi, Y = yj ). Then h(xi, yj ) 0 and

h(xi, yj ) =

i,j

1, since the Ai Bj form a partition of S. h is called the joint


probability distribution function of (X, Y ) associated with the
probability function P and X, Y are said to be jointly distributed. Suppose that f and g are the p.d.f.s of X and Y
respectively. What is the connection between f, g and h?
40

m
m
S = m
j=1 Bj , so Ai = Ai S = Ai (j=1 Bj ) = j=1 (Ai Bj ),

disjoint. Therefore f (xi) = P (Ai) = P (m


j=1 (Ai Bj )) =
m
P

P (Ai Bj ) =

j=1

m
P

h(xi, yj ). Similarly g(yj ) =

j=1

n
P

h(xi, yj ).

i=1

f and g are sometimes called the marginal distributions of h.


We often write the joint distribution in a table:

Example:

Throw a pair of dice. LetX(a, b) = max{a, b}

and Y (a, b) = a + b.

41

Definition:

X and Y are independent if h(xi, yj ) = f (xi)g(yj )

for all i, j.
This means that P (X = xi, Y = yj ) = P (X = xi)P (Y = yj )
or P (Ai Bj ) = P (Ai)P (Bj ) for all i, j.
Note that in the above example X and Y are not independent.
If G : R2 R, then we define a R.V.

Definition:

G(X, Y ) on S by G(X, Y )(s) = G(X(s), Y (s)) = G(xi, yj )


with p.d.f. h.
We now define the expectation and variance of G(X, Y ) as
E(G(X, Y )) =

G(xi, yj )h(xi, yj ) and

i,j

var(G(X, Y )) = E(G(X, Y )2) (E(G(X, Y ))2.


Example:

G(x, y) = x + y. Then (X + Y )(s) = X(s) +

P
Y (s) = xi + yj and E(X + Y ) = (xi + yj )h(xi, yj ), etc.
i,j

Theorem: (i)

For any R.V.s X and Y we have E(X+Y ) =

E(X) + E(Y ).
(ii)

If X and Y are independent, then var(X + Y ) =

var(X) + var(Y ). (This is not true in general).


Proof: (i)

E(X + Y ) =

P
(xi + yj )h(xi, yj )
i,j
42

PP

xih(xi, yj ) +

xi

h(xi, yj )+

PP
j

yj

yj h(xi, yj )

h(xi, yj ) =

xif (xi)+

yj g(yj )

= E(X) + E(Y ).
(ii)

First we show that E(XY ) = E(X)E(Y ).

E(XY ) =

xiyj h(xi, yj ) =

i,j

xiyj f (xi)g(yj )

i,j

P
P
= ( xif (xi))( yj g(yj )) = E(X)E(Y ).
i

Now var(X + Y ) = E((X + Y )2) (E(X + Y ))2


= E(X 2 + 2XY + Y 2) (E(X) + E(Y ))2 = E(X 2) +
2E(X)E(Y ) + E(Y 2) (E(X))2 (E(Y ))2 2E(X)E(Y )
= E(X 2) (E(X))2 + E(Y 2) (E(Y ))2 = var(X) + var(Y ).
Important Example:

Consider the Binomial Distribu-

tion B(n, p). The sample space is S S ... S, n times,


where S = {s, f }. For 1 trial, n = 1 define the R.V. X by
X(s) = 1 and X(f ) = 0. Then E(X) = 1 p + 0 q = p
and var(X) = E(X 2) (E(X))2 = p p2 = p(1 p) = pq.
For n trials define X1(a1, a2, ..., an) = X(a1), ...,
Xn(a1, a2, ..., an) = X(an), so that Xi is 1 if s is in the ith
place and 0 if f is in the ith place for all 1 i n.
43

Then E(Xi) = E(X) = p and var(Xi) = var(X) = pq.


Now Let Y = X1 + X2 + ... + Xn, so that Y gives the total
number of successes in the n trials. Then E(Y ) = E(X1) +
E(X2) + ... + E(Xn) = p + p + ... + p = np and var(Y ) =
var(X1) + var(X2) + ... + var(Xn) = npq.

44

Sampling Theory
Suppose that we have an infinite or very large finite sample
space S. This sample space is often called a population. Getting information about the total population may be difficult,
so we consider much smaller subsets of the population, called
samples. We want to get information about the population
by studying the samples. We consider the samples to be random samples i.e. each element of the population has the same
probability of being in a sample.
Example:

Consider the population of Ireland. Pick a per-

son at random and consider the age of this person. Do this


n times. This gives a random sample of size n of the ages of
people in Ireland.
Mathematically the situation is described in the following way:
Let X be a random variable on a sample space S with probability function P and let f (x) be the probability distribution
function of X. Consider the sample space = S S ...S,
(n times) with the product probability function P i.e.
45

P(A1 A2 ... An) = P (A1)P (A2)...P (An).


For each 1 i n define a random variable Xi on by
Xi(s1, s2, ..., sn) = X(si) where (s1, s2, ..., sn) . Then the
probability distribution function of Xi is also f (x) for each
i. The vector random variable (X1, X2, ..., Xn) defined by
(X1, X2, ..., Xn)(s1, s2, ..., sn) = (X1(s1), X2(s2), ..., Xn(sn)) =
(x1, x2, ..., xn) is a random variable on with joint distribution
P(X1 = x1, X2 = x2, ..., Xn = xn) = f (x1)f (x2)...f (xn).
Choosing a sample is simply applying the vector random variable (X1, X2, ..., Xn) to to get a random sample (x1, x2, ..., xn).
Each Xi has the same mean and variance 2 as X and
they are independent, by definition. They are called independent identically distributed random variables (i.i.d.). Functions of the X1, X2, ..., Xn and numbers associated with them
are called statistics, while functions of the original X and associated numbers are called parameters. Our task is to get
information about the parameters by studying the statistics.
The mean and variance 2 of X are called the population
46

mean and variance. We define two important statistics, the


sample mean and sample variance:
Definition:

Sample mean X =
2

sample variance S =

n
P
(Xi X)2
n1

Then X(s1, s2, ..., sn) =


2

S (s1, s2, ..., sn) =

x1 +x2 +...+xn
n

n
P
(xi x)2
n1

X1 +X2 +...+Xn
,
n

= x = S and

We have
Theorem: (i) Expection of X = E(X) = , the population mean,
(ii)

2
Variance of X = X
=

2
n,

the population variance over

n.
Proof: (i) E(X) = E( X1+X2n+...+Xn ) =
++...+
n

(ii)

n
n

E(X1 )+E(X2 )+...+E(Xn )


n

= .

2
X
= var(X) = var( X1+X2n+...+Xn ) = var( Xn1 )+var( Xn2 )+

... + var( Xnn ) =

var(X1 )
n2

var(Xn )

2)
+ var(X
+
=
n
=
2
2
n
n
n2

2
n.

The reason for the n 1 instead of n in the definition of S 2 is


given by the following result.

47

Theorem:

E(S 2) = 2, the population variance.

Proof:

E(S ) = E(

n
P
(Xi X)2
n1

)=

1
n1 E(

n
P
2
1
2
(X

2X
X
+
X
)) =
E(
i
i
n1
1
n1 [
1
n1 [

n
P
1

n
P
1

E(Xi2)

2E((

n
P

1
2

E(Xi2) 2E((nX)(X)) + nE(X )] =


2

+ 2) 2nE(X ) + nE(X )] =

1
2
n1 [n(

+ 2) nE(X )] =

Note:

Xi)X) + nE(X )] =

1
2
n1 [n(

1
n1 [(n

n
P
(Xi X)2) =

1
2
n1 [n(

+ 2) n( n + 2)] =

1) 2)] = 2.
If the mean or expectation of a statistic is equal to

the corresponding parameter, the statistic is called an unbiased estimator of the parameter. Hence X and S 2 are unbiased estimators of and 2 respectively. An estimate of
a population parameter given by a single number is called a
point estimate e.g. if we take a sample of size n and calculate
S =

x1 +x2 +...+xn
n

and S =

n
P
(xi x)2
n1

, then these are unbiased

point estimates of and 2 respectively. We shall, however,


concentrate on interval estimates, where the parameter lies
within some interval, called a confidence interval.
48

Confidence Intervals for


Suppose we have n i.i.d. random variables X1, X2, ..., Xn with
E(Xi) = and var(Xi) = 2 for each 1 i n. Then if
X=

X1 +X2 +...+Xn
,
n

we get E(X) = and var(X) =

2
n.

As before, X1, X2, ..., Xn are jointly distrbuted random variables defined on the product sample space. We have the very
important result:
Central Limit Theorem:

For large n ( 30) the prob-

ability distribution of X is approximately normal with mean


and variance

2
n

i.e. N (, n ) or, in other words,

is

approximately N (0, 1). The larger the n the better the approximation.
Note:

X1, X2, ..., Xn or X need not be normal. If, however,

they are normal, then

is approximately N (0, 1) for all

values of n.
Recall N (0, 1).

49

If we want P (z1 Z z1) = 95%, then (z1) = 97.5%, so


z1 = 1.96. We say that 1.96 Z 1.96 is a 95% confidence
interval for N (0, 1).
Hence P (1.96

1.96) = 95%

P (1.96 n X 1.96 n ) = 95%


P (1.96 n X 1.96 n X) = 95%
P (1.96 n + X 1.96 n + X) = 95%
P (X 1.96 n X + 1.96 n ) = 95%.
If we know this gives us a 95% confidence interval for i.e.
given any random sample there is a 95% probability that lies
within the above interval or we can say with 95% confidence
that is between the two limits of the interval. Put another
way, 95% of samples will have in the above interval.
Example:

A sample of size 100 is taken from a population

with unknown mean and variance 9. Determine a 95% confidence interval for if the sample mean is 5.
Here X = 5, = 3 and n = 100.
P (X 1.96 n X + 1.96 n ) = 95%
50

3
3
P (5 1.96 10
5 + 1.96 10
) = 95%

P (4.412 5.588) = 95%.


Example:

A sample of size 80 is taken from the workers

in a very large company. The average wage of the sample of


workers is 25,000 euro. If the standard deviation of the whole
company is 1,000 euro, construct a confidence interval for the
mean wage in the company at the 95% level.
Here X = 25, 000, = 1, 000 and n = 80.
P (X 1.96 n X + 1.96 n ) = 95%
P (25, 000 1.96

1,000

80

25, 000 + 1.96

1,000
)
80

95% P (24, 781 25, 219) = 95%.


We can have different confidence intervals:
Let be a small percentage ( 5% above). Then
P (X z 2

X + z 2

)
n

= 1 gives an

confidence interval, where (z 2 ) = 1 2 .

51

Note:

For a 95% interval we have (z 2 ) = .975, so z 2 =

1.96 and for a 99% interval (z 2 ) = .995, so z 2 = 2.575.


Example:

Determine a 99% confidence interval for the

mean of a normal population if the population variance is 2 =


4.84, using the sample 28, 24, 31, 27, 22.
(Note that we need normality here since the sample size < 30).

X = 28+24+31+27+22
=
26.4
and

=
4.84 = 2.2. Then
5

P (26.4 2.5752.2
26.4 2.5752.2
) = .99
5
5

P (23.867 28.933) = .99.


Example:

If we have a normally distributed population

with 2 = 9, how large must a sample be if the 95% confidence


interval has length at most 0.4?
In general the length of the confidence interval is
(X + z 2

)
n

(X z 2

)
n

= 2z 2

.
n

For the 95%

3
n

.4, which

confidence interval Z 2 = 1.96, so 2 1.96


gives

21.963
.4

or n = 865.

In all the previous examples we knew 2, the population variance. If that is not so and n 30 we can use S 2 as a point
52

estimate for 2 and assume that X


is approximately N (0, 1).
S
n

Example:

A watch-making company wants to investigate

the average life of its watches. In a random sample of 121


watches it is found that X = 14.5 years and S = 2 years.
Construct a (i) 95%, (ii) 99% confidence interval for .
(i)

2
2
14.5 + 1.96 11

14.5 1.96 11

14.14 14.86.
(ii)

2
2
14.5 2.575 11
14.5 + 2.575 11

14.03 14.97.
Note that the greater the confidence the greater the interval.
If n is small (< 30) this is not very accurate, even if the original
X is normal. In this case we must use the following:
Theorem:

If X1, X2, ..., Xn are independent normally dis-

tributed random variables, each with mean and variance 2,


then the random variable

X
S
n

is a t-distribution with n 1

degrees of freedom.
We denote the number of degrees of freedom n 1 by . For
each the t-distribution is a symmetric bell-shaped distribu53

tion.

For = we get the standard normal distribution N (0, 1).


The statistical tables usually read P (|T | > k) for each .

Example:

= 5. P (|T | > k) = 0.01 Then k = 2.015, so

P (2.015 T 2.015) = 99%.

Example:

A certain population is normal with unknown

mean and variance. A sample of size 20 is taken. The sample


mean is 15.5 and the sample variance is 0.09. Obtain a 99%
confidence interval for , the population mean.
54

Since n = 20 < 30 we must use the t-distribution with =


n 1 = 19. We have X = 15.5 and S 2 = 0.09. For = 19 we
have P (|T | > k) = 0.01 giving k = 2.861.
Now

X
S
20

is a t-distribution with 19 degrees of freedom, so

P (2.861

15.5
0.3

20

2.861) = 99%

P (15.5 2.8610.3
15.5 + 2.8610.3
) = 99%
20
20

P (15.308 15.692) = 99%.


Example:

Five independent measurements, in degrees F,

of the flashpoint of diesel oil gave the reults 144, 147, 146, 142, 144.
Assuming normality, determine a (i) 95%, (ii)99% confidence
interval for the mean flashpoint.
Since n < 30 we must apply the t-distribution. n = 5, so
= 4. We have X =
S2 =
(i)

144+147+146+142+144
5

(.6)2 +(2.4)2 +(1.4)2 +(2.6)2 +(.6)2


5

= 144.6. Also

= 3.8, so S = 1.949.

2.7761.949

P (144.6 2.7761.949

144.6+
) = 95%
5
5

P (142.18 147.02) = 95%.


(ii)

P (144.6

4.6041.949

144.6 +

99% P (140.59 148.61) = 99%.


55

4.6041.949

)
5

Hypothesis Testing
Suppose that a claim is made about some parameter of a population, in our case always the population mean . This claim
is called the null hypothesis and is denoted by H0. Any claim
that differs from this is called an alternative hypothesis, denoted by H1. We must test H0 against H1.
Example:

H0 : = 90.

Possible alternatives are


H1 : 6= 90
H1 : > 90
H1 : < 90
H1 : = 95.
We must decide whether to accept or reject H0. If we reject
H0 when it is in fact true we commit what is called a type I
error and if we accept H0 when it is in fact false we commit a
typeII error. The maximum probability with which we would
be willing to risk a type I error is called the level of significance
of the test, usually 10%, 5% or 1%. We perform a hypothesis
56

test by taking a random sample from the population.


Suppose that we are given H0 : = 0, some fixed value.
(i)

We suspect that 6= 0. This is our H1.

We take a random sample X. We might have X > 0, X < 0


or X = 0. Now if the mean is 0, then X is approximately
N (0, 2) or

X0

is approximately N (0, 1).

There is a 5% probability that X is in either of the end regions of N (0, 2) or, equivalently,
regions of N (0, 1). If our

X0

X0

is in either of the end

is in this rejection region

we reject H0 at th 5% significance level. Otherwise we do not


reject H0. This is called a two-tailed test.
(ii)

We suspect that > 0. This is our H1.

Our X is now > 0 (this is why we suspect that > 0.) We


only check for probability on the right-hand side.

57

Again if the mean is 0, then if our

X0

is in this rejection

region we reject H0 at th 5% significance level. Otherwise we


do not reject H0. This is called a one-tailed test.

Note that a bigger 0 may push X into the non-rejection region.

(iii)

We suspect that < 0. This is our H1.

This is the same as (ii) but on the left.

58

Example:

A battery company claims that its batteries

have an average life of 1,000 hours. In a sample of 100 batteries it was found that X = 985 hours and S = 30 hours.
Test the hypothesis H0 : = 1, 000 hours against the alternative hypothesis H1 : 6= 1, 000 hours at the 5% significance
level, assuming that the lifetime of the batteries is normally
distributed.
n = 100 > 30 so we can take S for . If = 1, 000, then

X
S
n

is approximately N (0, 1). We are interested in extreme values


of X on both sides of = 1, 000 so we use a two-tailed test.
Values of

X
S
n

will be between -1.96 and 1.96 95% of the time.

For our sample

X
S
n

9851,000
30
100

= 5, which is (deep) in the

rejection region. So we reject H0 at the 5% significance level.


There is a 5% probability of a type I error.

59

Example: A researcher claims that 10 year old children


watch 6.6 hours of television daily. In a sample of 100 it was
found that X = 6.1 hours and S = 2.5 hours. Test the hypothesis H0 : = 6.6 hours against the alternative H1 : 6= 6.6
hours at the (i) 5%, (ii)1% significance levels.
n = 100 > 30, so we can take S for .
Then

(i)

If = 6.6, then

S
n

is approximately N (0, 1).

probability 95%. But

is between -1.96 and 1.96 with

S
n

X
S
n

6.16.6
2.5
10

= 2 is in the rejection

region. We reject H0 at the 5% level.

(ii)

If = 6.6, then

X
S
n

is between -2.575 and 2.575 with

probability 99%. But, as above,

X
S
n

6.16.6
2.5
10

= 2, which

now is in the non-rejection region. We do not reject H0 at the


1% level.

60

Example: A manufacturer produces bulbs that are supposed to burn with a mean life of at least 3,000 hours. The
standard deviation of 500 hours. A sample of 100 bulbs is
taken and the sample mean is found to be 2,800 hours. Test
the hypothesis H0 : 3, 000 hours against the alternative
H1 : < 1, 000 hours at the 5% significance level.
In this case if our X value is greater than 3,000 we do not reject
it since it agrees with H0, so we are only interested in extreme
values on the left. We use a one-tailed test. Again
is approximately N (0, 1) and

1.645 with a proba-

bility of 95%. But X = 2, 800, n = 100 and = 500, so


X

2,8003,000
500
10

= 4. Hence we reject H0 at the 95% level.

We also need to use the t-distribution.


Example: We need to buy a length of a certain type of wire.
The manufacturer claims that the wire has a mean breaking
61

limit of 200 kg or more. We suspect that the mean is less. We


have H0 : 200 and H1 : < 200. We take a random
sample of 25 rolls of wire and find that X = 197 kg and
S = 6 kg. Test H1 against H0 at the 5% level, assuming the
breaking limit of the wire is normally distributed.
Here n = 25 < 30, so we must use a t-distribution with = 24.
If the mean is , then

X
S
n

is at-distribution with 24 degrees

of freedom.

P (|T | > 1.711) = 10% and

X
S
n

197200

in the rejection region. We reject H0.

62

6
25

= 2.5, which is