20 tayangan

Diunggah oleh Jaspreet_46

probability

- An Empirical Analysis of Factors Affecting Work Life Balance Among University Teachers the Case of Pakistan
- Oliveira 2010
- ch09a.pdf
- MQM100_MultipleChoice_Chapter10
- 1 Research Practices That Can Prevent an Inflation of False-Positive Rates
- A Study on Job Satisfaction in Rane Engine Valve Ltd
- Hypothesis
- Research Hypothesis
- out.pdf
- Stat171_11_2015_1
- Lakme Final 1
- termproject part2
- Math 104 Midterm
- Hypothesis Testing
- Small Sampling Theory Presentation
- Roc-statistics-primer Better Graphics Etc Rewritten for 2007IEEEconf-Submission Working
- Neyman Pearson
- To Study the Impact of Multi Vendor Outsourcing on the Performanc
- Chapter9_Lecture3
- Practical Files 2018

Anda di halaman 1dari 62

on

Probability and Statistics

hOg

ain

Joe O

E-mail: johog@maths.tcd.ie

Main Text: Kreyszig; Advanced Engineering Mathematics

Other Texts: Schaum Series, Robert B. Ash, Hayter

Online Notes:

tures

Probability Function

Definition:

outcomes.

Definition:

Toss a coin: S = {H, T }

Examples:

Toss a coin twice: S = {HH, HT, T H, T T }.

Toss a coin until H appears and count the number of times it

is tossed: S = {1, 2, 3, ..., }, where means that H never

appears.

Definition:

Let P(S) be the set of all events in S i.e. the collection of all

subsets of S.

Definition:

(i)

P (S) = 1

2

for all i 6= j.

Theorem 1:

Proof:

P () = 0.

Theorem 2:

Proof:

P (Ac) = 1 P (A).

P (S) + P (Ac).

Theorem 3:

Proof:

Theorem 4:

Proof:

AB P (A) P (B).

and P (B A) 0.

An event containing one element is a singleton. If S contains

n elements x1, x2, ..., xn say, and each one has the same probability p of occuring, then 1 = P (S) = P ({x1, x2, ..., xn}) =

P ({x1}{x2}...{xn}}) = P ({x1})+P ({x2})+...+P ({xn}) =

p + p + ... + p = np, so p = n1 . Then, for any AS we have

P (A) =

|A|

|S| ,

to check that P gives a probability function on S.

Example:

12

52 , P (AB)

3

52

and P (AB) =

13

52

12

52

3

52

13

52 , P (B)

22

52 ,

using

In general singletons need not be equiprobable. Then let P (xi) =

pi for 1 i n. (We write P (xi) for P ({xi}) for convenience). We have P (A) =

n

P

xi A

i=1

tion table

4

the table defines a probability function on S.

Example: Three horses A,B,C race against each other. A is

twice as likely to win as B and B is twice as likely to win as C.

Assuming no deadheats find P (A), P (B), P (C) and P (AC).

Let P (C) = p. Then P (B) = 2p, P (A) = 4p. Hence p + 2p +

4p = 1, so 7p = 1 or p = 71 . Then P (C) = 17 , P (B) =

P (A) = 47 . Also P (BC) = P (B) + P (C) = 73 .

2

7

and

In this case S = {x1, x2, ..., xn, ...} with P (xi) = pi for all

i 1. Then 1 =

pi .

i=1

Example:

that H never appears. Set P (1) = 21 , P (2) = 14 , ..., P (n) =

1

2n

Let A = {1, 2, 3}. Then P (A) =

1

2

+ 14 +

1

8

7

8

= the proba-

If B = {2, 4, 6, ...}, then the probability of H on an even throw

is P (B) =

1

22

1

24

1

26

+ ... =

1

22

1 12

2

1 1

1

2 + 22 + 24 +...

of H on an odd throw is 23 .

= 13 .

1

2

1 21

Conditional Probability

Let A, E be events in S with P (E) 6= 0.

Definition: The conditional probability of A given E is defined by P (A|E) =

Example:

P (AE)

P (E)

|AE|

|S

|E|

|S|

Example:

P (AE)

P (E) .

|AE|

|E| .

Pair of dice. S = {(1, 1), (1, 2), ..., (6, 6)}. Find

the probability that one die shows 2 given that the sum is

6. Let A be the probability that one die is 2 and E be the

probability that the sum is 6. Then P (A|E) = 52 .

Note that, in general, P (A E) = P (E)P (A|E).

Example:

and P (A B) = 2. Then

P (A|B) =

2

3 , P (B|A)

2

6 , P (A

B) = 6 + 3 2 =

7, P (Ac) = 1 6 = 4, P (B c) = 1 3 = 7, P (Ac B c) =

3

c c

P ((A B)c) = 1 7 = 3, P (Ac|B c) = 3

7 , P (B |A ) = 4 .

Example:

7

Find the probability that

(i)

(ii)

time.

The sample space may be taken as {(D, A), (D, Ac), (Dc, A), (Dc, Ac)}.

Let B = {(D, A), (D, Ac)} and let C = {(D, A), (Dc, A))}.

Then P (B) = 8, P (C) = 9 and

78

8 ,

(i)

P (C|B) =

(ii)

P (C c|B c) =

P (C c B c )

P (B c )

P (CB)c

P (B c )

1P (CB)

1P (B)

1(8+9+78)

.

18

i 6= j. We say that the Ai are mutually exclusive and form a

partition of S. Let E S. Then E = E S = E (A1 A2

... An) = (E A1) (E A2) ... (E An), disjoint, So

P (E) = P (EA1)+P (EA2)+...P (EAn) =

n

P

P (EAi).

i=1

P (E) =

n

P

i=1

8

P (Aj E)

p(E)

P (EAj )

p(E)

P (Aj )P (E|Aj )

P (E)

P (Aj )P (E|Aj )

n

P

P (Ai )P (E|Ai )

i=1

Example: Three machines X, Y, Z produce items.X produces 50%, 3% of which are defective. Y produces 30%, 4%

of which are defective and Z produces 20%, 5% of which are

defective. Let D be the event that an item is defective. Let

an item be chosen at random.

(i)

(ii)

Given that it is defective, find the probability that it

Let A1 be the event consisting of elements of X, let A2 be

the event consisting of elements of Y and let A3 be the event

consisting of elements of Z. Then P (A1) = 5, P (A2) = 3

and P (A3) = 2. Also P (D|A1) = 03, P (D|A2) = 04 and

P (D|A3) = 05.

(i)

(ii)

P (A1|D) =

P (A1 )P (D|A1 )

P (D)

9

(5)(03)

037

= 405 = 40 5%.

48 of the nurses got a pay rise. At the beginning of the year

the hospital offered a training seminar which was attended by

138 of the nurses. 27 of the nurses who got a pay rise attended

the seminar. What is the probability that a nurse who got a

pay rise attended the seminar?

Let A be the event consisting of nurses who attended the seminar and Let B be the event consisting of nurses who got a

pay rise. Then P (A) =

27

c

138 , P (B |A)

138

300

and P (Ac) =

111

c

138 , P (B|A )

21

162

162

300 .

Also P (B|A) =

and P (B c|Ac) =

141

300 .

138 27

21

Therefore P (B) = ( 300

)( 138 ) + ( 162

300 )( 162 ) =

48

300 ,

P (A)P (B|A)

P (B)

138 )( 27 )

( 300

138

48

300

27

48 .

10

which is

=

Exercise:

45% of conservative, 40% of Liberal and 60% of Independent

voted. A person is selected at random. Find the probability

that the person voted. If the person voted, find the probability

that the voter is (i) Con. (ii) lib. and (iii) Ind.

11

Independent Events

Definition

P (A B) = P (A)P (B).

If A and B are independent then P (A|B) =

P (AB)

P (B)

P (A)P (B)

P (B)

as the probability of A. The converse is obviously true.

Note: A, B are mutually exclusive if and only if A B = ,

if and only if P (A B) = 0, hence A, B are not independent

unless either P (A) = 0 or P (B) = 0.

Example:

P (A) =

13

52 , P (B)

12

52

and P (AB) =

3

52 .

Hence P (AB) =

Example:

Let A be the event where the first toss is heads, B be the event

where the second toss is heads and let C be the event where

there are exactly two heads in a row. Then P (A) = 84 , P (B) =

12

4

8 , P (C)

= 82 , P (A B) = 28 , P (A C) = 18 , P (B C) = 28 .

are not independent.

Example:

are independent. What is the probability that either A or B

hits the target?

P (A B) = P (A) + P (B) P (A B) = P (A) + P (B)

P (A)P (B) = 14 + 52 14 25 =

Exercise:

11

20 .

13

Let S = {a1, a2, ..., as} and T = {b1, b2, ..., bt} be the sample

spaces for two experiments. LetPS (ai) = pi and PT (bj ) = qj

for all 1 i s and 1 j t, where PS and PT are the

probability functions on S and T respectively.

Let S T = {(ai, bj )|ai S, bj T }. Define a function P on

P(S T ) by P ({(ai, bj )}) = piqj and addition. Then P is a

probability function on S T :

(i)

(ii)

(iii)

Obvious by addition.

the only probability function on S T. We can extend this

definition to the product of any finite number of sample spaces.

Suppose that S T has the product probability P .

Let A = {ai} T and B = S {bj }.

Then P (A) = P ({ai} T ) = PS (ai) PT (T ) = pi 1 = pi

14

A B = {(ai, bj )} so P (A B) = P {(ai, bj )} = piqj =

P (A)P (B) and hence A and B are independent. Similarly,

any two events of the form C T and S D are independent,

where C S and D T.

Conversely, suppose that P is a probability function on S T

such that P ({ai} T ) = pi and P (S {bj }) = qj for all

ai S and bj T and all sets of this form are independent.

Then P {(ai, bj )} = P (({ai} T ) (S {bj })) =

P ({ai} T ) P (S {bj }) = piqj , so that P must be the

product probability.

We deduce that the product probability is the unique probability on S T with these two independence properties.

Example:

and 16 . Suppose they race twice. Then, assuming independence,

the probability of C winning the first race and A winning the

second race is

1

6

21 =

1

12

etc.

15

of times. The sample space S S ... S consists of tuples.

If we assume that the experiments are independent, then the

probability function on this sample space is the product probability. If we do it n times we say that we have n independent

trials.

Example:

P (HT H) =

1

2

12

1

2

1

8

function as assuming that all triples are equiprobable, as before. Hence we can consider the problem in either of the two

ways.

16

Counting Techniques

Suppose we have n objects. How many permutations of size

1 r < n can be made?

n!

Using the Fundamental Principle of counting gives (nr)!

, which

is written nPr .

How many combinations of size 1 r < n can be made?

Let the answer be nCr . Each of these combinations gives r!

permutations, so nCr r! = nPr . Hence nCr =

nP

r!

n!

(nr)!r! .

we can use the formula for all 0 r n.

A lot of problems in finite probability can be done from first

principles i.e. using boxes.

Example:

the same birthday is greater than 21 ?

Let the answer be n. Then 1

365 P

n

(365)n

17

Random Variables

Definition:

The image or range of X is denoted by RX . If RX is finite

we say that X is a finite or finitely discrete R.V., if RX is

countably infinite we say that X is a countably infinite or

infinitely discrete R.V. and if RX is uncountable we say that

X is an uncountable or continuous R.V.

Example:

(i)

let Y : S R be the sum of the two numbers. Then RX =

{1, 2, 3, 4, 5, 6} and RY = {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}. X

and Y are finite R.V.s

(ii)

or (the number of T s) +1. Then RX = {1, 2, 3, ...}. X is a

countably infinite R.V.

(iii)

18

the point from the centre (0, 0). Then RX = [0, 1] and X is a

continuous R.V.

19

Let X : S RX be finite with RX = {x1, x2, ..., xn} say.

X induces a function f on RX by f (xk ) = P (X = xk ) =

p({s S|X(s) = xk }). f is called a probability distribution

function (p.d.f.). Note that f (xk ) 0 and

n

P

f (xk ) = 1.

k=1

x 6= x1, x2, ..., xn. We often write f using a table:

If we let f (xk ) =

fk

,

n

P

fi

i=1

Note:

20

Example:

Exercise:

table for X.

21

Definition:

Let

E(X) = x1f (x1) + x2f (x2) + ... + xnf (xn) =

n

P

xk f (xk ).

k=1

This is the same definition as the mean in the case of a frequency distribution.

Example:

1

3

E(X) = 1 36

+ 2 36

+ ... + 6 11

36 = 4 47,

2

1

1

+ 3 36

+ ... + 12 36

= 7.

E(Y ) = 2 36

We write X or simply , if there is no confusion, for E(X).

E(X) is the weighted average where the weights are the probabilities. We can apply this definition to games of chance:

a game of chance is an experiment with n outcomes, a1, a2, ..., an

and corresponding probabilities p1, p2, ..., pn. Suppose the payout for each ai is wi. We define a R.V. X by X(ai) = wi.

22

n

P

i=1

Example:

of euros. Should we play?

We have a distribution table

Then E(X) = 2 16 +3 61 +5 16 1 61 4 16 6 16 = 16 .

Dont play!

Definition X a finite R.V. The variance of X is defined by

var(X) = E((X)2) = (x1 )2f (x1)+(x2 )2f (x2)+...+

2

n

P

k=1

p

of X is defined to be X = var(X). If X is understood, we

(xn ) f (xn) =

just write .

Note:

=

n

P

var(X) = E((X ) ) = =

n

P

k=1

k=1

23

n

P

x2k f (xk )

n

P

xk f (xk ) +

k=1

2

k=1

n

P

f (xk )

k=1

Example:

2

E(X ) =

6

P

k=1

3

1

x2k f (xk ) = 12 36

+22 36

+...+62 11

36 = 2197.

2

E(Y ) =

11

P

k=1

1

2

1

yk2 f (yk ) = 22 36

+32 36

+...+122 36

= 548.

Theorem:

(i)

E(aX) = aE(X),

(ii)

E(X + b) = E(X) + b.

Proof: (i)

E(aX) =

n

P

axk f (xk ) = a

k=1

n

P

xk f (xk ) =

k=1

aE(X).

(ii)

E(X+b) =

n

P

k=1

n

P

xk f (xk )+b

k=1

n

P

k=1

E(X) + b.

Hence E(aX + b) = E(aX) + b = aE(X) + b.

Theorem:

(i)

var(aX) = a2var(X),

(ii)

var(X + b) = var(X).

24

f (xk ) =

Proof: (i)

a2var(X). We could also just use the definition of var(X).

var(X + b) = E((X + b)2) (E(X + b))2

(ii)

=

n

P

k=1

n

P

2b

(x2k

k=1

n

P

xk f (xk ) + b

k=1

n

P

2

n

P

x2k f (xk ) +

k=1

k=1

= E(X 2) (E(X))2 = var(X).

Hence var(aX + b) = var(aX) = a2var(X) and aX+b =

|a|X .

Definition:

defined as Z =

X

.

E(Z) = E( X

) =

1

var(X)

2

2

2

E(X)

= 0 and var(Z) =

= 1.

25

Definition:

P

xk x

of X.

Example:

Suppose X is defined by

Then F (2) = 14 , F (1) = 83 , F (2) = 78 , F (4) = 1.F (x) is obvious for all other x.

function since f (xk ) 0 for all xk .

Note:

26

One of the most important finite distributions is the binomial

distribution. Suppose we have an experiment with only two

possible outcomes, called success and failure i.e. S = {s, f }.

Let P (s) = p and P (f ) = q. Then p + q = 1. Each such

experiment is called a Bernouilli trial. Suppose we repeat

the experiment n times and assume that the trials are independent. The sample space for the n Bernouilli trials is

S S ...S and a typical elelment in the sample space looks

like (a1, a2, ..., an), where each ai = s or f . Define a R.V. X

on this sample space by X(a1, a2, ..., an) = the number of successes in (a1, a2, ..., an). Then RX = {0, 1, 2, ..., n}. The p.d.f.

of X, f (x), is given by f (0) = q n, f (1) = nC1pq n1, f (2) =

n

... +n Ck pk q nk + ... +n Cnpn = (p + q)n = 1n = 1, by the Binomial Theorem. X is said to have the binomial distribution,

written as B(n, p).

27

Example:

getting

(i)

(ii)

(iii)

(i)

(ii)

f (4)+f (5)+f (6) = 6C4( 12 )4( 12 )2+ 6C5( 12 )5( 12 )1+ 6C6( 12 )6 =

15

64 .

22

64 .

(iii)

1 f (0) = 1 ( 12 )6 =

Example:

63

64 .

(i)

(ii)

(i)

(ii)

1 ( 32 )7 = 0 94.

Example:

28

one six.

Here p = 16 , q = 65 . Let n be the required number. Then

1

( 56 )n

>

1

2,

so

( 65 )n

<

1

2

or

n ln( 56 )

<

ln( 12 )

i.e. n <

ln( 12 )

.

ln( 56 )

Hence n = 4.

Example:

random,

(i)

none

(ii)

one

(iii)

(i) f (0) = (8)10,

10

C1(2)1(8)9,

(ii)

f (1) =

(iii)

10

C2(2)2(8)8).

29

Suppose that X is a R.V. on a sample space S whose range RX

is an interval in R or all of R. Then X is called a continuous

R.V. If there exists a piece-wise continuous function

Rb

f : R R such that P (a X b) = f (x)dx for any

a

a < b, where a, b S, then f is called the probability distribution function (p.d.f.) or density function for X.

R

f must satisfy f (x) 0 for all x and

f (x)dx = 1. Note

that P (X = a) = P (a X a) =

define E(X) =

R

Ra

f (x)dx = 0. We

R 2

x f (x)dx 2. Again 2 = var(X).

Rx

by F (x) =

f (t)dt. Then P (a x b) = F (b) F (a).

0

30

Example:

x, 0 x < 2

2

f (x) =

0, x < 0, 2 < x

P (1 X 1 5) =

15

R

1

R2

0

x2

2 dx

4

3

x

2 dx

5

16 .

E(X) =

xf (x)dx =

( 43 )2 = 29 , so =

2

3 .

R2

Rx

t

2 dt

If 0 x 2, then F (x) =

If 2 < x, then F (x) =

Rx

x3

2 dx

= 0.

t

2 dt

x

R t

2 dt

Rx

0

2

R

0

t

2 dt

t

2 dt

x2

4.

= 1, as we would

expect!

Example The heights of 1,000 people.

are

50

1,000

P

= 05 etc.

(rel. freqs.) = 1. The graph is now a

histogram:

group. The proportion of people of height less than any number is the sum of the areas up to that number. Joining the

midpoints of the tops of the rectangles gives a bell-shaped

curve. For large populations the (relative) frequency function

approximates to such a curve.

The most important continuous random variable is the normal

distribution whose p.d.f. has a bell shape.

Definition

1 x 2

1 e 2 ( ) .

2

We say that X is

32

about x = and the bigger the the wider the graph of f (x)

is.

p.d.f.

R

Theorem: (i)

(ii)

x2

2

2

12 ( x

)

dx =

dx =

2.

2.

(iii)

2

and P (a X b) = F (b) F (a) =

1

2

1

2

Rb

Rx

1 v )2

e 2 (

dv

1 v )2

e 2 (

dv.

These integrals cant be found analytically, so they are tabulated numerically. This would have to be done for all values of

33

the standardized normal R.V. Z =

X

.

variable. We denote the p.d.f of Z by (z) =

Rz 1 u2

1

c.d.f. by (z) = 2

e 2 du.

Consider F (x) =

u =

v

; dv

v = x, u =

1

2

Rx

1 v )2

e 2 (

1 2

1 e 2 z

2

and its

x

.

Therefore F (x) =

1

2

1 2

e 2 u du = ( x

).

a

Hence P (a X b) = F (b) F (a) = ( b

) ( ).

(1) (1) = P (1 Z 1) = 682,

P ( 2 X + 2) =

(2) (2) = P (2 Z 2) = 954, and

P ( 3 X + 3) =

(3) (3) = P (3 Z 3) = 997, etc.

34

(z) = P ( < Z z) and (0) = P ( < Z 0) = 5.

Then (z) = 1 (z) for z < 0.

(i)

0 a < b.

P (a Z b) = (b) (a).

(ii)

a < 0 < b.

P (a Z b) = (b) (a)

= (b) (1 (a))

= (b) + (a) 1.

(iii)

a < b < 0.

P (a Z b) = (b) (a)

= 1 (b) (1 (a))

= (a) (b).

Example:

35

(i) P (Z 2 44)

(ii)

P (Z 1 16)

(iii)

P (Z 1)

(iv)

P (2 Z 10)

P (Z 1 16) = 1 P (Z 1 16) =

(ii)

P (Z 1) = 1 P (Z < 1) = 1 (1) = 1 8413 =

(iii)

1587.

(iv)

Example:

(i)

P (Z c) = 10%

(ii)

P (Z c) = 5%

(iii)

P (0 Z c) = 45%

(iv)

P (c Z c) = 99%.

(i)

P (Z c) = 1 P (Z c) = 1 (c), so 1 (c) = 1

(ii)

36

c = 1 645.

(iv)

Example:

(i) P (X 2 44)

(ii)

P (X 1 16)

(iii)

P (X 1)

(iv)

P (2 X 10)

(i)

(ii)

) = (82) = 7939.

2

P (X 1 16) = F (1 16) = ( 1168

)=

2

( 98) = 1635.

(iii)

P (X 1) = 1P (X 1) = 1F (1) = 1( 18

2 ) =

1 (1) = 4602.

(iv)

28

P (2 X 10) = F (10)F (2) = ( 108

2 )( 2 ) =

(4 6) (6) = 2743.

Example:

37

(i)

than 18 5m,

(ii)

the distance d that the throw will exceed with 90% prob-

ability.

(i)

P (X > 18 5) = 1 P (X 18 5) = 1 F (18 5) =

) = 1 (75) = 2266.

1 ( 18517

2

(ii)

Hence ( d17

2 ) = 05, so

Example:

d17

2

stoves is nomally distributed find

(i)

less,

(ii)

20 years.

(i)

25 ) = (2) = 1 (2) =

0228 = 2 28%.

38

(ii)

25 )

( 1615

25 ) = (2) (4) = 3218.

39

Let X, Y be finite R.V.s on the same sample space S with probability function P. Let the range of X be RX = {x1, x2, ..., xn}

and the range of Y be RY = {y1, y2, ..., ym} respectively.

Consider the pair (X, Y ) defined on S by (X, Y )(s) = (X(s), Y (s)).

Then (X, Y ) is a R.V. on S with range RX RY =

{(x1, y1), ..., (x1, ym), (x2, y1), ..., (x2, ym), ..., (xn, y1), ..., (xn, ym)}.

We sometimes call (X, Y ) a vector R.V.

Let Ai = {s S | X(s) = xi} = {X = xi} and

Bj = {s S | Y (s) = yj } = {Y = yj }. We write

Ai Bj = {X = xi, Y = yj }. Define a function

h : RX RY R by h(xi, yj ) = P (Ai Bj )

= P (X = xi, Y = yj ). Then h(xi, yj ) 0 and

h(xi, yj ) =

i,j

probability distribution function of (X, Y ) associated with the

probability function P and X, Y are said to be jointly distributed. Suppose that f and g are the p.d.f.s of X and Y

respectively. What is the connection between f, g and h?

40

m

m

S = m

j=1 Bj , so Ai = Ai S = Ai (j=1 Bj ) = j=1 (Ai Bj ),

j=1 (Ai Bj )) =

m

P

P (Ai Bj ) =

j=1

m

P

j=1

n

P

h(xi, yj ).

i=1

We often write the joint distribution in a table:

Example:

and Y (a, b) = a + b.

41

Definition:

for all i, j.

This means that P (X = xi, Y = yj ) = P (X = xi)P (Y = yj )

or P (Ai Bj ) = P (Ai)P (Bj ) for all i, j.

Note that in the above example X and Y are not independent.

If G : R2 R, then we define a R.V.

Definition:

with p.d.f. h.

We now define the expectation and variance of G(X, Y ) as

E(G(X, Y )) =

i,j

Example:

P

Y (s) = xi + yj and E(X + Y ) = (xi + yj )h(xi, yj ), etc.

i,j

Theorem: (i)

E(X) + E(Y ).

(ii)

Proof: (i)

E(X + Y ) =

P

(xi + yj )h(xi, yj )

i,j

42

PP

xih(xi, yj ) +

xi

h(xi, yj )+

PP

j

yj

yj h(xi, yj )

h(xi, yj ) =

xif (xi)+

yj g(yj )

= E(X) + E(Y ).

(ii)

E(XY ) =

xiyj h(xi, yj ) =

i,j

xiyj f (xi)g(yj )

i,j

P

P

= ( xif (xi))( yj g(yj )) = E(X)E(Y ).

i

= E(X 2 + 2XY + Y 2) (E(X) + E(Y ))2 = E(X 2) +

2E(X)E(Y ) + E(Y 2) (E(X))2 (E(Y ))2 2E(X)E(Y )

= E(X 2) (E(X))2 + E(Y 2) (E(Y ))2 = var(X) + var(Y ).

Important Example:

where S = {s, f }. For 1 trial, n = 1 define the R.V. X by

X(s) = 1 and X(f ) = 0. Then E(X) = 1 p + 0 q = p

and var(X) = E(X 2) (E(X))2 = p p2 = p(1 p) = pq.

For n trials define X1(a1, a2, ..., an) = X(a1), ...,

Xn(a1, a2, ..., an) = X(an), so that Xi is 1 if s is in the ith

place and 0 if f is in the ith place for all 1 i n.

43

Now Let Y = X1 + X2 + ... + Xn, so that Y gives the total

number of successes in the n trials. Then E(Y ) = E(X1) +

E(X2) + ... + E(Xn) = p + p + ... + p = np and var(Y ) =

var(X1) + var(X2) + ... + var(Xn) = npq.

44

Sampling Theory

Suppose that we have an infinite or very large finite sample

space S. This sample space is often called a population. Getting information about the total population may be difficult,

so we consider much smaller subsets of the population, called

samples. We want to get information about the population

by studying the samples. We consider the samples to be random samples i.e. each element of the population has the same

probability of being in a sample.

Example:

n times. This gives a random sample of size n of the ages of

people in Ireland.

Mathematically the situation is described in the following way:

Let X be a random variable on a sample space S with probability function P and let f (x) be the probability distribution

function of X. Consider the sample space = S S ...S,

(n times) with the product probability function P i.e.

45

For each 1 i n define a random variable Xi on by

Xi(s1, s2, ..., sn) = X(si) where (s1, s2, ..., sn) . Then the

probability distribution function of Xi is also f (x) for each

i. The vector random variable (X1, X2, ..., Xn) defined by

(X1, X2, ..., Xn)(s1, s2, ..., sn) = (X1(s1), X2(s2), ..., Xn(sn)) =

(x1, x2, ..., xn) is a random variable on with joint distribution

P(X1 = x1, X2 = x2, ..., Xn = xn) = f (x1)f (x2)...f (xn).

Choosing a sample is simply applying the vector random variable (X1, X2, ..., Xn) to to get a random sample (x1, x2, ..., xn).

Each Xi has the same mean and variance 2 as X and

they are independent, by definition. They are called independent identically distributed random variables (i.i.d.). Functions of the X1, X2, ..., Xn and numbers associated with them

are called statistics, while functions of the original X and associated numbers are called parameters. Our task is to get

information about the parameters by studying the statistics.

The mean and variance 2 of X are called the population

46

sample mean and sample variance:

Definition:

Sample mean X =

2

sample variance S =

n

P

(Xi X)2

n1

2

x1 +x2 +...+xn

n

n

P

(xi x)2

n1

X1 +X2 +...+Xn

,

n

= x = S and

We have

Theorem: (i) Expection of X = E(X) = , the population mean,

(ii)

2

Variance of X = X

=

2

n,

n.

Proof: (i) E(X) = E( X1+X2n+...+Xn ) =

++...+

n

(ii)

n

n

n

= .

2

X

= var(X) = var( X1+X2n+...+Xn ) = var( Xn1 )+var( Xn2 )+

var(X1 )

n2

var(Xn )

2)

+ var(X

+

=

n

=

2

2

n

n

n2

2

n.

given by the following result.

47

Theorem:

Proof:

E(S ) = E(

n

P

(Xi X)2

n1

)=

1

n1 E(

n

P

2

1

2

(X

2X

X

+

X

)) =

E(

i

i

n1

1

n1 [

1

n1 [

n

P

1

n

P

1

E(Xi2)

2E((

n

P

1

2

2

+ 2) 2nE(X ) + nE(X )] =

1

2

n1 [n(

+ 2) nE(X )] =

Note:

Xi)X) + nE(X )] =

1

2

n1 [n(

1

n1 [(n

n

P

(Xi X)2) =

1

2

n1 [n(

+ 2) n( n + 2)] =

1) 2)] = 2.

If the mean or expectation of a statistic is equal to

the corresponding parameter, the statistic is called an unbiased estimator of the parameter. Hence X and S 2 are unbiased estimators of and 2 respectively. An estimate of

a population parameter given by a single number is called a

point estimate e.g. if we take a sample of size n and calculate

S =

x1 +x2 +...+xn

n

and S =

n

P

(xi x)2

n1

concentrate on interval estimates, where the parameter lies

within some interval, called a confidence interval.

48

Suppose we have n i.i.d. random variables X1, X2, ..., Xn with

E(Xi) = and var(Xi) = 2 for each 1 i n. Then if

X=

X1 +X2 +...+Xn

,

n

2

n.

As before, X1, X2, ..., Xn are jointly distrbuted random variables defined on the product sample space. We have the very

important result:

Central Limit Theorem:

and variance

2

n

is

approximately N (0, 1). The larger the n the better the approximation.

Note:

values of n.

Recall N (0, 1).

49

z1 = 1.96. We say that 1.96 Z 1.96 is a 95% confidence

interval for N (0, 1).

Hence P (1.96

1.96) = 95%

P (1.96 n X 1.96 n X) = 95%

P (1.96 n + X 1.96 n + X) = 95%

P (X 1.96 n X + 1.96 n ) = 95%.

If we know this gives us a 95% confidence interval for i.e.

given any random sample there is a 95% probability that lies

within the above interval or we can say with 95% confidence

that is between the two limits of the interval. Put another

way, 95% of samples will have in the above interval.

Example:

with unknown mean and variance 9. Determine a 95% confidence interval for if the sample mean is 5.

Here X = 5, = 3 and n = 100.

P (X 1.96 n X + 1.96 n ) = 95%

50

3

3

P (5 1.96 10

5 + 1.96 10

) = 95%

Example:

workers is 25,000 euro. If the standard deviation of the whole

company is 1,000 euro, construct a confidence interval for the

mean wage in the company at the 95% level.

Here X = 25, 000, = 1, 000 and n = 80.

P (X 1.96 n X + 1.96 n ) = 95%

P (25, 000 1.96

1,000

80

1,000

)

80

We can have different confidence intervals:

Let be a small percentage ( 5% above). Then

P (X z 2

X + z 2

)

n

= 1 gives an

51

Note:

Example:

4.84, using the sample 28, 24, 31, 27, 22.

(Note that we need normality here since the sample size < 30).

X = 28+24+31+27+22

=

26.4

and

=

4.84 = 2.2. Then

5

P (26.4 2.5752.2

26.4 2.5752.2

) = .99

5

5

Example:

interval has length at most 0.4?

In general the length of the confidence interval is

(X + z 2

)

n

(X z 2

)

n

= 2z 2

.

n

3

n

.4, which

gives

21.963

.4

or n = 865.

In all the previous examples we knew 2, the population variance. If that is not so and n 30 we can use S 2 as a point

52

is approximately N (0, 1).

S

n

Example:

watches it is found that X = 14.5 years and S = 2 years.

Construct a (i) 95%, (ii) 99% confidence interval for .

(i)

2

2

14.5 + 1.96 11

14.5 1.96 11

14.14 14.86.

(ii)

2

2

14.5 2.575 11

14.5 + 2.575 11

14.03 14.97.

Note that the greater the confidence the greater the interval.

If n is small (< 30) this is not very accurate, even if the original

X is normal. In this case we must use the following:

Theorem:

then the random variable

X

S

n

is a t-distribution with n 1

degrees of freedom.

We denote the number of degrees of freedom n 1 by . For

each the t-distribution is a symmetric bell-shaped distribu53

tion.

The statistical tables usually read P (|T | > k) for each .

Example:

Example:

mean is 15.5 and the sample variance is 0.09. Obtain a 99%

confidence interval for , the population mean.

54

n 1 = 19. We have X = 15.5 and S 2 = 0.09. For = 19 we

have P (|T | > k) = 0.01 giving k = 2.861.

Now

X

S

20

P (2.861

15.5

0.3

20

2.861) = 99%

P (15.5 2.8610.3

15.5 + 2.8610.3

) = 99%

20

20

Example:

of the flashpoint of diesel oil gave the reults 144, 147, 146, 142, 144.

Assuming normality, determine a (i) 95%, (ii)99% confidence

interval for the mean flashpoint.

Since n < 30 we must apply the t-distribution. n = 5, so

= 4. We have X =

S2 =

(i)

144+147+146+142+144

5

5

= 144.6. Also

= 3.8, so S = 1.949.

2.7761.949

P (144.6 2.7761.949

144.6+

) = 95%

5

5

(ii)

P (144.6

4.6041.949

144.6 +

55

4.6041.949

)

5

Hypothesis Testing

Suppose that a claim is made about some parameter of a population, in our case always the population mean . This claim

is called the null hypothesis and is denoted by H0. Any claim

that differs from this is called an alternative hypothesis, denoted by H1. We must test H0 against H1.

Example:

H0 : = 90.

H1 : 6= 90

H1 : > 90

H1 : < 90

H1 : = 95.

We must decide whether to accept or reject H0. If we reject

H0 when it is in fact true we commit what is called a type I

error and if we accept H0 when it is in fact false we commit a

typeII error. The maximum probability with which we would

be willing to risk a type I error is called the level of significance

of the test, usually 10%, 5% or 1%. We perform a hypothesis

56

Suppose that we are given H0 : = 0, some fixed value.

(i)

or X = 0. Now if the mean is 0, then X is approximately

N (0, 2) or

X0

There is a 5% probability that X is in either of the end regions of N (0, 2) or, equivalently,

regions of N (0, 1). If our

X0

X0

reject H0. This is called a two-tailed test.

(ii)

only check for probability on the right-hand side.

57

X0

is in this rejection

do not reject H0. This is called a one-tailed test.

(iii)

58

Example:

have an average life of 1,000 hours. In a sample of 100 batteries it was found that X = 985 hours and S = 30 hours.

Test the hypothesis H0 : = 1, 000 hours against the alternative hypothesis H1 : 6= 1, 000 hours at the 5% significance

level, assuming that the lifetime of the batteries is normally

distributed.

n = 100 > 30 so we can take S for . If = 1, 000, then

X

S

n

of X on both sides of = 1, 000 so we use a two-tailed test.

Values of

X

S

n

X

S

n

9851,000

30

100

There is a 5% probability of a type I error.

59

watch 6.6 hours of television daily. In a sample of 100 it was

found that X = 6.1 hours and S = 2.5 hours. Test the hypothesis H0 : = 6.6 hours against the alternative H1 : 6= 6.6

hours at the (i) 5%, (ii)1% significance levels.

n = 100 > 30, so we can take S for .

Then

(i)

If = 6.6, then

S

n

S

n

X

S

n

6.16.6

2.5

10

= 2 is in the rejection

(ii)

If = 6.6, then

X

S

n

X

S

n

6.16.6

2.5

10

= 2, which

1% level.

60

Example: A manufacturer produces bulbs that are supposed to burn with a mean life of at least 3,000 hours. The

standard deviation of 500 hours. A sample of 100 bulbs is

taken and the sample mean is found to be 2,800 hours. Test

the hypothesis H0 : 3, 000 hours against the alternative

H1 : < 1, 000 hours at the 5% significance level.

In this case if our X value is greater than 3,000 we do not reject

it since it agrees with H0, so we are only interested in extreme

values on the left. We use a one-tailed test. Again

is approximately N (0, 1) and

X

2,8003,000

500

10

Example: We need to buy a length of a certain type of wire.

The manufacturer claims that the wire has a mean breaking

61

have H0 : 200 and H1 : < 200. We take a random

sample of 25 rolls of wire and find that X = 197 kg and

S = 6 kg. Test H1 against H0 at the 5% level, assuming the

breaking limit of the wire is normally distributed.

Here n = 25 < 30, so we must use a t-distribution with = 24.

If the mean is , then

X

S

n

of freedom.

X

S

n

197200

62

6

25

= 2.5, which is

- An Empirical Analysis of Factors Affecting Work Life Balance Among University Teachers the Case of PakistanDiunggah olehMohan Kumar M
- Oliveira 2010Diunggah olehrenata
- ch09a.pdfDiunggah olehAnsa Patni
- MQM100_MultipleChoice_Chapter10Diunggah olehNakin K
- 1 Research Practices That Can Prevent an Inflation of False-Positive RatesDiunggah olehDaysilirion
- A Study on Job Satisfaction in Rane Engine Valve LtdDiunggah olehganesh1433
- HypothesisDiunggah olehEnamul Haque
- Research HypothesisDiunggah olehAndrra Tanase
- out.pdfDiunggah olehZul Lapudu
- Stat171_11_2015_1Diunggah olehnigerianhacks
- Lakme Final 1Diunggah olehmyraarora
- termproject part2Diunggah olehapi-272758895
- Math 104 MidtermDiunggah olehteacher.theacestud
- Hypothesis TestingDiunggah olehRahul Kumar Jain
- Small Sampling Theory PresentationDiunggah olehJustin Lezcano
- Roc-statistics-primer Better Graphics Etc Rewritten for 2007IEEEconf-Submission WorkingDiunggah olehbrues
- Neyman PearsonDiunggah olehdevianc3
- To Study the Impact of Multi Vendor Outsourcing on the PerformancDiunggah olehIAEME Publication
- Chapter9_Lecture3Diunggah olehNdomadu
- Practical Files 2018Diunggah olehYash Paul
- Study Guide for Final Exam[1]Diunggah olehpunam_rahman
- Lecture_20_-_Testing_in_Industrial_&_Business_Settings.pptDiunggah olehshirshendupandey
- ASTM E 2263 – 04Diunggah olehFrancisco Guerra
- chpts4-6Diunggah olehChristabel Ginika Genevive Ekechukwu
- 13 Chapter IVDiunggah olehDean Dennis
- Analysis of Factors Affecting Farmer Satisfaction in Artificial Insemination Services in Jepara Regency Central Java IndonesiaDiunggah olehIJEAB Journal
- LMR Answer KeyDiunggah olehDon Marcus
- v12i1general1Diunggah olehDarlene Calica
- A New Frequency Dependent Approach to Model Validation for Iterative IdentificationDiunggah olehChristiam Morales Alvarado
- Normal DistDiunggah olehyifru1

- Taro Yamanae Theory.pdfDiunggah olehVAlentino AUrish
- High Frequency Dynamics of Limit Order Markets RamaCont_Continued FractionDiunggah olehEmmanuelDasi
- Common Belief HairDiunggah olehHamida Mutia Maharani
- Review 12ADiunggah olehAmalAbdlFattah
- Assignment for FlexsimDiunggah olehShubham Mehla
- Notes on SPSSDiunggah olehranjana30
- Bootstrap Methods for Standard ErrorsDiunggah olehAilton Vieira da Cunha
- REMOVAL SAMPLINGDiunggah olehapi-27589736
- Factor AnalysisDiunggah olehVikas Bansal
- skittles project part 5Diunggah olehapi-246855746
- Section_12.pdfDiunggah olehMASHIAT MUTMAINNAH
- 05_contrasts1Diunggah olehdaria_ioana
- Oberauer.kliegl.2006.JMLDiunggah olehJon2170
- s f m lesson planDiunggah olehspanzs
- am50_hw4Diunggah olehBryan Baek
- Discriminant AnalysisDiunggah olehc_dezinz
- math ia first draft.docxDiunggah olehKingsley Junior
- Bootstrap Efron FulltextDiunggah olehEfemena Doroh
- StatDiunggah olehJohn Emmanuel Abel Ramos
- SPSS.ANOVA.PCDiunggah olehcattlebrander
- Stats TheoryDiunggah olehNauman-ur-Rasheed
- Chapter6 7 SolDiunggah olehsarvesh009
- Basic 2 (1).pdfDiunggah olehSamia Khalid
- Advanced Data Analysis 2Diunggah olehJuan Pablo Madrigal Cianci
- Econometrics IV Stepán JurajdaDiunggah olehSilvio de Paula
- MIT Introduction to Deep Learning Course Lecture 4 Slides - Computer Vision Deep Generative ModelsDiunggah olehtintojames
- Monte Carlo Simulation Formula in ExcelDiunggah olehAna Marks
- Copula PackageDiunggah olehFerdaws Boumezzough
- t Test Chi-square TestDiunggah olehAprajita Singh
- CopulaeDiunggah olehshrutigarodia