Stats notes

Attribution Non-Commercial (BY-NC)

38 tayangan

Stats notes

Attribution Non-Commercial (BY-NC)

- Br Fast Food
- SHS Core_Statistics and Probability CG.pdf
- BA7102 Statistics for Management Question Paper
- CaseStudiesStatistics-2009
- Lecture4-ConfidenceIntervals
- Psych Testing Lecture 2
- The Level of Use of Project Management Methods Techniques and Tools
- ap statistics final project proposal resubmission
- Chapter 5
- Assignment 02 (a)
- SPE-180383-MS
- Assignment #2 Estimation Exercises 1, 2 and 5
- proposal_2009872522
- The Islamic Influence on Work Motivation
- Hypothesis Testing - 2 Populations
- STATISTICS QUESTION PAPER-2014
- 17213016020
- 10.1.1.64.575
- Fachris Statistika Quiz 2
- skittles project cruz vigil

Anda di halaman 1dari 51

7.Simple Significance Tests Learning Outcomes - Tests based on the Normal distribution; single mean, comparison of two means; single proportion or percentage; comparison of two proportions; correlation coefficient. - Tests based on the t-distribution: single mean; paired comparison; comparison of two means; linear regression coefficient. - Tests based on the F-distribution; comparison of two variances; comparison of r means, analysis of variance. - Tests based on the 2 distribution: single variance; goodnessof-fit for classified data.

7.1 Tests Based on the Normal Distribution 7.1.1 Test of Single Mean 7.1.1.1 Suppose we have a random sample of size n from the normal distribution N( , 2) where is unknown but the variance 2 is known. We wish to test H0: = 0 (specified) against H1: 0 Or H1: > 0 H1: < 0 at a given significance level . The test statistic

Z X ~ N(0,1) .

2 n The critical region lies in the tails of N(0,1) distribution (or right hand tail or left hand tail respectively).

page

2 A 100(1- )% confidence interval for is x z* n P(Z z*) . (See examples 2 and 3 of chapter 6). 2

N(0,1)

where

z*

7.1.1.2 If the variance 2 is unknown, then if the sample size n is reasonably large, say n 30, the above procedure can be used 2 2 1 n ( x x ) 2. with replaced by the sample variance s n 1i 1 i 2 The results will be approximate. The quantity sn is called the standard error (of the mean) - used in confidence intervals calculations. (If n is small, use the t-distribution - see 7.2.1 later). Suppose the distribution of the population from which the sample is drawn is not known to be normal. When n is large, 2 X ~ approximately N( , ) by the Central Limit Theorem

(SOR101 Chapter 5) and hence Z X - ~ approximately N(0,1). 2 n We can then use the methods of 7.1.1.1 and 7.1.1.2 but treat the results as approximate.

Simple Significance Tests page

2

7.1.1 Comparison of Two Means 7.1.2.1 Suppose we have random samples of size n1 and n2 from independent normal populations with unknown means 1 and 2 and known variances 12 and 22 . Sample mean random variable from population 1 2 1 X1 ~ N( 1, n ) 1 Sample mean random variable from population 2 X 2 ~ N( 2 , n2 ) 2 H0: against Consider H1:

1

0,

0.

D X1 X 2

2)

Therefore

D ~ N( 1 2 1 2, n 1 2 ) n2 2

page

And

is

7.1.2.2 If 12 and 22 are unknown and n1 30, n2 30, replace 2 2 1 and 2 by the sample variances and proceed as above.

7.1.2.3 When the population distributions are not known to be normal, the above procedure can be used for large samples (n1 30, n2 30). Z ~ approximately N(0,1).

page

Example 1 Suppose we wish to determine if there is a difference in mean weight between the two sexes in a particular bird species. The following data were obtained: Male sample size n1=125, mean weight x1 92.31g s12 56 .22g 2 mean weight x 2 88.84g s 22 65.41g 2

1 1

Test Against

H0: H1:

2 2

i.e. i.e.

- 2= 0 1- 2 0

1 1

at a 5% significance level. If significant, give a 95% c.i. for 2. Under H0, test statistic (X1 X 2 ) 0 Z ~ approx. N(0,1) s1 s 1 1 2 n n 1 2 Test statistic value z

page

2.5%

-1.96 CR

1.96 CR

Test is highly significant at 5% level since P(|Z| > 3.14) < 0.01. Hence we are confident that the mean weights are different, in particular with male mean weight greater than female mean weight. How different? From

P( 1.96 (X1 X 2 ) ( 1 s2 s 2 1 2 n n 1 2 2 ) 1.96) 0.95

Approximate 95% confidence limits for 1 - 2 and s2 s 2 1 2 (x1 x 2 ) 1.96 n n 1 2 That is, approximately 95% confidence interval is [1.31, 5.63]. Note Since 1 - 2 = 0 does not lie in this interval, the test is significant at 5% level.

page

7.1.3 Tests of Proportions(s) 7.1.3.1 Single proportion p Suppose we have a random sample of n units from a large population, a proportion p (unknown) of which possess a certain attribute. Let x units in the sample possess the attribute. Then the x sample proportion p n estimates p. A possible probability model for this situation is the binomial distribution. Let random variable X correspond to the number of units with the attribute in the sample of size n. Then n x P(X x) p (1 p)n x x 0,1,..., n. x When the sample size n is large and both np and n(1-p)>5, X~ approx. N(np, np(1-p)) in the sense that

where Y ~ N(np,np(1- p)). The sample proportion random variable is X . n E X 1 E(X) np p . n n n 1 Var (X) np(1 p) p(1 p) Var X n n2 n n2 Therefore X ~ approx. N p, p(1- p) and, n n

X p n ~ approx. N(0,1). p(1 p) n

P(a X b) (a Y b)

Simple Significance Tests page

7

against

Since,

P 1.96

X p n 1.96 p(1 p) n

0.95

an approximate 95% c.i. for unknown proportion p is obtained by replacing p(1 p) by p(1- p) where p x . n n n That is,

p 1.96 p(1- p) . n

The above procedure is also applicable when p is a probability, for example p=P(head when coin tossed).

page

Example 2 In a random sample of 120 graduates, 78 spent 3 years at university and 42 more than 3 years. Test the hypothesis that 70% of graduates obtain degrees in 3 years. Let p=P(graduate in 3 years) (unknown) H0: p = 0.7 H1: p 0.7

Sample proportion p

Test statistic

2.5%

-1.96 CR

1.96 CR

Test not significant at 5% level. We have insufficient evidence for rejecting H0.

page

7.1.3.2 Comparison of Two Proportions (from large samples) Suppose we have large samples of sizes of n1 and n2 from two populations where proportions p1 and p2 respectively have an attribute. We wish to test H0: p1 = p2 H1: p1 p2 (or a one-sided alternative).

against

Under H0, denote the common proportion by p, that is, H0: p1 = p2 = p say, where p is unknown. Sample proportion random variable X1 p (1 p1) ~ approx. N p1, 1 n1 n1 Sample proportion random variable X2 p (1 p 2 ) ~ approx. N p 2 , 2 n2 n2 Therefore X1 X 2 p (1 p1) p 2 (1 p 2 ) ~ approx. N p1 - p 2 , 1 n1 n 2 n1 n2

X1 X 2 ~ approx. N 0, p(1- p) 1 1 . n1 n 2 n1 n 2 The above variance needs to be estimated by estimating p. Under H0, p is estimated from the combined samples, that is, x x p 1 2. n1 n 2

page 10

X1 X 2 n1 n 2 p(1 p) 1 1 n1 n 2

~ approx. N(0,1).

X1 X 2 (p1 p 2 ) n1 n 2 p1(1 p1) p 2 (1 p 2 ) n1 n2

P z*

z*

where

x p1 1 , p 2 n1 x2 , P(Z z*) n2 2

page 11

Example 3 We wish to compare the germination rates of spinach seeds for two different methods of preparation. Method A Method B 80 seeds sown, 90 seeds sown, 65 germinate 80 germinate.

Let proportions germinating be p1 and p2. H0: p1 = p2= p H1: p1 p2. (unknown)

against Estimate p by

p 65 80 p1 65 0.8125, p2 80

z

80 0.853. 90 80 0.889. 90

1.4 .

Under H0,

2.5%

-1.96 CR

1.96 CR

Test not significant at 5% level. We have no evidence for supposing the germination rates to be different. (Hypotheses involving proportions can also be tested using the 2 distribution. See chapters 7 and 8).

Simple Significance Tests page 12

7.1.4 Test of Correlation Coefficient Suppose we have a random sample (x1, y1), , (xn, yn) with sample correlation coefficient r from a bivariate probability distribution with unknown correlation coefficient . The distribution of the sample correlation coefficient random variable R is very complicated. However the transformed random variable Z 1 log e 1 R 2 1 R is approximately normally distributed with mean 1 log 1 p and variance 1 . 2 e1 p n -3 (Called the Fisher Z transformation.) Confidence limits for are obtained by first calculating confidence limits for and then transforming back. NCEST Tables 16 and 17 are useful.

page 13

Example 4 A random sample of 39 pairs of observations have sample correlation coefficient r =0.73. Test the hypothesis that the population correlation coefficient is 0.9. Give a 95% c.i. for . H0: = 0.9 against H1: 0.9.

Z 1 log e 1 R ~ approx. N , 1 where 2 1 R 36

1 log 1 p 2 e1 p

Therefore Y Z

1 36

~ approx. N(0,1) .

Under H0,

0.9287

2 .5 %

-1.96

CR

1.96

CR

p-value 0.1%

Simple Significance Tests page 14

P 1.96 Z

1 6

1.96

0.95

that is

Z 1.96 0.95 . 6 1 log 1 p is Therefore a 95% c.i. for 2 e1 p 0.9287 1.96 , 0.9287 1.96 , 6 6 P Z 1.96 6

that is,

Table 17

0.60, 1.26 .

z =0.60,r =0.54 z =1.26,r =0.85 is (0.54, 0.85).

Note: ' = 0.9' does not lie in this interval therefore test significant at 5% level.

1 log 1 r , then Table 16. 2 e1 r

page 15

7.2 Tests Based on the (Students) t-distribution Many of the tests described in previous sections require large samples or precise information about variances (which is often lacking in practice). Small samples are important since in some practical situations the number of observations which can be made may be limited by: the experimental technique, the amount of experimental material, the cost of making an observation, the particular environmental conditions, etc. For small samples there are useful tests based on the tdistribution. The probability density function of the t-distribution with degrees of freedom is

1 f (t ) 2

where Note: (i)The

2 1( 1) 2 2 1 t

is a positive integer.

p.d.f. is symmetrical about t=0. As , t - distribution N(0,1). For practical purposes, when 30, the t-distribution is approximately the same as N(0,1). (ii) If X1 , Xn are independent and each ~ N( , 2 ), then T X ~t n 1 S2 n i.e. t-distribution with (n-1) degrees of freedom where

page 16

n 1 n (X X)2. X 1 X and S2 ni 1 i n 1i 1 i

Therefore, given a random sample of size n from is an observation from tn-1. N( , 2 ), t x s2 n (iii) NCEST Table 9 tabulates the cumulative distribution function

P(T t) F (t) t f (u) du, t 0

t p.d.f.

P(T

t)

t(P)

NCEST Table 10 tabulates percentage points, That is, given P% and , read t(P) where P(T t(P)) P%. For example, =30, P 5% 2.5% 0.5% 0.05% t(P) 1.697 2.042 2.750 3.646 N(0,1) 1.645 1.960 2.576 3.291

page 17

t p.d.f.

P%

t(P)

7.2.1 Single Mean Suppose we have a random sample of size n (small) from N( , 2 ), where and 2 are both unknown. We wish to test H0: H1: Under H0, T =

0 0

x

0 is an n 1 s2 S2 n n observation from tn-1. The critical region will lie in the tails of the tn-1 distribution.

0 ~t

is obtained from

t* X

S2 n

t*

Simple Significance Tests page 18

Example 5 The temperature of warm water springs in a basin is reported to have a mean of 38C. A sample of 12 springs from the west end of the basin had mean temperature 39.4 and variance 1.92. Have springs at the west end a different mean temperature? Give a 95% c.i. for the mean temperature.

tn-1 p.d.f.

-t*

Denote west end spring temperature by X where X has mean and variance 2. We estimate 2 by s2=1.92 with 11 degrees of freedom. H0: H1: = 38 38

page 19

t11 p.d.f.

2.5%

2.5%

-2.201

2.201

Upper 2.5% point of t11 is 2.201. Test is significant at 5% level and we conclude that west springs do have a different temperature. Since upper 0.5% point of t11 is 3.106, the test is highly significant at 5% level. (Alternatively, p-value=P(|T|> 3.5) =2P(T>3.5) =2(1-P(T<3.5)) =2(1-0.9975) From Table 9 =0.005 or 0.5% ).

2.201

X 1.92 12

2.201

0.95

39.4 2.201 0.4

page 20

7.2.2 Paired Comparison Test In this case we are interested in the difference between two methods or properties where the observations occur naturally in pairs and taking the difference of the paired observations is valid. It is not possible to pair arbitrarily. Example 6 Consider an experiment to compare the effects of two sleeping drugs A and B. There are 10 subjects and each subject receives treatment with each of the two drugs (the order of treatment being randomised). The number of hours slept by each subject is recorded. Is there any difference between the effects of the two drugs?

Subject 1 2 3 4 5 6 7 8 9 10 Hours slept using A 9.9 8.8 9.1 8.1 7.9 12.4 13.5 9.6 12.6 11.4 Hours slept using B 8.7 6.4 7.8 6.8 7.9 11.4 11.7 8.8 8.0 10.0 Difference (A-B) x 1.2 2.4 1.3 1.3 0.0 1.0 1.8 0.8 4.6 1.4

The paired sample data have been reduced to a single sample of differences. This will tend to cancel out any subject effect assuming that the effect of the drug is additive. Assume x values to be normally distributed with mean . x 15.8, x 1.58, x 2 38.58, s 2 1.513 . i i H0: =0 H1: 0

page 21

Under H0 , t 1.58 0 4.06 is an observation from the t 1.513 10 distribution with 9 degrees of freedom.

t9 p.d.f.

2.5%

2.5%

-2.262

2.262

From Table 10 we have P(|T| > 2.262)=0.05. The test is significant at a 5% level.

The upper 0.5% point of t9 is 3.250 so the test is also significant at the 1% level. We are thus confident that there is a difference between the drugs, in particular that drug A induces more sleep than drug B on average. (Or p-value = P(|T| > 4.06)=2(1-P(T 4.06))=2(1-0.9986)=0.3% Table 9). A 95% confidence interval for the unknown mean difference is x 2.262 1.513 , x 2.262 1.513 10 10 That is, [0.70, 2.46].

page 22

7.2.3 Comparison of Two Means - small samples Random sample of size n1 (n1 small, < 30) with sample mean x , 1 sample variance s 2 from a normal or approximately normal 1 distribution with unknown mean 1 and unknown variance 2. Random sample of size n2 (n2 small, < 30) : x , s 2 , 2 and 2. 2 2 Note: The unknown population variances are equal. We wish to test H0: 1- 2= H1: 1- 2 First, estimate

0 2

by

page 23

(X X ) ( ) 1 2 1 2 t* t* S2 1 1 n n 1 2

n n 2 1 2

2.5%

2.5%

-t*

t*

) is

(x x ) t * s 2 1 1 n n 1 2 1 2

page 24

Example 7 Two methods of oxidation care are used in an industrial process. Repeated measurements of the oxidation time are made to test the hypothesis that the oxidation time of method 2 is longer than that of method 1 on average. Sample Sample Sample size mean variance Method 1 9 41.3 20.7 Method 2 8 48.9 34.2 We wish to test H0: 1= H1: 1<

2 2

1- 2=0 1- 2<

0.

We shall assume that the unknown population variances are equal. (This can be tested using an F -Test - See 7.3.1)

s2 8(20.7) 7(34.2) 8 7

Under H0, t 41.3 48.9 27 1 1 9 8

27

w 15 df. tith

t15

5%

page 25

Since p-value = P(T<-3.01) 0.005, test is highly significant at 5% level. We are confident that oxidation time for method 2 is longer. 95% c.i. for (

) is

(X X ) ( 2 1 P 1 S2 1 1 9 8

) 1.753 95%

That is,

P X X 1.753 S2 1 1 9 8 2 1 2 95% .

) is

That is (

) < -3.2,

i.e. (

) >3.2. 1

page 26

7.2.4 Test of Slope and Intercept in Linear Regression Model In the linear regression model, suppose that the responses are normally distributed. That is, Y ~ N( x , 2 ) i 1, ...., n. i i The least squares estimator of is n (x x) (y y) i i 1 i n (x x) 2 i 1 i 2 and (Chapter 5) E( ) , Var( ) n (x x) 2 i 1 i Since is a linear combination of normal random variables, is also normally distributed That is,

~N , n 2 .

i can be used to test hypotheses such as H0: = 0 (specified), against, H1: 0 and also to set up c.i. for unknown slope .

Simple Significance Tests page 27

(x

x) 2

7.3 Tests based on the F-distribution The probability density function of the F-distribution with ( , ) degrees of freedom is 1 2 1 1 1( ) 1 1 1 ( 2 2 1 2 ) 12 1 2 2 2 2 1 f ( ) 0 ( , ) B 1 ,1 1 2 2 12 2 where B(a,b)= (a ) (b) is the beta function and , and are (a b) 1 2 positive integers. Note: (i) For

, 3 the probability density function has the 1 2 following shape (not symmetrical):

(ii) Given two independent random samples of size n 1 and n2 from N , 2 and N , 2 respectively, the ratio 1 1 2 2 S2 1 2 1 ~F F (n 1, n 1) S2 1 2 2 2 2 where S12 and S22 are the sample variance random variables.

Simple Significance Tests page 28

(iii) NCEST Tables 12(a) 12(f) tabulate percentage points for the right hand tail only. where P=10, 5, 2.5, 1, 0.5, 0.1%, read , and 1 2 (P) where P(F > (P))=P%. Given P%,

Suppose we wish to find the lower percentage point 1(P) for F distribution. ( , ) 1 2 First, find the upper percentage point 2(P) for F , that is, ( , ) 2 1 with degrees of freedom interchanged. Then 1 . (P) (P) 1 2

Linear interpolation in 1 or 2 will be sufficient except when either 1 12 or 2 40, in which case harmonic interpolation should be used. (See example later).

Simple Significance Tests page 29

7.3.1 Comparison of Two Variances Suppose we have two random samples of size n 1 and n2 with

2 2 sample variances s1 and s 2 from two independent normal

2

2 2.

2 1 2 2

2 1 2 2

We wish to

(specified), against,

2 1

0=1,

2 1 2 2

2 S1

against

)

F

2 1 2 2 2 1 S1 2 0 S2

S2 2

~ F(n 1 1, n 2 1)

The critical region lies in both tails of the F-distribution. (For H1:

2 1 2 2 0

2 1 2 2

hand tail only). A 100(1- )% confidence interval for the variance ratio obtained from

2 1 2 2

is

page 30

S1

2 2

S2

2 2

2 2

S1 S2

2

2 1 2 2

S1 S2

2

1

L

S1 S2

2 U 2

2 1 2 2

is

S1 S2

2

Example 8 We wish to compare the precisions of two technicians in titrations of CaCO3 content of raw meal. The following results were obtained: 1st Technician 2nd Technician We wish to test H0: H1: n1=31, n2=25

s12 =0.0388

s 22 =0.0177.

2 1 2 1

2 2

F

2 s1

s2 2

0.0388 0.0177

page 31

F(30,24)

2.5%

2.5%

WL

WU

In table 12(c), there is no tabulated value for F(30, 24). We use harmonic interpolation in 1, that is linear interpolation in 1 or a

1

multiple of 1 . 1

(24,24) (30,24) ( ,24) 2.269

U

1.935

120 1 5 4 0

1.935 4 (2.269 1.935) 2.202. U 5 For , we first find upper 2.5% point of F(24, 30), that is 2.136. L 1 0.468. Then L 2.136 The observed value of the test statistic, 2.19, does not fall in the critical region. We have no convincing evidence that precisions are different.

OR Under H0, the test statistic observation from F(24, 30).

Simple Significance Tests page 32

s2 F 2 s2 1

is

an

F(24,30)

2.5%

2.5%

1/(2.202)=0.454

2.136

l arg er sample variance smaller sample variance Which will fall in the right hand tail of the F distribution. It is still a 2-sided test, but we don't have to calculate the lower percentage point.

In practice we calculate F

S2 1 P 0.468 S 2 2

2 1 2 2

2.202

0.95

S2 1 P S 2 (2.202 ) 2

2 S2 1 1 2 S 2 (0.468) 2 2

0.95

2 1 2 2

[0.996, 4.682].

page 33

7.3.2 Comparison of t Means Suppose we have t samples of size n 1, n2 ,, nt from independent normal distributions N , 2 , N , 2 , , N , 2 t 1 2 respectively where 1, 2 ,, t and 2 are unknown. Sample 1 Data Total n1 y11, y12, , y 1n1 T y 1 j i 1j y21, y22, , y Mean Variance T s2 1 y 1 1 n 1

2n2 T 2

: :

Tt

n2

j i 2j

T 2 y 2 n 2

: :

s 2 2

: :

: : t

: : yt1, yt2, , y tn t

nt

j i tj

yt

Tt nt

st2

T y i i 1 j i ij i 1

t ni

i i i

y G n

(overall mean)

Simple Significance Tests page 34

2

An estimate of

is

s2

where SSE is the within samples sum of squares or error sum of squares or residual sum of squares with (n-t) degrees of freedom.

t SS (y y)2 n (y y)2. T i 1j 1 i i 1 i i SST is the sum of squares of deviations of the sample means from the overall mean and is referred to as the between samples sum of squares or treatment sum of squares with (t-1) degrees of freedom.

Now consider

t ni

page 35

SS T

SS

(t 1) is an observation (n t )

When H0 does not hold, some of the (y y)2 terms will tend to i be larger than expected, resulting in a larger value for SST than expected, leading to a larger value of the test statistic. So we set up the critical region in the right-hand tail only of the F distribution. If the test statistic falls in this region, the null hypothesis of equality of means is rejected. In practice we do not compute SSE and SST in the form given above. First compute the total corrected sum of squares.

t ni 2 SS (y y )2 y 2 2 y y y TC i 1 j 1 ij i ij ij i 1 j 1 i 1j 1 i 1j 1 t ni 2 t ni 2 y 2yny ny 2 y ny 2 (*) ij ij i 1j 1 i 1j 1 t ni 2 G 2 SS y n TC i 1 j 1 ij t ni t ni t ni

page 36

Also,

SS TC

t ni

2 (y ij y) (y i y)

i 1j 1 t ni t ni t ni (y y )2 2 (y y )( y y) ( y y) 2 ij i ij i i i 1j 1 i 1j 1 i 1j 1 i

t ni n i

2 ( y y )( y y) 2 ( y y) ( y y ) i 1 j 1 ij i i i 1 i j 1 ij i n t i 2 ( y y) y ny i i 1 j 1 ij i i 0

We usually calculate SSTC using (*) and SST, and then obtain SSE by subtraction, that is, SSE = SSTC + SST.

page 37

n y 2 ny 2 i 1 i i t Ti 2 G 2 SS n T i 1n i

(**)

We set out the calculations in an analysis of variance table Source variation (Between) samples of df t-1 SS SST

SS T (t 1) SS E (or s 2 ) (n t )

MS (mean square)

F ratio

SS T SS E

(t 1) (n t )

n-t

SSE

n-1

SSTC

We can also set up confidence intervals for an unknown mean i or a difference ( i - j) using t- distribution and s2 with (n-t) degrees of freedom. We can test pairs of means for equality using a t-test.

page 38

Example 9 We wish to test if there is any difference in the average yield of a particular crop when treated with four different fertilisers: 1. Straw, 2.Straw + Nitrate, 3.Straw + Phosphate, 4. Straw + Nitrate + Phosphate. In particular we are interested in any difference between fertilisers 3 and 4. A properly designed experiment was carried out with the following results:

Fertiliser 1. 2. 3. 4. S S+N S+P S+N+P Yield yij 38.3 38.8 40.3 62.7 38.5 43.4 42.6 61.0 38.7 38.9 41.1 54.8 41.2 39.1 40.6 51.7 ni 4 4 4 4 16 Total Mean T 2 i y Ti i n i 156.7 39.18 6138.7 160.2 40.05 6416.0 164.6 41.15 6773.3 230.2 57.55 13248.0 711.7 32576.0

y 2 j ij

G 2 711.7 2 31657 .3 n 16 SS 32679 .9 31657 .3 1022 .6 TC SS 32576 .0 31657 .3 918.7 T SS 1022 .6 918.7 103.9 E

df 3

SS 918.7

MS 306.2

F ratio 35.4

page 39

at the 5 % level.

Under H0 the F ratio 35.4 is an observation from F (3,12). The value 35.4 falls in the critical region. If W ~ F(3,12), then from Table 12(f), P(W > 35.4) < 0.001. Hence the test is very highly significant at a 5% level, i.e. we are very confident that the fertilisers give different mean yields. A 95% confidence interval for is based on the percentage 2 points of the t12 distribution and s2=8.66: y 2.179 s . n i i

i

Fertiliser 1 2 3 4

Standard 95% confidence error interval 8.66 1.47 (35.97, 42.39) 4 1.47 (36.84, 43.26) 1.47 (37.94, 44.36) 1.47 (54.34, 60.76)

Clearly fertiliser 4 is different from 1, 2 and 3. To investigate fertilisers 3 and 4, consider H0: , against 3 4 H1: . 3 4

page 40

Under H0,

(y y ) 0 3 4

7.88

is an

t12

2.5%

2.5%

-2.179

2.179

The test is very highly significant at the 5% level, that is, we are very confident that fertilisers 3 and 4 are different on average: from examination of the sample means, fertiliser 4 produces a higher yield on average than fertiliser 3.

page 41

Distribution

2

distribution with

Note: (i) For

(ii) Let S2 be the sample variance random variable of a random sample of size n from N( , 2). (n 1)S2 ~ 2n 1. Then V 2 (iii) NCEST Table 8 tabulates 2 percentage points. Given and P% where P= 99.95, 99.9,, 60% (page 40) P= 50,, 0.05% (page 41) read (P) where P(V> (P))=P%.

page 42

7.4.1 Single Variance 2 Suppose we have a random sample of size n from a normal distribution with unknown variance 2. We wish to test H0: 2 = 02 (specified) against 2 H1: 2 0 (or a one-sided alternative) at given significance level .

(n 1)S2 ~ 2n 1. Under H0, the test statistic V 2 0 The critical region lies in both tails of the 2 distribution. (For H1: 2 > 02, use right hand tail only. For H1: 2 < 02, use left hand tail only.)

is obtained from

(n 1)S2 P L 2

that is,

2 2 (n 1)S2 1 P (n 1)S U L Therefore a 100(1- )% confidence interval for (n 1)s 2 , (n 1)s 2 . U L

page 43

is

Example 10 The precision of a measuring process is stated to be 2=0.025. A random sample of 30 measurements has sample variance s2=0.032. Is the above statement justified? H0: 2 = 0.025 H1: 2 0.025 Assuming the measurements are normally distributed, then under H0, the test statistic value 29 0.032 37.12 0.025 is an observation from the 2 . 29

Test is not significant at the 5% level. We can have insufficient evidence for rejecting H0 in favour of H1, that is, no reason to doubt statement. (In this case, we wouldn't normally compute a c.i. for 2 since the test was not significant. However a 95% c.i. could be computed as follows: 2 P 16.05 29S 45.72 0.95 2

2 P 29S 45.72

95% c.i. for

2

29S2 16.05

0.95

Simple Significance Tests page 44

7.4.2 Goodness-of-fit Test for Classified Data Suppose that a sample of n observations is classified into k mutually exclusive and exhaustive classes, that is, each observation belongs to one and only one class. k Let Oi be the observed frequency in the ith class, O n. i 1 i Consider a null hypothesis H0 which specifies the probabilities of belonging to the k classes. Under H0, let Ei be the expected k frequency in the ith class, E n. i 1 i Under H0, the goodness-of-fit test statistic k (Oi Ei ) 2 2 E i 1 i is approximately distributed 2 where =k-1-(number of independent parameters estimated from the data). The critical region lies in the right hand tail only of the 2 distribution since if H is not true we would expect the E 's to 0 i be quite different from the Oi's resulting in a larger than expected value of 2 . (Small 2 results when Ei's and Oi's are in good agreement - certainly not a reason to reject H0). Note: (i) The exact distribution of 2 is discrete and is approximated by the continuous 2 distribution. For this approximation to be reasonable, Ei should be > 5 for each class. If not, combine adjacent classes with resultant loss of one or more degrees of freedom.

page 45

(ii) In tests with only 1 degree of freedom, a better approximation is obtained by including Yates' continuity correction: k (| Oi Ei | 1 )2 2 2 ~ approx. 2 . 1 E i 1 i Example 11 The geneticist Mendel evolved the theory that for a certain type of pea, the characteristics Round and Yellow, R and Green, Angular and Y, A and G occurred in the ratio 9:3:3:1. He classified 556 seeds and the observed frequencies were 315, 108, 101 and 32. Test Mendel's theory on the basis of these data.

E i

R +Y R +G A+Y A+G

9 16 3 16 3 16 1 16

2 . The test is not 3 significant at the 5% level, that is, no evidence for supporting the rejection of H0. (That is, data in agreement with the theory).

Under H0, 2 =0.47 is an observation from

Simple Significance Tests page 46

Example 12 In a random sample of 120 graduates, 78 spent 3 years at University and 42 more than 3 years. Test hypothesis that 70% obtain degree in 3 years. (See7.1.3.1 - test of proportion using the normal distribution test). H0: P(degree in 3 years) = p = 0.7 H1: p 0.7 Oi Ei Degree in 3 years 78 84 More than 3 years 42 36 120 120 Degrees of freedom = 2 - 1 = 1. Therefore use correction.

(| 78 84 | 1 )2 (| 42 36 | 1 )2 2 2 2 1.2 84 36

Test not significant at 5% level. No evidence to support the rejection of H0. Alternative Method - use normal approximation.

page 47

Fitting and Testing the Goodness-of-fit of a Probability Distribution to classified Data This consists of the following steps: Decide which distribution is applicable; Find parameter values from H0 and/or by estimation; Calculate the probability P i for the ith class; The expected value in the ith class = Ei = npi where n = number of observations in the sample; Carry out 2 goodness-of-fit tests, amalgamating adjacent classes if necessary. (i.e. to make all Ei 's at least 5). For discrete distributions, the classes occur in a straight forward manner and the calculation of P i is based on calculating the probability function at specified values.

Example 13 Yeast Cell Data For continuous distributions the classes used are only possible divisions of the real line into classes; If (ci, ci+1] defines the ith class, then c i 1 P f (x | , ...., ) dx. 1 i c k i Different divisions may lead to different values of 2 . For this reason we often use other methods for testing the goodness-of-fit of a continuous distribution.

page 48

Example 14 Fitting and testing the goodness-of-fit of a Poisson distribution to the yeast cell data. (See chapter 3 Grouped Data for a description of the experiment and data.) Number of yeast cells in a square i 0 1 2 3 4 5 6 7 8 9 10 11 12 13 Observed Probability Expected (Oi Ei ) 2 Count Oi pi Count Ei Ei 0 20 43 53 86 70 54 37 18 10 5 2 2 0 400 0.0093 0.0434 0.1016 0.1585 0.1855 0.1736 0.1354 0.0905 0.0530 0.0276 0.0129 0.0055 0.0021 0.0011 1.0000 3.7 17.4 40.6 63.4 74.2 69.4 54.2 36.2 21.2 11.0 5.2 2.2 0.8 0.4 399.9

0.1 0.1 1.7 1.9 0.0 0.0 0.0 0.5 0.1 0.0

4.4

A possible model for X, the number of yeast cells in a square, is a Poisson distribution. We thus want to test H0: data arises from a Poisson distribution against H1: data not from a Poisson distribution, at a 5% significance level.

page 49

1. Estimate the Poisson parameter by the sample mean 12 x 1 iO 4.68. 400 i 1 i 2. Calculate the Poisson probabilities for and thus calculate i 4.68 p P(X i) (4.68) e for i 0, 1, ... , 12 i i! and thus calculate 12 P(X 13) 1 P(X i). i 0 3. Calculate the expected counts Ei = 400pi. 4. Combine E0 and E1, and E10 and E11, so that expected values exceed 5. Combine the corresponding observed counts. 5. 2 =4.4 with 10 - 1 - 1 = 8 degrees of freedom (since 10 classes were used in computing 2 and one parameter was estimated from the data). The upper 5% point of the 2 distribution with 8 degrees of freedom is 15.51. Hence we do not reject H0 in favour of H1, that is, we conclude that the Poisson distribution model provides an adequate fit to the yeast cell data.

page 50

7.4.3 Amalgamation of

Results

Theoretical result: if V1, , Vr are independent 2 random variable with degrees of freedom , ... , r respectively then the 1 .... random variable V = V1 ++ Vr ~ 2 where r. 1 Example 15 Suppose 4 independent experiments are performed to test a null hypothesis H0 and the goodness-of-fit test statistic 2 calculated in each case. Also suppose that the experimental results cannot be combined.

Experiment 1 2 3 4

Individual tests of H0 are not significant at 5% level. Using the additive property, under H0 the value 50.2 is an observation from 2 . The upper 5% point of 2 is 43.77. Hence the combined 30 30 test is significant at the 5% level so we have sufficient evidence for rejecting H0 in favour of H1.

page 51

- Br Fast FoodDiunggah olehRucha Asolkar
- SHS Core_Statistics and Probability CG.pdfDiunggah olehMarycris Buendia
- BA7102 Statistics for Management Question PaperDiunggah olehMukilanA
- CaseStudiesStatistics-2009Diunggah olehsachy24
- Lecture4-ConfidenceIntervalsDiunggah olehud
- Psych Testing Lecture 2Diunggah olehN
- The Level of Use of Project Management Methods Techniques and ToolsDiunggah olehSophia Karra
- ap statistics final project proposal resubmissionDiunggah olehapi-344176657
- Chapter 5Diunggah olehJoseph Kandalaft
- Assignment 02 (a)Diunggah olehrubelkk
- SPE-180383-MSDiunggah olehjuanseferrer15
- Assignment #2 Estimation Exercises 1, 2 and 5Diunggah olehChouaib El Hajjaji
- proposal_2009872522Diunggah olehAnis Syakira Bidres
- The Islamic Influence on Work MotivationDiunggah olehAlexander Decker
- Hypothesis Testing - 2 PopulationsDiunggah olehvanessa
- STATISTICS QUESTION PAPER-2014Diunggah olehfazalulbasit9796
- 17213016020Diunggah olehjoyanedel
- 10.1.1.64.575Diunggah olehMichael Jordan
- Fachris Statistika Quiz 2Diunggah olehBela Prayoga
- skittles project cruz vigilDiunggah olehapi-316965470
- math 1040 docx skittlesDiunggah olehapi-242871507
- skittles projectDiunggah olehapi-319345408
- culminating stats inquiry final draftDiunggah olehapi-239648548
- Paired TestDiunggah olehIRS
- p208 SajjadDiunggah olehFlorin Nastasa
- BUS 308 Statistics for Managers Complete Week 1 to 5Diunggah olehOlivia Stutler
- Research Awards BulletinDiunggah olehmacwing05
- M. a M.sc STATISTICS Semester Pattern Syllabus of STATISTICSDiunggah olehNishant Uzir
- 156Diunggah olehMahesh kumar Choudhary
- Tests for One Poisson MeanDiunggah olehJulyAnto

- 3 Bpa Mv Regression Reference Guide May2012 FinalDiunggah olehDragomir Vasic
- Chapter3 [Compatibility Mode]Diunggah olehWei Quan
- Supply Chain FundamentalsDiunggah olehGalal Ghanem
- 3073890-Proyecto-minero-Utupara.pdfDiunggah olehJairoCruzPortilla
- Acceptance Sampling PresentationDiunggah olehVladimir Aquino
- Session 2Diunggah olehPavan Kasireddy
- Quant 11Diunggah olehbusybeefreedom
- ACET Sample Question PaperDiunggah olehmadhusrivats
- Introduction to Econometrics- Stock & Watson -Ch 10 Slides.docDiunggah olehAntonio Alvino
- Hierarchical RegressionDiunggah olehWale Afebioye
- Chem EngrDiunggah olehNasir Danial
- Chi Square DistributionDiunggah olehronojoysengupta
- 07 IvkovicDiunggah olehAndi Irma Mutmainnahtul Adawiyah
- Us Environmental Protection Agency-Acute Toxicity Lc50Diunggah olehApoteker Dina Yuspita Sari
- FaultsDiunggah olehFlorina Prisacaru
- Paper PerrinDiunggah olehchampo1812
- Assignment 1 SolDiunggah olehkshambl
- Linear Regression Models for Panel Data Using SAS, STATA, LIMDEP and SPSSDiunggah olehLuísa Martins
- Idft Pasar PaperDiunggah olehjenastoveza
- OperationsDiunggah olehDushyant Chaturvedi
- Chapter 1 - VLE Part 2Diunggah olehRoger Fernandez
- Unit-7Diunggah olehswingbike
- EstimationDiunggah olehmohit_naman
- Normality Tests in SASDiunggah olehdkanand86
- tepid sponge.pdfDiunggah olehkusrini_kadar
- Win LoadDiunggah olehMinh Tran
- Random VariablesDiunggah olehMichael Hsiao
- Math ADiunggah olehFrishian Gail Quijano
- math mcq probabilityDiunggah olehAyushMitra
- Chapter 03Diunggah olehAdnan Akhtar