Anda di halaman 1dari 28

Chi-Square Tests and the ANOVA

Goodness of Fit

Chi-Square Distributions
Several important statistical tests use a probability distribution known as chi square, denoted .
for 1 or 2 d.f. for 3 or more d.f.

0 0 is a family of distributions. The graph of the distribution depends on the number of degrees of freedom (number of free choices) in a statistical experiment. The distributions are skewed right and are not symmetric. The value of is greater than or equal to 0.

Chi-Square Test for Goodness-of-Fit


Example: A social service organization claims 50% of all marriages are the first marriage for both bride and groom, 12% are first for the bride only, 14% for the groom only and 24% a remarriage for both.

First Marriage Bride and Groom Bride only Groom only Neither

% 50 12 14 24

H0: The distribution of first-time marriages is 50% for both bride and groom, 12% for the bride only, 14% for the groom only. 24% are remarriages for both.
H1: The distribution of first-time marriages differs from the claimed distribution.

Goodness-of-Fit Test
Observed frequency, O, is the frequency of the category found in the sample. Expected frequency, E, is the calculated frequency for the category using the specified distribution. Ei = npi
In a survey of 103 married couples, find the E = expected number in each category. First Marriage % E = np

Bride and Groom Bride only Groom only Neither

50 12 14 24

103(.50) = 51.50 103(.12) = 12.36 103(.14) = 14.42 103(.24) = 24.72

Chi-Square Test
If the observed frequencies are obtained from a random sample and each expected frequency is at least 5, the sampling distribution for the goodnessof-fit test is a chi-square distribution with k 1 degrees of freedom (where k = the number of categories).

The test statistic is:

O = observed frequency in each category E = expected frequency in each category

A social service organization claims 50% of all marriages are the first marriage for both bride and groom, 12% are first for the bride only, 14% for the groom only, and 24% a remarriage for both. The results of a study of 103 randomly selected married couples are listed in the table. Test the distribution claimed by the agency. Use . First Marriage Bride and Groom Bride only Groom only Neither 1. Write the null and alternative hypothesis. H0: The distribution of first-time marriages is 50% for both bride and groom, 12% for the bride only, 14% for the groom only. 24% are remarriages for both. Ha: The distribution of first-time marriages differs from the claimed distribution. 2. State the level of significance. f 55 12 12 24

3. Determine the sampling distribution. A chi-square distribution with 4 1 = 3 d.f. 4. Find the critical value.
5. Find the rejection region. 0 11.34 2

6. Find the test statistic. % 50 12 14 24 100 O 55 12 12 24 103 E 51.5_ 12.36 14.42 24.72 103.__ (O E)2 (O E) 2/E 12.25__ 0.2379 0.1296 0.0105 5.8564 0.4061 0.5184 0.0210 0.6755 = 0.6755

Bride and groom Bride only Groom only Neither Total

11.34
7. Make your decision.

The test statistic 0.6755 does not fall in the rejection region, so fail to reject H0. 8. Interpret your decision. The distribution fits the specified distribution for first-time marriages.

Section 10.2

Independence

Test for Independence


A chi-square test may be used to determine whether two variables (i.e., gender and job performance) are independent. Two variables are independent if the occurrence of one of the variables does not affect the occurrence of the other.
The following contingency table reflects the gender and job performance evaluation of 220 accountants.

Low Male Female Total 22 14 36

Average 81 75 156

Superior 9 19 28

Total 112 108 220

Expected Values
Assuming the variables are independent, then the expected value of each cell is:

E1,1 = (112)(36)/220 = 18.33

E1,2 = (112)(156)/220 = 79.42

All other expected values can be found by subtracting from the total of the row or the column. Low Male Female 18.33 17.67 Average 79.42 76.58 Superior 14.25 13.75

Total
112

108
220

Total

36

156

28

Sampling Distribution
The sampling distribution is a distribution with degrees of freedom equal to: (Number of rows 1) (Number of columns 1)

Example: Find the sampling distribution for a test of independence that has a contingency table of 4 rows and 3 columns.
The sampling distribution is a ( 4 1) (3 1) = 32 = 6 d.f. distribution with

Application
The following table reflects the gender and job performance evaluation of 220 accountants. Test the claim that gender and job performance are independent. Use .

Low Male Female Total 22 14 36

Average 81 75 156

Superior 9 19 28

Total
112 108 220

1. Write the null and alternative hypothesis. H0: Gender and job performance are independent. Ha: Gender and job performance are not independent. 2. State the level of significance.

3. Determine the sampling distribution. Since there are 2 rows and 3 columns, the sampling distribution is a chi-square distribution with (2 1)(3 1) = 2 d.f.

4. Find the critical value. 5. Find the rejection region. 5.99

6. Find the test statistic.

Chi-Square Test

(O E)2

(O E)2/E

22 81 9 14 75 19 220

18.33 79.42 14.25 17.67 76.58 13.75 220.00

13.49 2.50 27.61 13.49 2.50 27.61

0.74 0.03 1.94 0.76 0.03 2.01 5.51

= 5.51

5.99 7. Make your decision.

The test statistic, 5.51, does not fall in the rejection region, so fail to reject H0.
8. Interpret your decision.

Gender and job evaluation are independent variables. Do not hire accountants based on their gender, since gender does not influence job performance levels.

Analysis of Variance

ANOVA
One-way analysis of variance (ANOVA) is a hypothesis testing technique that is used to compare means from three or more populations. H0: (All population means are equal.) Ha: At least one of the means is different from the others.
The variance is calculated in two different ways and the ratio of the two values is formed.

1. MSB, Mean Square Between, the variance between samples, measures the differences related to the treatment given to each sample. 2. MSW , Mean Square Within, the variance within samples, measures the differences related to entries within the same sample. The variance within samples is due to sampling error.

Mean Square Between


Each group is given a different treatment. The variation from the grand mean (mean of all values in all groups) is measured. The treatment (or factor) is the variable that distinguishes members of one sample from another. First calculate SSB and then divide by k 1, the degrees of freedom. (k = the number of treatments or factors.)

Mean Square Within


Calculate SSW and divide by N k, the degrees of freedom.

If MSB is close in value to MSW, the variation is not attributed to different effects the different treatments have on the variable. The ratio of the two measures (F-ratio) is close to 1. If MSB is significantly greater than MSW, the variation is probably due to differences in the treatments or factors, and the F-ratio will differ significantly from 1.

Analysis of Variance
The table shows the annual amount spent on reading (in $) for a random sample of American consumers from four regions. At , can you conclude that the mean annual amounts spent are different?

Northeast

Midwest

South

West 223 184 221 269 199 171 204

308 246 103 58 169 143 246 141 164 109 158 119 220 167 99 144 76 214 316 108 1. Write the null and alternative hypothesis.

H0: (All population means are equal.) Ha: At least one of the means is different from the others.

2. State the level of significance. 3. Determine the sampling distribution. An F distribution with d.f.N = 3, d.f.D = 23
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0

4. Find the critical value.

5. Find the rejection region.


0.10
0 1 2 3 4 5

2.34

6. Find the test statistic.


Northeast 308 58 141 109 220 144 316 Midwest 246 169 246 158 167 76 South 103 143 164 119 99 214 108 West 223 184 221 269 199 171 204

Calculate the mean and variance for each sample. 210.14 177.00 135.71 1020.80 4050.05 1741.39
Calculate the mean of all values.

Mean Square Between

mean
1 2 3 4 185.14 177.00 135.71 210.14

n
7 6 7 7 66.26 0.00 1704.86 1098.26 463.8 0.0 11934.0 7687.8

s2

1 2 3 4

7 6 7 7

9838.66 4050.05 1741.39 1020.80

59031.9 20250.2 10448.4 6124.8

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0

0.10
0 1 2

2.53

7. Make your decision.


Since F = 1.669 does not fall in the rejection region, fail to reject the null hypothesis.

8. Interpret your decision. There is not enough evidence to support the claim that the means are not all equal. Expenses for reading are the same for all four regions.

Minitab Output
One-way Analysis of Variance
Source Factor Error Total

Analysis of Variance DF SS MS F 3 20085 6695 1.61 23 95857 4168 26 15942

P 0.215

Using the P-value method, fail to reject the null hypothesis, since 0.215 > 0.10. There is not enough evidence to support that the amount spent on reading is different in different regions.

Anda mungkin juga menyukai