Goodness of Fit
Chi-Square Distributions
Several important statistical tests use a probability distribution known as chi square, denoted .
for 1 or 2 d.f. for 3 or more d.f.
0 0 is a family of distributions. The graph of the distribution depends on the number of degrees of freedom (number of free choices) in a statistical experiment. The distributions are skewed right and are not symmetric. The value of is greater than or equal to 0.
First Marriage Bride and Groom Bride only Groom only Neither
% 50 12 14 24
H0: The distribution of first-time marriages is 50% for both bride and groom, 12% for the bride only, 14% for the groom only. 24% are remarriages for both.
H1: The distribution of first-time marriages differs from the claimed distribution.
Goodness-of-Fit Test
Observed frequency, O, is the frequency of the category found in the sample. Expected frequency, E, is the calculated frequency for the category using the specified distribution. Ei = npi
In a survey of 103 married couples, find the E = expected number in each category. First Marriage % E = np
50 12 14 24
Chi-Square Test
If the observed frequencies are obtained from a random sample and each expected frequency is at least 5, the sampling distribution for the goodnessof-fit test is a chi-square distribution with k 1 degrees of freedom (where k = the number of categories).
A social service organization claims 50% of all marriages are the first marriage for both bride and groom, 12% are first for the bride only, 14% for the groom only, and 24% a remarriage for both. The results of a study of 103 randomly selected married couples are listed in the table. Test the distribution claimed by the agency. Use . First Marriage Bride and Groom Bride only Groom only Neither 1. Write the null and alternative hypothesis. H0: The distribution of first-time marriages is 50% for both bride and groom, 12% for the bride only, 14% for the groom only. 24% are remarriages for both. Ha: The distribution of first-time marriages differs from the claimed distribution. 2. State the level of significance. f 55 12 12 24
3. Determine the sampling distribution. A chi-square distribution with 4 1 = 3 d.f. 4. Find the critical value.
5. Find the rejection region. 0 11.34 2
6. Find the test statistic. % 50 12 14 24 100 O 55 12 12 24 103 E 51.5_ 12.36 14.42 24.72 103.__ (O E)2 (O E) 2/E 12.25__ 0.2379 0.1296 0.0105 5.8564 0.4061 0.5184 0.0210 0.6755 = 0.6755
11.34
7. Make your decision.
The test statistic 0.6755 does not fall in the rejection region, so fail to reject H0. 8. Interpret your decision. The distribution fits the specified distribution for first-time marriages.
Section 10.2
Independence
Average 81 75 156
Superior 9 19 28
Expected Values
Assuming the variables are independent, then the expected value of each cell is:
All other expected values can be found by subtracting from the total of the row or the column. Low Male Female 18.33 17.67 Average 79.42 76.58 Superior 14.25 13.75
Total
112
108
220
Total
36
156
28
Sampling Distribution
The sampling distribution is a distribution with degrees of freedom equal to: (Number of rows 1) (Number of columns 1)
Example: Find the sampling distribution for a test of independence that has a contingency table of 4 rows and 3 columns.
The sampling distribution is a ( 4 1) (3 1) = 32 = 6 d.f. distribution with
Application
The following table reflects the gender and job performance evaluation of 220 accountants. Test the claim that gender and job performance are independent. Use .
Average 81 75 156
Superior 9 19 28
Total
112 108 220
1. Write the null and alternative hypothesis. H0: Gender and job performance are independent. Ha: Gender and job performance are not independent. 2. State the level of significance.
3. Determine the sampling distribution. Since there are 2 rows and 3 columns, the sampling distribution is a chi-square distribution with (2 1)(3 1) = 2 d.f.
Chi-Square Test
(O E)2
(O E)2/E
22 81 9 14 75 19 220
= 5.51
The test statistic, 5.51, does not fall in the rejection region, so fail to reject H0.
8. Interpret your decision.
Gender and job evaluation are independent variables. Do not hire accountants based on their gender, since gender does not influence job performance levels.
Analysis of Variance
ANOVA
One-way analysis of variance (ANOVA) is a hypothesis testing technique that is used to compare means from three or more populations. H0: (All population means are equal.) Ha: At least one of the means is different from the others.
The variance is calculated in two different ways and the ratio of the two values is formed.
1. MSB, Mean Square Between, the variance between samples, measures the differences related to the treatment given to each sample. 2. MSW , Mean Square Within, the variance within samples, measures the differences related to entries within the same sample. The variance within samples is due to sampling error.
If MSB is close in value to MSW, the variation is not attributed to different effects the different treatments have on the variable. The ratio of the two measures (F-ratio) is close to 1. If MSB is significantly greater than MSW, the variation is probably due to differences in the treatments or factors, and the F-ratio will differ significantly from 1.
Analysis of Variance
The table shows the annual amount spent on reading (in $) for a random sample of American consumers from four regions. At , can you conclude that the mean annual amounts spent are different?
Northeast
Midwest
South
308 246 103 58 169 143 246 141 164 109 158 119 220 167 99 144 76 214 316 108 1. Write the null and alternative hypothesis.
H0: (All population means are equal.) Ha: At least one of the means is different from the others.
2. State the level of significance. 3. Determine the sampling distribution. An F distribution with d.f.N = 3, d.f.D = 23
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0
2.34
Calculate the mean and variance for each sample. 210.14 177.00 135.71 1020.80 4050.05 1741.39
Calculate the mean of all values.
mean
1 2 3 4 185.14 177.00 135.71 210.14
n
7 6 7 7 66.26 0.00 1704.86 1098.26 463.8 0.0 11934.0 7687.8
s2
1 2 3 4
7 6 7 7
0.10
0 1 2
2.53
8. Interpret your decision. There is not enough evidence to support the claim that the means are not all equal. Expenses for reading are the same for all four regions.
Minitab Output
One-way Analysis of Variance
Source Factor Error Total
P 0.215
Using the P-value method, fail to reject the null hypothesis, since 0.215 > 0.10. There is not enough evidence to support that the amount spent on reading is different in different regions.