For qualitative data Tests whether observed frequencies are closely similar to hypothesized expected frequencies. Expected frequencies can be probabilities determined by chance or other values based on theory.
Two Tests
Tests observed frequencies against a null hypothesis of equal or specified proportions. Tests observed frequencies against specified proportions across all cells of two cross-classified variables. Another way of saying this is that it tests for an interaction.
Frequencies
Observed frequencies the obtained frequency for each category in a study. Expected frequencies the hypothesized frequency for each category given a true null hypothesis.
Determine the expected frequencies. Are the differences between the expected and the observed frequencies large enough to qualify as a rare outcome? Calculate the c2 ratio. Compare against the c2 table with appropriate degrees of freedom.
Frequency
Observed (fo) Expected (fe)
O
38 44
A
38 41
B
20 10
AB
4 5
Total
100 100
( fo fe ) c fe
2
Calculating c2
( fo fe )2 c fe
2
(38 44) (38 41) 2 (20 10) 2 (4 5) 2 44 41 10 5 (6) 2 (3) 2 (10) 2 (1) 2 44 41 10 5 36 9 100 1 44 41 10 5 df = categories (c) - 1 .82 .22 10.00 .20
2
11.24
Chi-Square Distribution
Look up the critical value for our df (c-1) and significance level (e.g., p < .05). Is 11.24 greater than 7.81?
If yes, reject the null hypothesis. Conclude blood types are not distributed as in the general population.
About c2
Because differences from expected values are squared, the value of c2 cannot be negative. Because differences are squared, the c2 test is nondirectional. A significant c2 is not necessarily due to big differences, small ones can add up.
Two-Way c2
When observations are crossclassified according to two variables, a two-way test is used. The two-way test examines the relationship between two variables.
Returned Letters
Yes No Total
Downtown
41 19 60
Suburbia
32 38 70
Campus
47 23 70
Total
120 80 200
H0: Type of neighborhood and return rate of lost letters are independent. H1: H0 is false.
fe
fe
Calculating Two-Way c2
Expected frequencies are based on the proportions found in the column and row totals. Degrees of freedom are limited by the column and row totals. Once expected frequencies and df have been found, calculate c2 the same as in a one-way test.
Calculating c2
( fo fe )2 c fe
2
(41 36) 2 (32 42) 2 (47 42) 2 (19 24) 2 (38 28) 2 (23 28) 2 36 42 42 24 28 28 0.69 2.38 .060 1.04 3.57 0.89 9.17
df = (columns 1)(rows 1) df = (3-1)(2-1) = 2 From the Chi Square Table, critical value is 5.99. Our value of 9.17 exceeds 5.99 so reject the null. There is a relationship between neighborhood and letter return rate.
Roughly estimates the proportion of explained variance (predictability) between two qualitative variables.
2 c
c2
n(k 1)
Precautions
Avoid small expected frequencies must be 5 or more. Avoid small sample sizes increases danger of Type II error (retaining a false null hypothesis). Avoid very large sample sizes.
z-test for use with normal distributions when is known. t-test for use with one or two groups, when is unknown. F-test (ANOVA) for comparing means for multiple groups. Chi-square test for use with qualitative data.
How you write the null and alternative hypothesis varies with the design of the study so does the type of statistic. Which table you use to find the critical value depends on the test statistic (t, F, c2, U, T, H). t and z tests can be directional.
Is the design within or between subjects? How many independent variables (IVs or factors) are there?
Summary of t-tests
Single group t-test for one sample compared to a population mean. Independent sample t-test for comparing two groups in a between-subject design. Paired (matched) sample t-test for comparing two groups in a within-subject design.
One-way ANOVA for one IV, independent samples Repeated Measures ANOVA for one or more IVs where samples are repeated, matched or paired. Two-way (factorial) ANOVA for two or more IVs, independent samples. Mixed ANOVA for two or more IVs, between and within subjects.
Tests whether frequencies are equally distributed across the possible categories. Tests whether there is an interaction (relationship) between the two variables.