Anda di halaman 1dari 14

Categorical Data Analysis

Biostat 6651 Lecture 2 Fall 2013 30 Sep 2013 Dr. Lynn Eudey

Tables and Contingency Tables

We can perform inference from tables


Univariate

Estimate proportions within each category Test whether the observed proportions fit our expectations

Bivariate

(or higher) Contingency Tables

Estimate proportions within each cell or marginal categories Test for independence of variables

Hypothesis Testing the steps


1. 2. 3.

4.

5.

6. 7.

Set up Null and Alternative Hypothesis Use statistical notation Translation of your hypothesis about the population Set or level of significance Choose an appropriate test statistic State a decision rule Default: if p-value < then Reject H0 Collect your data and calculate observed value of the test statistic and p-value State a statistical decision Translate the statistical decision into everyday English

Example

M&Ms claims

24% blue, 20% orange, 16% green, 14% yellow, 13% red, 13% brown We want to test this claim Count the number of each color out of 50 M&Ms from a large bag of well-mixed M&Ms
13 blue, 8 orange, 10 green, 6 yellow, 8 red, 5 brown

Test
Our proportions of the colors are not exactly what we would expect, but they are close.
Test Statistic: Chi-squared =
( ;)2

Test

See chalkboard
This is called a Goodness-of-fit test because we are seeing whether our data fit our proposed distribution of colors

Bivariate Data

Both variables are categorical


Contingency

Table Tests for Independence

Independence:
If

the probabilities of one variable do not change over the different categories of the second variable then the two variables are called independent

Bivariate Data

Independence:
If

the probabilities of one variable do not change over the different categories of the second variable then the two variables are called independent Two variables are independent if and only if = : :

Example: Seat belt use and fatality in accidents


Injury Fatal
Seat Belt On No Seat Belt

Not Fatal 412,527

510

1601

162,527

Test for Independence

H0: = : :
HA: : :

Use 5% level of significance Use Chi-squared statistic, has 1 d.f. Decide to Reject H0 if observed Chi-squared > 3.8416

Calculated Chi-Square

Row margins
Seat

belt on for 413,037/ 577,165 or 0.7156 No Seat belt on for 164,128/577,165 or 0.2844

Column margins
Fatal

for 2111/577,165 or 0.0037 Not Fatal for 575,054/577,165 or 0.9963

Expected Frequencies
Injury Fatal
Seat Belt On No Seat Belt

Not Fatal
411,509 163,545

1528 583

Deviations
Injury Fatal
Seat Belt On No Seat Belt

Not Fatal
1018 -1018

-1018 1018

Chi-square statistic

2 =

10182 10182 10182 10182 + + + 1528 411509 583 163545

= 678.22 +++

Highly significant Reject H0 Wearing seat belts and fatality at accidents are definitely not independent. You are more likely to suffer a fatal accident (if you are involved in an accident) if you are not wearing a seat belt.