Anda di halaman 1dari 17

df 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 18 20 24 30 40 60 120

0.10 2.706 4.605 6.251 7.779 9.236 10.645 12.017 13.362 14.684 15.987 17.275 18.549 19.812 21.064 22.307 23.542 25.989 28.412 33.196 40.256 51.805 74.397 140.233

0.05 3.841 5.991 7.815 9.488 11.070 12.592 14.067 15.507 16.919 18.307 19.675 21.026 22.362 23.685 24.996 26.296 28.869 31.410 36.415 43.773 55.758 79.082 146.567

0.025 5.024 7.378 9.348 11.143 12.833 14.449 16.013 17.535 19.023 20.483 21.920 23.337 24.736 26.119 27.488 28.845 31.526 34.170 39.364 46.979 59.342 83.298 152.211

0.01 6.635 9.210 11.345 13.277 15.086 16.812 18.475 20.090 21.666 23.209 24.725 26.217 27.688 29.141 30.578 32.000 34.805 37.566 42.980 50.892 63.691 88.379 158.950

0.005 7.879 10.597 12.838 14.860 16.750 18.548 20.278 21.955 23.589 25.188 26.757 28.300 29.819 31.319 32.801 34.267 37.156 39.997 45.559 53.672 66.766 91.952 163.648

MODULE S7 - CHI SQUARE


The "t" test and the F test described in previous modules are called parametric tests. They assume certain conditions about the parameters of the population from which the samples are drawn. Parametric and nonparametric statistical procedures test hypotheses involving different assumptions. Parametric statistics test hypotheses based on the assumption that the samples come from populations that are normally distributed. Also, parametric statistical tests assume that there is homogeneity of variance (variances within groups are the same). The level of measurement for parametric tests is assumed to be interval or at least ordinal. Nonparametric statistical procedures test hypotheses that do not require normal distribution or variance assumptions about the populations from which the samples were drawn and are designed for ordinal or nominal data. The main weakness of nonparametric tests is that they are less powerful than parametric tests. They are less likely to reject the null hypothesis when it is false. When the assumptions of parametric tests can be met, parametric tests should be used because they are the most powerful tests available. There are, however, certain advantages of nonparametric techniques such as Chi Square (X2). For one thing, nonparametric tests are usually much easier to compute. Another unique value of nonparametric procedures is that they can be used to treat data which have been measured on nominal (classificatory) scales. Such data cannot, on any logical basis, be ordered numerically, hence there is no possibility of using parametric statistical tests which require numerical data. The general pattern of nonparametric procedures is much like that seen with parametric tests, namely, certain sample data are treated by a statistical model which yields a value or statistic. This value is then interpreted for the likelihood of its chance occurrence according to some type of statistical probability distribution. With Chi Square, a value is calculated from the data using Chi Square procedures and then compared to a critical value from a Chi Square table with degrees of freedom corresponding to that of the data. If the calculated value is equal to or greater than the critical value (table value), the null hypothesis is rejected. If the calculated value is less than the critical value, the null hypothesis (Ho) is accepted. This procedure is similar to that used with the "t" test and F test.

Purpose of Chi Square The Chi Square (X2) test is undoubtedly the most important and most used member of the nonparametric family of statistical tests. Chi Square is employed to test the difference between an actual sample and another hypothetical or previously established distribution such as that which may be expected due to chance or probability. Chi Square can also be used to test differences between two or more actual samples. Basic Computational Equation

Example:

A Observed responses (Fo) Expected responses (Fe) Fo - Fe (Fo - Fe)2 8 (10) -2 4

U 8 (10) -2 4

D 14 (10) 4 16

.4

.4

1.6

responses, people or objects falling in each category designated by the researcher. The expected number is what the researcher expects by chance or according to some null hypothesis. Example of a One-Way Classification (with Yates Correction): Suppose that we flip a coin 20 times and record the frequency of occurrence of heads and tails. We know from the laws of probability that we should expect 10 heads and 10 tails. We also know that because of sampling error we could easily come up with 9 heads and 11 tails or 12 heads and 8 tails. Let us suppose our coin-flipping experiment yielded 12 heads and 8 tails. We would enter our expected frequencies (10 - 10) and our observed frequencies (12 - 8) in a table. Observed Expected (Fo-Fe-0.5) (Fo-Fe-0.5)2

2.4

Degrees of freedom - (number of levels - 1) = 2 X2.05 = 5.991 2.4 < 5.991 Therefore, accept null hypothesis.

When there is only one degree of freedom, an adjustment known as Yates correction for continuity must be employed. To use this correction, a value of 0.5 is subtracted from the absolute value (irrespective of algebraic sign) of the numerator contribution of each cell to the above basic computational formula. The basic chi square computational formula then becomes:

Heads Tails

12 8 20

10 10 20

1.5 -1.5

2.25 2.25

0.225 0.225 0.450

One-Way Classification
The One-Way Classification (or sometimes referred to as the Single Sample Chi Square Test) is one of the most frequently reported nonparametric tests in journal articles. The test is used when a researcher is interested in the number of responses, objects, or people that fall in two or more categories. This procedure is sometimes called a goodness-of-fit statistic. Goodness-of-fit refers to whether a significant difference exists between an observed number and an expected number of

The calculation of x in a one-way classification (Yates Correction) is very straight forward. The expected frequency in a category ("heads") is subtracted from the observed frequency, and since Yates Correction is being used, 0.5 is subtracted from the absolute value of Fo - Fe, the difference is squared, and the square is divided by its expected frequency. This is repeated for the remaining categories, and as the formula for x2 indicates, these results are summed for all categories. How does a calculated X2 of 0.450 tell us if our observed results of 12 heads and 8 tails represent a significant deviation from an expected 10-10 split? The shape of the Chi Square sampling distribution depends upon the number of degrees of freedom. The degrees of freedom for a one-way classification X2 is r - 1, where r is the number of levels. In our problem above r = 2, so there would obviously be 1 degree of

freedom. From our statistical reference tables, a X2 of 3.84 or greater is needed for X2 to be significant at the .05 level, so we conclude that our X2 of 0.450 in the coinflipping experiment could have happened by sampling error and the deviations between the observed and expected frequencies are not significant. We would expect any data set yielding a calculated X2 value less than 3.84 with one degree of freedom at least 5% of the time due to chance alone. Therefore, the observed difference is not statistically significant at the .05 level.

Example - For the cell "Boys - A", the corresponding row subtotal = 100, the corresponding column subtotal = 100, and the total number of observations = 200. NOTE: Row subtotals and column subtotals must have equal sums, and total expected frequencies must equal total observed frequencies.

Two-Way Classification
The two-way Chi Square is a convenient technique for determining the significance of the difference between the frequencies of occurrence in two or more categories with two or more groups. For example, we may see if there is any difference in the number of freshmen, sophomores, juniors, or seniors in regards to their preference for spectator sports (football, basketball, or baseball). This is called a two-way classification since we would need two bits of information from the students in the sample, their class and their sports preference. Example of a Two-Way Classification Suppose an investigator wishes to see if 20 boys and girls respond differently to an attitudinal question regarding the educational value of extracurricular activities and observed the following (A = very valuable, U = uncertain, and D = little value). Boys A = 60 U = 20 D = 20 Girls A = 40 U = 0 D = 60 Expected frequencies (Fe) for each cell are determined by the following formula. Boys Girls Column Subtotals 60 40 100

A (50) (50) 20 0 20

U (10) (10) 20 60 80

D (40) (40)

Row Subtotals 100 100 200

Degrees of Freedom = (Rows - 1)(Columns - 1) = (2 - 1)(3 - 1) = 2 Table value of X2.05 with 2 degrees of freedom = 5.991 Therefore, reject null hypothesis.

Degrees of Freedom
A value of X2 cannot be evaluated unless the number of degrees of freedom associated with it is known. The number of degrees of freedom associated with any X2 may be easily computed.

If there is one independent variable, df = r - 1 where r is the number of levels of the independent variable. If there are two independent variables, df = (r - l) (s - l) where r and s are the number of levels of the first and second independent variables, respectively. If there are three independent variables, df = (r - l) (s - 1) (t - 1) where r, s, and t are the number of levels of the first, second, and third independent variables, respectively. Assumptions

2. Write the basic computational equation for Chi Square. 3. Explain the difference between a one-way classification and a two-way classification. 4. How do you compute the degrees of freedom for the following: X2 with one independent variable X2 with two independent variables X2 with three independent variables 6. What purpose does Yates correction for continuity serve?

Even though a nonparametric statistic does not require a normally distributed population, there still are some restrictions regarding its use. 1. Representative sample (Random)

7. What are some of the major differences between parametric and nonparametric statistics? 8. What level of measurement is required for Chi Square?

2. The data must be in frequency form (nominal data) or greater. 3. The individual observations must be independent of each other. 4. Sample size must be adequate. In a 2 x 2 table, Chi Square should not be used if n is less than 20. In a larger table, no expected value should be less than 1, and not more than 20% of the variables can have expected values of less than 5. 5. Distribution basis must be decided on before the data is collected. 6. The sum of the observed frequencies must equal the sum of the expected frequencies. General - In the population being sampled, the proportions of people in each category are equal. Ho: P1 = P2 = P3 Specific - In the population being sampled, equal proportions of people prefer baseball, basketball, and football. 9. A public opinion polling team in a small town was interested in the type of sporting events that adults in the age bracket of 20-50 years prefer to watch on TV. A random sample of 120 was selected and asked, "Given your preference, would you prefer to watch baseball (P1), basketball (P2), or football (P3) on TV?" Of the respondent, 39 indicated a preference for baseball, 25 selected basketball, and 56 selected football. Null Hypothesis

SELF ASSESSMENT
1. Explain the purpose and importance of Chi Square as a nonparametric statistic. Observed responses (Fo)

P1

P2

P3

Expected responses (Fe)

Fo - Fe (Fo - Fe)2

Degrees of freedom - (number of levels - 1) = X2.05 = Ho = Accept or Reject? 10. In a school with a merit system for pay raises a random sample of the faculty were asked if they wished that system to be continued. Of the 10 faculty members responding, 7 wanted to continue and 3 did not want to continue. Use the onesample case technique with Yates correction and determine if the difference in proportions are statistically significant at the 0.05 level.

(FoObserved Expected Fe0.5) Continue Discontinue

(Fo-Fe-0.5)2

Degrees of freedom - (number of levels - 1) = X2.05 = Ho = Accept or Reject?

11. A representative of a major university was interested in how undergraduate males and females responded differently to a question regarding a proposed athletic fee. Of the 100 males and 100 females who responded, 20 males and 60 females agreed, 70 males and 20 females disagreed, and 10 males and 20 females were undecided.

A Males Females Column Subtotals

Row Subtotals

Degrees of Freedom = (Rows - 1)(Columns - 1) = X2.05 = Ho = Accept or Reject?

The Chi Square Statistic


Types of Data: There are basically two types of random variables and they yield two types of data: numerical and categorical. A chi square (X ) statistic is used to investigate whether distributions of categorical variables differ from one another. Basically categorical variable yield data in the categories and numerical variables yield data in numerical form. Responses to such questions as "What is your major?" or Do you own a car?" are categorical because they yield data such as "biology" or "no." In contrast, responses to such questions as "How tall are you?" or "What is your G.P.A.?" are numerical. Numerical data can be either discrete or continuous. The table below may help you see the differences between these two variables.
2

Data Type Categorical Numerical Numerical

Question Type What is your sex? Disrete- How many cars do you own? Continuous - How tall are you?

Possible Responses male or female two or three 72 inches

Notice that discrete data arise fom a counting process, while continuous data arise from a measuring process. The Chi Square statistic compares the tallies or counts of categorical responses between two (or more) independent groups. (note: Chi square tests can only be used on actual numbers and not on percentages, proportions, means, etc.) 2 x 2 Contingency Table

There are several types of chi square tests depending on the way the data was collected and the hypothesis being tested. We'll begin with the simplest case: a 2 x 2 contingency table. If we set the 2 x 2 table to the general notation shown below in Table 1, using the letters a, b, c, and d to denote the contents of the cells, then we would have the following table: Table 1. General notation for a 2 x 2 contingency table. Variable 1
Variable 2 Category 1 Category 2 Total Data type 1 a c a+c Data type 2 b d b+d Totals a+b c+d a+b+c+d=N

For a 2 x 2 contingency table the Chi Square statistic is calculated by the formula:

Note: notice that the four components of the denominator are the four totals from the table columns and rows. Suppose you conducted a drug trial on a group of animals and you hypothesized that the animals receiving the drug would show increased heart rates compared to those that did not receive the drug. You conduct the study and collect the following data: Ho: The proportion of animals whose heart rate increased is independent of drug treatment. Ha: The proportion of animals whose heart rate increased is associated with drug treatment.

Table 2. Hypothetical drug trial results.


Heart Rate No Heart Rate Total Increased Increase

Treated Not treated Total

36 30 66

14 25 39

50 55 105

Applying the formula above we get: Chi square = 105[(36)(25) - (14)(30)] / (50)(55)(39)(66) = 3.418
2

Before we can proceed we eed to know how many degrees of freedom we have. When a comparison is made between one sample and another, a simple rule is that the degrees of freedom equal (number of columns minus one) x (number of rows minus one) not counting the totals for rows or columns. For our data this gives (2-1) x (2-1) = 1. We now have our chi square statistic (x2 = 3.418), our predetermined alpha level of significance (0.05), and our degrees of freedom (df = 1). Entering the Chi square distribution table with 1 degree of freedom and reading along the row we find our value of x2 (3.418) lies between 2.706 and 3.841. The corresponding probability is between the 0.10 and 0.05 probability levels. That means that the p-value is above 0.05 (it is actually 0.065). Since a p-value of 0.65 is greater than the conventionally accepted significance level of 0.05 (i.e. p > 0.05) we fail to reject the null hypothesis. In other words, there is no statistically significant difference in the proportion of animals whose heart rate increased. What would happen if the number of control animals whose heart rate increased dropped to 29 instead of 30 and, consequently, the number of controls whose hear rate did not increase changed from 25 to 26? Try it. Notice that the new x2 value is 4.125 and this value exceeds the table value of 3.841 (at 1 degree of freedom and an alpha level of 0.05). This means that p < 0.05 (it is now0.04) and we reject the null hypothesis in favor of the alternative hypothesis - the heart rate of animals is different between the treatment groups. When p < 0.05 we generally refer to this as a significant difference. Table 3. Chi Square distribution table. probability level (alpha)
Df 1 0.5 0.10 0.05 0.02 5.412 0.01 6.635 0.001 10.827

0.455 2.706 3.841

2 3 4 5

1.386 4.605 5.991 2.366 6.251 7.815 3.357 7.779 9.488

7.824 9.837

9.210

13.815

11.345 16.268

11.668 13.277 18.465

4.351 9.236 11.070 13.388 15.086 20.517

To make the chi square calculations a bit easier, plug your observed and expected values into the following applet. Click on the cell and then enter the value. Click the compute button on the lower right corner to see the chi square value printed in the lower left hand coner.

Chi Square Goodness of Fit (One Sample Test) This test allows us to compae a collection of categorical data with some theoretical expected distribution. This test is often used in genetics to compare the results of a cross with the theoretical distribution based on genetic theory. Suppose you preformed a simpe monohybrid cross between two individuals that were heterozygous for the trait of interest. Aa x Aa The results of your cross are shown in Table 4.

Table 4. Results of a monohybrid coss between two heterozygotes for the 'a' gene.

A a Totals

A 10 33 43

a 42 15 57

Totals 52 48 100

The penotypic ratio 85 of the A type and 15 of the a-type (homozygous recessive). In a monohybrid cross between two heterozygotes, however, we would have predicted a 3:1 ratio of phenotypes. In other words, we would have expected to get 75 A-type and 25 a-type. Are or resuls different?

Calculate the chi square statistic x by completing the following steps:


2

1. 2. 3. 4.

For each observed number in the table subtract the corresponding expected number (O E). Square the difference [ (O E) ]. Divide the squares obtained for each cell in the table by the expected number for that cell [ (O - E) / E ]. Sum all the values for (O - E) / E. This is the chi square statistic.
2 2 2

For our example, the calculation would be:


Observed Expected (O E) (O E)2 (O E)2/ E

Atype

85

75

10

100

1.33

a-type 15

25

10

100

4.0

Total

100

100

5.33

x2 = 5.33 We now have our chi square statistic (x = 5.33), our predetermined alpha level of significalnce (0.05), and our degrees of freedom (df =1). Entering the Chi square distribution table with 1 degree of freedom and reading along the row we find our value of x 5.33) lies between 3.841 and 5.412. The corresponding probability is 0.05<P<0.02. This is smaller than the conventionally accepted significance level of 0.05 or 5%, so the null hypothesis that the two distributions are the same is rejected. In other words, when the computed x statistic exceeds the critical value in the table for a 0.05 probability level, then we can reject the null hypothesis of equal distributions. Since our x2 statistic (5.33) exceeded the critical value for 0.05 probability level (3.841) we can reject the null hypothesis that the observed values of our cross are the same as the theoretical distribution of a 3:1 ratio.
2 2 2

Table 3. Chi Square distribution table. probability level (alpha)


Df 1 2 3 4 5 0.5 0.10 0.05 0.02 5.412 7.824 9.837 0.01 6.635 9.210 0.001 10.827 13.815

0.455 2.706 3.841 1.386 4.605 5.991 2.366 6.251 7.815 3.357 7.779 9.488

11.345 16.268

11.668 13.277 18.465

4.351 9.236 11.070 13.388 15.086 20.517

To put this into context, it means that we do not have a 3:1 ratio of A_ to aa offspring. To make the chi square calculations a bit easier, plug your observed and expected values into the following java applet.

Click on the cell and then enter the value. Click the compute button on the lower right corner to see the chi square value printed in the lower left hand coner.

Chi Square Test of Independence For a contingency table that has r rows and c columns, the chi square test can be thought of as a test of independence. In a test ofindependence the null and alternative hypotheses are: Ho: The two categorical variables are independent. Ha: The two categorical variables are related. We can use the equation Chi Square = the sum of all the (fo - fe) / fe
2

Here fo denotes the frequency of the observed data and fe is the frequency of the expected values. The general table would look something like the one below:
Category Category Category I II III Sample A a b c Sample B d e f Sample C g h i Column a+d+g b+e+h c+f+i Row Totals a+b+c d+e+f g+h+i a+b+c+d+e+f+g+h+i=N

Totals

Now we need to calculate the expected values for each cell in the table and we can do that using the the row total times the column total divided by the grand total (N). For example, for cell a the expected value would be (a+b+c)(a+d+g)/N. Once the expected values have been calculated for each cell, we can use the same procedure are before for a simple 2 x 2 table.
Observed Expected |O (O E)2 E|
(O E)2/ E

Suppose you have the following categorical data set. Table . Incidence of three types of malaria in three tropical regions.
Asia Africa Malaria A Malaria B Malaria C Totals 31 2 53 86 14 5 45 64 South America 45 53 2 100 Totals 90 60 100 250

We could now set up the following table:


Observed Expected |O -E| (O E)2 31 30.96 0.04 0.0016
(O E)2/ E

0.0000516

14 45 2 5 53 53 45 2

23.04 36.00 20.64 15.36 24.00 34.40 25.60 40.00

9.04 9.00 18.64 10.36 29.00 18.60 19.40 38.00

81.72 81.00 347.45 107.33 841.00 345.96 376.36 1444.00

3.546 2.25 16.83 6.99 35.04 10.06 14.70 36.10

Chi Square = 125.516 Degrees of Freedom = (c - 1)(r - 1) = 2(2) = 4 Table 3. Chi Square distribution table. probability level (alpha)
Df 1 2 3 4 5 0.5 0.10 0.05 0.02 5.412 7.824 9.837 0.01 6.635 9.210 0.001 10.827 13.815

0.455 2.706 3.841 1.386 4.605 5.991 2.366 6.251 7.815 3.357 7.779 9.488

11.345 16.268

11.668 13.277 18.465

4.351 9.236 11.070 13.388 15.086 20.517

Reject Ho because 125.516 is greater than 9.488 (for alpha = 0.05) Thus, we would reject the null hypothesis that there is no relationship between location and type of malaria. Our data tell us there is a relationship between type of malaria and location, but that's all it says.

Follow the link below to access a java-based program for calculating Chi Square statistics for contingency tables of up to 9 rows by 9 columns. Enter the number of row and colums in the spaces provided on the page and click the submit button. A new form will appear asking you to enter your actual data into the cells of the contingency table. When finished entering your data, click the "calculate now" button to see the results of your Chi Square analysis. You may wish to print this last page to keep as a record.

Anda mungkin juga menyukai