I. Description
Analysts use statistical models to reflect data they have observed. Occasionally,
questions regarding the datas accuracy come up. Specifically, questions of how close
the actual values are, to those fitted into the statistical models are formulated.
To remedy this, the Chi-square Goodness of Fit Test is used. It compares
expected frequencies in a population, to observed frequencies of the sample.
Furthermore, it calculates the probability of a sample, coming from a population,
achieving expected proportions in each category. Basically, it is a test commonly used to
compare observed, actual data, to data that would be expected to be obtained, based on a
specific hypothesis.
II. Advantages
1. In comparison to the other non-parametric tests, values for comparing the frequencies
are easier to compute for.
2. Comparison and contrast of data between the model and frequencies can be observed
in this non-parametric test.
3. Chi-square Goodness of Fit Test can be used in any type of distribution - such as
normal, discrete or continuous distribution.
III. Disadvantages
1. The data must be a frequency data. Data values must be presented in numerical values
and not percentages or ratios.
2. Chi-square statistic is sensitive to sample size. It is highly recommended that Chisquare Goodness of Fit Test is not be used if the sample size is less than 50, for it may
give inaccurate results. In these cases, a randomization test or an exact test may be used
instead.
3. Very insufficient amount of information on the strength of the relationship between the
model and data values can be solved using the Chi-square Goodness of Fit Test.
4. Data values must be independent elements. This means that one value cannot be under
two populations or categories.
The Chi-Square Goodness of Fit test is used when dealing with nominal
variables, with two or more possible values. Nominal values being attributes or
categories, such as gender, color, and shape. Observed frequencies of these nominal
variables are compared to the expected frequencies.
V. How to Use
1. Formulate the hypotheses. One being null, the other, alternative (H and H ).
0
VI. Equations
= (o -e ) /e
2
i=1
Where:
= random variable / test statistic
o = observed frequency of the ith cell
e = expected frequency of the ith cell
k = # of categories/groupings
df = (# of columns -1) (# of rows 1)
2
Actual Count
56
44
Expected Count
50
50
o
a
= (o -e ) /e
2
i=1
Since 1.44 is less than 3.841, the decision is to fail to reject the null hypothesis. The
claim that there is a 50:50 ratio between employees educated in a pubilc elementary
school and those educated in a private elementary school is true.