Chi-Square Applications
Characteristics of
The Chi-Square Distribution
It is never negative
There is a family of chi-square
distributions
The shape of the chi-square distribution does
not depend on the size of the sample, but
the number of categories used (k)
It is positively skewed
As the number of both d.f. increases, the
distribution begins to approximate the
normal distribution
2-2
CHI-SQUARE DISTRIBUTION
df = 3
df = 5
df = 10
c2
Chi-Square Test
Compare several proportion (Multinomial
Test)
One of nonparametric or distribution-free
tests of hypothesis
Data : nominal-scale or ordinal-scale
The test statistic is :
x
2 f0 fe
2
fe
cont..
Chi-square test is used to :
Test whether an observed set of frequencies
could have come from a hypothesized
population distribution
Determine whether the sample observations
come from a particular distribution such as
the normal distribution
Contingency table analysis, is used to test
whether two traits or characteristics are
related (Test of Independency)
Goodness-of-Fit Test:
Equal Expected Frequencies
The purpose of Goodness-of-Fit Test is to
compare an observed set of frequencies (fo)
to an expected set of frequencies (fe).
Ho : no difference between fo and fe
H1 : there is a difference between fo & fe
The critical value is a chi-square value with
(k - 1) degrees of freedom, where k is the
number of categories
Contoh : Penjualan Kaos Pemain Sepak Bola
Jumlah Jumlah yang
Pemain Terjual (fo) Diharapkan Terjual
(fe)
Owen 13 20
Ronaldo 33 20
Nesta 14 20
Buffon 7 20
Beckham 36 20
Zidane 17 20
TOTAL 120 120
Cont..
Pemain fo fe (fo – fe) (fo – fe)2 ( fo fe )2
fe
Owen 13 20 -7 49 2,45
Ronaldo 33 20 13 169 8,45
Nesta 14 20 -6 36 1,80
Buffon 7 20 -13 169 8,45
Beckham 36 20 16 256 12,80
Zidane 17 20 -3 9 0,45
0 34,40
2
X
Goodness-of-Fit Test:
Unequal Expected Frequencies
Contoh :
Dosen mengharapkan distribusi nilai ujian : A
= 40%, B = 40%, dan C = 20%. Hasil ujian
menunjukkan distribusi nilai sebagai berikut :
A : 30 orang B : 20 orang C : 10 orang
Uji dengan level of significance 10%, apakah
distribusi nilai tersebut sesuai dengan
harapan dosen tersebut ?
Limitations of Chi-Square
If there are only two cells, the expected
frequency in each cell should be 5 or more
For more than two cells, Chi-Square should
not be used if more than 20% of the
expected frequency cells have expected
frequency less than 5.
Example
Level of Management fo fe
Foreman 30 32
Supervisor 110 113
Manager 86 87
Middle Manager 23 24
Assistant vice president 5 2
Vice president 5 4
Senior vice president 4 1
TOTAL 263 263
Level of Management fo fe
Foreman 30 32
Supervisor 110 113
Manager 86 87
Middle Manager 23 24
Vice president 14 7
TOTAL 263 263
Goodness-of-Fit Test for Normality
Purpose: To test whether the observed
frequencies in a frequency distribution
match the theoretical normal distribution.
Procedure:
Determine the mean and standard deviation of
the frequency distribution.
Compute the z-value for the lower class limit
and the upper class limit for each class.
Determine fe for each category
Use the chi-square goodness-of-fit test to
determine if fo coincides with fe.
EXAMPLE : Distribution of Salary
Salary ($ 000) frequency 54.03
20 – 30 4 13.76
30 – 40 20
40 – 50 41
50 – 60 44
60 – 70 29
70 – 80 16
80 – 90 2
90 – 100 4
TOTAL 160
Salary (S 000) Z Value Area fe
Under 30 Under –1.75 0.0401 6.416
30 – 40 -1.75 to -1.02 0.1138 18.208
40 – 50 -1.02 to -0.29 0.2320 37.120
50 – 60 -0.29 to 0.43 0.2805 44.880
60 – 70 0.43 to 1.16 0.2106 33.696
70 – 80 1.16 to 1.89 0.0936 14.976
80 or more over 1.89 0.0294 1.704
1 160
x
Z
Calculation for Chi-Square
( fo fe )2
Salary (S 000) fo fe (fo – fe) (fo – fe
fe)2
Under 30 4 6.416 -2.416 5.837 0.910
30 – 40 20 18.208 1.792 3.211 0.176
40 – 50 41 37.120 3.880 15.054 0.406
50 – 60 44 44.880 -0.880 0.774 0.017
60 – 70 29 33.696 -4.696 22.052 0.654
70 – 80 16 14.976 1.024 1.049 0.070
80 or more 6 1.704 1.296 1.680 0.357
160 160 2.590
X2
Suppose we knew the mean and standard
deviation of population but wished to find
whether some sample data conform to the
normal distribution,
d.f. = k - 1
On the other hand, if we don’t know the
mean and standard deviation of population
but we wish to test whether some sample
data follow the normal distribution,
d.f. = k – p – 1
(where p is the number of population parameter
being estimated from the sample data)
Contingency Table Analysis
Contingency table analysis is used to test whether
two traits or variables are related.
Two-way classification table
Each observation is classified according to two
variables.
d.f. : (number of rows-1)(number of columns-1).
The expected frequency (fe) is computed as:
fe
Row _ total Coloumn _ total
Grand _ total 2
X
Coefficient of Contingency : C
X2 N
Contoh
Apakah pria yang dibebaskan dari penjara
mampu bersosialisasi dengan baik di tempat
asal atau di tempat lain?
Hipotesis:
Ho : tidak ada hubungan antara kemampuan
bersosialisasi dengan tempat tinggal setelah
keluar dari penjara
Hi: Ada hubungan antara kemampuan
bersosialisasi dengan tempat tinggal setelah
keluar dari penjara
Frekuensi yang diamati Sosialisasi
Tidak
Tampat Tinggal setelah bebas Mengagumkan Baik Cukup Total %
memuaskan
Tmp Tinggal Asal 27 35 33 25 120 60%
Tempat lain 13 15 27 25 80 40%
40 50 60 50 200
expected values:
A B C
Atas 157.50 122.25 104.25 384.00
Menengah 136.17 105.70 90.13 332.00
Bawah 126.33 98.05 83.62 308.00
420.00 326.00 278.00 1024.00
7.34 chi-square
4 df
.1190 p-value