The expected frequency for each cell can be calculated by using a simple formula:
n rn c
fe = n
where nr = total number in the row
nc = total number in the column
n = total sample size
chi-square Null Hypothesis: No association between variables
Alternate Hypothesis: Association between variables
Sex and internet usage
= 3.333
chi-square
Chi-square distribution shape is dependent on degrees of freedom
◦ As df increases, the distribution becomes more symmetrical
An important characteristic of the chi-square statistic is the number of degrees of freedom (df)
associated with it. That is, df = (r - 1) x (c -1).
◦ (2-1) x (2-1) = 1 degree of freedom.
The calculated chi-square statistic had a value of 3.333. Since this is less than the critical value
of 3.841, the null hypothesis of no association can not be rejected indicating that the association
is not statistically significant at the 0.05
chi-square Null Hypothesis: No association between variables
Sex and internet usage Alternate Hypothesis: Association between variables
c2
f
n
Takes the value of 0 when there is no association and 1 when there is perfect association
f = Squareroot (3.333/30) = 0.333
Contingency Coefficient
Contingency coefficient (C) can be used to assess the strength of association in a table of any size
◦ Unlike phi coefficient is specific to a 2 x 2 table
c2
C =
c2 + n
2
f
V=
min (r-1), (c-1)
Or
c2/n
V=
min (r-1), (c-1)
Lambda Coefficient
Asymmetric lambda measures the percentage improvement in predicting the value of the
dependent variable, given the value of the independent variable.
◦ varies between 0 and 1 -> 0 means no improvement in prediction &1 indicates that the prediction can
be made without error
◦ Asymmetric lambda is computed for each of the variables (treating it as the dependent
variable).
Tau b is the most appropriate with square tables in which the number of rows and the number
of columns are equal. Its value varies between +1 and -1.
Tau c should be used for a rectangular table in which the number of rows is different than the
number of columns,.
Gamma does not make an adjustment for either ties or table size. Gamma also varies between
+1 and -1 and generally has a higher numerical value than tau b or tau c.
Steps for cross Tabulation
While conducting cross-tabulation analysis in practice, it is useful to proceed along the following steps.
1. Test the null hypothesis that there is no association between the variables using the chi-square statistic.
1. If null hypothesis rejected, then there is no relationship.
2. If H0 is rejected
1. Determine the strength of the association using an appropriate statistic (phi-coefficient, contingency coefficient,
Cramer's V, lambda coefficient, or other statistics
2. Interpret the pattern of the relationship by computing the percentages in the direction of the independent
variable, across the dependent variable.
3. For ordinal variables use tau b, tau c, or Gamma as the test statistic.
Exercise: Chi-square
Sex and internet usage Null Hypothesis: No association between variables
Alternate Hypothesis: Association between variables
Observed Sex Expected Sex
frequency frequency
Internet Male Female Row Total Internet Male Female Row Total
Usage Usage
Light (1) 5 10 15 Light (1) 7.5 7.5 15
Heavy (2) 10 5 15 Heavy (2) 7.5 7.5 15
Column 15 15 30 Column 15 15 30
Total Total
Another Exercise
Q1. A cross tabulation and relevant test of {from Dell Case data}–
◦ Internet usage {Q1 -> To be coded as - Low (1 to 2), Medium (3 to 4), High (5,6)} with Gender {Q14 ->
Male (1), and Female (2)}
Thank You