Anda di halaman 1dari 54

Chi-square Test

in SPSS
Prof. Clarinda L. Berja
Department of Social Sciences
Learning objectives

Using SPSS,
■ Generate a contingency table and calculate a chi-square test statistic
■ Determine the significance of a chi-square test statistic
■ Apply and interpret measures of association: Lambda, Cramer’s V, Gamma,
and Kendall’s tau-b
Outline

■ Crosstabulation
■ Chi-square test statistic
■ Measures of association
– Phi, Lambda, Gamma, Tau-b
One of the main objectives of social science is to make
sense out of human and social experience by uncovering
regular patterns among events. Therefore, the language
of relationships is at the heart of social science inquiry.
Concept of relationship

■ to state a relationship between the variables, X and


Y is to say that certain categories of the variable X go
with certain categories of the variable Y.
Cross-tabulation
■ is a technique for analyzing the relationship between two
variables (independent & dependent) organized in a table

■ Is a type of bivariate analysis, a method designed to detect


and describe the relationship between two nominal or ordinal
variables.

Note: Aside from establishing association between them, strength of the association and,
when appropriate, its direction is also determined thru NPAR tests.
Two basic rules for computing and analyzing
percentages in a cross-tabulation:

1. Calculate percentages within each category of the


independent variable.
2. Interpret the table by comparing the percentage point
difference for different categories of the independent
variable.
Feeling of happiness * Religious denomination Crosstabulation
Religious denomination
Iglesia ni Cristo Roman
None (INC) Protestant Catholic Total
Feeling of Very happy Count 4339 23 2241 5824 12427
happiness % within Religious 27.8% 45.1% 41.1% 39.2% 34.6%
denomination
Rather happy Count 9129 23 2302 7293 18747
% within Religious 58.5% 45.1% 42.2% 49.1% 52.1%
denomination
Not very happy Count 1902 5 768 1557 4232
% within Religious 12.2% 9.8% 14.1% 10.5% 11.8%
denomination
Not at all Count 238 0 138 170 546
happy % within Religious 1.5% 0.0% 2.5% 1.1% 1.5%
denomination
Total Count 15608 51 5449 14844 35952
% within Religious 100.0% 100.0% 100.0% 100.0% 100.0%
denomination
Rule 1: Calculate percentages within each category
of the independent variable.
• When the independent variable is arrayed in the columns, we compute
percentages within each column separately.

• The frequencies within each cell and the row marginals are divided by
the total of the column in which they are located, and the column totals
should sum to 100%.
Rule 2: Comparing the Percentages Across
Different Categories of the Independent Variable

■ tells us to compare how feeling of happiness varies across


religion.

■ comparisons are made by examining differences between


percentage points across different categories of the independent
variable.

■ Some researchers limit their comparisons to categories with at


least a 10 percentage point difference.
Feeling of happiness and religion

– In our comparison, we can see that there is a 11


percentage point difference between the percentage of
Roman Catholic (39.2%) and None (27.8%).

– In other words, in this group, Catholics are more likely to


be very happy than None (those without religion).
Therefore, we can conclude that one’s religion appears
to be associated with the likelihood of being very happy.
Are happiness
and life
satisfaction
different across
religious groups?
Chi-Square Test
■ pronounced kai-square and written as c2
■ inferential statistical technique designed to test for significant
relationships between two variables organized in a bivariate table
■ has a variety of research applications
■ requires no assumptions about the shape of the population
distribution from which a sample is drawn
■ can be applied to nominal or ordinal data (including grouped interval-
level data).
Observed frequencies Marginal Totals

Male Female Total Male Female Total


Yes 2610 1270 3880 Yes 3880
No 1367 3111 4478 No 4478
Total 3977 4381 8358 Total 3977 4381 8358

We can determine the values in the empty cells by assuming


independence of the two variables  “Expected Frequencies”
Expected Frequencies
 computed as if there is no difference between the groups, i.e. both groups have the
same proportion as the total sample in each category of the test variable.

 Since the proportion of subjects in each category of the group variable can differ, we
take group category into account in computing expected frequencies as well.

 the expected frequencies for each cell are computed to be proportional to both the
breakdown for the test variable and the breakdown for the group variable.
Expected Frequency Calculation
The data from “Observed Frequencies for Sample Data” is the
source for information to compute the expected frequencies.
Percentages are computed for the column of all students and for
the row of all GPA’s. These percentages are then multiplied by
the total number of students in the sample (453) to compute the
expected frequency for each cell in the table.

Exp Freq of a cell = Row marginal% * Col marginal% * Total sample size
Difference between observed and expected frequencies
is tested formally using the c2
Expected frequencies
Observed frequencies

Male Female Total Male Female Total


Yes 1846.2 2033.8 3880.0 Yes 2610 1270 3880
No 2130.8 2347.2 4478.0 No 1367 3111 4478
Total 3977.0 4381.0 8358.0 Total 3977 4381 8358

Basically,
Observed ≈ Expected --> No association
Observed ≈/≈ Expected --> Association

The chi-square test of independence plugs the observed frequencies and


expected frequencies into a formula which computes how the pattern of
observed frequencies differs from the pattern of expected frequencies.
The two variables in a chi-square test of
independence each play a specific role.
 The group variable is also known as the independent
variable because it has an influence on the test variable.

 The test variable is also known as the dependent variable


because its value is believed to be dependent on the value
of the group variable.

The Chi-squared Test of Independence is a test of


the influence that a subject’s value on one variable
has on a second variable.
Step 1. Assumptions
 The chi-square Test of Independence can be used for
any level variable, including interval level variables
grouped in a frequency distribution. It is most useful for
nominal variables for which we do not another option.

 Assumptions: No cell has an expected frequency less


than 5.

 If these assumptions are violated, the chi-square


distribution will give us misleading probabilities.
Step 2. Hypotheses and alpha

 The research hypothesis states that the two variables are dependent or related. This
will be true if the observed counts for the categories of the variables in the sample
are different from the expected counts.

 The null hypothesis is that the two variables are independent. This will be true if the
observed counts in the sample are similar to the expected counts.

 The amount of difference needed to make a decision about difference or similarity is


the amount corresponding to the alpha level of significance, which will be either
0.05 or 0.01. The value to use will be stated in the problem.
Step 3. Sampling distribution and test
statistic
 To test the relationship, we use the chi-square test statistic, which follows the chi-
square distribution.

 If we were calculating the statistic by hand, we would have to compute the degrees
of freedom to identify the probability of the test statistic.
 SPSS will print out the degrees of freedom and the probability of the test
statistics for us.
Step 4. Computing the Test Statistic

 Conceptually, the chi-square test of independence statistic is


computed by summing the difference between the expected and
observed frequencies for each cell in the table divided by the expected
frequencies for the cell.

 We identify the value and probability for this test statistic from the
SPSS statistical output.
Step 5. Decision and Interpretation

■ If the probability of the test statistic is less than or equal to the


probability of the alpha error rate, we reject the null hypothesis and
conclude that our data supports the research hypothesis. We
conclude that there is a relationship between the variables.

■ If the probability of the test statistic is greater than the probability of


the alpha error rate, we fail to reject the null hypothesis. We conclude
that there is no relationship between the variables, i.e. they are
independent.
Which Cell or Cells Caused the
Difference?
 We are only concerned with this procedure if the result of the chi-square test was
statistically significant.

 One of the problems in interpreting chi-square tests is the determination of which


cell or cells produced the statistically significant difference. Examination of
percentages in the contingency table and expected frequency table can be
misleading.

 The residual, or the difference, between the observed frequency and the expected
frequency is a more reliable indicator, especially if the residual is converted to a z-
score and compared to a critical value equivalent to the alpha for the problem.
Standardized Residuals
■ SPSS prints out the standardized residual (converted to a z-
score) computed for each cell. It does not produce the
probability or significance.

■ Without a probability, we will compare the size of the


standardized residuals to the critical values that correspond to
an alpha of 0.05 (+/-1.96) or an alpha of 0.01 (+/-2.58). The
problems will tell you which value to use. This is equivalent to
testing the null hypothesis that the actual frequency equals the
expected frequency for a specific cell versus the research
hypothesis of a difference greater than zero.

■ There can be 0, 1, 2, or more cells with statistically significant


standardized residuals to be interpreted.
Interpreting Standardized Residuals

 Standardized residuals that have a positive value mean that the cell
was over-represented in the actual sample, compared to the expected
frequency, i.e. there were more subjects in this category than we
expected.

 Standardized residuals that have a negative value mean that the cell
was under-represented in the actual sample, compared to the
expected frequency, i.e. there were fewer subjects in this category
than we expected.
Interpreting Cell Differences in a Chi-square Test - 1

A chi-square test of
independence of the
relationship between sex
and marital status finds a
statistically significant
relationship between the
variables.
Interpreting Cell Differences in a Chi-square Test - 2

Researcher often try to identify try to identify which cell or cells are the
major contributors to the significant chi-square test by examining the
pattern of column percentages.

Based on the column percentages, we would identify cells on the


married row and the widowed row as the ones producing the
significant result because they show the largest differences: 8.2% on
the married row (50.9%-42.7%) and 9.0% on the widowed row
(13.1%-4.1%)
Interpreting Cell Differences in a Chi-square Test - 3

Using a level of significance of 0.05, the critical value for a


standardized residual would be -1.96 and +1.96. Using
standardized residuals, we would find that only the cells on the
widowed row are the significant contributors to the chi-square
relationship between sex and marital status.

If we interpreted the contribution of the married marital status,


we would be mistaken. Basing the interpretation on column
percentages can be misleading.
c2 Test in SPSS

 Using SPSS, the Crosstabs command produces a table of different cells with
associated frequencies inserted in each cell by crossing the levels of variable Y with
the levels of variable X. To do this,
a. CLICKAnalyze
b. Then CLICK Descriptive Statistics
c. and SELECT Crosstabs.
Crosstabs dialogue box
To specify variables in the Crosstab

a. CLICK on the variable “g2” in the list at the left.


b. Then CLICK the uppermost of the right arrows [] to indicate that g2
will be the row variable.
c. Then CLICK a second variable “sex” and CLICK on the middle right
arrow (to indicate that it will be the column variable).

■ This will create a 2 X 2 table that contains 4 cells.


More on crosstabs - 1

Including a control variable


■ The lowest box in the window allows for crosstabulation of three
or more variables. If, for instance, we wanted to find the g2 by
sex disaggregated by ur (urban or rural), you would
– CLICK the “ur” variable in the list of variables
– Then CLICK the lowest of the three right arrows. This would result in 2 tables or
a crosstab of g2 by sex for urban and rural areas.
More on crosstabs - 2

 The Previous and Next to the left and right of Layer 1 of 1 are used if
you wanted a g2 by sex analysis for more than one variable. For
example, you wanted a breakdown for both ur and a1, you would
 CLICK ur, click the lowest right arrow,
 CLICK Next then CLICK a1, then CLICK the lowest right arrow again.
 This would produce 2 2 x 2 tables and another 2 2 x 2 tables.
It is rare for a researcher to want to
compute only cell frequencies.
 In addition to frequencies, it is possible to include within each cell a
number of additional options.
 Those most frequently used are listed below with a brief definition of
each. When you press the Cells button, a new screen appears that
allows you to select a number of options.
 The Observed count is selected by default. The Percentages is in most cases
also desired. Inclusion of other values depend on the preference of the
researcher.
Crosstabs: Cell Display
Cell Display Options
Observed count: The actual number of subjects or cases within each cell.

Expected count: The expected value for each cell.

Row percentages: The percent of values in each cell of that row.

Column percentages: The percent of values in each cell of that column.

Total percentages: The percent of values in each cell of the whole table.

Unstandardized residuals: Observed value minus expected value.


Crosstabs: Statistics Window
c2 Post Hoc Tests
 Many different tests of independence or association are
listed in the Statistics window.
 As in the Cells window, the procedure is to CLICK in the
small box to the left of the desired statistic before
returning to the previous screen to conduct the analysis.
 For example, to add Chi-square and Phi and Cramer's V
analyses to your crosstabulation,
1. Tick the small box beside Chi-square, and Phi and Cramer's V
2. Then press [Continue]
3. Lastly, click [OK]
Steps in solving chi-square test of
independence: post hoc problems - 1
The following is a guide to the decision process about chi-square
test of independence post hoc problems:

Is the dependent and No


Incorrect
independent variable nominal, application of
ordinal, dichotomous, or a statistic
grouped interval?

Yes
Steps in solving chi-square test of
independence: post hoc problems - 2

Compute the Chi-Square test of independence,


requesting standardized residuals in the output

Yes Incorrect
Expected cell counts less application of
than 5? a statistic

No

Is the p-value for the chi- No


square test of False
independence <= alpha?

Yes
Steps in solving chi-square test of
independence: post hoc problems - 3
Identify the cell in the crosstabs table that
contains the specific relationship in the problem

Is the value of the standardized


No
residual for the specified cell
larger (smaller) than the postive False
(negative) critical value given in
the problem?

Yes

No
Is the relationship correctly False
described?

Yes

True
SPSS Crosstabs Output - 1
SPSS Crosstabs Output -2
Summary
 The first step in interpreting a crosstabulation or chi-square
analysis is to observe the actual values and the expected values
within each cell.
 Observation indicates that the observed values and the expected
values are different. There are discrepancies in smoking by sex
of respondent.
 Even without looking at the chi-square statistics you would
anticipate that chi-squared statistic would be significant that
is, g2 and sex in this sample are related.
 The results support this observation with a high Chi-squared
value (1125.102) and a significance lesser than 0.05 (sig.=.000).
 Although the two variables are related, the degree of association
is weak (.367) and likewise it is significant (sig.=.000).
Hands-on Exercise:
■ Consider the two variables;
– V211 – How proud of nationality
– V242 - Age
■ Describe the patterns that you observe in the crosstabulation.
■ Perform a c2 test of independence
■ Save your output.
■ Analysis
– Write your null and alternative hypothesis.
– Based on the chi-squared statistic, is there sufficient evidence to reject
the null hypothesis at .05 level of significance?
– What is the actual level of significance?
– If necessary, conduct a Chi-square Post Hoc test. What does the result
imply?
Nonparametric Tests
of Association
Prof. Clarinda L. Berja
Department of Social Sciences
Lambda: A Measure of Association for Nominal Variables
■ Lambda is an asymmetrical measure used to determine the strength of the
relationship between two nominal variables.
– An asymmetrical measure will vary depending on which variable is
considered the independent variable and which the dependent variable.

■ Lambda may range in value from 0.0 to 1.0. Zero indicates that there is
nothing to be gained by using the independent variable to predict the
dependent variable.
– A lambda of 1.0 indicates that by using the independent variable as a
predictor, we are able to predict the dependent variable without any
error.

Computational Formula:
𝐸1 − 𝐸2
𝜆=
𝐸1
Calculation of Lambda, 𝜆
Formula:
𝐸1 − 𝐸2
𝜆=
𝐸1
■ Find E1, the errors of prediction made when the independent variable is ignored.
To find E1, find the mode of the dependent variable and subtract its frequency
from N.
– E1 = N − Modal frequency

■ Find E2, the errors made when the prediction is based on the independent
variable. To find E2, find the modal frequency for each category of the
independent variable, subtract it from the category total to find the number of
errors, and then add up all the errors.
– E2 = Nk − Modek
Cramer’s V: A Chi-Square–Related Measure
of Association for Nominal Variables
■ Cramer’s V is an alternative measure of association that can be used for nominal
variables.
■ It is based on the value of chi-square and ranges between 0 to 1, with 0 indicating
no association and 1 indicating perfect association. Because it cannot take negative
values, it is considered a nondirectional measure. Unfortunately, Cramer’s V is
somewhat limited because the results cannot be interpreted using the PRE
framework.
Formula:
𝜒2
𝑉= √
Ν (𝑚ሻ
where m = smaller of (r − 1) or (c − 1).
Gamma and Kendall’s Tau-b: Symmetrical
Measures of Association for Ordinal Variables

■ suitable for use with ordinal variables or with dichotomous


nominal variables.
– This means that their value will be the same regardless of
which variable is the independent variable or the dependent
variable.
■ both can vary from 0.0 to ±1.0 and provide us with an indication
of the strength and direction of the association between the
variables.
■ can be positive or negative.
Interpretation of Gamma & Kendall’s
tau-b
■ A gamma or Kendall’s tau-b of 1.0 indicates that the relationship
between the variables is positive and that the dependent variable can
be predicted without any errors based on the independent variable.
■ A gamma of −1.0 indicates a perfect, negative association between
the variables.
■ A gamma or a Kendall’s tau-b of zero reflects no association between
the two variables; hence, there is nothing to be gained by using the
independent variable to predict the dependent variable.
Summary
■ Gamma A symmetrical measure of association suitable for use with
ordinal variables or with dichotomous nominal variables. It can vary from
0.0 to ±1.0 and provides us with an indication of the strength and
direction of the association between the variables. Gamma is also
referred to as Goodman and Kruskal’s gamma.
■ Kendall’s tau-b A symmetrical measure of association suitable for use
with ordinal variables. It can vary from 0.0 to ±1.0. It provides an
indication of the strength and direction of the association between the
variables. Kendall’s tau-b will always be lower than gamma.
■ Symmetrical measure of association A measure whose value will be the
same when either variable is considered the independent variable or the
dependent variable.

Anda mungkin juga menyukai