in SPSS
Prof. Clarinda L. Berja
Department of Social Sciences
Learning objectives
Using SPSS,
■ Generate a contingency table and calculate a chi-square test statistic
■ Determine the significance of a chi-square test statistic
■ Apply and interpret measures of association: Lambda, Cramer’s V, Gamma,
and Kendall’s tau-b
Outline
■ Crosstabulation
■ Chi-square test statistic
■ Measures of association
– Phi, Lambda, Gamma, Tau-b
One of the main objectives of social science is to make
sense out of human and social experience by uncovering
regular patterns among events. Therefore, the language
of relationships is at the heart of social science inquiry.
Concept of relationship
Note: Aside from establishing association between them, strength of the association and,
when appropriate, its direction is also determined thru NPAR tests.
Two basic rules for computing and analyzing
percentages in a cross-tabulation:
• The frequencies within each cell and the row marginals are divided by
the total of the column in which they are located, and the column totals
should sum to 100%.
Rule 2: Comparing the Percentages Across
Different Categories of the Independent Variable
Since the proportion of subjects in each category of the group variable can differ, we
take group category into account in computing expected frequencies as well.
the expected frequencies for each cell are computed to be proportional to both the
breakdown for the test variable and the breakdown for the group variable.
Expected Frequency Calculation
The data from “Observed Frequencies for Sample Data” is the
source for information to compute the expected frequencies.
Percentages are computed for the column of all students and for
the row of all GPA’s. These percentages are then multiplied by
the total number of students in the sample (453) to compute the
expected frequency for each cell in the table.
Exp Freq of a cell = Row marginal% * Col marginal% * Total sample size
Difference between observed and expected frequencies
is tested formally using the c2
Expected frequencies
Observed frequencies
Basically,
Observed ≈ Expected --> No association
Observed ≈/≈ Expected --> Association
The research hypothesis states that the two variables are dependent or related. This
will be true if the observed counts for the categories of the variables in the sample
are different from the expected counts.
The null hypothesis is that the two variables are independent. This will be true if the
observed counts in the sample are similar to the expected counts.
If we were calculating the statistic by hand, we would have to compute the degrees
of freedom to identify the probability of the test statistic.
SPSS will print out the degrees of freedom and the probability of the test
statistics for us.
Step 4. Computing the Test Statistic
We identify the value and probability for this test statistic from the
SPSS statistical output.
Step 5. Decision and Interpretation
The residual, or the difference, between the observed frequency and the expected
frequency is a more reliable indicator, especially if the residual is converted to a z-
score and compared to a critical value equivalent to the alpha for the problem.
Standardized Residuals
■ SPSS prints out the standardized residual (converted to a z-
score) computed for each cell. It does not produce the
probability or significance.
Standardized residuals that have a positive value mean that the cell
was over-represented in the actual sample, compared to the expected
frequency, i.e. there were more subjects in this category than we
expected.
Standardized residuals that have a negative value mean that the cell
was under-represented in the actual sample, compared to the
expected frequency, i.e. there were fewer subjects in this category
than we expected.
Interpreting Cell Differences in a Chi-square Test - 1
A chi-square test of
independence of the
relationship between sex
and marital status finds a
statistically significant
relationship between the
variables.
Interpreting Cell Differences in a Chi-square Test - 2
Researcher often try to identify try to identify which cell or cells are the
major contributors to the significant chi-square test by examining the
pattern of column percentages.
Using SPSS, the Crosstabs command produces a table of different cells with
associated frequencies inserted in each cell by crossing the levels of variable Y with
the levels of variable X. To do this,
a. CLICKAnalyze
b. Then CLICK Descriptive Statistics
c. and SELECT Crosstabs.
Crosstabs dialogue box
To specify variables in the Crosstab
The Previous and Next to the left and right of Layer 1 of 1 are used if
you wanted a g2 by sex analysis for more than one variable. For
example, you wanted a breakdown for both ur and a1, you would
CLICK ur, click the lowest right arrow,
CLICK Next then CLICK a1, then CLICK the lowest right arrow again.
This would produce 2 2 x 2 tables and another 2 2 x 2 tables.
It is rare for a researcher to want to
compute only cell frequencies.
In addition to frequencies, it is possible to include within each cell a
number of additional options.
Those most frequently used are listed below with a brief definition of
each. When you press the Cells button, a new screen appears that
allows you to select a number of options.
The Observed count is selected by default. The Percentages is in most cases
also desired. Inclusion of other values depend on the preference of the
researcher.
Crosstabs: Cell Display
Cell Display Options
Observed count: The actual number of subjects or cases within each cell.
Total percentages: The percent of values in each cell of the whole table.
Yes
Steps in solving chi-square test of
independence: post hoc problems - 2
Yes Incorrect
Expected cell counts less application of
than 5? a statistic
No
Yes
Steps in solving chi-square test of
independence: post hoc problems - 3
Identify the cell in the crosstabs table that
contains the specific relationship in the problem
Yes
No
Is the relationship correctly False
described?
Yes
True
SPSS Crosstabs Output - 1
SPSS Crosstabs Output -2
Summary
The first step in interpreting a crosstabulation or chi-square
analysis is to observe the actual values and the expected values
within each cell.
Observation indicates that the observed values and the expected
values are different. There are discrepancies in smoking by sex
of respondent.
Even without looking at the chi-square statistics you would
anticipate that chi-squared statistic would be significant that
is, g2 and sex in this sample are related.
The results support this observation with a high Chi-squared
value (1125.102) and a significance lesser than 0.05 (sig.=.000).
Although the two variables are related, the degree of association
is weak (.367) and likewise it is significant (sig.=.000).
Hands-on Exercise:
■ Consider the two variables;
– V211 – How proud of nationality
– V242 - Age
■ Describe the patterns that you observe in the crosstabulation.
■ Perform a c2 test of independence
■ Save your output.
■ Analysis
– Write your null and alternative hypothesis.
– Based on the chi-squared statistic, is there sufficient evidence to reject
the null hypothesis at .05 level of significance?
– What is the actual level of significance?
– If necessary, conduct a Chi-square Post Hoc test. What does the result
imply?
Nonparametric Tests
of Association
Prof. Clarinda L. Berja
Department of Social Sciences
Lambda: A Measure of Association for Nominal Variables
■ Lambda is an asymmetrical measure used to determine the strength of the
relationship between two nominal variables.
– An asymmetrical measure will vary depending on which variable is
considered the independent variable and which the dependent variable.
■ Lambda may range in value from 0.0 to 1.0. Zero indicates that there is
nothing to be gained by using the independent variable to predict the
dependent variable.
– A lambda of 1.0 indicates that by using the independent variable as a
predictor, we are able to predict the dependent variable without any
error.
Computational Formula:
𝐸1 − 𝐸2
𝜆=
𝐸1
Calculation of Lambda, 𝜆
Formula:
𝐸1 − 𝐸2
𝜆=
𝐸1
■ Find E1, the errors of prediction made when the independent variable is ignored.
To find E1, find the mode of the dependent variable and subtract its frequency
from N.
– E1 = N − Modal frequency
■ Find E2, the errors made when the prediction is based on the independent
variable. To find E2, find the modal frequency for each category of the
independent variable, subtract it from the category total to find the number of
errors, and then add up all the errors.
– E2 = Nk − Modek
Cramer’s V: A Chi-Square–Related Measure
of Association for Nominal Variables
■ Cramer’s V is an alternative measure of association that can be used for nominal
variables.
■ It is based on the value of chi-square and ranges between 0 to 1, with 0 indicating
no association and 1 indicating perfect association. Because it cannot take negative
values, it is considered a nondirectional measure. Unfortunately, Cramer’s V is
somewhat limited because the results cannot be interpreted using the PRE
framework.
Formula:
𝜒2
𝑉= √
Ν (𝑚ሻ
where m = smaller of (r − 1) or (c − 1).
Gamma and Kendall’s Tau-b: Symmetrical
Measures of Association for Ordinal Variables