Biostatistics
Larry Winner
Department of Statistics
University of Florida
SPSS Windows
• Data View
– Used to display data
– Columns represent variables
– Rows represent individual units or groups of units that share
common values of variables
• Variable View
– Used to display information on variables in dataset
– TYPE: Allows for various styles of displaying
– LABEL: Allows for longer description of variable name
– VALUES: Allows for longer description of variable levels
– MEASURE: Allows choice of measurement scale
• Output View
– Displays Results of analyses/graphs
Data Entry Tips I
• For variables that are not identifiers (such as name,
county, school, etc), use numeric values for levels and
use the VALUES option in VARIABLE VIEW to give
their levels. Some procedures require numeric labels
for levels. SPSS will print the VALUES on output
• For large datasets, use a spreadsheet such as EXCEL
which is more flexible for data entry, and import the
file into SPSS
• Give descriptive LABEL to variable names in the
VARIABLE VIEW
• Keep in mind that Columns are Variables, you don’t
want multiple columns with the same variable
Data Entry/Analysis Tips II
• When re-analyzing previously published data, it is often
possible to have only a few outcomes (especially with
categorical data), with many individuals sharing the
same outcomes (as in contingency tables)
• For ease of data entry:
– Create one line for each combination of factor levels
– Create a new variable representing a COUNT of the number of
individuals sharing this “outcome”
• When analyzing data Click on:
– DATA WEIGHT CASES WEIGHT CASES BY
– Click on the variable representing COUNT
– All subsequent analyses treat that outcome as if it occurred
COUNT times
Example 1.3 - Grapefruit Juice Study
crcl
To import an EXCEL file, click on:
38
66 FILE OPEN DATA then change
FILES OF TYPE to EXCEL (.xls)
74
99
80 To import a TEXT or DATA file, click on:
64 FILE OPEN DATA then change
80 FILES OF TYPE to TEXT (.txt) or
120 DATA (.dat)
You will be prompted through a series of
dialog boxes to import dataset
Descriptive Statistics-Numeric Data
• After Importing your dataset, and providing names to
variables, click on:
• ANALYZE DESCRIPTIVE STATISTICS DESCRIPTIVES
• Choose any variables to be analyzed and place them in box on right
• Options include:
n
y i n
Mean : y i 1
Sum : yi
n i 1
y
n
2
i y
Std. deviation : S i 1
Variance : S 2
n 1
S
S.E. Mean :
n
Example 1.3 - Grapefruit Juice Study
N
e
ui
mta
md
i
t
t
t
t
tt
ta
i
E
i
i
i
ii
i
8
8
0
1
3
3
1
1 C
8 V
Descriptive Statistics-General Data
• After Importing your dataset, and providing names to
variables, click on:
• ANALYZE DESCRIPTIVE STATISTICS
FREQUENCIES
• Choose any variables to be analyzed and place them in box on
right
• Options include (For Categorical Variables):
– Frequency Tables
– Pie Charts, Bar Charts
• Options include (For Numeric Variables)
– Frequency Tables (Useful for discrete data)
– Measures of Central Tendency, Dispersion, Percentiles
– Pie Charts, Histograms
Example 1.4 - Smoking Status
u
P
r
ucc
0
9
9
9 V
N
3
3
3
2 Q
9
6
6
8 Q
2
4
4
2 C
3
8
8
0 O
7
0
0 T
Vertical Bar Charts and Pie Charts
• After Importing your dataset, and providing names
to variables, click on:
• GRAPHS BAR… SIMPLE (Summaries for Groups
of Cases) DEFINE
• Bars Represent N of Cases (or % of Cases)
• Put the variable of interest as the CATEGORY AXIS
60
40
20
5
Count
4
0
1 2 3 4 5
OUTCOME 3
2
Histograms
• After Importing your dataset, and providing names
to variables, click on:
• GRAPHS HISTOGRAM
• Select Variable to be plotted
• Click on DISPLAY NORMAL CURVE if you want a
normal curve superimposed (see Chapter 3).
Example 1.6 - Drug Approval Times
30
20
10
MONTHS
Side-by-Side Bar Charts
20
OUTCOME
10
3
5
Count
0 6
1 2
TRT
Scatterplots
2
THCLRNCE
0
.5 1.0 1.5 2.0 2.5 3.0 3.5
DRUG
Scatterplots with 2 Independent Variables
DRUG
2
THCLRNCE
Tagamet
1 Pepcid
0 Placebo
0 2 4 6 8 10 12 14 16
SUBJECT
Contingency Tables for Conditional
Probabilities
• After Importing your dataset, and providing names to
variables, click on:
• ANALYZE DESCRIPTIVE STATISTICS CROSSTABS
• For ROWS, select the variable you are conditioning on
(Independent Variable)
• For COLUMNS, select the variable you are finding the conditional
probability of (Dependent Variable)
• Click on CELLS
• Click on ROW Percentages
Example 1.10 - Alcohol & Mortality
A
o
0
1 t
5
50W
0
C
% %
%
1
451
C
% %
%
6
95T
C
% %
%
Independent Sample t-Test
E
e
N
eeG
6
3
2
2 A
N
6
7
9
7 H
s
ufa V
o n
a l
e
.e r
Ea
e
e
2
oF
d
p
it
rr
w
g
-f
p
ee
A
E
4
1
6
0
4
77
0
3
a
E
6
3
6
77
3
6
n
Wilcoxon Rank-Sum/Mann-Whitney Tests
• After Importing your dataset, and providing names
to variables, click on:
• ANALYZE NONPARAMETRIC TESTS 2
INDEPENDENT SAMPLES
• For TEST VARIABLE, Select the dependent (response)
variable(s)
• For GROUPING VARIABLE, Select the independent
variable. Then define the names of the 2 levels to be
compared (this can be used even when the full dataset has
more than 2 levels for independent variable).
• Click on MANN-WHITNEY U
Example 3.6 - Levocabastine in Renal Patients
N
fG
A
N
H
T
b
t a
U
M
W
Z
A
E
a
S
a
N
b
G
Paired t-test
E
e
e
N
e
P
S
1
I
e
Ni g
P
S
if
o n
a l
rEe
p
2
e
ee
d
wt
p-
a
a
3
9
2
8
9
6
2
0PS
Wilcoxon Signed-Rank Test
n
Nf
9
9
0Ia
N
3
3
0 b
P
1 c
T
3 T
a
I
b
I
c
I
t b
a
a
Z
A
a
B
b
W
Relative Risks and Odds Ratios
• After Importing your dataset, and providing names to
variables, click on:
• ANALYZE DESCRIPTIVE STATISTICS CROSSTABS
• For ROWS, Select the Independent Variable
• For COLUMNS, Select the Dependent Variable
• Under STATISTICS, Click on RISK
• Under CELLS, Click on OBSERVED and ROW PERCENTAGES
• NOTE: You will want to code the data so that the outcome present
(Success) category has the lower value (e.g. 1) and the outcome
absent (Failure) category has the higher value (e.g. 2). Similar for
Exposure present category (e.g. 1) and exposure absent (e.g. 2).
Use Value Labels to keep output straight.
Example 5.1 - Pamidronate Study
R
E V
No
e o
7
9
6 P
P
C
%
%
% %
4
7
1 P
C
%
%
% %
1
6
7 T
C
%
%
% %
o n
e r
p
awp
lu
O
6
3
0
(
F
7
2
5
Y
F
6
3
6
N
7 N
Example 5.2 - Lip Cancer
E
C R
No
e o
t
9
9
8 P
Y
C
%
%
% %
8
1
9 N
C
%
%
% %
7
0
7 T
C
%
%
% %
n
e r
pwl
p u
O
3
1
9
P
F
6
8
5
Y
8
2
4 F
7 N
Fisher’s Exact Test
• After Importing your dataset, and providing names to
variables, click on:
• ANALYZE DESCRIPTIVE STATISTICS CROSSTABS
• For ROWS, Select the Independent Variable
• For COLUMNS, Select the Dependent Variable
• Under STATISTICS, Click on CHI-SQUARE
• Under CELLS, Click on OBSERVED and ROW
PERCENTAGES
• NOTE: You will want to code the data so that the outcome
present (Success) category has the lower value (e.g. 1) and the
outcome absent (Failure) category has the higher value (e.g. 2).
Similar for Exposure present category (e.g. 1) and exposure
absent (e.g. 2). Use Value Labels to keep output straight.
Example 5.5 - Antiseptic Experiment
R E
T H
e
Doat
e
6
4
0 T
A
C
%
%
% %
6
9
5 C
C
%
%
% %
2
3
5 T
C
%
%
% %
a r
mc
cp t
t
s
a
s
sdi
i
lid
d
ud
f
51
4Pb
Ca
81
8
71
3L
5
4F
L
21
4
A
5 N
a
C
b
0
1
McNemar’s Test
• After Importing your dataset, and providing names to
variables, click on:
• ANALYZE DESCRIPTIVE STATISTICS CROSSTABS
• For ROWS, Select the outcome for condition/time 1
• For COLUMNS, Select the outcome for condition/time 2
• Under STATISTICS, Click on MCNEMAR
• Under CELLS, Click on OBSERVED and TOTAL
PERCENTAGES
• NOTE: You will want to code the data so that the outcome present
(Success) category has the lower value (e.g. 1) and the outcome
absent (Failure) category has the higher value (e.g. 2). Similar for
Exposure present category (e.g. 1) and exposure absent (e.g. 2).
Use Value Labels to keep output straight.
Example 5.6 - Report of Implant Leak
E
G
o
sst
98
7S
PC
%
% %
53
8AC
%
% %
41
5TC
%
% %
t
ild
a
MP-value
N
a
B
Cochran Mantel-Haenszel Test
• After Importing your dataset, and providing names to
variables, click on:
• ANALYZE DESCRIPTIVE STATISTICS CROSSTABS
• For ROWS, Select the Independent Variable
• For COLUMNS, Select the Dependent Variable
• For LAYERS, Select the Strata Variable
• Under STATISTICS, Click on COCHRAN’S AND MANTEL-
HAENSZEL STATISTICS
• NOTE: You will want to code the data so that the outcome present
(Success) category has the lower value (e.g. 1) and the outcome absent
(Failure) category has the higher value (e.g. 2). Similar for Exposure
present category (e.g. 1) and exposure absent (e.g. 2). Use Value
Labels to keep output straight.
Example 5.7 Smoking/Death by Age
V
C
T H
e
Do
Aae
tG
7
05
7S
S 0M
4
26N
1
23T o
7
45
1S
S 5M
4
15N
1
56T o
5
96
4S
S 0M
8
08N
3
92T o
3
76
0S
S 5M
6
95N
9
65T o
m o
7E
7l n
1S
0A
2A
CL
8I
RU
6lLn
7OU
T
d
t h
Chi-Square Test
• After Importing your dataset, and providing names to
variables, click on:
• ANALYZE DESCRIPTIVE STATISTICS CROSSTABS
• For ROWS, Select the Independent Variable
• For COLUMNS, Select the Dependent Variable
• Under STATISTICS, Click on CHI-SQUARE
• Under CELLS, Click on OBSERVED, EXPECTED, ROW
PERCENTAGES, and ADJUSTED STANDARDIZED
RESIDUALS
• NOTE: Large ADJUSTED STANDARDIZED RESIDUALS
(in absolute value) show which cells are inconsistent with null
hypothesis of independence. A common rule of thumb is seeing
which if any cells have values >3 in absolute value
Example 5.8 - Marital Status & Cancer
R E V
N C R
T
C o
nat
c a
n
7 9
7M
S
C i
6 A
o
n
.1
9E
0 x
%
%
% %
3
3A d
2 6
8M
C
4 o
a
.3
7E
0 x
%
%
% %
7
7A d
2 7
6W
C
3 o
.6
4E
0 x
%
%
% %
1
1A d
1 5
5D
C i
0 o
v
.0
0E
0 x
%
%
% %
0
0A d
3 7
6T
C
3 o o
.0
0E
0 x
%
%
% %
a
m p
sa
dild
0
3
7 a
P
2
3
4 L
L
1
1
7
A
3N
a
1
m
Goodman & Kruskal’s g / Kendall’s tb
C
H S
o1
2 t
8 D
1
8 2
7 3
8 4
1 T
y m
a
b
a
orl
Ex
ou
4
4
6
3 O
K
9
0
6
3 O
G
1 N
a
N
b
U
Kruskal-Wallis Test
• After Importing your dataset, and providing names to
variables, click on:
• ANALYZE NONPARAMETRIC TESTS k INDEPENDENT
SAMPLES
• For TEST VARIABLE, Select Dependent Variable
• For GROUPING VARIABLE, Select Independent Variable, then
define range of levels of variable (Minimum and Maximum)
• Click on KRUSKAL-WALLIS H
Example 5.11 - Antibiotic Delivery
n
n
N D
5 O
1
5 2
4 3
T
a
a
C
CNote: This statistic
dmakes the adjustment for
Aties. See Hollander and
a
K
Wolfe (1973), p. 140.
b
G
Cohen’s k
• After Importing your dataset, and providing names to
variables, click on:
• ANALYZE DESCRIPTIVE STATISTICS CROSSTABS
• For ROWS, Select Rater 1
• For COLUMNS, Select Rater 2
• Under STATISTICS, Click on KAPPA
• Under CELLS, Click on TOTAL Percentages to get the observed
percentages in each cell (the first number under observed count in
Table 5.17).
Example 5.12 - Siskel & Ebert
E
-
o0
11
48
3
5 S
-
C
%
%
%
% %
83
1
2 0
C
%
%
%
% %
09
4
3 1
C
%
%
%
% %
20
8
0 T
C
%
%
%
% %
y m
a
b
a
orl
Eo
xu
9
0
1
0 M
K
0 N
a
N
b
U
1-Factor ANOVA - Independent Samples
(Parallel Groups)
C
m
dF
Sa
i g
f
2
0
1
0
C BD
7
8rW a
TN
1
2T
9 a
0
0TZ
0
0 S
0
0 S
6
0 S
o m
M
Da
Ue
e a
d e
r e
S-
r
r(
(
i
EJI
g
J
B)
B)
0
9Tu
6
1
9S
S* Z
Z
0
90
1
9ZZ
*
0
96
9
1S
S* Z
Z
0
96
9
9ZZ
0
90
9
1ZZ
S* Z
0
96
9
9S Z
0
9B
3
2
2S
S o
Z
Z
0
90
8
2ZZ
*
0
93
2
2S
S Z
Z
0
91
2
2ZZ
0
90
2
8ZZ
S* Z
0
91
2
2S Z
*
Th.
Kruskal-Wallis Test
• After Importing your dataset, and providing names to
variables, click on:
• ANALYZE NONPARAMETRIC TESTS k INDEPENDENT
SAMPLES
• For TEST VARIABLE, Select Dependent Variable
• For GROUPING VARIABLE, Select Independent Variable, then
define range of levels of variable (Minimum and Maximum)
• Click on KRUSKAL-WALLIS H
Example 6.2(a) - Thalidomide and HIV-1
n
NT
W
1
2
3
4
T
a
a
G
C
d
A
a
K
b
G
Randomized Block Design - F-test
• After Importing your dataset, and providing names to
variables, click on:
• ANALYZE GENERAL LINEAR MODEL UNIVARIATE
• Assign the DEPENDENT VARIABLE
• Assign the TREATMENT variable as a FIXED FACTOR
• Assign the BLOCK variable as a RANDOM FACTOR
• Click on MODEL, then CUSTOM, under BUILD TERMS choose
MAIN EFFECTS, move both factors to MODEL list
• Click on POST HOC and select the TREATMENT factor for
POST HOC TESTS and BONFERRONI and TUKEY (among
many choices)
• For PLOTS, Select the BLOCK factor for HORIZONTAL AXIS
and the TREATMENT factor for SEPARATE LINES, click ADD
Example 6.3 - Theophylline Clearance
n - S
D
I I I
S
q
d F
SS
u
ifga
3
1
3
50IHn
1
3
4 Ea
5
2
3
10DH
9
6
1 Eb
1
3
4
30SH
9
6
1 Eb
aM.
C b
oMm
.
M e a
ei d e
r e
de
(
e
S .
I-r
(
r
(i
E
JIJ
g
)
BB
0
3
20
0 0
6T
8
1C
F
1 u
3
3
03
3 6
62
5P
7*
0
3
20
0 0
68
1F
C
1 a
3
3
03
3 6
61
5P
7*
3
3
03
3 6
62
7P
C
5*
3
3
03
3 6
61
7F
5*
0
3
06
6 0
6B
0
2C
F
2 o
3
3
07
9 6
62
4P
8*
0
3
06
6 0
60
2F
C
2 a
3
3
07
9 6
61
4P
8*
3
3
09
7 6
62
8P
C
4*
3
3
09
7 6
61
8F
4*
B a
*
T .h
Example 6.3 - Theophylline Clearance
O P
b s
N1
2D
4
7Ta
P ,
u
4
3 C
4
3 F
0
8 S
M
B
T h Estimated Marginal Means of THEOPHCL
a
U .
7
b
A .
6
5
Estimated Marginal Means
3
DRUG
2
Cimetidine
1 Famotidine
0 Placebo
1 2 3 4 5 6 7 8 9 10 11 12 13 14
SUBJECT
Randomized Block Design - Friedman’s test
• After Importing your dataset, and providing names to
variables, click on:
• ANALYZE NONPARAMETRIC TESTS k RELATED
SAMPLES
• For TEST VARIABLES, select the variables representing the
treatments (each line is a subject/block)
• Click on FRIEDMAN
Example 6.4 - Absorption of Valproate
Depakote
naa
N
C
C
C
d
E
A
a
F
D
I I I
S
qd
FS
S
uif g
4
3
14
7Ca
8
1
82
0In
0
1
04
5E
2
1
26
8G
2
1
22
0E
8
6
7 E
0
0 T
2
9 C Estimated Marginal Means of CLRNCE
a
R9.5
9.0
8.5
Estimated Marginal Means
8.0
7.5
7.0 GENDER
6.5 1
6.0 2
1 2
ETHNIC
Linear Regression
• After Importing your dataset, and providing names
to variables, click on:
• ANALYZE REGRESSION LINEAR
• Select the DEPENDENT VARIABLE
• Select the INDEPENDENT VARAIABLE(S)
• Click on STATISTICS, then ESTIMATES,
CONFIDENCE INTERVALS, MODEL FIT
• For histogram of residuals, click on PLOTS, and
HISTOGRAM under STANDARDIZED RESIDUAL
PLOTS
Examples 7.1-7.6 - Gemfibrozil Clearance
i a
c
d
a
i
c
i c
c
r
BeE
M
i
t B
g
B
8
8
1
0
0
6 1
(
5
1
5
3
6
2
8 C
a
D
Histogram
Dependent Variable: CLGM
6
2
Frequency
0 N = 17.00
-1.50 -1.00 -.50 0.00 .50 1.00 1.50
m
dF
S
M
i
a g
f
2
1
8
3
61
R a
8
5
3R
0
6T
a
P
b
D
b
u
E
u s
r
s
R
q
q M
t
a
1
1
6
0
a
P
b
D
Example 7.8 - TB/Thalidomide in HIV
i a
c
d
a
i
i c
c
SB
e
EM
i
t g
2
5
0
0 1
(
7
6
2
1
0 L
0
8
7
9
1 D
1
2
9
9
8 T
a
D
b
O
m
dF
S
M
i
a g
f
8
3
6
6
01
R a
1
8
0R
0
1T
a
P
b
D
Useful Regression Plots
• Scatterplot with Fitted (Least Squares) Line
– GRAPHS INTERACTIVE SCATTERPLOT
– Select DEPENDENT VARIABLE for UP/DOWN AXIS
– Select INDEPENDENT VARIABLE for RIGHT/LEFT AXIS
– Click on FIT Tab, then REGRESSION for METHOD
– NOTE: Be certain both variables are SCALE in VARIABLE
VIEW under MEASURE
• Partial Regression Plots (Multiple Regression) to observe
association of each Independent Variable with Y,
controlling for all others
– Fit REGRESSION model with all Independent Variables
– Click PLOTS, then PRODUCE ALL PARTIAL PLOTS
Example 7.1 - Gemfibrozil Scatterplot
Linear Regression
600
500
clgm
400
300
clgm = 460.83 + -3.22 * clcr
200
R-Square = 0.33
20 40 60
clcr
Logistic Regression
. f
p
d
Bp
.a
w
iE
p
g
8
2
8
1
0
8
1
3 S
D
a
1
0
8
1
0
2 1
C
a
V
di
q g
S
S
B
M
. f
ow
p
d
B.
pi
aE
pg
f
4
5
2
1
0
7
8
1 S
L
a
4
8
8
1
0
1
6
9 1
L
0
9
8
1
0
0 C
a
V
di
q g
S
S
B
M
Nonlinear Regression
Asymptotic 95 %
Asymptotic Confidence Interval
Parameter Estimate Std. Error Lower Upper
Ax B
Model : y B x AUC06 h A, B, C Parameters
x CB
Survival Analysis -Kaplan-Meier Estimates
and Log-Rank Test
• After Importing your dataset, and providing names to
variables, click on:
• ANALYZE SURVIVAL KAPLAN-MEIER
• Select the variable representing the survival TIME of individual
• Select the variable representing the STATUS of individual
(whether or not event has occured). NOTE: If the variable is an
indicator that the observation was CENSORED, then a value of 0
for that variable will mean the event has occured.
• Select the variable representing the FACTOR containing the
groups to be compared
• Click on COMPARE FACTOR, select LOG-RANK, and POOL
ACROSS STRATA
Examples 9.1-2 - Navelbine and Taxol in Mice
Survival Analysis for TIME
Factor REGIMEN = 1
6 0 .9796 .0202 1 48
8 0 .9592 .0283 2 47
22 0 .9388 .0342 3 46
32 0 4 45
32 0 .8980 .0432 5 44
35 0 .8776 .0468 6 43
41 0 .8571 .0500 7 42
46 0 .8367 .0528 8 41
54 0 .8163 .0553 9 40
Factor REGIMEN = 2
8 0 .9333 .0644 1 14
10 0 .8667 .0878 2 13
27 0 .8000 .1033 3 12
31 0 .7333 .1142 4 11
34 0 .6667 .1217 5 10
35 0 .6000 .1265 6 9
39 0 .5333 .1288 7 8
47 0 .4667 .1288 8 7
57 0 .4000 .1265 9 6
Examples 9.1-2 - Navelbine and Taxol in Mice
Survival Functions
1.1
1.0
.9
.8
.7
REGIMEN
.6
2
Cum Survival
.5
2-censored
.4 1
.3 1-censored
0 10 20 30 40 50 60 70
TIME
f
p
d
Bp
w
a
iE
pg
8
1
0
1
9
3 T
1.0
.8
.6
.4
Cum Survival
TRT
.2
Placebo
0.0 6MP
-10 0 10 20 30
REMSTIME