Quantifying risk Definitions and formulas are based on the classic Disease
2 × 2 or contingency table.
or intervention
Risk factor
a b
c d
Incidence vs Incidence = # of new cases (during a specified Incidence looks at new cases (incidents).
prevalence rate # of people at risk time period)
# of existing cases (at a point in Prevalence looks at all current cases.
Recurrence Prevalence =
Total # of people time)
Incidence
in a population
Prevalence Prevalence = Incidence rate × average duration
1 – prevalence of disease
Mortality Cure
Prevalence ≈ incidence for short duration disease Prevalence ∼ pretest probability.
(eg, common cold). q prevalence p q PPV and r NPV.
Prevalence > incidence for chronic diseases, due to
large # of existing cases (eg, diabetes).
Precision vs accuracy
Precision (reliability) The consistency and reproducibility of a test. Random error r precision in a test.
The absence of random variation in a test. q precision p r standard deviation.
q precision p q statistical power (1 − β).
Accuracy (validity) The trueness of test measurements. Systematic error r accuracy in a test.
The absence of systematic error or bias in a test.
Accuracy Accuracy
High Low High Low
Statistical distribution
Measures of central Mean = (sum of values)/(total number of values). Most affected by outliers (extreme values).
tendency Median = middle value of a list of data sorted If there is an even number of values, the median
from least to greatest. will be the average of the middle two values.
Mode = most common value. Least affected by outliers.
Measures of Standard deviation = how much variability σ = SD; n = sample size.
dispersion exists in a set of values, around the mean of Variance = (SD)2.
these values. SE = σ/√n.
Standard error = an estimate of how much SE r as n q.
variability exists in a (theoretical) set of sample
means around the true population mean.
Normal distribution Gaussian, also called bell-shaped.
–1σ +1σ
Mean = median = mode.
–2σ +2σ
–3σ +3σ
68%
95%
99.7%
Nonnormal distributions
Bimodal Suggests two different populations (eg,
metabolic polymorphism such as fast vs
slow acetylators; age at onset of Hodgkin
lymphoma; suicide rate by age).
Positive skew Typically, mean > median > mode. Mode
Median
Asymmetry with longer tail on right. Mean
Statistical hypotheses
Null (H0) Hypothesis of no difference or relationship (eg, Reality
there is no association between the disease and H1 H0
the risk factor in the population).
Alternative (H1) Hypothesis of some difference or relationship Power α
Study rejects H0
(eg, there is some association between the ( 1 – β) Type I error
disease and the risk factor in the population).
Confidence interval Range of values within which the true mean If the 95% CI for a mean difference between 2
of the population is expected to fall, with a variables includes 0, then there is no significant
specified probability. difference and H0 is not rejected.
CI for population mean = x̄ ± Z(SE) If the 95% CI for odds ratio or relative risk
The 95% CI (corresponding to α = .05) is often includes 1, H0 is not rejected.
used. If the CIs between 2 groups do not overlap
For the 95% CI, Z = 1.96. p statistically significant difference exists.
For the 99% CI, Z = 2.58. If the CIs between 2 groups overlap p usually
no significant difference exists.
PUBLIC HEALTH SCIENCES `
BEHAVIORAL SCIENCE—ETHICS SEC TION II 253
Pearson correlation r is always between −1 and +1. The closer the absolute value of r is to 1, the stronger the linear
coefficient correlation between the 2 variables.
Positive r value p positive correlation (as one variable q, the other variable q).
Negative r value p negative correlation (as one variable q, the other variable r).
Coefficient of determination = r 2 (amount of variance in one variable that can be explained by
variance in another variable).
r = –0.8 r = –0.4 r=0 r = +0.4 r = +0.8
`
BEHAVIORAL SCIENCE—ETHICS