Anda di halaman 1dari 12


49th Annual Meeting Disclosure

I do not have a vested interest in or affiliation with
any corporate organization offering financial
support or grant monies for this continuing education
Applications of Biostatistics activity, or any affiliation with an organization
whose philosophy could potentially bias my
Randy C. Hatton, BPharm, PharmD, FCCP, BCPS presentation
Clinical Professor
University of Florida, College of Pharmacy

OWNING CHANGE: Taking Charge of Your Profession

Objectives Internal Validity

Apply commonly used biostatistical concepts to Doesthestudymeasurewhatissupposedto?
review literature
Summarize data using appropriate forms of Is the outcome due to a random error
biostatistical review [chance]?
Provide practical approaches for simplifying Is the outcome due to a systematic error
biostatistics in daily practice [other cause or bias]?

Issues with Internal Validity Random Error

Random error All 100 pneumonia patients who received penicillin
Systematic error recovered, while 10 of the 100 pneumonia patients
who received placebo died
Selection bias What is the chance that the difference in mortality did
Attrition bias not occur by chance but rather due to the penicillin
Confounding exposure?
Choice and timing of measures
Measurement bias
Reliability and validity of measurement


Theory of Science Control for Random Error

Study the universe of patients
Rarely done
Sample Population US Census
Study a sample of the population and estimate the
chance that the conclusions are wrong

Statistical Inference Random Error & Sample Size

The outcome happened by chance because the
inference was made based on a sample
If the universe of patients were tested no random
error could occur
Penicillin No Penicillin
True mortality 0.1% 10%
N=10 0/10 1/10
N=100 0/100 10/100
N=1000 1/1000 100/1000

Types of Variables Types of Variables

Independent Variable (Exposure or Treatment) Dependent Variable (Measurement or Outcome)

Control variable Response variable
Predictor variable Experimental variable
Grouping variable Primary Outcome [Dependent] Variable
Manipulated variable Only one
Treatments Secondary Outcomes
Conditions Planned
Subset Analyses
Planned or unplanned
Post-hoc analyses


Levels of Measurement Levels of Measurement

Nominal Ordinal
Data that can be classified into mutually exclusive Data that can be ranked without a consistent level of
groups without any order or ranking of severity magnitude between ranks
Binary means two nominal outcomes
Number assigned to ranking
Gender Likert scale
Race Pain scale
Risk factors Quality of life scores
Outcomes or endpoints
Lived OR died (binary)
Response OR no response

Levels of Measurement Levels of Measurement

Data that are measured on a continuum with a
consistent level of magnitude between units
Sometimes only measured in whole numbers
Number of seizures, length-of-stay, heart rate
Interval: arbitrary zero
Degrees centigrade
Ratio: absolute zero
BP, body weight, half-life, drug concentration

Keegan S. Ann Pharmacother 2009;43:19-27.

Levels of Measurement Levels of Measurement

If you dont understand a measurement, you cant Nominal < Ordinal < Continuous
possibly determine statistical or clinical significance You can use a statistical test designed for a lower level
APACHE II Score? of measure of a dependent variable, but [in general]
SOFA Score?
not the other way around
Somewhat controversial
Child-Pugh Score?
Using parametric methods to analyze pain measurements, that
aPTT are generally ordinal variables [nonparametric]
Continuous variables are often surrogate outcomes
Thus,are lower levels of measurement clinically
The mathematical and clinical hierarchy of levels of
measurement do not match


Valid Measurements Types of Variables

Valid Outcomes Extraneous Variables or Confounders

Appropriatefor the question being asked Confounders are differences between the study and
Measures what it is supposed to measure control groups due to chance or bias that could affect
Compared the measurement to the reference [gold] standard the dependent variables
MUST be reliable Controlling confounders by design or analysis is

What do values or changes in values mean? important

How much change is clinically significant Randomization
Stratified random allocation
Cross-over design
Mantel Haenszel Chi-square

Descriptive Statistics Descriptive Statistics

Measures of Central Tendencies

Mean Applicability of Measures of Central Tendency
Average Characteristic Mean Median Mode
Other means Used for continuous data Yes Yes Yes
Geometric mean
Used for ordinal data No? Yes Yes
Middle value Used for nominal data No No Yes
Mode Affected by outliers Yes No No
Most common value
Gaddis & Gaddis. Ann Emerg Med 1990;19:309-15.

Descriptive Statistics Descriptive Statistics

Measures of Variability
Outer fence = 3 IQR
Standard deviation
Standard Error of the Mean (SEM)

Measure of precision for the mean estimate
Not a measure of variability Whisker = 1.5 IQR
Interquartile range (IQR)


Box Plots


BoxPlotsvsFrequencyHistogram Descriptive Statistics

Applicability of Measures of Dispersion

Characteristic Range IQR SD SEM
** * * *
Useful for continuous data Yes Yes Yes Yes
Useful for ordinal data Yes Yes No? No
Describes sample variability Yes Yes Yes No
Assists in statistical inferences No No Yes Yes
Used to calculate CIs No No No Yes

Gaddis & Gaddis. Ann Emerg Med 1990;19:309-15.

How to Interpret CIs How to Interpret CIs

Confidence Intervals
Used to describe a variable
May be used instead of [or with] hypothesis testing
Differences: Zero in a CI
Ratios: One in a CI
Range of reasonable values for the parameter of
90%, 95%, and 99% CIs
Estimates the magnitude, direction, and certainty of
a measurement
Weaver SJ. J Am Pharm Assoc 2004;44:694-9.

Hypothesis Testing The Null Hypothesis (H0)

Null Hypothesis Decision The Truth
Alternate Hypothesis HO True HO False
One-tailed vs two-tailed Type I error Correct
Difference in either direction (HO false)
Easier to show superiority with a one-tailed test
Correct Type II error
(HO true)


The Null Hypothesis (H0) Type I and II Errors

Decision The Truth There cannot be Type I and Type II errors at the
same time, for the same dependent variable
Equal Different Type I error is only a possible problem when there is
statistical significance
Groups are Type II error is only a possible problem when there
Equal Correct Type II error
is no statistical significance
Smallstudies that find a difference have a sufficient
Groups are sample size
Different Type I error Correct These studies are NOT under-powered

Sample Size (n=) Sample Size (n=)

Alpha Alpha, by convention, is usually 0.05

Beta But can be lower or higher
Power = 1- beta Beta, by convention, is usually 0.2 or 0.1
Effect size Power = 80% or 90%
Clinicallysignificant difference Clinically significant differences are set by
Variability [continuous variables] convention and/or must be defensible
Additional adjustments If continuous variables are variable = larger
Attrition rate samples
Distribution Attrition rates are based on other studies

Why Estimate Sample Size? Why Estimate Sample Size?

A sample that is too small is under-powered A sample that is too large can result in over-
A sample that is too small may suggest an important powering
difference, but it would not be statistically significant Statistically
significant differences that are clinically
We could miss a potentially valuable intervention irrelevant
A sample that is too small may show no difference, but Wastes resources [time and money] and put patients at
we dont know if there is really no difference or just did unnecessary risk
not have the power to detect a difference
Too small a sample wastes resources [time and money]
and may put patients at unnecessary risk


Why Estimate Sample Size? P-Values

Based on a literature review, the event rate for the Calculated probability of making a Type I error
first primary outcome variable was predicted to be
Concluding there is a difference when there is not
20% For the experimental therapy to be
considered clinically beneficial, it was judged Alpha: the pre-set acceptable Type I error
necessary to lower the event rate to 5%.... A sample < 0.05 most common
size calculation indicated that 174 patients were
needed to demonstrate a difference in .. [this].. Higher alpha accepted when you do not want to miss an
outcome with a power of 80% and an overall effect [exploratory research = 0.10]
alpha of 0.05. Lesser values required
Good experimental studies have an a priori sample Multiple testing
size determination
Interim analyses
Observational studies often use the data available,
and under-and over-powering may be issues Statistical differences do not guarantee clinical
Trachtman et al. JAMA Vol.290 No.10, 2003 importance

Differences Between Groups Random Error?

Tx (20% mortality) Control (40% mortality) Comparison of aspirin (20%) and non-aspirin users
RR = 0.5 (40%)
10 patients in each group: 2 MIs Versus 4
100 patients in each group: 20 MIs Versus 40
N (per group) 95% CI p= 1000 patients in each group: 200 MIs Versus 400

10 0.11 2.14 > 0.2

Random error becomes smaller
100 0.32 0.79 < 0.05 with a larger sample of patients
1000 0.43 0.58 < 0.001 with a larger baseline incidence
with a larger difference between groups

Interpretation of P-Values Interpretation of P-Values

P=0.2 If the differences between the variables being

Denotes the probability of a Type I error compared is not clinically significant, then the p-
[conclude a difference when there is not] value is irrelevant
20% chance of making a Type I error If the p-value is above 0.05?
If less than 0.05, you have statistical significance that is
A statistician would say that the p-value is the clinically irrelevant
calculated probability that a test statistic would be as
large [or as small] as observed


How to Interpret P-values? Statistical Significance

If the difference is clinically significant
Examine the sample size and number of events % Responding to
If p is small (e.g., p=0.001), it is very unlikely that the Treatment P- Value Statistical
difference occurred by chance New Standard Significance
If the p is large (e.g., p>0.5), it is likely that the differences
occurred by chance 480/800 = 60% 416/800 = 52% 0.001 Yes
Do not obsess on a p<0.05, which is arbitrary 15/25 = 60% 13/25 = 52% 0.57 No
Remember, p-values do not address systematic error
15/25 = 60% 9/25 = 36% 0.09 No
Studies are usually not refuted after publication
because they had a large random error 240/400 = 60% 144/400 = 36% <0.0001 Yes

Braitman LE. Ann Intern Med 1991;114:515-7.

Estimation Statistical vs Clinical Significance

% Responding to Treatment Difference in % Graph of 95% CI

( = point estimate)
Smallest Clinically Important
New Standard Point 95% CI Difference Assumed to 15%

480/800 = 60% 416/800 = 52% 8% 3% to 13%*

15/25 = 60% 13/25 = 52% 8% -19% to 35%

15/25 = 60% 9/25 = 36% 24% -3 to 51%
Zero Difference = NSD

240/400 = 60% 144/400 = 36% 24% 17% to 31%* Braitman LE. Ann Intern Med 1991;114:515-7.

Braitman LE. Ann Intern Med 1991;114:515-7.

Braitman LE. Ann Intern Med 1991;114:515-7. Statistically significant = *

Relative vs Absolute Differences Relative vs Absolute Differences

Risk in exposed = Absolute Risk Risk of an adverse event

Risk in unexposed = Absolute Risk Drug No Drug RR Increased risk Absolute increase
500/1000 300/1000 1.67 67% 200 more per 1000
100/1000 60/1000 1.67 67% 40 more per 1000
Absolute Risk (exposed) = Relative Risk (RR) 5/1000 3/1000 1.67 67% 2 more per 1000
Absolute Risk (unexposed)

With RR you lose the numerator and denominator


Exercise: GUSTO Trial GUSTO

Calculate the number-needed-to treat to save 1 life What is a unsucessful outcome? death
Using alteplase (t-PA) instead of streptokinase
Calculate the CI for this NNT How many got t-PA (alone)? 10,344
What is the number-needed-to harm (NNH) for t-PA How many died? 651
instead of streptokinase for intracranial bleeds?
NNT = 1 Absolute Risk Reduction (ARR) Death as a % 6.3%
Number of patients tx with t-PA instead of streptokinase How many got streptokinase? 20,173
to achieve one positive outcome
The smaller the number the better How many died? 1473
NNH = 1 Absolute Risk Increase [Attributable Risk]
Death as a % 7.3%
Number tx that results in one negative outcome
The Gusto Investigators. NEJM 1993;329:673. The Gusto Investigators. NEJM 1993;329:673.

Hemorrhagic Stroke
Relative Risk 0.86 104/20,023 = 0.52% for streptokinase
Relative Risk Reduction 13.6% * 0.72% for t-PA
(CI) (5.9% to 21.3%) Attributable risk = 0.72% - 0.52% = 0.2%
Absolute Risk Reduction 1% NNH = 1 AR = 1 0.002 = 500
(CI) (0.4% to 1.7%)
NNT 100 (250 to 59)
*Based on CI given in article, not calculated 14%
Estimated based on CI of RRR
The Gusto Investigators. NEJM 1993;329:673. The Gusto Investigators. NEJM 1993;329:673.

GUSTO Common Statistical Tests

Balancing Benefits and Risks It is beyond the scope of a 1-hour statistics overview
For every 1000 patients treated with t-PA instead to review all of the most common statistical tests
of streptokinase You can email me for a handout that will help you
10 more patients will survive practice determining whether the appropriate tests
2 more patients will have a hemorrhagic stroke are being used in the studies you read
Using the wrong test used to be commonit is not
common, but does occur rarely

The Gusto Investigators. NEJM 1993;329:673.


The Appropriate Test Chi-square

Study design Determine the level of measurement for the
[independent] or cross-over [related] dependent variable
Number of groups being compared Nominal

Levels of measurement Determine the number of groups being compared

Confounders Two or more independent variables
Two or more outcomes
Assumptions of the test
Contingency table method
Use a test with different assumptions
Determine whether the groups are independent or
Independent = Chi-square; Related = McNemars

Chi-square Chi-square
Determine whether the data meet the assumptions Assumptions
of the test Independent observations
For a 2x2 table, consider using the Yates correction Expected cell frequencies (ECF) = RT * CT/GT
Most common No ECF < 1
Considered conservative No more than 20% has an ECF <5
More difficult to find a 1 cell < 5 in a 2x2 table
statistically significant difference
If not met
Type II error
Descriptive stats
Collapse categories
Fishers Exact Test

Chi-square Chi-square

No Row No Row
Bleeding Bleeding Totals Bleeding Bleeding Totals
Individualized 1 49 50 Individualized 1 49 50
Enoxaparin Enoxaparin
(4.9) (45.1) (4.9) (45.1)
Conventional 9 44 53 Conventional 9 44 53
Enoxaparin Enoxaparin
(5.1) (47.9) (5.1) (47.9)
Column 10 93 103 Column 10 93 103
Totals Totals
Chi-Square p = 0.03 Fishers Exact p = 0.02

Barras MA. Clin Pharmacol Ther 2008;83:882-8. Barras MA. Clin Pharmacol Ther 2008;83:882-8.


Chi-square Chi-square

No No
Relapsing Relapsing Row Relapsing Relapsing Row
Spasm Spasm Totals Spasm Spasm Totals
Vigabatrin 0 16 16 Vigabatrin 0 16 16
(2) (14) (2) (14)
ACTH 4 12 16 ACTH 4 12 16
(2) (14) (2) (14)
Column 4 28 32 Column 4 28 32
Totals Totals
Chi Square p = 0.03 Fishers Exact = 0.10
Cossette P. Neurology 1999;52:1691-4.
Cossette P. Neurology 2000;54:539. [Erratum]

Fishers Exact Test Multivariate Statistics

Nominal data

groups and two outcomes [only]
Exposure Outcome
More than 2x2? (DRUG)
Split into several 2x2 tables
Increases Type I error Other explanatory
Freeman-Halton extension variables that could
affect outcome
Useful when a nominal
outcome is rare
Or for small samples

Multivariate Statistics Logistic Regression

Detailed discussion beyond the scope of this Odds Ratio (OR): controlling for other variables in
presentation the model, the odds of having the outcome of
Control for confounders interest
Mantel-Haenszel Chi Square Over-estimates the relative risk
Unless the outcome is rare
Multi-way (eg, 2-way) ANOVA
Analysis of Covariance (ANCOVA) Crude OR: only look at exposure and outcome with
Multiple linear regression
no adjustment
Multiple logistic regression Adjusted OR: adjusts for other extraneous variables
Cox proportional hazard models Confounders


Survival Analysis Survival Analysis

Time to event analysis
Death 1.0

Clinical progression

Cumulative Survival
Kaplan-Meier Plot
Log-Rank Test
Cox Proportional Hazard Model
Hazard Ratio (HR)

Kaplan-Meier Plot Summary

640/850 = 75.5% Understanding commonly used biostatistical
581/840 = 69.2%
concepts enables you to interpret the literature
ARR = 6.1% Descriptive statistics represent the sample, and
hopefully, the population of interest
Confidence intervals can be used to make
inferences about comparative treatments
Incorrect statistical tests in published papers are
Interpreting the results of these tests is more
important and identifies literature deficiencies
Bernard, et al. N Engl J Med 2001;344(10):699