DR Andy Vail - Statistics Presentation PDF

Statistics for Diagnostic Tests
Andy Vail
Biostatistics
University of Manchester
FOCUS Training Day, 30th April 2012
Opener
• Assume DNA-fingerprinting is very good

– 1 in 20 million chance of incorrect match
• Defendant has DNA match
• Do the odds:
A. Overwhelmingly imply guilty?
B. Favour guilty?
C. Favour innocent?
Outline
• ‘Proof of concept’ studies

– Summary statistics
– Confidence Intervals
– Sample size calculations
• Diagnostic accuracy studies
– Likelihood & odds ratios
– Receiver Operating Characteristic (ROC) curves
• Clinical/cost effectiveness studies
Definition
• A process by which a sample from an individual

(patient, location,…) is assigned a value which is
used to decide if a particular attribute (disease,
microorganism, trait, genotype,...) is present or
absent.
• Underlying process may give a numerical value, but
this is dichotomised to make a decision
Proof of concept (1)
• Given a new measurement, how to assess value?

• First show that measurement is reliable
– Between analyses on the same sample
• Different days/operators/reagent batches
• Different laboratories
– Between repeat samples on the same individual
• “Same” time
• Over time
• Always assess ‘blinded’ to earlier result(s)
Reliability analysis
• Categories
– Cohen’s Kappa is a measure of the strength of agreement
between two categorisations, adjusted for chance
agreement
– kappa = (observed – chance) / (maximum - chance)
• Continuous
– Bland-Altman: plot differences against average value
Proof of concept 2
• Given reliability, does it detect barn-door differences?

• Identify clear cases and clear non-cases
30
Frequency
10 0
0 5 10
Frequency
10 020
0 5 10
Discrimination analysis
• Typically see tests of association

– Mann-Whitney, t-test, chi-squared, Fisher’s
• Interest here is not in average difference
• Interest is extent of overlap
– Potential for mis-classification
Sensitivity and Specificity
True Diagnosis Sensitivity - proportion of cases

Y N correctly identified = a/(a+c)
Test Result
Y a b
N c d Specificity - proportion of controls

correctly identified = d/(b+d)
True Diagnosis
Y N Sensitivity = 90/100 = 90%
Test Result
Y 90 20 Specificity = 80/100 = 80%

N 10 80
Confidence intervals
• Estimates are no use alone

• Need idea of precision: “give or take a bit”
– “95% confidence interval” sounds more convincing!
• Sensitivity and specificity are just percentages
– Standard methods for CI of a proportion
– 8/10 = 80% (44% to 97%)
– 80/100 = 80% (71% to 87%)
– 800/1000 = 80% (77% to 82%)
95% Confidence Intervals
• If a study were repeated many times, 95% of such

intervals would contain the true value
• Sadly not “95% chance that it contains true value”
• The bigger the study, the tighter the confidence

interval will be
Sample size calculation
• In case-control studies, can fix number of each group

• Need some idea of desirable/plausible sens & spec
-- 95% Confidence interval --

Proportion n=50 n=100 n=200
80% 67 to 89 71 to 87 74 to 85
90% 79 to 96 83 to 94 85 to 93
95% 85 to 98 89 to 98 91 to 97
100% 93 to 100 96 to 100 98 to 100
Diagnostic Accuracy Studies
• Barn-door differences: so what?

• Clinical diagnosis occurs in real, messy cases
• Need
– independent “Gold” or “Reference” Standard
– Usually prospective cases
– Follow-up of test negatives
Same table, more stats!
True Diagnosis • Sens: a/(a+c)

Y N
• Spec: d/(b+d)
Test Result
Y a b
• PPV: a/(a+b)
N c d
• NPV: d/(c+d)
• LR+: sens/(1-spec)
• LR-: (1-sens)/spec
• DOR: (a/c)/(b/d) = ad/bc
Predictive values (PPV & NPV)
• Depend on prevalence
– meaningless in Case-Control design
– may not transfer to different settings
• Positive Predictive Value
– Proportion of positive results that are correct
– PPV = (Prev x Sens) / [Prev x Sens + (1- Prev)(1-Spec)]
• Negative Predictive Value

– Proportion of negative results that are correct
– NPV = [(1-Prev) x Spec) / [(1-Prev) x Sens + Prev x (1-Spec)]
Likelihood Ratios
• Measure of information contained in test result

• Unlike PPV & NPV, no direct dependence on prev
– But only useful in real context
• LR+
– How much more likely to test +ve if affected
• LR-
– How much more likely to test –ve if affected
Diagnostic odds ratio
• Attempt to summarise value as single figure

• DOR = (LR+)/(LR-)
– How much higher odds that test +ve if affected
– How much higher odds that affected if test +ve
• Increasing use
– logistic regression and meta-analysis
Example
True Diagnosis • Sens: 90/(90+10) = 90%

Y N
• Spec: 80/(20+80) = 80%
Test Result
Y 90 20
• PPV: 90/(90+20) = 82%
N 10 80
• NPV: 80/(10+80) = 89%
• LR+: 90/20 = 4.5
• LR-: 10/80 = 0.125
• DOR: (90/10)/(20/80) = 36
ROC curves
• In practice, usually have continuous score (assay)

• Need to make categorical decision (yes/no)
• Choose threshold value to determine diagnosis
• Higher threshold leads to:
– Fewer diagnoses
– Lower sensitivity (as fewer ‘affected’ testing positive)
– Higher specificity (as more ‘unaffected’ testing negative)
20 15
Frequency
10 5
0
0 5 10
test result
Determining a cut-off
• Trade-off between sensitivity and specificity

• Produce table for all possible cut-off values
• Calculate and plot Sens v (100% - Spec) for each
• At minimum possible value
– Everyone tests positive, so Sens = 100%, Spec = 0%
• At maximum possible value
– Everyone tests negative, so Sens = 0%, Spec =100%
Minimum
threshold
1.0
Perfect Test
0.8
Sensitivity
0.4 0.6
Guessing
0.2
0.2
0.0
0.0
Maximum
threshold 0.0 0.2 0.4 0.6
0.6 0.8
0.8 1.0
1.0
1-Specificity
1-Specificity
1.0
0.8 Preferable if
high sens key
Sensitivity
0.6
‘optimal’ to
0.4
minimise errors
0.2
0.2
Preferable if
high spec key
0.0
0.0
0.0 0.2 0.4 0.6

0.6 0.8
0.8 1.0
1.0
1-Specificity
1-Specificity
Area under ROC (AuROC)
• Perfect discrimination: AuROC=1

• Guesswork: AuROC=0.5
• AuROC = probability that randomly chosen affected
person will have higher value than randomly chosen
unaffected person
1.0
Perfect Test
AuROC=1.0 0.8
Sensitivity
0.6
AuROC=0.94
0.4
Guessing
0.2
0.2
AuROC=0.59
AuROC=0.5
0.0
0.0
0.0 0.2 0.4 0.6

0.6 0.8
0.8 1.0
1.0
1-Specificity
1-Specificity
STARD
• STAndards for the Reporting of Diagnostic accuracy

• Checklist (with justification) for reporting:
– Title (1 item)
– Intro (1)
– Method (11)
– Results (11)
– Discussion (1)
STARD examples
• 12. Describe methods for calculating or comparing

measures of diagnostic accuracy, and the statistical
methods used to quantify uncertainty (e.g. 95%
confidence intervals).
• 18. Report distribution of severity of disease (define
criteria) in those with the target condition; other
diagnoses in participants without the target condition.
• 22. Report how indeterminate results, missing
responses and outliers of the index tests were
handled.
Clinical/cost effectiveness
• Are sens, spec, PPV, NPV, AuROC, etc enough?

• What is a good diagnostic test?
– One that makes a clinical difference!
• Ultimately need comparison studies
– ‘Act on new test’ versus ‘Act in ignorance of new test’
– Sample size as for other randomised trials
– CONSORT rather than STARD reporting
– Remarkably rare!
Resources
• BMJ Statistical notes series

– Altman & Bland 1994;308:1552. sensitivity and specificity.
– Altman & Bland 1994;309:102. predictive values.
– Altman & Bland 1994;309:188. roc plots.
– Altman & Bland 2004;329:168. likelihood ratios.
• Bland & Altman. Lancet 1986, i, 307. assessing
agreement between two methods.
• STARD: http://www.stard-statement.org/
Prosecutor’s fallacy
Guilty Innocent Total

DNA Match
No match
Total

DNA Match
No match
Total 1 ~60 million ~60 million

DNA Match 3
No match ~60million

DNA Match 1 2 3
No match 0 ~60million ~60million
• Sens=100%, 1-Spec = 1/20million

• Prev = very low = 1/60million
• Probability that guilty given positive match?
– PPV = 1/3
– Odds are in defendant’s favour!
Summary
• Statistics of diagnostic test accuracy largely just

percentages and ratio of percentages.
• Difficulty in technical versus lay language
• Need to ensure statistics and interpretation are
appropriate to phase of research

DR Andy Vail - Statistics Presentation PDF

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

DR Andy Vail - Statistics Presentation PDF

Diunggah oleh

Hak Cipta:

Format Tersedia

Statistics for Diagnostic Tests

• Assume DNA-fingerprinting is very good

• ‘Proof of concept’ studies

• A process by which a sample from an individual

• Given a new measurement, how to assess value?

• Always assess ‘blinded’ to earlier result(s)

• Given reliability, does it detect barn-door differences?

• Typically see tests of association

True Diagnosis Sensitivity - proportion of cases

N c d Specificity - proportion of controls

Y 90 20 Specificity = 80/100 = 80%

• Estimates are no use alone

• If a study were repeated many times, 95% of such

• Sadly not “95% chance that it contains true value”

• The bigger the study, the tighter the confidence

Sample size calculation

• In case-control studies, can fix number of each group

-- 95% Confidence interval --

• Barn-door differences: so what?

Same table, more stats!

True Diagnosis • Sens: a/(a+c)

• Negative Predictive Value

• Measure of information contained in test result

• Attempt to summarise value as single figure

True Diagnosis • Sens: 90/(90+10) = 90%

• In practice, usually have continuous score (assay)

• Trade-off between sensitivity and specificity

0.0 0.2 0.4 0.6

Area under ROC (AuROC)

• Perfect discrimination: AuROC=1

0.0 0.2 0.4 0.6

• STAndards for the Reporting of Diagnostic accuracy

• 12. Describe methods for calculating or comparing

• Are sens, spec, PPV, NPV, AuROC, etc enough?

• BMJ Statistical notes series

Guilty Innocent Total

Guilty Innocent Total

Guilty Innocent Total

Guilty Innocent Total

• Sens=100%, 1-Spec = 1/20million

• Statistics of diagnostic test accuracy largely just

Anda mungkin juga menyukai