discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/227686869
CITATIONS READS
404 2,008
1 author:
Roger Johnson
South Dakota School of Mines and Technology
40 PUBLICATIONS 620 CITATIONS
SEE PROFILE
All content following this page was uploaded by Roger Johnson on 22 April 2015.
The user has requested enhancement of the downloaded file. All in-text references underlined in blue
are linked to publications on ResearchGate, letting you access and read them immediately.
An Introduction to the Bootstrap
KEYWORDS: Roger W. Johnson
Teaching; South Dakota School of Mines & Technology,
Standard error; USA.
Condence interval; e-mail: rwjohnso@taz.sdsmt.edu
Minitab;
Bias; Summary
Mean square error. This article presents bootstrap methods for
estimation, using simple arguments. Minitab
macros for implementing these methods are given.
^ INTRODUCTION ^
A bootstrap method of estimating the standard This bootstrap method may be used with even
error of X now involves a modication of the smaller sized data sets than that given above.
above procedure. In particular, use the sample as Loosely speaking, however, the bootstrap idea of
an approximation of our population. Specically, approximating the population by the sample
take samples with replacement of size n from the becomes more questionable as the sample size, n,
data to approximate samples of size n from the decreases. As with other statistical procedures, our
population. If you think that this is akin to `lifting trust in the bootstrap will grow with increased
yourself by your bootstraps' you are not alone! sample size.
Here, then, is a bootstrap method for estimating
the standard error of X: In the previous example there was, of course, no
File: sedriver.txt
noecho
erase c10
let k1 = n(c1) # the data must have previously been put in column c1
let k2 = 200 # number of bootstrap samples, B
execute 'bootstrp.txt' k2
echo
let k3 = stdev(c10) # se of mean
print k3
end
File: bootstrp.txt
sample k1 c1 c11;
replace.
let k20 = mean(c11)
stack c10 k20 c10
Use execute sedriver.txt at the Minitab prompt to run this bootstrap procedure
Fig 2. Bootstrap standard error code
^ BOOTSTRAP CONFIDENCE ^
As a nal example to illustrate the above bootstrap
INTERVALS
method of estimating standard errors, consider
male mortality rate averaged over the years 1958^ In this section we outline a bootstrap method for
1964 for towns in England and Wales versus producing a condence interval (see Rice 1995). As
calcium (from Hand et al. 1994, pp. 5^6), shown before, it is helpful to compare this method with a
as a scatter diagram in gure 3. The calcium standard technique, so we start with the well-
concentration may be thought of as a measure of known case of estimating a mean with a `large'
water hardness; the higher the calcium concen- sample size. Returning to the M1 motorway data,
tration, the harder the water. The correlation an approximate 95% condence interval for the
coecient between male mortality and calcium mean interarrival time m, by the central limit
concentration for the 61 data points shown is theorem, is
0.655. s s
x 1:96 p ; x 1:96 p 5:36; 10:24
n n
Now we give the rationale for a bootstrap con-
dence interval. Suppose we can nd values c1 and
c2 so that
PX m c2 0:975 and PX m c1 0:025
1
File: cidriver.txt
noecho
erase c10
let k1 = n(c1) # the data must have previously been put in column c1
let k2 = 1000 # number of bootstrap samples, B
execute 'bootstrp.txt' k2
sort c10 c11
let k3 = 0.95 # desired condence level
let k4 = round(k2*(1^k3)/2)
let k5 = round(k2*(1+k3)/2)
let k6 = c11(k4)
let k7 = c11(k5) # (k6,k7) is the percentile interval
let k8 = mean(c1)
let k10 = 2*k8^k7
let k11 = 2*k8^k6
print k10 k11 # 100*k3% condence interval (4) for mean
end
Use execute cidriver.txt at the Minitab prompt to run this bootstrap procedure
Fig 4. Bootstrap condence interval code
^ FURTHER DETAILS ^
Acknowledgement
When trying to assess the performance of an Thanks are due to the referee for comments that
estimate y^ of y we will, in general, be concerned led to an improved presentation.
Teaching Statistics. Volume 23, Number 2, Summer 2001 . 53
References Handbook of Small Data Sets. London:
Edgington, E. (1995). Randomization Tests Chapman & Hall.
(3rd revised edn). New York: Marcel Raspe, R.E. (1785). The Adventures of Baron
Dekker. Munchausen.
Efron, B. and Tibshirani, R. (1986). Bootstrap Reeves, J. (1995). Resampling stats. Teaching
methods for standard errors, condence Statistics, 17(3), 101^3.
intervals, and other measures of statis- Rice, J. (1995). Mathematical Statistics and
tical accuracy. Statistical Science, 1(1), Data Analysis (2nd edn), pp. 271^2.
54^77. Ricketts, C. and Berry, J. (1994). Teaching
Efron, B. and Tibshirani, R. (1993). An Intro- statistics through resampling. Teaching Stat-
duction to the Bootstrap. London: Chapman istics, 16(2), 41^4.
& Hall. Stout, W., Travers, K. and Marden, J.
Good, P. (2000). Permutation Tests: A Prac- (1999). Statistics: Making Sense of Data
tical Guide to Resampling Methods for (2nd edn). Rantoul, Illinois: Mbius Com-
Testing Hypotheses (2nd edn). New York: munications.
Springer. Tae, J. and Garnham, N. (1996). Re-
Hand, D., Daly, F., Lunn A., McConway, sampling, the bootstrap and Minitab.
K. and Ostrowski, E. (eds) (1994). A Teaching Statistics, 18(1), 24^5.