Introduction to Bootstrapping
James Guszcza, FCAS, MAAA
CAS Predictive Modeling Seminar
Chicago
September, 2005
More Concisely
Philosophy
Theoretical Picture
The true
distribution
in the sky
1
2
Y1
1
k
Sample 2
Y 1, Y Y
2
2
2
Y2
2
k
Sample 3
Y , Y 2 Y
3
1
Y3
Sample N
YN1, YN2 YNk
YN
The actual
sample
Y1, Y2 Yk
Re-sample 1
Y* 1, Y* 2 Y*
1
Y*1
Re-sample 2
Y* 1, Y* 2 Y*
2
Y*2
Re-sample 3
Y* , Y* Y*
3
1
3
2
3
k
Re-sample N
Y*N1, Y*N2 Y*Nk
Y*3
Y*N
Y : Y*
0.6
0.4
0.02
0.2
0.01
0.0
0.00
phi.ybar
0.03
0.8
0.04
70
80
90
100
ybar
110
120
98.5
99.0
99.5
100.0
y.star.bar
100.5
101.0
Summary
Motivating Example
500
4.47
raw data
statistic
value
#obs
500
4995.79
mean
98.78
sd
2.5%ile
4812.30
97.5%ile
5195.58
raw data
statistic
value
#obs
500
4995.79
mean
98.78
sd
2.5%ile
4812.30
97.5%ile
5195.58
4700
0.000
4900
0.002
5100
0.004
n(5000,100) data
4700
4800
4900
5000
5100
5200
5300
-3
-2
-1
Resampling
Sample with
replacement 500 data
points from the
original dataset S
Call this S*1
Now do this 999
more times!
S*1, S*2,, S*1000
Compute X-bar on
each of these 1000
samples.
R Code
norm.data <- rnorm(500, mean=5000, sd=100)
boots <- function(data, R){
b.avg <<- c(); b.sd <<- c()
for(b in 1:R) {
ystar <- sample(data,length(data),replace=T)
b.avg <<- c(b.avg,mean(ystar))
b.sd <<- c(b.sd,sd(ystar))}
}
boots(norm.data, 1000)
Results
raw data
statistic
value
#obs
500
4995.79
mean
98.78
sd
2.5%ile
4705.08
97.5%ile
5259.27
4985
4995
5005
X-bar
theory bootstrap
1,000
1,000
5000.00 4995.98
4.47
4.43
4991.23 4987.60
5008.77 5004.82
4985
4990
4995
5000
5005
5010
-3
-2
-1
Percentile method
Just take the desired percentiles of the
bootstrap histogram.
More reliable in cases of asymmetric bootstrap
histograms.
mean(norm.data) - 2 * sd(b.avg)
[1] 4986.926
mean(norm.data) + 2 * sd(b.avg)
[1] 5004.661
raw data
statistic
value
#obs
500
4995.79
mean
98.78
sd
2.5%ile
4705.08
97.5%ile
5259.27
X-bar
theory bootstrap
1,000
1,000
5000.00 4995.98
4.47
4.43
4991.23 4987.60
5008.77 5004.82
And a Bonus
110
105
100
95
90
sample.sd
4985
4990
4995
sample.mean
5000
5005
5010
Severity Data
2700 size-of-loss data points.
severity distribution
4 e-04
2 e-04
0 e+00
10000
20000
30000
40000
50000
0.000
2800
3000
0.002
3200
0.004
3400
2800
3000
3200
3400
-3
-2
0.000
2800
3000
0.002
3200
3400
-1
2800
2900
3000
3100
3200
3300
3400
-3
-2
-1
0.0000
7000
8000
0.0010
9000
7000
7500
8000
8500
9000
-3
-2
-1
6000
5500
5000
sample.sd
2800
2900
3000
3100
sample.mean
3200
3300
3400
80
60
40
20
age
20
40
60
80
100
.28
s.d.() .028
0.20
0.25
0.30
10
0.35
15
0.20
0.25
0.30
0.35
-3
-2
-1
veh
5000
loess line
10000
15000
regression line
density
20000
25000
30000
-0.75
10
-0.70
15
-0.65
20
-0.75
-0.70
-0.65
-3
-2
-1
0.7
0.8
0.9
1.0
bootstrap total LR
0.7
0.8
0.9
1.0
-3
-2
-1
.78
s.d.(LR):
.05
.13
Normal Q-Q Plot
0.0
0.6
1.0
0.8 1.0
2.0
1.2
1.4
3.0
bootstrap total LR
0.6
0.8
1.0
1.2
1.4
-3
-2
-1
Distribution of Capped LR
Closer to frequency
0.55
0.60
10
0.65
15
0.70
0.55
0.60
0.65
0.70
-3
-2
-1
0.076
50
0.080
100 150
200
0.084
0.074
0.076
0.078
0.080
0.082
0.084
0.086
-3
-2
-1
LRtot = .79
LRclean = .58
LLRclean = -27%
LRother = .84
LRRother = +6%
0.4
0.6
0.8
0.4
0.5
0.6
0.7
0.8
0.9
1.0
-3
-2
0.70
0.80
0.90
1.00
-1
0.70
0.75
0.80
0.85
0.90
0.95
1.00
1.05
-3
-2
-1
0.0
0.5
1.0
0.7
2.0
0.9
3.0
1.1
0.5
0.6
0.7
0.8
0.9
1.0
1.1
-3
-2
1.00
1.05
10
1.10
15
-1
1.00
1.05
1.10
-3
-2
-1
-0.1
0.1
0.3
0.5
LRR_other - LRR_clean
0.0
0.2
0.4
0.6
-3
-2
0.0
1.0
0.5
1.5
1.0
2.0
1.5
2.5
LRR_other / LRR_clean
-1
1.0
1.5
2.0
2.5
-3
-2
-1
Bootstrapping Reserves
Same size as S
=8; =1.3
Li+j = Li * (link + )
Bootstrapping Reserves
3 e-04
0 e+00
1 e-04
2 e-04
19000
20000
21000
22000
23000
24000
25000
95% confidence
interval
4 e-04
3 e-04
Mean:
$21.751M
Median: $21.746M
:
$0.982M
/ 4.5%
2 e-04
1 e-04
0 e+00
19000
20000
21000
22000
23000
24000
(19.8M, 23.7M)
25000
19000
0 e+00
21000
2 e-04
23000
25000
4 e-04
19000
20000
21000
22000
23000
24000
25000
-3
-2
-1
References