expenses (ALAE). In this study only the 1991 record relating to Plan type 4 is used. The size of this sample is 3,901.
Univariate descriptive statistics for the the three variables are
provided in Table 1.
1
2
3
Mean
Q25
Q50
Q75
Min.
Max.
LOSS
63,450
31,360
42,310
66,130
25,000
1,430,000
74,665
ALAE
18,920
8,088
13,940
22,850
23
568,000
22,073
Total
82,370
43,040
58,560
88,500
25,320
1,530,000
85,523
Table 1.
St. dev.
1.3
Data description
Philippe Lambert3
g(LOSS, ALAE) =
0
LOSS R +
LOSSR
LOSS
ALAE
(1)
if LOSS R,
if LOSS > R.
300000
0
100000
200000
ALAE
400000
500000
2*10^5
4*10^5
6*10^5
8*10^5
10^6
1.2*10^6
1.4*10^6
LOSS
Sklars construction
12
2.1.2
Ln(ALAE)
10
10
11
12
13
(3)
14
Ln(LOSS)
2.1.3
Figure 1. Bivariate plots of LOSS and ALAE on original and
log-scale.
Copulas
2.1.1
Definition
where the Fi1 s, i = 1, 2, are the quantile functions associated with the Fi s, i.e.
If C is considered to be a distribution function of two random variables U1 and U2 , the first condition ensures that U1
and U2 have uniform marginal distributions. The second condition, known as the rectangular inequality, simply requires
that C is a valid distribution function, that is P (u11 U1
u12 , u21 U2 u22 ) 0.
Another important property is that copulas capture the dependence between two variables, regardless of the scale in
which each variable is measured. More precisely, given two
arbitrary functions g1 and g2 strictly increasing over the range
of X1 and X2 , then the transformed variables g1 (X1 ) and
g2 (X2 ) have the same copula than X1 and X2 .
Probably the most frequently used copulas are the
Archimedean ones. Copulas in this family can be written in
the following way
F (x1 , x2 ) = P (X1 x1 , X2 x2 )
Since F1 (X1 ) and F2 (X2 ) are uniformly distributed between
0 and 1, their joint distribution for every (u1 , u2 ) [0, 1]2 can
be expressed as
C(u1 , u2 )
34
The traditional measure of dependence, the Pearson correlation coefficient, presents some drawbacks as a measure for
bivariate distributions (see Joe (1997), page 32, as well as Denuit and Dhaene in this volume). Measures with better properties include Kendall and Spearman . The latter can be
expressed uniquely in terms of the copula. Specifically, the
following representations hold true:
1 1
{C(u, v) uv}dudv
= 12
4
0
2.2
0
0
1 1
0
2.2.2
C(u, v)dC(u, v) 1.
2.2.3
Some examples
Gumbel copula
Definition
Logistic model
1/r
ln(u2 )
C(u1 , u2 ) = exp [ln(u1 ) + ln(u2 )]A
ln(u1 ) + ln(u2 )
This model is very flexible and contains several of the existing models such as the logistic ( = = 1), the biextremal,
the dual of the biextremal, the Gumbel, as well as a mixture
of logistic and independence models. Complete dependence
corresponds to = = 1 and r = whereas independence
corresponds to = 0 or = 0 or r = 1.
35
Mixed model
ln u1 ln u2
C(u1 , u2 ) = u1 u2 exp
ln(u1 u2 )
with
(v) =
3.2
and
x2x1 v
C(x1 , xx2 1 v )
dx1
x1
1
v
Kendalls tau
(v) = (1 A )v log(v)
(5)
1. Logistic model:
A = 1 1/r.
2. Mixed model:
8 arctan /(4 )
2.
A =
(4 )
2
C(u1 , u2 ; p).
u1 u2
Assuming known margins, only C depends on the parameters to be estimated and the function
Parameter estimation
L(p) =
n
(6)
i=1
Genest and Rivest (2000) suggest a general formula for computing the distribution function K of the random variable
V = F (X1 , X2 ) = C(F1 (X1 ), F2 (X2 )).
3.4
Nonparametric estimation of C
In contrast to the univariate case, it is not true that V is uniformly distributed over [0, 1]. In fact, any distribution function
K for F (X1 , X2 ) has to fulfill the inequalities
v K(v) 1
for 0 v 1. Moreover, K(v) = v and K(v) = 1 correspond to the two extreme cases where the random variables
X1 and X2 are in perfect positive dependence (comonotonicity) and in perfect negative dependence (countermonotonicity), respectively.
1 , u2 ) = F (F 1 (u1 ), F 1 (u2 ))
C(u
1
2
36
0.90
0.85
estimated A(w)
0.95
1.00
where F, F1 and F2 represent estimators of the bivariate distribution function F and of the corresponding marginal distributions F1 and F2 . In this work, the empirical distribution
function will be used to estimate each of the marginal functions F1 and F2 , but kernel or parametric estimators (if proper
distributions are found to fit the margins) could also be used.
Note that a kernel estimator, based on the pseudoobservations
F (x1i ), F (x2i )),
ci = C(
can be be used to estimate the function K.
Mixed
Logistic
Asym. logistic
4 LOSS-ALAE analysis
0.0
4.1
0.2
0.4
0.6
Estimation results
Table 2 summarizes estimations for the three parametric models discussed in Section 2.3. Not only the value of the objective function (6) is given, but also an AIC goodness-of-fit
measure,
AIC = 2L + 2np /n
Figure 2.
Model
Parameters
A
(6)
AIC
Logistic
r
=1.406
0.289
-7,321.3
14,643
Asym. Log.
=0.825,
=0.983,
0.288
-7,315.7
14,631
0.289
-7,327.3
14,654
4.2
Table 2.
Goodness-of-fit tests
In addition to the graphical tools, some more objective measures and tests are desirable to complete the goodness-offit analysis. First, we performed a general test proposed by
Ghoudi et al. (1998) to verify whether the copula belongs
to the EV family without specifying a particular dependence
function. It is based on the distribution function of the copula,
K; more precisely, the hypothesis H0 : K(v) = K (v) with
K of the form (5) for some (0, 1) is checked using the
test statistic
r
=1.486
=0.738
1.0
Mixed
0.8
Sn
8
ij
n(n 1)
i=j
n
9
ij ik 1,
n(n 1)(n 2)
i=j=k
where
ij = [(X1i , x2i ), (X1j , x2j )]
and
[(x1 , y1 ), (x2 , y2 )] =
1 if x1 x2 and y1 y2
0 otherwise.
2
n 1 i
Sn Sn .
n i=1
80.1
Cnp-Cp
0.060.0
0.04
-0.02 0 0.02
Cnp-Cind
0.080.1
0.06
0.04
-0.02 00.02
0.
0.
8
0.
6
F( 0.4
x2
)
0.
2
0.6
0
0.8
F( 0.4
x2
)
0.4 )
1
F(x
0.2
0.
0.
2
0
Independent model
Cnp-Cp
0.0
0.06
0.04
-0.02 0 0.02
80.1
80.1
Logistic model
0.0
0.06
0.04
-0.02 0 0.02
Cnp-Cp
0.2
0.6
0.4 )
1
F(x
0.8
1
0.
8
0.
F( 0.4
x2
)
0.
1
0.6
0.4
0.
2
0
0.2
0.
0.8
F( 0.4
x2
)
1)
F(x
0.
2
0
0.2
0.6
0.4 )
1
F(x
0.8
Mixed model
Figure 4.
Figure 3.
38
Mixed
0.0121
0.0118
0.0118
0.61
0.64
0.64
p-value
1.0
0.8
0.6
0.4
K(ci) estimation
0.2
Logistic
Nonparametric estimation
Independent estimation
0.0
Test
K-S test statistic
Independent model
0.0
0.2
0.4
0.6
0.8
1.0
ci
Mixed model
0.2
0.4
K(ci) estimation
5 Reinsurance premiums
0.6
0.8
1.0
Table 3.
0.0
Nonparametric estimation
MIxed estimation
0.0
0.2
0.4
0.6
0.8
1.0
ci
1.0
Logistic model
0.2
0.6
0.4
K(ci) estimation
0.8
0.0
Nonparametric estimation
Logistic estimation
0.0
0.2
0.4
0.6
0.8
1.0
ci
1.0
0.2
=
g LOSSt , FALAE
n t=1
0.6
t =1
0.4
n
n
1
g(LOSSt , ALAEt ).
n2 t=1
K(ci) estimation
=
0.8
0.0
Nonparametric estimation
Asym. Logistic estimation
0.0
0.2
0.4
0.6
ci
Figure 5.
39
Estimation of K .
0.8
1.0
0.6
0.4
0.0
0.2
Independent C quantiles
0.8
1.0
Independent model
o
oooooo
oooooo
ooooo
ooooo
ooooooo
ooooooo
o
o
o
o
o
o
oooooo
ooooo
ooooo
oooooo
oooooo
oooooo
ooooo
oooooo
o
o
o
o
ooo
oooooo
oooooo
oooooo
oooooo
ooooooo
ooooooo
oooooo
o
o
o
o
o
o
oooo
oooooo
oooooo
oooooo
ooooooo
oooooo
oooooo
oooooo
o
o
o
o
o
o
o
oooooo
oooo
ooooooo
oooooo
oooooo
ooooooo
ooooooo
ooooooooo
o
o
o
o
o
ooooo
oooooooo
oooooo
oooooo
oooooooo
oooooo
oooooooo
oooooooo
o
o
o
o
o
o
o
o
o
ooooooo
ooooooo
oooooooo
oooooooo
ooooooooo
ooooooooo
ooooooo
oooooooooo
o
o
o
o
o
o
o
o
o
ooooo
oooooooooo
ooooooooooo
0.0
0.2
0.4
0.6
0.8
=
b.- or we estimate under one of the copula models presented in Section 2.3. Under these models, the pure premium can be estimated by
1.0
= E[g(LOSS,
ALAE)] =
Kernel C quantiles
0.6
0.4
0.0
0.2
Mixed C quantiles
0.8
1.0
Mixed model
ooo
oooooo
oooooo
oooooo
oooooooo
oooooo
ooooo
o
o
o
o
o
o
ooo
ooooooo
oooooo
ooooooo
ooooooo
oooooooo
oooooo
ooooooo
o
o
o
o
o
o
oooo
ooooooo
oooooo
ooooooo
oooooo
oooooooo
ooooooo
ooooooo
o
o
o
o
o
o
ooooo
ooooooo
oooooo
ooooooo
ooooooo
oooooooo
ooooo
oooooooo
o
o
o
o
oooooo
oooooooo
oooooo
ooooooo
ooooooo
ooooooo
oooooo
ooooooo
o
o
o
o
o
ooooo
ooooooo
ooooooo
oooooooo
ooooooo
ooooooo
oooooo
oooooooo
o
o
o
o
o
o
oooooo
oooooooo
ooooooo
ooooooo
ooooooo
ooooooo
oooooo
oooooooo
o
o
o
o
o
o
ooooo
ooooooo
ooooo
0.0
0.2
0.4
0.6
0.8
where
fLOSS,ALAE (l, a) =
[FLOSS (l), FALAE (a)].
fLOSS (l) fALAE (a) C
Since no parametric models have been posed for the margins, the marginal densities are estimated using kernel
methods. Having an estimator of the bivariate density function available, the double integral is then calculated using
numerical procedures. Since the integrand is a rather complicated function, a powerful software to perform analytical
calculations is necessary; in this case MathCad 2000 has
been used. Although the procedure is not complicated, it is
rather time consuming, mainly due to the evaluations of the
kernel estimators of the density and distribution functions.
The code is available from Ana Cebrian upon request.
1.0
Kernel C quantiles
0.6
0.4
0.0
0.2
Logistic C quantiles
0.8
1.0
Logistic model
ooo
oooooo
ooooooo
ooooooo
ooooooo
oooooo
o
o
o
o
o
o
ooooooo
ooooooo
oooooo
oooooo
oooooo
ooooooo
oooooo
oooooo
o
o
o
o
o
oooooo
ooooooo
oooooo
oooooo
ooooooo
ooooooo
ooooooo
oooooo
o
o
o
o
o
o
oooooo
ooooooo
ooooooo
ooooooo
oooooo
ooooooo
oooooo
oooooooo
o
o
o
o
ooooooo
oooooooo
oooooo
ooooooo
oooooo
ooooooo
ooooo
ooooooo
o
o
o
o
oooooo
oooooo
ooooooo
oooooooo
oooooo
ooooooo
oooooooo
ooooooo
o
o
o
o
o
o
ooooo
oooooooo
ooooooo
ooooooo
oooooooo
ooooooo
ooooooo
oooooooo
o
o
o
o
o
o
oooooo
ooooooo
oooooo
0.0
0.2
0.4
0.6
0.8
The
values for R =25000, 50000, 100000, 500000,
1000000 estimated under the different models are shown in
Table 4. We see that substantial mispricing could result from
the independence hypothesis, while the comononotic approximation is too conservative. Thus, as expected, independence
generates lower premiums than those taking into account dependence in the data, which themselves are smaller than those
based on the comonotonic assumption. Two estimators considering dependence are shown, the one based on the empiric
dependence of the data and the one based on the extreme value
copula with the mixed model, that provided the best fit to the
bivariate sample. Comparing both estimators, we observe that
they are rather similar for small R values, but differ substantially for higher values of R. It must be remembered that, in
fact, the main drawback of nonparametric estimation for is
that for high R values the sample size is not large enough to
get reliable estimations.
1.0
Kernel C quantiles
0.6
0.4
0.2
0.0
A. Logistic C quantiles
0.8
1.0
0.2
0.4
0.6
0.8
6 Conclusions
To conclude we can say that the results show the importance
to consider the dependence structure of the two variables: substantial mispricing can result from the independence assumption, while the comonotonic approximation seriously overestimates the premium.
1.0
Kernel C quantiles
Figure 6.
1
g(LOSSt , ALAEt );
n t=1
40
25,000
50,000
100,000
500,000
1,000,000
Independence
46,365.7
25,642.3
12,785.5
1,247.1
230.3
empirical
48,350.6
27,639.1
13,842.3
1,336.6
244.7
(3901)
(1516)
(470)
(17)
(3)
Mixed model
48,357.5
27,804.6
14,188.3
1,431.6
269.3
Comonotonicity
50,318.8
29,760.7
15,374.9
1,570.6
302.1
(# obs. LOSS> R)
Table 4.
ACKNOWLEDGEMENTS
Ana Cebrian thanks the Universite catholique de Louvain,
Louvain-la-Neuve, Belgium, for financial support through a
FSR research grant.
REFERENCES
[1] Caperaa, P., Fougeres, A.-L. and Genest. (1997). A nonparametric estimation procedure for bivariate extreme value copulas. Biometrika 84,3,
567-577.
[2] Cebrian, A., Denuit, M., & Lambert, Ph. (2003). Generalized Pareto
fit to the society of Actuaries large claims database. North American
Actuarial Journal 7, 18-36.
[3] Denuit, M., Dhaene, J., and Ribas, C. (2001). Does positive dependence between individual risks increase stop-loss premiums? Insurance: Mathematics and Economics 28, 305-308.
[4] Frees, E.W. and Valdez, E. A. (1998). Understanding relationships using copulas. North American Actuarial Journal 2, 1-15.
[5] Genest, C. and Rivest, L.P. (2000). On the Multivariate Probability Integral Transformation. Statistics and Probability Letters 53, 391-399.
[6] Ghoudi, K., Khoudraji, A. and Rivest L.P. (1998). Proprietes Statistiques des Copules de Valeurs Extremes Bidimensionnelles. The Canadian Journal of Statistics 26 No 1, 187-197.
[7] Grazier, K.L., and GSell Associates (1997). Group Medical Insurance
Large Claims Database and Collection. SOA Monograph M-HB97-1.
[8] Joe, H. (1997). Multivariate Models and Dependence Concepts. Chapman & Hall, London
[9] Klugman, S. A. and Parsa, R. (1999). Fitting Bivariate Loss Distributions with Copulas. Insrance: Mathematics and Economics 24, 139148.
[10] Nelsen, R.B. (1999). An Introduction to Copulas. Lecture Notes in
Statistics 139, Springer Verlag, New York.
[11] Tawn, J. (1988). Bivariate Extreme Value Theory: Models and Estimation. Biometrika 75, 397-415.
[12] Scaillet, O. (2000). Nonparametric Estimation of Copulas for Time Series. Downloadable on http://www.hec.unige.ch.
41