Anda di halaman 1dari 37

STAT3010/6075 Statistical Methods in Insurance

Section 6: Generalised Linear Models


Outline
1. Two results from statistical theory
2. Exponential family
3. Normal regression model
4. Poisson regression model
5. Binomial regression model
6. Generalised linear models
7. Explanatory variables
8. Linear predictor
9. Scaled deviance
10. Residuals
1
Two results from statistical theory
If
l(, ; y) = log f
Y
(y; , )
is the log-likelihood function then, under some mild
regularity conditions,
E
Y
_
l

_
= 0 (1)
and
E
Y
_

2
l

2
_
= E
Y
_
_
l

_
2
_
(2)
Proof
l

log f
Y
=
1
f
Y
f
Y

Hence
E
Y
_
l

_
=
_
l

f
Y
dy
=
_
f
Y

dy
=

_
f
Y
dy = 0
2

2
l

2
=

_
1
f
Y
f
Y

_
=
1
f
Y

2
f
Y

2

1
f
2
Y
_
f
Y

_
2
Hence
E
Y
_

2
l

2
_
=
_

2
f
Y

2
dy E
Y
_
_
1
f
Y
f
Y

_
2
_
= 0 E
Y
_
_
l

_
2
_
3
Example: Y N(,
2
)
l(,
2
; y) =
1
2
log
_
2
2
_

(y )
2
2
2
E
Y
_
l

_
= E
Y
_
Y

2
_
= 0
E
Y
_

2
l

2
_
=
1

2
E
Y
_
_
l

_
2
_
= E
Y
_
(Y )
2

4
_
=
1

2
4
Exponential family
A random variable Y has a distribution belonging to
the exponential family if its p.d.f. can be written as
f
Y
(y; , ) = exp
_
y b()
a()
+ c(y, )
_
Here
a(), b() and c() are specied functions
is the natural or canonical parameter
is the scale or dispersion parameter
5
Example: Y N(,
2
)
f
Y
(y; , ) =
1

2
2
exp
_
(y )
2
2
2
_
= exp
_
y
2
+ 2y
2
2
2

1
2
log(2
2
)
_
= exp
_
y
2
/2

2

1
2
_
y
2

2
+ log(2
2
)
__
= exp
_
y b()
a()
+ c(y, )
_
where
= the canonical parameter
a() = =
2
the dispersion parameter
b() =

2
2
=

2
2
6
Example: Y Po()
f
Y
(y; , ) =

y
e

y!
= exp(y log log y!)
Hence
= log
a() = = 1
b() = = e

c(y, ) = log y!
7
Example: Z Binomial(m, )
Consider Y = Z/m
As
f
Z
(z; , ) =
_
m
z
_

z
(1 )
mz
f
Y
(y; , ) =
_
m
my
_

my
(1 )
mmy
= exp
_
my log + (mmy) log(1 ) + log
_
m
my
__
= exp
_
m{y log + (1 y) log(1 )} + log
_
m
my
__
= exp
_
m
_
y log
_

1
_
+ log(1 )
_
+ log
_
m
my
__
Hence
= log
_

1
_
Note that =
e

1 + e

= m and a() =
1

=
1
m
b() = log(1 ) = log(1 + e

)
8
Mean
If Y has a distribution belonging to the exponential
family then the log-likelihood function for one
observation is
l(, ; y) =
y b()
a()
+ c(y, )
To nd E(Y ) note that by Result (1)
0 = E
Y
_
l

_
= E
Y
_
Y b

()
a()
_
Hence
E
Y
(Y ) = b

()
9
Variance
Var(Y ) = E
Y
[{Y E
Y
(Y )}
2
]
= E
Y
[{Y b

()}
2
]
= E
Y
_
_
a()
l

_
2
_
since
l

=
y b

()
a()
= a
2
()E
Y
_
_
l

_
2
_
= a
2
()E
Y
_

2
l

2
_
by Result (2)
= a()b

() since

2
l

2
=
b

()
a()
To emphasise the dependence on the mean = E
Y
(Y )
we write
Var(Y ) = a()V ()
where the variance function V () = b

()
10
Examples (continued)
Normal: b() =
2
/2
E(Y ) = b

() = =
V () = b

() = 1
Var(Y ) = a()V () =
2
1
Poisson: b() = e

E(Y ) = b

() = e

=
V () = b

() = e

=
Var(Y ) = a()V () = 1 e

=
Binomial: b() = log(1 + e

)
E(Y ) = b

() = e

/(1 + e

) =
V () = b

() = e

/(1 + e

)
2
= (1 )
Var(Y ) = a()V () = (1 )/m
where Y = Z/m and Z Binomial(m, )
11
Normal regression model
Suppose we wish to relate a response variable Y to a
set of explanatory variables x
1
, . . . , x
p
If we can assume that Y is normally distributed then
we usually consider the linear regression model:
Y N(,
2
) < <
where
=
0
+
1
x
1
+ +
p
x
p
Ingredients
1. A distribution for the response variable Y
normal
2. A linear predictor
=
0
+
1
x
1
+ +
p
x
p
3. A link function g() relating the linear predictor
to = E(Y )
the identity link g() = , so =
12
Poisson regression model
Suppose we wish to relate a response variable Y ,
which we assume follows a Poisson distribution, to
a set of explanatory variables x
1
, . . . , x
p
For example, number of claims in a year may depend
on age and sex
Here we can use a Poisson regression model:
Y Po() > 0
where
g() = log() = =
0
+
1
x
1
+ +
p
x
p
13
Note that:
The distribution is Poisson
The link function is the log function
log is the natural or canonical parameter and
hence the log link is called the natural or
canonical link function for the Poisson
regression model
This model is also called a log-linear model
< = log() < e

= > 0
14
Binomial regression model
Suppose we wish to relate a response variable Y ,
which we assume follows a binomial or Bernoulli
distribution, to a set of explanatory variables x
1
, . . . , x
p
For example, the probability that a policy holder dies
in a year may depend on age and sex
Here we can use a binomial or binary (m = 1)
regression model:
Y Binomial(m, ) 0 < < 1
where
g() = log
_

1
_
= =
0
+
1
x
1
+ +
p
x
p
15
Note that:
The distribution is binomial
The canonical link function is the logisitic
function: log
_

1
_
= logit()
This model is also called a logistic regression model
or a logit model
< = log
_

1
_
<
0 < =
e

1 + e

< 1
16
Generalised linear models
The three models above are generalised linear models
with canonical link functions
Sometimes it is more appropriate to use a non-canonical
link function, e.g., the identity link in the Poisson
regression model or the log link in the normal
regression model
For the binomial regression model there are two other
links that are sometimes used:
probit: g() =
1
() the inverse of the c.d.f. of
a standard normal r.v.
complementary log-log: g() = log{log(1 )}
17
Once we have selected the distribution and the link
function the focus is on the form of the linear
predictor:
=
0
+
1
x
1
+ +
p
x
p
For a given set of explanatory variables, x
1
, . . . , x
p
,
we need to
estimate the parameters
0
,
1
, . . . ,
p
decide which explanatory variables are required
in the linear predictor and which are not
assess whether or not the variables in the linear
predictor adequately explain the variation in the
response variable
interpret the estimated parameters to ascertain
the relationship between the explanatory variables
and the response variable
18
Explanatory variables
These can be
continuous
e.g., age (in years) or experience (in years)
discrete or ordinal (categorical)
e.g., risk group (high, medium, low) or insurance
group (1, 2, . . . )
binary
e.g., sex (male or female)
categorical (nominal)
e.g., type of car (saloon, hatchback, estate, 4 by 4)
functions or combinations of the above
e.g., age squared or an interaction between age
and sex
19
Linear predictor: =
0
+
1
x
1
+ +
p
x
p
Continuous variables and sometimes discrete or
ordinal variables are included directly in the linear
predictor:
=
0
+
1
x
1
where, e.g., x
1
is age
In this case we might write
=
0
+
1
AGE
Similarly functions of continuous variables are
included directly in the linear predictor:
=
0
+
1
x
1
+
2
x
2
1
or
=
0
+
1
log x
1
20
Binary variables can be included by dening a dummy
variable that equals zero for one of the categories and
one for the other category:
=
0
+
1
x
1
+
2
x
2
where, e.g., x
2
= 0 for males and x
2
= 1 for females
Hence
=
_
_
_

0
+
1
x
1
for males

0
+
1
x
1
+
2
for females
Alternatively we can rewrite this model as
=
i
+
1
x
1
where i = 1 for males and i = 2 for females
Hence
=
_
_
_

1
+
1
x
1
for males

2
+
1
x
1
for females
i.e.,
1
=
0
and
2
=
0
+
2
21
Categorical and sometimes ordinal variables can be
included by dening two or more dummy variables
that equal one if the individual is in a certain category
and zero otherwise:
=
0
+
1
x
1
+
3
x
3
+
4
x
4
+
5
x
5
where, e.g.,
x
3
= 1 for hatchback and x
3
= 0 otherwise
x
4
= 1 for estate and x
4
= 0 otherwise
x
5
= 1 for 4 by 4 and x
5
= 0 otherwise
Hence
=
_

0
+
1
x
1
for saloons

0
+
1
x
1
+
3
for hatchbacks

0
+
1
x
1
+
4
for estates

0
+
1
x
1
+
5
for 4 by 4s
22
Note that if there are k categories there are k 1
dummy variables and the category without a dummy
variable is called the baseline or reference category
saloons in the example above
Again we can rewrite the model as
=
i
+
1
x
1
where i = 1 for saloons, i = 2 for hatchbacks, etc
Notice that for both the sex and type of car
examples the eect of age of the individual was the
same irrespective of which category they were in:
male or female; or saloon, hatchback, estate or 4 by 4
To allow for dierent eects in dierent categories we
can include interaction terms in the linear predictor
23
Let x
3
= x
1
x
2
, where x
1
is the age variable and x
2
is the sex dummy variable, and consider the model
=
0
+
1
x
1
+
2
x
2
+
3
x
3
Here
=
_

0
+
1
x
1
for males

0
+
1
x
1
+
2
+
3
x
1
=
0
+
2
+ (
1
+
3
)x
1
for females
Alternatively, we can write
=
0
i
+
1
i
x
1
where i = 1 for males and i = 2 for females

0
1
=
0

0
2
=
0
+
2

1
1
=
1

1
2
=
1
+
3
24
If the categorical variable has k categories then k 1
variables of the form x
1
x
i
are required to specify the
interaction
In this case it is easier to write the model as
=
0
i
+
1
i
x
1
i = 1 for saloons, i = 2 for hatchbacks, etc
We can also have interactions between two
categorical variables, with k
1
and k
2
categories say
In this case (k
1
1)(k
2
1) variables of the form x
i
x
j
are required to specify the interaction
However, we can write the model as
=
12
ij
where, e.g., i = 1 for males and i = 2 for females,
and j = 1 for saloons, j = 2 for hatchbacks, etc
25
Estimation
The parameters,
0
,
1
, . . . ,
p
, are estimated using
maximum likelihood estimation
Let y
i
, x
i1
, . . . , x
ip
and
i
denote the response,
explanatory variables and canonical parameter for
the ith individual and assume that is the same
for all n individuals
Then
l(, ; y) =
n

i=1
y
i

i
b(
i
)
a()
+
n

i=1
c(y
i
, )
With the canonical link

i
=
0
+
1
x
i1
+ +
p
x
ip

j
l(, ; y) =
n

i=1

i
_
y
i

i
b(
i
)
a()
_
=
n

i=1
x
ij
y
i
b

(
i
)
a()
j = 0, . . . , p
where x
i0
= 1
26
Setting equal to zero and substituting for
i
gives
p + 1 equation with p + 1 unknowns:
n

i=1
x
ij
{y
i
b

(
i
)} = 0 j = 0, . . . , p
Note that
These equations do not depend on and hence
it can be estimated separately
These equation usual have to be solved
numerically, except in the normal case where
b

() = and we have the familiar normal


equations
Most statistical package can solve these equation
to obtain the m.l.e.,

0
, . . . ,

p
The above equations are easily generalised to the
case when can be dierent for each individual
but are known, e.g., m
i
Y
i
= Z
i
Binomial(m
i
,
i
)
27
Scaled deviance
A saturated model is a model where the tted values
equal the observed values:

i
= y
i
We can assess (and test) the goodness of t of a
particular model by comparing it with the saturated
model using the scaled deviance
The scale deviance of Model m, SD
m
, is dened as
twice the dierence between the log-likelihood of the
saturated model, l
s
, and that of Model m, l
m
:
SD
m
=
D
m
a()
= 2(l
s
l
m
)
where D
m
is the deviance of Model m
28
Example: Y
i
N(
i
,
2
)
Recall
i
=
i
, a() =
2
and
l(,
2
; y) =
n
2
log
_
2
2
_

i=1
(y
i

i
)
2
2
2
Since for the saturated model
i
= y
i
l
s
=
n
2
log
_
2
2
_
and
l
m
=
n
2
log
_
2
2
_

i=1
(y
i

i
)
2
2
2
where here
i
are the m.l.e. under Model m
Hence
SD
m
= 2(l
s
l
m
) =
n

i=1
(y
i

i
)
2

2
and
D
m
=
n

i=1
(y
i

i
)
2
29
Example: Y
i
Po(
i
)
Recall
i
= log
i
, a() = 1 and
l(; y) =
n

i=1
(y
i
log
i

i
log y
i
!)
Here
l
s
=
n

i=1
(y
i
log y
i
y
i
log y
i
!)
and
l
m
=
n

i=1
(y
i
log
i

i
log y
i
!)
Hence
SD
m
= 2(l
s
l
m
)
= 2
_
n

i=1
y
i
log(y
i
/
i
)
n

i=1
y
i
+
n

i=1

i
_
= D
m
Note that

n
i=1
y
i
=

n
i=1

i
if a constant
0
is
included in the model
30
Example: Z
i
Binomial(m
i
,
i
)
Consider Y
i
= Z
i
/m
i
l(; y) =
n

i=1
m
i
{y
i
log
i
+ (1 y
i
) log(1
i
)}
+
n

i=1
log
_
m
i
m
i
y
i
_
Hence
SD
m
= 2(l
s
l
m
)
= 2
_
n

i=1
m
i
{y
i
log y
i
+ (1 y
i
) log(1 y
i
)}

i=1
m
i
{y
i
log
i
+ (1 y
i
) log(1
i
)}
_
= 2
n

i=1
m
i
_
y
i
log
y
i

i
+ (1 y
i
) log
_
1 y
i
1
i
__
31
Writing SD
m
in terms of z
i
gives
SD
m
= 2
n

i=1
m
i
_
z
i
/m
i
log
z
i
/m
i

i
+(1 z
i
/m
i
) log
_
1 z
i
/m
i
1
i
__
= 2
n

i=1
_
z
i
log
z
i
m
i

i
+ (m
i
z
i
) log
_
m
i
z
i
m
i
(1
i
)
__
Note that we could have obtained this result directly
by considering l(; z) and noting that under the
saturated model
i
= z
i
/m
i
32
If Model m ts the data well then SD
m
will be small,
whereas if Model m ts poorly it will be large
Under the null hypothesis that Model m holds SD
m
is:

2
np1
, if Y
i
is normal and
2
is known
asymptotically
2
np1
, if Y
i
is Poisson
asymptotically
2
np1
, if Y
i
is binomial and m
i
suciently large
Hence, a chi-squared goodness-of-t test can
sometimes be used to assess the overall t of Model m
33
Two nested models, Model 1 Model 2 say, can
be compared by looking at the dierence in scaled
deviance:
SD
1
SD
2
Under the null hypothesis that the smaller model
(Model 1) ts the data as well as the larger model
(Model 2), this dierence is (asymptotically)
2
d
, where
d is the number of additional parameters in the larger
model
Hence, for Poisson and binomial regression models a
chi-squared test can be used to assess whether Model
2 ts signicantly better than Model 1
For normal regression models,
2
is unknown and
must be estimated. Then an F-test can be used
34
Residuals
The goodness-of-t of a model for individual
observations can be checked using residuals
Raw residuals:
r
R
i
= y
i

i
Pearson residuals (McCullagh & Nelder, 1989):
r
P
i
=
y
i

i
_
V (
i
)
(Standardised) Pearson residuals:
r
S
i
=
y
i

i
_

Var(Y
i
)
=
y
i

i
_
a(
i
)V (
i
)
Deviance residuals:
r
D
i
= sign(y
i

i
)
_
d
i
where SD
m
=

n
i=1
d
i
35
Examples
Normal: V () = 1 a() =
2
r
P
i
= y
i

i
r
S
i
= r
D
i
=
y
i

i

although usually has to be replaced by an


estimate
Poisson: V () = a() = 1
r
P
i
= r
S
i
=
y
i

i


i
r
D
i
= sign(y
i

i
)
_
2{y
i
log(y
i
/
i
) y
i
+
i
}
Binomial: V () = (1 ) a() = 1/m
r
P
i
=
y
i

i


i
(1
i
)
r
S
i
=

m
i
(y
i

i
)


i
(1
i
)
=

m
i
(z
i
/m
i

i
)


i
(1
i
)
=
(z
i
m
i

i
)

m
i

i
(1
i
)
r
D
i
= sign(y
i

i
)

d
i
where d
i
= 2m
i
_
y
i
log
_
y
i

i
_
+ (1 y
i
) log
_
1y
i
1
i
__
36
Note that
All these residual have mean zero but in
general only r
S
i
and r
D
i
have variance
(approximately) equal to one
|r
S
i
| > 2 or |r
D
i
| > 2 for the ith observation
suggests that the model does not t the ith
observation well
If the model ts, all the residual (r
R
i
, r
P
i
, r
S
i
and
r
D
i
) are approximately normally distributed,
except for the binomial case with small m
i
Hence, model t can also be assessed by checking
whether or not these residuals appear to follow a
normal distribution
37

Anda mungkin juga menyukai