A I C Supplement Sept 2016

AIC Supplement, Fitting Loss Distributions
HCM 9/27/16,
Page 1
Copyright 2017 by Howard C. Mahler.

A short study note that replaces section 16.5.3 Score Based Approaches from the Loss Models
textbook. This study note updates section 16.5.3 by including material on the Akaike Information
Criterion (AIC), along with updating examples to illustrate how to use the AIC. This note is effective
with the October 2016 exam administration. Download the note from the SOA webpage.1
AIC and BIC are each methods of comparing models fit via maximum likelihood.
In each case, a larger value is better.
AIC (Akaike Information Criterion):
The Akaike Information Criterion (AIC) is used to compare a bunch of models all fit via maximum
likelihood to the same data. The model with the largest AIC is preferred. For a particular model:
AIC = maximum loglikelihood - number of parameters = ln[L] - r.2
r is the number of parameters fit via maximum Iikelihood.
Assume for example, assume we have three Models fit to the same data:
Model #
1
2
3
Number of Parameters
4
5
6
Loglikelihood
-302.7
-301.2
-300.4
AIC
-302.7 - 4 = -306.7
-301.2 - 5 = -306.2
-300.4 - 6 = -306.4
We prefer Model #2, since it has the largest AIC.
https://www.soa.org/education/exam-req/edu-exam-c-detail.aspx
Then click on the Oct. 2016 syllabus. The syllabus has a link to the new study note.
2
This is the definition in Loss Models.
Most other textbooks define AIC = (-2) (maximum loglikelihood) + (number of parameters)(2).
In that case, instead the smaller AIC would be preferred. Which model is preferred would be the same, regardless of
which definition of AIC one used.
HCM 9/27/16,
Page 2
BIC (Bayesian Information Criterion):

In Loss Models, BIC is just another name for the Schwarz Bayesian Criterion (SBC).3
The Bayesian Information Criterion can also be used to compare a bunch of models all fit via
maximum likelihood to the same data. The model with the largest BIC is preferred.
For a particular model:
BIC = maximum loglikelihood - (number of parameters / 2) ln(number of data points) =
ln[L] - (r/2) ln[n].4
n is the number of data points, and r is the number of parameters fit via maximum Iikelihood.
Assume for example, assume we have three Models fit to the same 100 data points:
Model #
1
2
3
4
5
6
Loglikelihood
-533.6
-530.1
-527.3
BIC
-533.6 - (4/2)ln[100] = -542.8
-530.1 - (5/2)ln[100] = -541.6
-527.3 - (6/2)ln[100] = -541.1
We prefer Model #3, since it has the largest BIC.

The penalty for AIC is r, while the penalty for BIC is (r/2) ln[n]. For large data sets, the penalty for
BIC is larger.5 Thus, BIC prefers simpler models than does AIC.
The Schwarz Bayesian Criterion is covered in Mahlers Guide to Fitting Loss Distributions.
This is the definition in Loss Models.
Most other textbooks define BIC = (-2) (maximum loglikelihood) + (number of parameters) ln(number of data points).
In that case, instead the smaller BIC would be preferred. Which model is preferred would be the same, regardless of
which definition of BIC one used.
5
For n 8, the penalty for BIC is larger than for AIC.
4
HCM 9/27/16,
Problems:
Supp.1 (3 points) Five Models have been fit to the same set of 200 observations.
Model
Number of Fitted Parameters
LogLikelihood
A
3
-359.17
B
4
-357.84
C
5
-356.42
D
6
-354.63
E
7
-353.85
Which model has the best AIC (Akaike Information Criterion)?
Supp.2 (3 points)
Five different Models, have been fit to the same set of 400 observations.
Model
LogLikelihood
A
1
-730.18
B
2
-726.24
C
3
-723.56
D
4
-721.02
E
5
-717.50
Which model has the best BIC (Bayesian Information Criterion)?
Supp.3 (3 points)
Use the following information on two Models fit to the same 100 data points:
Loglikelihood
1
-321.06
2
-319.83
(a) Based on AIC (Akaike Information Criterion), which model is preferred?
(b) Based on BIC (Bayesian Information Criterion), which model is preferred?
Model
LogLikelihood
A
1
-213.6
B
2
-212.3
C
3
-211.5
D
4
-210.7
E
5
-209.4
Which model has the best AIC (Akaike Information Criterion)?
Page 3
HCM 9/27/16,
Page 4
Model
LogLikelihood
A
2
-220.18
B
3
-217.40
C
4
-214.92
D
5
-213.25
E
6
-211.03
Which model has the best BIC (Bayesian Information Criterion)?
Supp.6 (3 points) You are given:
(i) Sample size = 100
(ii) The negative loglikelihoods associated with five models are:
Model
Number Of Parameters
Negative Loglikelihood
Generalized Pareto
3
219.1
Burr
3
219.2
Pareto
2
221.2
Lognormal
2
221.4
Inverse Exponential
1
224.2
Which of the following is the best model, using the Akaike Information Criterion?
(A) Generalized Pareto (B) Burr (C) Pareto (D) Lognormal
(E) Inverse Exponential
Comment: Data taken from 4, 11/00, Q.10.
HCM 9/27/16,
Page 5
Solutions to Problems:
Supp.1. D. AIC = maximum loglikelihood - number of parameters.
For example, AIC = -359.17 - 3 = -362.17.
Model
Loglikelihood
AIC
A
B
C
D
E
3
4
5
6
7
-359.17
-357.84
-356.42
-354.63
-353.85
-362.17
-361.84
-361.42
-360.63
-360.85
Since AIC is largest for model D, model D is preferred.

Supp.2. B. BIC = maximum loglikelihood - (number of parameters / 2) ln[400].
For example, BIC = -730.18 - (1/2) ln[400] = -733.18.
Model
Loglikelihood
BIC
A
B
C
D
E
1
2
3
4
5
-730.18
-726.24
-723.56
-721.02
-717.50
-733.18
-732.23
-732.55
-733.00
-732.48
Since BIC is largest for model B, model B is preferred.

Supp.3. (a) AIC = maximum loglikelihood - (number of parameters.
For the first model, AIC = -321.06 - 1 = -322.06.
For the second model, AIC = -319.83 - 2 = -321.83.
Since AIC is larger for the second model, the second model is preferred.
(b) BIC = maximum loglikelihood - (number of parameters / 2) ln(number of data points).
For the first model, BIC = -321.06 - (1/2) ln[100] = -323.36.
For the second model, BIC = -319.83 - (2/2) ln[100] = -324.44.
Since BIC is larger for the first model, the first model is preferred.
Comment: An example where using AIC and BIC lead to different conclusions.
Supp.4. B. AIC = maximum loglikelihood - number of parameters.
For example, AIC = -212.3 - 2 = -214.3.
Model
Loglikelihood
AIC
A
B
C
D
E
1
2
3
4
5
-213.6
-212.3
-211.5
-210.7
-209.4
-214.6
-214.3
-214.5
-214.7
-214.4
Since AIC is largest for model B, model B is preferred.
HCM 9/27/16,
Supp.5. C. BIC = maximum loglikelihood + (number of parameters / 2) ln[60].

For example, BIC = -220.18 - (2/2) ln[60] = -224.27.
Model
Loglikelihood
BIC
A
B
C
D
E
2
3
4
5
6
-220.18
-217.40
-214.92
-213.25
-211.03
-224.27
-223.54
-223.11
-223.49
-223.31
Since BIC is largest for model C, model C is preferred.

Supp.6. A. AIC = maximum loglikelihood - number of parameters.
For example, AIC = -219.1 - 3 = -222.1.
Model
Loglikelihood
AIC
A
B
C
D
E
3
3
2
2
1
-219.1
-219.2
-221.2
-221.4
-224.2
-222.1
-222.2
-223.2
-223.4
-225.2
Since AIC is largest for model A, model A is preferred.
Page 6

A I C Supplement Sept 2016

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

A I C Supplement Sept 2016

Diunggah oleh

Hak Cipta:

Format Tersedia

AIC Supplement, Fitting Loss Distributions

Copyright 2017 by Howard C. Mahler.

We prefer Model #2, since it has the largest AIC.

AIC Supplement, Fitting Loss Distributions

BIC (Bayesian Information Criterion):

We prefer Model #3, since it has the largest BIC.

AIC Supplement, Fitting Loss Distributions

AIC Supplement, Fitting Loss Distributions

AIC Supplement, Fitting Loss Distributions

Since AIC is largest for model D, model D is preferred.

Since BIC is largest for model B, model B is preferred.

Since AIC is largest for model B, model B is preferred.

AIC Supplement, Fitting Loss Distributions

Supp.5. C. BIC = maximum loglikelihood + (number of parameters / 2) ln[60].

Since BIC is largest for model C, model C is preferred.

Since AIC is largest for model A, model A is preferred.

Anda mungkin juga menyukai