Anda di halaman 1dari 85

Credit Scoringfrom risk assessment to

pricing, profits and portfolios


Lyn C.Thomas
Quantitative Financial Risk Management Centre
School of Management
University of Southampton
Santiago de Chile, June 11 2008

Structure

Recap of consumer credit and credit scoring

Methodologies for building default risk scorecards

Current Pressures

Issues Arising and Future Developments


Changing objectives of risk assessment bring new methodologies
Using survival analysis to build scorecards

Profitability modelling
Variable pricing
New issues in data cleaning and enhancing
Impact of new Basel Accord

Low default portfolios


Loss Given Default modelling
Need for models of credit risk of portfolios of consumer loans
Conclusions

History of consumer credit


Babylonians lent for seed to be repaid at harvest
The commerce of consumer lending around for 750 years since
Medieval pawnbrokers
1920s saw Ford/Sloan not only mass produce cars but ways of
financing them for the masses
1960s saw the arrival of credit cards and the start of the explosion in
consumer credit. Same time saw the growth in home ownership in most
Western countries
Now consumer credit is ubiquitous it is argued as a human right.

Current consumer credit levels


Chile; (main information from Cox, Parrado, Ruiz-Tagle 2006)
Household debt 60% of average annual income (US is 130%, UK,
Canada , etc>100%)
75% own home: 64% of consumer debt is mortgage but held by
only 16% households ( US 76%, Canada 69%, UK 73%)
3 million + Mastercard Credit cards
Private label credit/store cards 5 million +

Comparison of US household and


corporate debt 1974-2006
figure 1.1.1 Comparison of US household and business debt
14000
12000

$ Billions

10000
8000
6000
4000
2000
0
1970

1975

Total household

1980
mortgage

1985

1990

consumer credit

1995

2000

total business

2005

2010

corporate

Countries with largest


Mastercard/Visa circulation 2003
VISA/MC (Credit + Debit) (Cards in Circulation) 000's

Rank

Country

Cards

USA

755,300

China

177,359

Brazil

148,435

UK

125,744

Japan

121,281

Germany

109,482

S. Korea

94,632

Taiwan

60,330

Spain

56,239

10

Canada

51,100

TOP 10

TOTAL

1,699,902

GLOBAL

TOTAL

2,362,042

Top 10 Represent 72% of Global VISA/MC Cards

Recap of default based credit scoringapplication scoring


Two types of credit scoring
application scoring and behavioural scoring

Application Scoring:
Grant credit to new applicant?
Information available
applicants application form details
credit reference agency check
application details/credit histories previous applicants
No information available on credit histories of previous applicants
who were rejected. Leads to bias.

Application scoring

Shaped by original application : whether to accept a new customer

pragmatic philosophy, predict not explain, no causal modelling

assumes credit worthiness time independent over 2-3 years


redo scorecard rather than put in dynamics

Objective to rank applicants correctly; default level forecast is secondary


reflected in performance measures Gini coefficient, swap sets

specific risk: prob. of missing 3 consecutive months in next year.

50 years since first commerical application scoring introduced

Credit bureau data greatly improved decision making accuracy


different information levels available in different countries

Legal considerations
what cannot be used (race/gender/age?);
what must be used (affordability in Australia)

Can there be a world wide consumer credit risk system?

History of consumer credit


modelling- Behavioural scoring
Behavioural models arrived in 1960s the revolution that
wasnt.
uses performance data as well as application data (but dominates
latter in strength of characteristics)
what is behavioural scoring used for?
different decisions to application scoring (credit limit, cross selling).
Same risk PD in next 12 months
use classification (static) models rather than model dynamics of
consumer credit risk behaviour
application scoring: snapshot to snapshot
behavioural scoring: video clip to snapshot
profit scoring; video clip to video clip

Classification methods used in


credit granting
Take sample of previous applicants;
classify into good payers or defaulters one year later.
Classification methods find characteristics identifying two groups.
In future accept those with good characteristics; reject bad.
Existing credit scoring classification methods

discriminant analysis/ linear regression


logistic regression
Classification trees, random forests
linear programming

Developmental credit scoring approaches

neural networks
Support vector machines
expert systems
genetic algorithms
nearest neighbour methods
Bayesian learning networks

Graph of simple scorecard on age and income


x
x

x x
x

x
income

x
x

x
x

x
x

x - bads
x-goods

x
x

x
x

x
x

x
x
x
x

x
x
x

x
x

age
Not perfect
But only two parameters
Age+a(income)=b

Better classifier
But lots more
parameters

Linear regression and Logistic regression


Discriminant analysis(LDF) is equivalent to linear
regression if only two classification groups so can use
least squares
pi = Exp{Yi}= w1X1+....+wpXp
where Yi = 1 if ith applicant good; 0 if bad
Logistic regression (LR) assumes
. log(pi/(1-pi)) = w1X1+....+wpXp
LR holds for much wider class of models than LDF
In both cases need to coarse classify variables to deal with
non-monotonicity in relationship with defaulting

Default risk with age


30
25
20
15

default risk

10
5

78

72

66

60

54

48

42

36

30

24

18

All variables are categorical

15
10

default risk

60+

37-59

5
29-36

So coarse classify all variablescategorical and continuous

20

22-28

So age splits into are you 18-21;


22-28; 29-36; 37-59;60+

25

18-21

Since risk is not linear in the


continuous variables, make these
variables categorical as well

Linear Programming approach


Assume nG goods labelled i = 1, 2, nG
nB bads labelled i = nG+1, .nG+nB
Require weights wj j= 1, 2, . . . . p and a cut off value, c such that
For goods:
For bads:

w1 xi1 + w2 xi2 + + wp xip > c


w1 xi1 + w2 xi2 + + wp xip <c

We wish to minimise a function of the absolute errors:


Minimise

a1 + a2 + a3 + . . . .anG+nB

Subject to

w1 xi1 + w2 xi2 + . . . + wp xip c ai


w1 xi1 + w2 xi2 + . . . + wp xip c + ai
ai 0

1 i nG
nG+1 i nG+nB

1 I nG+nB

This minimises the absolute errors - which are the ai


Need to normalise the LP in some way to avoid trivial solution c = 0, w= 0

orlp1

Classification trees
grouping rather than scoring

Methods like classification trees, expert systems neural nets classify


applicants into groups rather than giving a scorecard.

Classification tree developed both in statistics and computer science so


is also called Recursive partitioning algorithm

Splits sample A into two subsets, using attributes of one characteristic


so two subsets have maximum difference in bad rate

Take each subset and repeat the process until one decides to stop

Each terminal node is classified as Good or Bad

Classification tree depends on


Splitting rule how to choose best daughter subsets
Stopping rule- when one decides this is a terminal node
Assigning rule- which categories for terminal nodes

Classification tree:
credit risk example
whole
sample
residential status
owner

not owner

years at bank

years > 2
number
children
0 child

age

years < 2

age < 26
age

employment
1+

prof

not prof

<21

Extend to random forests:


lots of such trees each on
subset of sample data and subset of characteristics
Majority voting to classify

age >26
res. status

> 21

with
parents

other

Neural Network
Yrs at
address

(X1)

ACTIVATION
FUNCTION

W1

Income

(X2)

W2

NET

OUT

Wp
ARTIFICIAL NEURON
Age

(Xp)

OUT = f (NET) = f (w.x) = f (wx


1 1 +...+wpxp)

Neural network; computer system consisting on number of processing units


Processors connected together in layers
For credit scoring, characteristics nput layer, prediction of Bads output layer

TWO LAYER NEURAL NETWORK

W11

X1
Age

X2
.
.
Yrs at bank
.
.
Xp

W 12
W1q

K1

K2

Good/Bad

K3

w11 x1 + w21 x2 +

....

= s1

o1 = 1 / ( 1 + e s1 )
Income

If only input and output layer then can be no better than linear regression
Train by pattern discrimination or backward propogation

Problems when using Neural networks


in Credit Scoring
Can take too long to run
Do not meet legal requirements that one can give reasons for rejecting
Local Minima
A

Error

Time

How many hidden layers? - often only three


*How many nodes in each layer?
How to interpret weights or restrict connections?
ornn11.ppt

Is there a best classification method ?


Logistic regression industry norm
often used in conjunction with other approaches
classification trees, linear regression, linear programming

Segmented population: different scorecard each segment

system reasons ( e,g. new accounts)

Statistical reasons ( way of dealing with interactions in variables)


Strategic reasons ( want to be able to deal differently with some groups)

Newer classification methods have been piloted


dont have transparency or robustness

Flat maximum effect lots of almost equally good scorecards

Relative ranking of 17 methods on 8 consumer credit data sets


(Baesens JORS 2003)
Nearest Neighbour

Other versions of
classification trees

Best version of
classification trees

Neural Nets

other versions of
SVM

Best version of SVM

Linear Program

Logistic Reg

Linear Reg

Methods applied to 8 data


sets using 3 measures (
24 tests)

Number time method


best out of 17 tried

Number times statistic


insignificant difference
with best

10

13

18

12

10

21

12

Differences are in other features


Regression approach allows statistical tests to say how
important each characteristic is to classification
gives lean/mean scorecards
helps devise new application forms

Linear programming allows firms to set requirements on scores


score (age <25) > score (age >60)
deals more easily with large numbers of application characteristics

Classification trees, neural nets, Support vector machines pick


up relationships between variables which may not be obvious

Measuring scorecards in credit scoring


Three aspects of scorecard performance
Discriminatory power ( only scorecard needed)
How good is the system at separating the two classes of goods and bads
Divergence statistic
Mahalanobis distance
Somers D-concordance statistic
Kolmogorov Smirnov statistic
ROC curve
Gini coefficient

Calibration of forecast ( scorecard plus population odds)


Not used much until Basel requirements and so few tests
Chi-square ( Hosmer-Lemeshow ) test
Binomial and normal tests

Prediction error( scorecard + population odds + cut-off)


how many erroneous classifications
Error rates
Confusion matrix, swap sets, specificity, sensitivity

ROC curves and Gini Coefficient


Gini coefficient,
G, =2x(ratio of area between curve and diagonal )=2(ABC)
G =1 then perfect discrimination; G =0 no discrimination.
K-S is greatest vertical distance from diagonal to curve.

B
C

F(s|B)
F(s|B)

F(s | G)

Current pressures
Lenders
want to maximise profit not minimise default rates
want to optimise all decisions in customer relationship
not just whether to accept customer for vanilla loan.

Consumers
market near saturation in some countries
so take rates dropping, attrition rates rising
want customized products
will they buy into risk-based pricing ?

Industry
Basel New capital Accord begun in 2007 means IRB systems of
consumer credit de rigeur
need models of credit risk of portfolio of consumer loans
Basel II uses corporate model , need models for Basel III
securitization: bundling and pricing models are primitive

Changing objectives of risk


assessment bring new methodologies
Changes in objectives are more likely than a need for
improved accuracy to force changes in methodology
Move to assessing profitability not just default risk
Need to estimate several events- default, cross selling,
churn and also when these events will occur
survival analysis approaches
Markov chain models
need for dynamic models which incorporate economic/market effects

Traditional approach to credit scoring


Take Fixed time horizon T
If default occurs within that time Bad;
if no default within that time - Good/Indeterminate

Arbitrary: if time horizon is T, default at T-1 is bad, default at T+1 is


good ( or at least indeterminate).
Lose information: indeterminates left out.
Those who fail at 3 months classified same as those who fail at T-1 months.

Competing risks ignored: those who leave/pay off early during


outcome period left out of default scorecard building and vice versa.

Survival analysis: ask when


Ask when events happen- default, early repayment,
purchase
deals with censored data easily
gives a handle on profit as profit depends on time until
certain event occur (default,switch lenders)
does not require any choice of time horizon so no
arbitrariness or loss of information
uses the data on everyone so no loss of information
allows competing risks models so can build default ,
purchase and attrition models on same data.

Censoring Mechanism
End
End of
of
sample
sample date
date

Default

Censored (Closed Account)

Censored (Truncated)
Censored (Truncated and
started after start of sample)
0

12

24
Months on Books

47

Using Survival Analysis


How long customers survive before they default?
How long customers stay before they change companies?
How long until customer makes next purchase? =
How long deteriorating systems survive before failure?

Survival analysis analysis of lifetime data when censoring


Lifetime T lime before loan defaults ( repays early,purchase made).
Standard ways of describing the randomness of T are
distribution function, F(t), where F(t) = Prob{ T t}
( S(t)=1-F(t) is the survivor function)
density function, f(t) where Prob{ t T t+t)= f(t)t
hazard function h(t) =f(t)/(1-F(t)) so h(t)t = Prob{t T t+t |T t)

Hazard Function
T - r.v. representing failure time (time to default/early pay-off)
Hazard function

P (t T < t + t | T t )
h (t ) = lim

t
t 0

If discrete time , probability default in period t given not defaulated before.

Proportional hazards (PH) and


accelerated life (AL) models
Explanatory variables allows for heterogeneity of the population.
Proportional hazard models and accelerated life models connect
explanatory variables to failure times in survival analysis
Let x = ( x1, x2, ...., xN) be explanatory variables.
Accelerated life models assume

S (t , x) = S0 (eb.xt ) or h(t , x) = eb.x h0 (eb.x t )


S0 ,h0 are baseline survivor /hazard rate function and x's speed
up or slow down 'ageing'
Proportional hazard models

h(t , x) = eb.x h0 (t )
Explanatory variables have
multiplier effect on base hazard rate.

h(t)

Cox Proportional Hazards Model


( Non-parametric approach)
Cox showed can estimate b without knowledge of h0(t) by using rank
of failure and censored times.
If times are discrete so 'lots of ties' need approximation in Maximum
Likelihood estimator.
So if T - r.v. representing failure time (time to default/early pay-off)
and x -vector of covariates
:
h(t,x) is hazard for individual with characteristics x

h ( t , x ) = e sh ( x ) h0 ( t )
s h ( x ) = b1 x 1 + b 2 x 2 + .... + b n x n
acts like a scorecard ( minus to ensure higher score better loan)

Comparison of logistic regression


and survival analysis
For a borrower with characteristics x
Logistic regression
Performance horizon of t*; if p=PG(t*,x), score s
defined by

p
1
ln
= w.x. = s p =
s
1

1
+
p
e

Proportional hazards
Can estimate P G (t,x) for any t and x.
Consider p=PG(t*,x)

e-sh

p =c

sh = - w.x = - log (-log( p))

Building a credit scorecard for estimating when


customers default using proportional hazards
Take sample of past customers with their application
and bureau characteristics ( as usual)
For each, give time of default or the time history was
censored (no further info in sample/ time left lender)
Coarse classify variables without using time horizon
Check need for time dependent variables
Build proportional hazards model
Statistical tests for validating model

Coarse-classifying using
PH approach

Split variable into n binary variables, (each covering a category or in continuous


variable case range of (1/n)th of population).

Apply PH model with these binary variables as characteristics

Chart parameter estimates

choose splits based on similarity of parameter estimates

Note: It is important to do splits separately for every type of failure. Here are
estimates for default( left), early repayment(right)

Comparing Logistic Regression and


Proportional Hazards for estimating default risk

Two definitions of bad


1. Defaulted on loan in first 12 months
2. still repaying after 12 months but defaulted in the next
twelve months.

Two separate LR models for each definition.


One PH model predicting time to early pay-off.
So LRs should be best as they are designed for each specific
definition of bad
Compare models performance using ROC curves

ROC curves for PH and LR predicting default

PH vs LR 1

PH vs LR 2

(1st 12 mths)

(2nd 12 mths)

Application in Basel II
Basel II is new regulations concerning how much banks
need to set aside to cover credit losses
Use credit scoring to identify PD,probability of default in
next 12 months which feeds into Basel formulae of how
much to set aside
Low default Portfolios ( like mortgages) do not have enough
bads over 12 months to build good models
Use longer time intervals go bad at any time
How to recover 12 month PD
Answer survival analysis

Profitability Modelling
Emphasis moving away from minimising default to maximising profit
Acceptance decisions ( no longer yes/n0)
Several variants of the product
Customized product
price appropriate for profit
Customize non price features on line

Operational decisions
Credit limit adjustments
Cross sell or up sell
Counter attrition measures
optimise collections process for defaulters
Behavioural score on its own not enough

Current Profit Approach


Risk/Reward Matrix
Overdraft limit

Balance < $1000 Balance $1000$5000

Balance >$5000

Behav score
>500

$20,000

$25,000

$ 30,000

Behav score
300-500

$4000

$8000

$20,000

Behav score
<300

No overdraft

$ 1000

$5000

Use behavioural score (risk) and average balance ( return)


No recognition of dynamics of customer behaviour
Subjective decision in each cell. No optimization model within each cell
Overcome this by using dynamic modelssurvival analysis and markov chains

PH model to calculate
profit on fixed term loan
Build PH model to estimate time to default and hence
S( i) no default probability before month i
Similarly build PH model for time until early repayment and hence
E(i) no early repayment probability before month I
L - loan amount;

T- term of the loan

a- repayment per period

r - interbank lending rate ;


he(i) hazard function that early repayment in period i

a
Profit(no consideration of default/early repayment)=
L
i
i =1 (1 + r )
T

a
he (i) R(r, L)
+
True Profit = S (i) E(i 1)
L
i
i
(1 + r )
i =1
(1 + r )
T

Can generalise and allow r to be time dependent (yield curve) or stochastic

Plots of profits at for loans


of different durations

Default score: Increasing default probability

Markov Chain Models


Already used for roll rate analysis
Overdue by
0 months
1 month
2 months

0months 1month 2month 3 + month

0.95
0.05
0
0

.4

.2
.4
0

.3
.1
.1
.5

Extend to more general states


BS1

BS2

BS3

BS4

Closed

Overlimi
t

Default

Beh Score Band 1

.85

.03

.02

.1

Beh Score Band 2

.4

.4

.05

.05

.05

.05

Beh Score Band 3

.1

.3

.3

.2

.05

.1

.05

Beh Score Band 4

.2

.3

.3

.05

.05

.1

Closed

Overlimit

Default

Credit Limits set by


Markov chain profitability model
(Capital One; Interfaces 2003)
State: s, Credit limit L
Estimate monthly profit r(s,L),
transition probability p(s|s,L)
Markov decision process Vn(s,L) optimal profit over n periods starting
in state s with limit L

V n ( s , L ) = max r ( s , L ) + p ( s | s , L )V n 1 ( s , L )
L L
s

Can improve model by

Second Order Markov chain ( s(t) = ( BS(t), BS(t-1))


Include economic variables in transition matrix
Include age of loan in transition matrix
Segment population
Mover/stayer
Revolver/transactor
-

Pricing
Surprising for 40 years
consumer lending has had only
one price
Decision was is risk
acceptable/non acceptable
Now beginning to price for risk

Company

APR rate advertised

Yourpersonalloan.co.uk

6.3%

Bradford and Bingley

6.7%

GE Money

6.9%

Sainsbury

7.1%

Northern Rock

7.4%

Royal Bank of Scotland

7.8%

Nat West Bank

8.0%

Halifax

8.7%

Nationwide Building Society

8.9%

Intelligent Finance

9.7%

Tesco

9.9%

Lloyds TSB

11.4%

Autocredit

16.7%

Citi Finance

23.1%

Provident

177.0%

Key points in developing


pricing models
Lost quote data is valuable
Find out who did not take offer ( and if possible why)

Regulations will set constraints on minimum and maximum prices


Could say take everyone but the price for some is so high no one will
accept but there are always idiots ( who the regulations will protect).

Market changes much faster than economic changes


response scorecards need to be rebuilt faster than risk ones

Utilization of product is important for profitability


Pre payment and re financing need modelling
Prices are always will be negotiable once they are variable.
Adverse selection
Offer at interest rate 6% does not get normal population mix , but
more of those who could not get better offer than 6%

Take Probability q(p,r)


Profitability depends vitally on take probability
Take probability is function of
risk probability p, ( prob of being good) of borrower
Rate offered r

Take probability can also depend on other features


Need to estimate this probability

Cannot estimate without considering adverse


selection ( i.e. does depend on p and more so than
you may estimate)

Pr { Take} as function of Pr{Good}

Common risk free


take functions q(r)
q(r) fraction who will take loan at rate r
dq/dr 0

w(r)- density function of maximum willingness to pay

w(r)dr Fractionof populationwillingto payr or more q(r )


1

r1

Linear response function

q (r ) = max{0,1 b(r rL )} for r rL > 0


Logistic response afunction
br
q(r ) =

q (r )
e

ln

= a br sresponse
a br
1+ e
1 q(r )

Optimal price for risk free


response function

Maxr E[ PA ( r )] = ( q ( r ) ( ( r rF ) p (lD + rF ((1 p ) ) )


q ( r ) ( ( r rF ) p ( l D + rF ((1 p ) ) + q ( r ) p = 0
q (r )
( l + rF )(1 p )
+ D
q ( r )
p
q (r )
r = rF
+ ( l D + rF ) e s
q ( r )
r = rF

Example with logistic response


a=4, b=32, rF =0.05, lD =0.5
Probability of being Good, p

Optimal interest rate r as %

Take probability q(r ) as %

0.5

63.1

0.000009

0.6

44.8

0.003

0.7

31.7

0.2

0.8

22.0

4.5

0.9

15.5

28.0

0.94

13.7

40.2

0.96

13.0

45.7

0.98

12.4

50.5

0.99

12.2

52.7

1.00

11.9

54.7

Risky response rates q(r,p)


Same principle but now have to worry about
Adverse selection
Affordability
p%(r, p) is probability of borrower being Good if interest
rate charged is r, if p is probability of being Good at
benchmark interest rate
E[ PA ( r , p )] = q ( r , p )(( r ( p ) rF ) p%( r , p ) (lD + rF )(1 p%( r , p )))

( (r( p) rF ) p%(r, p) (lD + rF )(1 p%(r, p))

q(r, p)
p%(r, p)

+ q(r, p) p%(r, p) + (r( p) + lD )


=0
r
r

(lD + rF ) q(rr, p) p%(r, p)q(r, p)


r( p) = lD +
%
r ( p(r, p)q(r, p))

Example with logistic response


a=4, b=32, rF =0.05, lD =0.5 and c=50
Probability of being Optimal interest
Good, p
rate r as %

Take probability
q(r ) as %

Take probability from


equivalent risk free logit
response rate function

0.5

84.6

87.3

0.000009

0.6

68.6

88.4

0.003

0.7

53.3

87.4

0.2

0.8

38.5

84.2

4.5

0.9

24.4

76.5

28.0

0.94

19.1

70.6

40.2

0.96

16.6

66.5

45.7

0.98

14.2

61.3

50.5

0.99

13.0

58.2

52.7

1.00

11.9

54.7

54.7

Risky Response rate

q (r )
e a br cp
q(r ) =

ln

= a br cp sresponse
1 + e a br cp
1

q
(
r
)

Profit scoring and pricing


Profit scoring involves much more of organisation
than default based scoring
Risk based pricing needs much more careful
modelling and parameter estimation
Adverse selection
Cannibalisation

Other features might affect response rate not just


price ( interest rate charged)
Dynamic Price modelling will come
.proved successful in airlines, hotels, car rentals

Has arrived in consumer credit


HBOS claim benefits of 7 million per year already

Storing up trouble: Data cleaning


and parameter estimation
Reject inference :
sample biased because of those rejected in the past
Well established problem with controversial but standard techniques
used by industry
resurgence of interest, new ideas suggested and old approaches
revisited. Some ideas coming from economics literature
Surely 1 in n(s) must be satisfactory compromise

Drop/withdrawal(churn) inference
this group can be 2 to 5 times larger than reject group
should they be in the sample /Could make product attractive to them

Policy inference
Customer scores used in more operating decisions, will affect
subsequent performance of customer, including default risk
Can one ( How to ) construct what would performance/risk have been
under vanilla operating policy

New Basel Capital Accord


(started parallel implementation Jan 1 2007
started for real 1 January 2008)

Basel committee of banking regulators ( Fed etc) required banks to set aside
8% of loans ( capital requirement) to cover risks on losses.

New system based on using banks internal risk rating systems.

Risks split into market, credit and operational. Capital set aside to cover each

For credit risk, minimum capital requirement can be set using internal ratings
based (IRB) model as well as standard ( fixed %) model

In IRB models, segment portfolio of loans and for each segment give
PD ( long run average probability of default in next 12 months)
LGD (downturn loss given default)
EAD ( expected exposure at default)

Used in Basel formulae to calculate capital needed to cover UL (unexpected loss


due to credit risk).
EL ( expected loss due to credit risk) should be covered by provisions

For customer lending IRB is credit scoring

Basel forces scores to forecast accurately not just rank accurately.

Credit risk weighted assets


for corporate and retail exposures

Capital needed is

1/ 2
1/ 2

R
1
1 + (M - 2.5)b

1
Capital K = LGD.N
N
(
PD
)
+
N
(
0
.
999
)
PD

1
R
1

R
1
1
.
5
b

where N is Cumulative Normal Distribution, N-1 is inverse distribution


and R is correlation
Only covers unexpected risk; so if R=0, K=0 ; if R=1, K=LGD(1-PD)

Retail exposures
M=1 ( maturity term disappears)
For Mortgages R=0.15
For Revolving R=0.04
For other retail
1 e - 35 PD
R = 0.03
35
1 e

35 PD

+ 0 .16 1 1 e

1 e 35

Corporate exposures
b=(.11852-.05478ln(PD))2
1 e -50PD
1 e 50PD

R = 0.12
+ 0.241
50
1 e 50

1 e

Updated Basel capital requirements for K to cover UL ( LGD=0.5)


0.25

0.2

0.15

0.1

0.05

0
0

0.2

0.4

0.6

0.8

PD
residential

revolving

other retail

corporate

1.2

Impact of Basel Accord


on credit scoring development
Need to estimate calibration of scorecard not just
discrimination
Small numbers of defaults(180 days overdue) means
take all defaults not just ones in 12 month period
Estimate risk with data of different time periods
Coxs proportional hazards models

Loss given default ( or Recovery Rate) completely new


problem where outcome is mix of
decisions by lenders ( collect in house/use agent/sell off debt)
uncertainty of borrower willing/able to pay back

Stress testing and need for long run average PD means


have to incorporate economic variables into default
models or at least the dynamics of the default models
Mimic corporate credit risk models??

Problems with validating


Low Default Portfolios (LDP)
Problems:
Very few defaults to use in back testing so
one extra default makes a huge difference
Procyclicality will be more obvious
Subprime market is always in recession

Solutions:
Use as much data as you can
Make prudent assumptions

Low default portfolios:


Pluto and Tasche (2005)
No defaults, assumption of independence
Use largest set possible
take PD value whose lowest confidence limit is 0
Rating grades A, B, C with nA, nB, nC obligors
Assume borrower ranking to be correct
PDA PDB PDC
Most prudent estimate of PDA obtained under assumption

PDA=PDC, or PDA=PDB=PDC

Determine confidence region for PDA at confidence level (e.g. =90%)


Confidence region is values of PDA such that probability of not observing
any default is higher than 1-

Confidence limits;
usual and P and T

Lowerr -confidence limit of PD


If estimate true
what could happen in of cases

Best estimate
of PD

Upper -confidence limit of PD


If estimate true
what could happen in of cases

Lower -confidence limit of PD

Highest value of PD
That gets lower
limit to agree
with actual data

PD from actual data

Low Default Portfolios:


Using survival analysis directly
Survival analysis
Problem; How to calculate PD
as default in first 12 months if
using data including defaults
and bads at any time ?
Answer 1:Use survival analysisproportional hazard models -to
estimate when loan will
default/bad rather than
probability it goes bad in 12
months? Take data on whole
portfolio for as long as you have
it

Use Coxs proportional hazard


models to estimate hazard unction
h(s,x) for loan with characteristics x
12

S (12 ) = e

h( s, x)ds
0

So obtain PD for 12 month time


horizon

Modelling Loss Given Default


Very little work done on modelling this until mid 90s
Regression models used for LGD corporate loan models
Modelling approaches
Regression on type of loan/company, economic conditions
( needs lots of data points)
Segment and use historic averages ( need lots of defaults)
Build model of collections process

For consumer loans, modelling collection process only


option
LGD models in consumer lending has mix of random
events ( defaulter will not pay, cannot pay) and decisions
by lender (what collection strategy to use)

Collections strategy
Strategic level
Collect in house 0 LGD 1 ( though can exceed both
bounds)
Use agency ( who keep 40% collected) 0.4 LGD 1
Sell off debt ( say at 5p in 1) LGD = 0.95

Operational level
What sequence of contacts to make

Telephone contact possible?


Arrange repayment schedule
Letters nice
Letters nasty
Legal proceedings

LGD model for credit cards


Decision tree approach
Default
No trace
Agent
Satisfactory

Trace

Sell off

Not
satisfactory
Sell off

In house

Satisfactory

Not
Satisfactory

Agent
Satisfactory

Satisfactory

Sell off

Not
satisfactory

Second
agent
Satisfactory

Agent

Sell off

Not
satisfactory
Sell off

Sell off

Not
satisfactory
Sell off

Distribution of LGD for in house


collections

D
e
n
s
i
t
y

-0.100

0.075

0.250

0.425

0.600

0.775

0.950

1.125

LG
D

Actual LGD can stray outside 0 to 1


LGD has spikes at LGD =1 and LGD =0
For agent/sold debt, spike at LGD=1 predominates
Distribution 0<LGD<1 not normal
Very poor R2

Modelling LGD: Mixture models


Need to model as a mixed distribution; Here seemed to be
three classes:
1 Class
LGD<=0
Spike

2 class
0<LGD<0.4
Uniform
distribution

3 class
LGD>=0.4
Regression

Agree and abide by repayment schedule LGD=0


Pay back reduced amount LGD <.4
No agreed repayment schedule agreed to and abided by

Cumulative logistic regression which class borrower is in


Generalized linear model to estimate LGD within a class

Credit Risk models of corporate loan portfolios

Corporate credit risk models have been developed for last decade and some include economic
parameters which can be used for stress testing

Corporate credit risk models split into four classes

Structural models
Assume companies default when debts exceed assets ( Merton model)
Try to model the dynamics of their assets
Basel formula based on very simple version of this

Reduced form models


Cuts to the chase when will firms default as function of economic conditions

Hazard( survival analysis) or intensity models build a model of hazard rate


hi (t) chance firm I will default at t given not done so before
Markov chain rating based models. Models how firms credit ratings change dynamically with
one rating being defaulted

Actuarial models

Models at segment level not individual level. Estimated default rate and LGD rate using actuarial
distributions and historic parameter estimates.
Very few assumptions so can be used in retail area but where are economy variables in it.

Scorecard based

z scores, less successful as consumer credit scoring and no economic effects in them
Can we use these models to build stress tests for consumer loan portfolios ?

No. Assumption and data available are so different but appoach might work.

Introduce economic variables


into consumer credit risk models
Introducing economic variables into credit risk models
allows
Estimating Long run average PD for Basel
Stress testing required by Basel
Ways of building correlation between defaults of different loans
Pricing portfolios for securitization

Comparison

of retail and corporate


risk environments and models

corporate loans

consumer loans

Objective is to price bonds

Objective is to rank borrowers

well established market

no established market-only occasional


securitization sales

market price continuously available

no price available as no public sales

bonds only infrequently withdrawn

consumers often leave lender (attrition)

default caused by cash flow (consumer


has no idea of assets nor can realise
them)

no share price surrogate for correlation


of defaults

contingent claim model says default is


when loans exceeds assets

Correlation of defaults related to


correlation of assets related to
correlation of share prices

Economic conditions built into models

Economic conditions not in models

Corporate credit risk modelling


Corporate credit risk models include
Structural models
Assume default when debts exceed assets ( Merton model)
Model dynamics of their assets ( Basel formula simple version)
Basel Model: Rit < ct R is assets of firm; c is loan;
Rit = wFt + (1w2)Uit F is systemic factor ( world economy); Uis idiosyncratic
ct = +zt1 z is economic factors

pi,t ( f , zt1) = Pr{Dit =1| f , zt1) = N ( + zt1 wft )/(1w2)

Reduced form models

pi,t (zt1) = N( + zt1)

Default mode: Hazard( survival analysis) or intensity models


build a model of hazard rate
Mark to Market: Markov chain rating based models.
Actuarial models
Models at segment level not individual level. Estimate PD and LGD
using actuarial distributions/ historic parameter estimates.
Factors ( risk) used to give dynamics and correlations

Structural Models for


Consumer Credit
Reputation
Use behavioural score as measure of credit worthiness
Value to credit worthiness
Default if debt > Value of credit worthiness
Translates into behavioural score above debt cut-off
Model dynamics of behavioural score

Affordability
Repay if cash flow means can afford repayment
Model dynamics of cash flow

Consumer credit Default Mode


reduced form models
Extend Cox Proportional Hazard Models to get these
If t is time since loan started, the hazard of default at
time t for a person i with economic conditions EcoVar(t)
and behavioural score BehScr(t) is

hi (t ) = h0 (t )e
Months
Months
on
on Books
Books
Factor
Factor

(BehScri ( t ) + EcoVari (t )+ Vintagei )

Idiosyncr
Idiosyncr
atic
atic Risk
Risk

Systemic
Systemic
Risk
Risk

Vintage
Vintage
Factor
Factor

Cox Regression estimates , and and then use Kaplan-Meier form


of distribution function to recover baseline hazard function

Consumer credit reduced form


mark to market model:Markov chain approach
Think of rating grades as states of Markov Chain
So state is score band or default status ( 0,1,2,3+ overdue)
At least one state corresponds to default

Markov assumption is states of system describes all


information concerning credit risk of customer
Estimate transition probabilities of moving from state i
to state j in next time period
Use logistic regression to get transition probabilities to
be functions of economic variables
In stress test choose the economic variables for a stressed
scenario ( scenario could last over several periods)

Are securitization problems


due to credit scoring?
Securitized products were priced top down
What was market paying last week

Assumption all products were essentially the same ( or could be made


so)
Little investigation of borrowers credit scores and individual product
features
No model of correlation between default risks
Previous portfolio credit risk models would allow a bottom up approach

US Sub prime mortgage crisis


Other half of the disaster
Main reason was conspiracy of optimism
Lenders : scores were low but no one had defaulted for ages
Borrowers: house prices will go up, so can refinance before payments get
high

Some lessons for scoring


Products had hike in repayments
Allow for affordability in default probability (recall pricing)
Survival analysis (allow for rate terms)
If scorecard known, borrowers will work the system
Scorecard doctors guaranteed increase FICO score by 150

Conclusions

Profit scoring, pricing and customizing products , credit risk of


portfolios of credit loans are just a few of the new problems in the area.

Still exciting area where many different statistical, probability and OR


techniques- Markov chains, survival analysis, Support vector machines,
Brownian processes prove very useful

After 50 years, research in credit scoring is as vital as ever, and will


continue.

All progress is based upon a universal innate desire of every organism


to live beyond its income.
(Samuel Butler)

Anda mungkin juga menyukai