Expected Shortfall Backtest

Testing Expected Shortfall
C. Acerbi and B. Szekely

MSCI Inc.
Workshop on systemic risk and regulatory market risk measures
Pullach, Germany, June 2014
Carlo Acerbi and Balazs Szekely
June 2014
1 / 59
Outline
Motivation and goals
Testing setting
Basel VaR backtest
Three tests for ES. Plus one
Results
Conclusions
Post Scriptum
June 2014
2 / 59
Testing setting
Basel VaR backtest
Results
Conclusions
Post Scriptum
June 2014
3 / 59
Motivation
in the VaR/ES debate, backtesting has always been the main problem
with ES. See for instance Yamai and Yoshiba (01)
last obstacle for the adoption of ES in Basel N, finally occurred in 2013
but model testing still based on VaR
rich literature on VaR backtesting: Basel I (96), Kupiec (95),

Christoffersen (98), Berkowitz (00), Engle and Manganelli (04), among
others
few works on ES backtesting: noticeably Kerkhof and Melenberg (04)
Angelidis and Degiannakis (06)
Why is it difficult to test ES?

Fundamental reasons? Practical aspects? Power of the test? Model risk?
June 2014
4 / 59
Motivation
in the VaR/ES debate, backtesting has always been the main problem
with ES. See for instance Yamai and Yoshiba (01)
last obstacle for the adoption of ES in Basel N, finally occurred in 2013
but model testing still based on VaR
rich literature on VaR backtesting: Basel I (96), Kupiec (95),

Christoffersen (98), Berkowitz (00), Engle and Manganelli (04), among
others
few works on ES backtesting: noticeably Kerkhof and Melenberg (04)
Angelidis and Degiannakis (06)
Why is it difficult to test ES?

Fundamental reasons? Practical aspects? Power of the test? Model risk?
June 2014
4 / 59
Confusion
The nice thing about VaR is its more or less transparently

back-testable. You know what youre getting. With ES its all clouded
up with assumptions about distribution and arbitrary choices. When
have you breached it? What exactly are you testing? When you go
into the tail you are never quite sure...
RISK Magazine, last week
June 2014
5 / 59
The drama of nonelicitability of ES

Gneiting (11): VaR is elicitable, ES is not
This negative result may challenge the use of the ES functional as a
predictive measure of risk, and may provide a partial explanation for the
lack of literature on the evaluation of ES forecasts, as opposed to
quantile or VaR forecasts
elicitability is a subtle concept:
x = arg minx E[S(x, Y )]
What most people understood

ES is not backtestable, at all
a magnum champagne bottle gift for the VaR nostalgic
panic followed
ES cannot be back-tested because it fails to satisfy elicitability ... If you
held a gun to my head and said: We have to decide by the end of the
day if Basel 3.5 should move to ES, or do we stick with VaR, I would
say: Stick with VaR
Paul Embrechts, Imperial College, 2013
June 2014
6 / 59


panic followed
say: Stick with VaR
June 2014
6 / 59


panic followed
say: Stick with VaR
June 2014
6 / 59


panic followed
say: Stick with VaR
June 2014
6 / 59


panic followed
say: Stick with VaR
June 2014
6 / 59


panic followed
say: Stick with VaR
certainly not a VaR fanatic! Paul Embrechts, Imperial College, 2013
June 2014
6 / 59
Examples of elicitable statistics

the mean is elicitable
x = arg min EX [S(m, X )]
m
S(m, x) = (X m)2
a quantile is elicitable
q = arg min EX [S(q, X )]
q
S(q, x) = (x q)( (x q < 0))
when = 1/2 we retrieve the median

S(, x) = |x |
M = arg min EX [S(, X )]
there is no scoring function S that elicits ES

ES = arg min EX [S(c, X )]
c
S(c, x) does not exist
June 2014
7 / 59
Something is not quite right
if elicitable means backtestable isnt it a bit strange that

banks have always backtested VaR but never by exploiting its elicitability?
even standard deviation is not elicitable?
Kerkhof and Melenberg, back in (04), had found that
...contrary to common belief, ES is not harder to backtest than VaR if
we adjust the level of ES. Furthermore, the power of the test for ES is
considerably higher than that of VaR.
as a matter of fact, others reacted quite differently

ES is not elicitable. So, what?
Dirk Tasche
June 2014
8 / 59


Dirk Tasche
June 2014
8 / 59


Dirk Tasche
June 2014
8 / 59


Dirk Tasche
June 2014
8 / 59


Dirk Tasche
June 2014
8 / 59


Dirk Tasche
June 2014
8 / 59
Testing setting
Basel VaR backtest
Results
Conclusions
Post Scriptum
June 2014
9 / 59
Setting
we look at ES backtesting from a regulatory point of view
profitloss: independent (but not i.i.d.) Xt Ft , the real distributions,
t = 1, . . . , T (= 250)
Pt predicted (model) distributions
VaR and ES (with Basel confidence levels)
VaR = P 1 ()
ES =
= 1%
P 1 (q) dq
= 2.5%
we assume Pt continuous and strictly monotonic (just for simplicity,

inessential here). Then
ES = E[X |X + VaR < 0]
the assumption can be easily removed at the cost of heavier notation
June 2014
10 / 59
ES estimators
standard estimator of ES for N i.i.d. draws Xi P
[N]
X
,N
1
c
Xi:N + (N [N]) X[N+1:N]

ES
(X ) =
N
i
coherent N, , consistent, asymptotically normal, known variance

generalizes the idea of average of the N worst cases to N
/N
but biased. It always underestimates risk for finite N. No unbiased
estimator known for unknown P
conditional estimator; assuming VaR is known exactly
f
ES
,N
PN
i=1
(X ) = P
N
Xi 1Xi +VaR <0
i=1
1Xi +VaR <0
is unbiased
June 2014
11 / 59
ES estimators
standard estimator of ES for N i.i.d. draws Xi P
[N]
X
,N
1
c
Xi:N + (N [N]) X[N+1:N]

ES
(X ) =
N
i
coherent N, , consistent, asymptotically normal, known variance

generalizes the idea of average of the N worst cases to N
/N
but biased. It always underestimates risk for finite N. No unbiased
estimator known for unknown P
conditional estimator; assuming VaR is known exactly
f
ES
,N
PN
i=1
(X ) = P
N
Xi 1Xi +VaR <0
i=1
1Xi +VaR <0
is unbiased
June 2014
11 / 59
Hypothesis testing
Goal
testing VaRt and ESt predictions against observed profitloss realizations xt
H0 : Pt = Ft
H1 : Ft is riskier than Pt
EStF > EStP
we test only in the direction of risk underestimation
more specific H1s in the following, for computing test power
Modelfree test
We avoid any assumption on the nature of the predicted distributions Pt (no
location-scale family, no parametric models, ...)
We do not assume asymptotic convergence of any statistics either
June 2014
12 / 59
Hypothesis testing
Goal
testing VaRt and ESt predictions against observed profitloss realizations xt
H0 : Pt = Ft
H1 : Ft is riskier than Pt
EStF > EStP
we test only in the direction of risk underestimation
more specific H1s in the following, for computing test power
Modelfree test
We avoid any assumption on the nature of the predicted distributions Pt (no
location-scale family, no parametric models, ...)
We do not assume asymptotic convergence of any statistics either
June 2014
12 / 59
Testing setting
Basel VaR backtest
Results
Conclusions
Post Scriptum
June 2014
13 / 59
Basel test for VaR exceptions (96)
H0: bt = 1xt +VaRt <0 i.i.d.Bernoulli(), t

PT
bt B(T , ): yearly number of exceptions
test statistic: B =
one expects E[B] = T
0
H1: VaRPt = VaRFt with 0 > B B(T , 0 )

one says that coverage is not 1 = 99% but only 1 0 (say 98%)
trafficlight system: yellow zone from 95% significance level and red zone
from 99.99%
June 2014
14 / 59
Basel VaR test: power vs coverage
Figure: Fundamental review of the trading book: a revised market risk framework,
Basel Committee 2013
June 2014
15 / 59
Basel VaR test: traffic light system
June 2014
16 / 59
Criticism
Basel test addresses only unconditional coverage

independence of time arrival should be tested separately
Christoffersen (98): likelihood ratio test for conditional coverage
LRcc = LRuc + LRind
in most practical cases, however, independence testing is left to visual
inspection, which helps interpreting exception clusters. Basel did not
introduce any independence formal test
in the following we assume that independence is tested separately. We
focus on unconditional EScoverage
June 2014
17 / 59
Criticism

LRcc = LRuc + LRind
June 2014
17 / 59
Criticism

LRcc = LRuc + LRind
June 2014
17 / 59
Visual inspection
June 2014
18 / 59
Testing setting
Basel VaR backtest
Results
Conclusions
Post Scriptum
June 2014
19 / 59
Test 1
test ES after having tested VaR
from

Xt + ESt
E
Xt + VaRt < 0 = 0
ESt
denoting It = 1Xt +VaRt <0 , introduce
Test statistic 1
PT
t=1
~
Z1 (X ) = PT
Xt It
ESt
i=1 It
+1
EH0 [Z1 ] = 0. ES underestimated if Z1 < 0

the test averages over exceptions; insensitive to an excess of exceptions
Z1 defined as a pure number to sterilize changes in portfoliosize
June 2014
20 / 59
Test 1
test ES after having tested VaR
from

Xt + ESt
E
Xt + VaRt < 0 = 0
ESt
denoting It = 1Xt +VaRt <0 , introduce
Test statistic 1
PT
t=1
~
Z1 (X ) = PT
Xt It
ESt
i=1 It
+1

the test averages over exceptions; insensitive to an excess of exceptions
Z1 defined as a pure number to sterilize changes in portfoliosize
June 2014
20 / 59
Computing a pvalue
~ ) is simulated by drawing
under H0, the distribution PZ1 of Z1 (X
independent Xt Pt , t
the realization Z1 (~x ) provides a pvalue p = FZ1 (Z1 (~x ))
acceptance/rejection based on a chosen significance level, say 5%
type2 probabilities and test power are computed based on specific
alternatives H1
Main difficulty
Storage of the tail of each distribution Pt , to simulate Z1 under H0.
Technologically elementary, but a challenge for auditing
the observations in this slide apply to all the tests proposed in the
following
June 2014
21 / 59
Computing a pvalue
~ ) is simulated by drawing
under H0, the distribution PZ1 of Z1 (X
independent Xt Pt , t
the realization Z1 (~x ) provides a pvalue p = FZ1 (Z1 (~x ))
acceptance/rejection based on a chosen significance level, say 5%
type2 probabilities and test power are computed based on specific
alternatives H1
Main difficulty
Storage of the tail of each distribution Pt , to simulate Z1 under H0.
Technologically elementary, but a challenge for auditing
the observations in this slide apply to all the tests proposed in the
following
June 2014
21 / 59
Test 2
direct test for ES
from the unconditional expectation

Xt It
E
= ES,t
introduce
Test statistic 2
~)=
Z2 (X
T
X
t=1
Xt It
+1
T ESt

the test averages over all days; it detects an excess of exceptions
P
It
Z2 = (Z1 1) t + 1
T
June 2014
22 / 59
Test 2
direct test for ES
from the unconditional expectation

Xt It
E
= ES,t
introduce
Test statistic 2
~)=
Z2 (X
T
X
t=1
Xt It
+1
T ESt

the test averages over all days; it detects an excess of exceptions
P
It
Z2 = (Z1 1) t + 1
T
June 2014
22 / 59
Test 3
direct test for ES
consider the r.v.s Ut = Pt (Xt ). Under H0, Ut i.i.d U(0, 1)
Berkowitz (01) proposes to test for uniformity the tail of the empirical
distribution of the xt
~ to estimate ES
We use this pseudouniform sample U
Test statistic 3
~)=1
Z3 (X
T
T
X
t=1
T ,
c
~
ES
(Pt1 (U))

+1
T ,
c
~ ))
EV ES
(Pt1 (V
~ i.i.d U(0, 1)
where V
notice that the denominator is not ESt but a finite sample estimate, to
compensate for bias. Analytical expressions available for any Pt
June 2014
23 / 59
Test 3
direct test for ES
consider the r.v.s Ut = Pt (Xt ). Under H0, Ut i.i.d U(0, 1)
Berkowitz (01) proposes to test for uniformity the tail of the empirical
distribution of the xt
~ to estimate ES
We use this pseudouniform sample U
Test statistic 3
~)=1
Z3 (X
T
T
X
t=1
T ,
c
~
ES
(Pt1 (U))

+1
T ,
c
~ ))
EV ES
(Pt1 (V
~ i.i.d U(0, 1)
where V
notice that the denominator is not ESt but a finite sample estimate, to
compensate for bias. Analytical expressions available for any Pt
June 2014
23 / 59
Test 4
similar to Berkowitz (01), we can directly test the tail density, via the ES of
the uniform distribution
Test statistic 4
T ,
~)=
Z4 (X
c
~
ES
(U)

1
T ,
c
~
EV ES (V )
~ i.i.d U(0, 1)
where V
EH0 [Z4 ] = 0. Risk underestimated if Z4 < 0
not a test of ES of the model, but a generic test of the tail density
June 2014
24 / 59
Test 4
similar to Berkowitz (01), we can directly test the tail density, via the ES of
the uniform distribution
Test statistic 4
T ,
~)=
Z4 (X
c
~
ES
(U)

1
T ,
c
~
EV ES (V )
~ i.i.d U(0, 1)
where V
EH0 [Z4 ] = 0. Risk underestimated if Z4 < 0
not a test of ES of the model, but a generic test of the tail density
June 2014
24 / 59
Observations
tests 2 and 3 can naturally be extended to all spectral measures

test 1 can be extended to simple spectral measures, with piecewise
constant spectrum
June 2014
25 / 59
Testing setting
Basel VaR backtest
Results
Conclusions
Post Scriptum
June 2014
26 / 59
H0: Student-t; H1: scaled distributions
H0: Ft = Pt , Student-t distribution

H1: Ft (x) = Pt (x/), scaled distribution ( > 1)
June 2014
27 / 59
H0: Student-t, = 100; H1: scaled distributions
June 2014
28 / 59
June 2014
29 / 59
June 2014
30 / 59
H0: Student-t; H1: EScoverage 95%, 90%

H1: Ft (x) = Pt (x/), again scaled distribution, but labeled in terms of ES
coverage
ESP = ESF 0 , with 0 = 5%, 10%
analogous to the Basel VaR coverage tables
June 2014
31 / 59
H0: Student-t, = 100; H1: EScoverage 95%, 90%
June 2014
32 / 59
June 2014
33 / 59
June 2014
34 / 59
June 2014
35 / 59
H0: Student-t, = 100; H1: = 10, 5, 3;

H1: Student-t distribution with lower
p
notice that the standard deviation is larger = /( 2)
June 2014
36 / 59
H0: Student-t, = 100; H1: = 10, 5, 3;
June 2014
37 / 59
H0: Student-t, = 100; H1: = 10, 5, 3;
June 2014
38 / 59
H0: Student-t, = 10; H1: = 5, 3;
June 2014
39 / 59
H0: Normalized Student-t, = 100; H1: = 10, 5, 3;
H0: Ft = Pt , Student-t distribution with = 1

H1: Normalized Student-t distribution with lower and = 1
June 2014
40 / 59
June 2014
41 / 59
June 2014
42 / 59
H0: Normalized Student-t, = 10; H1: = 5, 3;
June 2014
43 / 59
H0: Normalized Student-t; H1: fixed VaR 97.5%
H0: Ft = Pt , Student-t distribution with = 1

H1: Normalized Student-t distribution with lower and = 1
the distribution are offset in such a way to have all the same VaR 97.5%
alternative hypotheses built to analyze test 1
June 2014
44 / 59
H0: Normalized Student-t; H1: fixed VaR 97.5%
June 2014
45 / 59
H0: Student-t, = 100; H1: fixed VaR 97.5%
June 2014
46 / 59
H0: Student-t, = 10; H1: fixed VaR 97.5%
June 2014
47 / 59
H0: Norm. Student-t, = 100; H1: fixed VaR 97.5%
June 2014
48 / 59
H0: Norm. Student-t, = 10; H1: fixed VaR 97.5%
June 2014
49 / 59
Summary of results
all tests for ES 97.5% generally display more power than the Basel test
for VaR 99% in identical conditions
test 1 is subordinated to testing VaR, but has strong power for model
misspecifications in the tail
test 2 and test 3 excel in different cases. Test 2 is more powerful on
scaled distributions. Test 3 is more powerful on distributions with different
tail index
June 2014
50 / 59
Test 2: a very practical test

Test 2 has critical levels that are almost invariant with respect to the tail
properties, in a range = [5, +) that spans all realistic cases of a
firmwide bank portfolio
it allows to define a traffic light system that does not require the collection
of the entire tail of Pt , but just the three numbers xt , ESt and It
significance
=3
=5
= 10
= 100
Gaussian
Critical levels
Test 1
Test 2
5%
10%
5%
10%
-0.43 -0.27 -0.82 -0.59
-0.26 -0.17 -0.74 -0.55
-0.17 -0.12 -0.71 -0.53
-0.12 -0.08 -0.70 -0.53
-0.11 -0.08 -0.70 -0.53
Test 3
5%
10%
-0.49 -0.32
-0.30 -0.22
-0.21 -0.16
-0.15 -0.12
-0.15 -0.11
June 2014
51 / 59
Testing setting
Basel VaR backtest
Results
Conclusions
Post Scriptum
June 2014
52 / 59
Our results
ES is backtestable; this is certainly not a new result, but surprisingly
its worth reaffirming it
we propose three tests for ES: the novelty of these tests is that they are
nonparametric and contain no model assumptions. For this reason they
represent valid proposals for regulatory purposes
all of these tests display superior power to the standard Basel VaR
backtesting methodology
the main difficulty with backtesting ES is that you need to store the tail of
all predictive distributions Pt . If this is not a conceptual problem and
certainly no more a technological one either, this is still a challenge for an
auditable process. This is the only difference between backtesting ES
and VaR
one of the proposed tests displays a remarkable stability of the critical
levels, which provides an opportunity to set up practical tests for which
the storage of the predictive distributions is not needed
June 2014
53 / 59
Our results
and VaR
June 2014
53 / 59
Our results
and VaR
June 2014
53 / 59
Our results
and VaR
June 2014
53 / 59
Our results
and VaR
June 2014
53 / 59
Elicitability
Elicitability of VaR has no relevance in the regulatory debate

Elicitability allows you to compare models which forecast the exact same
process, based on point forecasts only. But to score the performance of a
model against an absolute significance level, one still needs (or at least
we dont see how one would not) either model assumptions or recording
all predictive distributions
Its no coincidence that VaR in banks is backtested without exploiting its
elicitability
June 2014
54 / 59
Testing setting
Basel VaR backtest
Results
Conclusions
Post Scriptum
June 2014
55 / 59
By the way, ES is elicitable

well, not exactly but consider the scoring function
S(v , e, x) = e2 /2ev ((x +v < 0))+(ex 2(v 2 x 2 ))(x +v < 0)+2v 2
then you have
{VaR, ES} = arg min EF [S(v , e, Y )]
v ,e
the only condition is that 4VaR > ES, which is always true in noncrazy
cases
this means that you can set up a contest among models that forecast
jointly VaR and ES
we could call it joint elicitability of VaR and ES
Lambert, Pennock, Shoham (08) call this property 2elicitability and
prove it for variance and mean
June 2014
56 / 59

then you have
v ,e
cases
jointly VaR and ES
June 2014
56 / 59

then you have
v ,e
cases
jointly VaR and ES
June 2014
56 / 59

then you have
v ,e
cases
jointly VaR and ES
June 2014
56 / 59

then you have
v ,e
cases
jointly VaR and ES
June 2014
56 / 59

then you have
v ,e
cases
jointly VaR and ES
June 2014
56 / 59

then you have
v ,e
cases
jointly VaR and ES
June 2014
56 / 59
General score function
most general scoring function, for all W

S W (v , e, x) = e2 /2 + W v 2 /2 ev

+ e(v + x) + W (x 2 v 2 )/2 (x + v < 0)
with
ES < W VaR
June 2014
57 / 59
A scoring function of VaR and ES
June 2014
58 / 59
Thanks!
June 2014
59 / 59

Expected Shortfall Backtest

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Expected Shortfall Backtest

Diunggah oleh

Hak Cipta:

Format Tersedia

Testing Expected Shortfall

C. Acerbi and B. Szekely

Workshop on systemic risk and regulatory market risk measures

Pullach, Germany, June 2014

Carlo Acerbi and Balazs Szekely

Testing Expected Shortfall

Motivation and goals

Carlo Acerbi and Balazs Szekely

Testing Expected Shortfall

Motivation and goals

Carlo Acerbi and Balazs Szekely

Testing Expected Shortfall

rich literature on VaR backtesting: Basel I (96), Kupiec (95),

Why is it difficult to test ES?

Carlo Acerbi and Balazs Szekely

Testing Expected Shortfall

rich literature on VaR backtesting: Basel I (96), Kupiec (95),

Why is it difficult to test ES?

Carlo Acerbi and Balazs Szekely

Testing Expected Shortfall

The nice thing about VaR is its more or less transparently

Carlo Acerbi and Balazs Szekely

Testing Expected Shortfall

The drama of nonelicitability of ES

elicitability is a subtle concept:

x = arg minx E[S(x, Y )]

What most people understood

Testing Expected Shortfall

The drama of nonelicitability of ES

elicitability is a subtle concept:

x = arg minx E[S(x, Y )]

What most people understood

Testing Expected Shortfall

The drama of nonelicitability of ES

elicitability is a subtle concept:

x = arg minx E[S(x, Y )]

What most people understood

Testing Expected Shortfall

The drama of nonelicitability of ES

elicitability is a subtle concept:

x = arg minx E[S(x, Y )]

What most people understood

Testing Expected Shortfall

The drama of nonelicitability of ES

elicitability is a subtle concept:

x = arg minx E[S(x, Y )]

What most people understood

Testing Expected Shortfall

The drama of nonelicitability of ES

elicitability is a subtle concept:

x = arg minx E[S(x, Y )]

What most people understood

Testing Expected Shortfall

Examples of elicitable statistics

S(q, x) = (x q)( (x q < 0))

when = 1/2 we retrieve the median

M = arg min EX [S(, X )]

there is no scoring function S that elicits ES

Carlo Acerbi and Balazs Szekely

Testing Expected Shortfall

S(c, x) does not exist

Something is not quite right

if elicitable means backtestable isnt it a bit strange that