Anda di halaman 1dari 4

27

1.8. Large deviation and some exponential inequalities.


Theory of large deviation (Varadhan, 1984), concerning chance of rare events that are usually of
exponential decay, constitutes a major development in probability theory in the past few decades.
Its original idea may be traced back to the Laplace principle in mathematics: for any Borel set
B R
d
and measurable function g(),
lim
t
1
t
log

B
e
tg(x)
dx = essinf
xB
g(x), lim
t
1
t
log

B
e
tg(x)
dx = esssup
xB
g(x),
should the integral be nite. The classical form of large deviation in terms of iid random variables
is due to Harald Cramer(1938). We present a brief account in the simplest form.
(i). Large deviation for iid r.v.s.
Example 1.12 (Value-at-risk) Suppose a portfolio worths W
0
= 1 million dollar at inception.
Assume the returns of the i-th trading period are X
i
, which are iid. Then the portfolio worths
W
n
=

n
i=1
X
i
at the end of n-th trading period. The so-called value-at-risk, VaR, as a measurement
of risk of the portfolio is dened as follows: the n trading period p-percentage VaR of the portfolio
is c
n
> 0 such that
P(W
0
W
n
> c
n
) = P(W
n
< 1 c
n
) = p.
In other words, c
n
is the amount that the portfolio may lose as much as or more with chance p. In
nancial industry, p is commonly set to be small, as for example 5% or 1%.
Consider a standard setup with X
i
being log-normal, i.e., log X
i
N(,
2
). The critical fact that
we shall use in this example is, for any x < such that n(x )
2
is large,
1
n
log P

n
i=1
log(X
i
)
n
< x

=
1
n
log P(N(,
2
/n) < x) =
1
n
log (

n(x )/)
(x )
2
2
2
,
where () and () are the cdf and density of N(0, 1), since (s) (s)/|s| for s . Then,
for small p, we have
1
n
log p =
1
n
log P(W
n
< 1 c
n
) =
1
n
log P(log W
n
< log(1 c
n
))
=
1
n
log P

n
i=1
log(X
i
)
n
<
log(1 c
n
)
n


[(1/n) log(1 c
n
) ]
2
2
2
Suppose (1/n) log(1 c
n
) a < min(0, ) with n(a )
2
large. Then p e
nq
where
q =
(a )
2
2
2
and a =

2q < 0.
In other words, the portfolio may shrink at an average compound rate of |a| for n periods with
chance p e
nq
. For example, suppose = 0, = 1, q = 1/2, n = 6. Then, a = 1, the portfolio
may shrink to e
6
with approximate chance e
3
.
We note that (x)
2
/2
2
is the so-called rate function, or Cramer or entropy function, for N(,
2
).
The above calculation takes advantage of the log-normality assumption. With general population
distribution of X
i
, the limiting relation between c
n
and p is answered by the theorem of large
deviation.
Example 1.13 (Cramers actuarial problem) Suppose n clients have each paid a premium
of c dollars for life insurance over a period of time. Assume the claims are iid nonnegative random
28
variables X
1
, ..., X
n
. Suppose the total premium nc is all the insurance company has to pay out
the claims. The chance that the insurance company bankrupts is
P(
n

i=1
X
i
> cn) = P(S
n
/n > c) = P(S
n
/n > c ),
where is the common mean of X
i
. By weak law of large numbers, the chance is close to 1 if c < .
Normally, the insurance company sets the premium c > . The chance of bankruptcy is close to 0.
Since bankruptcy is life-and-death issue for the company, it is critical to have a precise estimation
of the chance. By the following Cramers theorem (Theorem 1.9), under suitable conditions,
P(
n

i=1
X
i
> cn) = P(S
n
/n > c) e
nI(c)
for c > and large n
where I(x) = sup
t
[xt log (t)] and is the moment generating function of X
i
dened as follows.

Definition For a r.v. X (or its distribution function), its moment generation function is (t) =
E(e
tX
), t (, ).
Note that (0) = 1 for any r.v. or distribution but the moment generating functions are not
necessarily nite everywhere. Should () be nite in a neighborhood of 0, the k-th derivative of
at 0 is the k-th moment of X
i
, i.e.,

(k)
(0) = E(X
k
),
This explains why it is called moment generating function.
The following are some moment generating functions for commonly used distributions:
Binomial B(n, p): (1 p +pe
t
)
n
; Normal N(,
2
): (t) = e
t+
2
t
2
/2
;
Poisson P(): (t) = e
+e
t
; Exponential E(): (t) = /( t) for t < .
Lemma Suppose a r.v. X has nite moment generating function () on (, ). Then the rate
function
I(x) = sup
t
(tx log (t)), x (, )
is a convex function with minimum 0 at x = E(X).
We omit the proof. The essential part is that tx log (t) is concave in t, since log () is convex.
Theorem 1.9 (Cramers Theorem) Suppose X, X
1
, X
2
, ... are iid with mean and nite moment
generating function () on (, ). Then,
1
n
log P(S
n
/n > x) I(x) for x > ; and
1
n
log P(S
n
/n < x) I(x) for x < .
Proof The proof uses the moment generating function and Chebyshev/Markov inequality. First,
for x > and t > 0,
P(S
n
/n > x) = P(S
n
> nx) e
nx
E(e
tS
n
) = e
ntx
(t)
n
= e
n(txlog (t))
Therefore,
1
n
log P(S
n
/n > x) sup
t0
(tx log (t)) = sup
t(,)
(tx log (t)), for x > .
Let F be the common distribution function of X
i
. For a xed t > 0, let X

i
be iid with common
cdf
F

(s) =
1
(t)

e
ta
dF(a)
29
Then, for any y > x and t > 0,
P(S
n
/n > x) P(ny > S
n
> nx) e
nty
E(e
tS
n
1
{ny>S
n
>nx}
)
= e
nty

1
{ny>

n
i=1
x
i
>nx}
e
tx
1
dF(x
1
) e
tx
n
dF(x
n
)
= e
nty
(t)
n

1
{ny>

n
i=1
x
i
>nx}
e
tx
1
(t)
dF(x
1
)
e
tx
n
(t)
dF(x
n
)
= e
n(tylog (t))

1
{ny>

n
i=1
x
i
>nx}
dF

(x
1
) dF

(x
n
)
= e
n(tylog (t))
P(y >
n

i=1
X

i
/n > x)
Choose t such that E(X

i
) (x, y). Then, by WLLN,
liminf
n
1
n
log P(S
n
/n > x) (ty log (t)).
Choose t such that y x, then t t
0
where t
0
is such that t
0
x log (t
0
) = sup
t
(tx log (t)).
As a result,
liminf
n
1
n
log P(S
n
/n > x) (t
0
x log (t
0
)) = sup
t
(tx log (t)) = I(x).
The rst inequality of this theorem is proved. The other inequality for x < can be shown
analogously.
(ii) Some exponential inequalities.
The above large deviation results require a common distribution of the r.v.s. If the r.v.s are
independent but not necessarily identically distributed, generalization of large deviation is not that
simple. However, some exponential inequalities, which can be relatively easily derived and be readily
generalized to U-statistics or martingales, are often useful.
Theorem 1.10 (Bernsteins inequality) Suppose X
n
, n 1, are independent with mean 0
and variance
2
n
, satisfying
E(|X
n
|
k
)
k!
2

2
n
c
k2
, k 2
for some constant c > 0. Then, for all x > 0,
P(S
n
/n > x) e
nx
2
/[2(s
2
n
/n+cx)]
,
where s
2
n
=

n
j=1

2
j
.
Proof. The proof again uses the moment generating function. Write, for |t| < 1/c,
E(e
tX
n
) 1 +E(tX
n
) +

j=2
E(|tX
n
|
j
)/j!
1 +
t
2

2
n
2
(1 +|t|c +t
2
c
2
+|t|
3
c
3
+ ) = 1 +
t
2

2
n
2
1
1 |t|c
e
t
2

2
n
/(22c|t|)
.
Apply Chebyshevs inequality, for 0 < t < 1/c,
P(S
n
/n > x) = P(e
tS
n
> e
tnx
) e
tnx
E(e
tS
n
) e
tnx
n

i=1
E(e
tX
i
)
e
tnx
e

n
i=1
t
2

2
i
/(22ct)
= e
tnx+t
2
s
2
n
/(22ct)
30
Choose t = nx/(s
2
n
+cnx). Bernsteins inequality follows.
Remark. A little sharper inequality, called Bennetts inequality, can be obtained by choosing t in
the above proof to minimize tnx +t
2
s
2
n
/(2 2ct):
P(S
n
/n > x) e
nx
2
/[s
2
n
/n(1+

1+2cx/s
2
n
)+cx]
.
Corollary Suppose X
n
, n 1, are independent with mean 0 and P(|X
n
| c) = 1 for c > 0 and
all n 1. Then, for 0 < x < c,
P(S
n
/n > x) e
nx
2
/4c
2
P(S
n
/n < x).
The proof of this corollary is straightfoward, by observing that s
2
n
/n c.
An important implication of the above corollary is that for uniformly bounded random variables
X
i
with mean 0,
limsup
n
|S
n
/n|

(log n)/n
< a.s.
This can be proved by citing Borel-Cantelli lemma and choosing, for large n, x = C(log n)

/n for a
large C in the inequality in the above corollary. Notice that the same convergence was also shown
in Section 1.7 for iid r.v.s.
The above inequality is an essential building block in a technique, called empirical approximation,
to prove uniform convergence of random functions. We illustrate it with the following example.
Example 1.14 Let X
1
, ..., X
n
, ... be iid with common cdf F() and empirical distribution F
n
(),
i.e., F
n
(t) =

n
i=1
1
{X
i
t}
/n. Then,
limsup
n
sup
t
|F
n
(t) F(t)|

(log n)/n
4 a.s..
Proof Without loss of generality, assume F() is continuous. The above corollary implies, for all
t and n 1,
P(|F
n
(t) F(t)| > 4

(log n)/n) 2e
4 log n
= 2n
4
.
Let t
0
< t
1
< ... < t
n
2 be such that F(t
k
) F(t
k1
) = n
2
. Then,

n=1
P( sup
1jn
2
|F
n
(t
j
) F(t
j
)| > 4

(log n)/n)

n=1
n
2

j=1
P(|F
n
(t
j
) F(t
j
)| > 4

(log n)/n)

n=1
2n
2
n
4
< .
By Borel-Cantelli lemma,
limsup
n
sup
1jn
2 |F
n
(t
j
) F(t
j
)|
4

(log n)/n
1 a.s.
It then follows from the monotonicity of F
n
() and F() and the fact F(t
k
) F(t
k1
) = n
2
that
limsup
n
sup
t
|F
n
(t) F(t)|

(log n)/n
4 a.s.

Remark. The actual convergence rate of this example is still the law of iterated logarithm.

Anda mungkin juga menyukai