Anda di halaman 1dari 10

TAIL RISK RESEARCH PROGRAM

Gini estimation under infinite variance


Andrea Fontanari - TU Delft and CWI
Nassim Nicholas Taleb - Tandon School of Engineering, NYU
Pasquale Cirillo - TU Delft

Abstract—Under infinite variance, the Gini coefficient from the empirical distribution of the available data (for
cannot be reliably estimated using conventional non- precision, see Eq. 3).
parametric methods. This paper aims to close this gap by deriving the lim-
We study different approaches to the estimation of
the Gini index in presence of a heavy tailed data iting distribution of the non-parametric Gini estimator
arXiv:1707.01370v1 [stat.ME] 5 Jul 2017

generating process, that is, one with Paretan tails and/or when the data are fat tailed and to analyze possible
in the stable distribution class with finite mean but non- strategies to reduce fat tails impact on the estimation.
finite variance (with tail index α ∈ (1, 2)). In particular, we will show how a maximum likelihood
While the Gini index is a measurement of fat tailed- approach, despite the risk of misspecification of the
ness, little attention has been brought to a significant
downward bias in conventional applications, one that model for the data, needs much fewer observations to
increases with lower values of α. reach efficiency compared to a "non-parametric" one.1
First, we show how the "non-parametric" estimator By heavy tails we mean data generated by a power
of the Gini index undergoes a phase transition in the law distribution that is the generalization of the type I
symmetry structure of its asymptotic distribution as the Pareto distribution with density:
data distribution shifts from the domain of attraction
of a light tail distribution to the domain of attraction
f ( x ) = ρxm x −ρ−1 , x > xm
ρ
of a fat tailed, infinite variance one. (1)
Second, we show how the maximum likelihood es-
timator outperforms the "non-parametric" requiring a Note that in this paper we will assume xm = 1 without
much smaller sample size to reach efficiency. loss of generality.
Finally we provide a simple correction mechanism to In particular we restrict our focus on distributions with
the small sample bias of the "non-paramteric" estimator finite mean and infinite variance and therefore we can
based on the distance between the mode and the mean limit the class of power laws only to those with tail
of its asymptotic distribution for the case of heavy tailed
data generating process. exponent ρ ∈ (1, 2).
Table I and Figure 1 present numerically and graphi-
cally our story and suggest its conclusion.
Table I compares the Gini index obtained by the non-
I. I NTRODUCTION parametric estimator and the one obtained via Maximum
Likelihood Estimation (MLE) of the tail exponent.
Wealth inequality studies represent a field of eco-
As the first column shows, the convergence of the
nomics and statistics exposed to fat-tailed data gener- ¯
"non-parametric" estimator to the true value g = 0.833
ating process, often with infinite variance. This is not
is extremely slow and monotonic increasing when the
surprising especially if we think that the prototype of
data distribution is infinite variance. This suggests an
fat tailed distribution, the Pareto distribution, has been
issue not only in the tail structure of the distribution of
derived for the first time to model data of households
the "non-parametric" estimator but also in its symmetry.
income [4].
As we shall see, in order to avoid the poor quality
However, the fat tailedness of data can be problematic
in the convergence of the "non-parametric" estimator to
in the context of statistical estimations since most of the
the true value of the Gini index, in presence of infinite
basic statistic results of efficiency and consistency do not
variance data generating process, we suggest a MLE
hold anymore.
approach to recover normality of the asymptotic distri-
The scope of this work is, accordingly, to show how fat
bution as well as to improve the speed of convergence.
tails affect the estimation of one of the most celebrated
We believe that in such a heavy tail framework the issues
measure of income inequality: the Gini index.
of a parametric approach are offset by the gains in the
The literature concerning the Gini index estimator is quality of the approximation.
wide and comprehensive, however, strangely enough Figure 1 provides additional evidence that the limiting
almost no attention has been posed on its behavior in distribution of the "non-parametric" Gini index loses its
presence of fat tails –all the more curious since the Gini
index is itself a measure of fat-tailedness and its effect 1 We observe a similar bias affecting the nonparametric measurement

on the first moment of the distribution. of quantile contributions (of the type the top 1% owns x% of the total
wealth)[8]. This paper extends the problem to the more widespread
The commonly applied method (so called "non- Gini coefficient, and goes deeper by making links with the limit
parametric") consists in simply measuring the Gini index theorems.

A. Fontanari, N. N. Taleb, P. Cirillo


TAIL RISK RESEARCH PROGRAM

TABLE I: Comparison of "non-parametric" Gini to


1 E (| X 0 − X”|)
ML estimator, assuming tail α = 1.1 g= . (2)
2 µ
Error
n Non-Parametric MLE
Ratio where g is the Gini index and X 0 and X” are independent
(number identical copies of a random variable X with c.d.f. F ( x ) ∈
of obs.)
Mean Bias Mean Bias [c, ∞), c > 0 and finite mean E[ X ] = µ.
103 0.711 -0.122 0.8333 0 1.4 The quantity E (| X 0 − X”|) is known as the "Gini mean
difference" (GMD).
104 0.750 -0.083 0.8333 0 3
105 0.775 -0.058 0.8333 0 6.6 The Gini index is then the mean expected deviation
between any two independent draws from a random
106 0.790 -0.043 0.8333 0 156
variable scaled by twice its mean [10], [1].
107 0.802 -0.033 0.8333 0 > 105
What we call the "non-parametric" estimator of the
Gini index of a sample ( Xi )1≤i≤n is the following:

∑ 1 ≤ i < j ≤ n | Xi − X j |
properties of normality and symmetry shifting to a fatter G NP ( Xn ) = (3)
tailed and skewed limit as the distribution of the data (n − 1) ∑in=1 Xi
enters an infinite variance domain. As we will prove in which can also be expressed as:
Section II this is exactly what happens. When the data
generating process is in the domain of attraction of a fat
∑in=1 (2( ni− 1
−1 − 1 ) X ( i )
1 n
n ∑i =1 Z(i )
tail distribution, the asymptotic distribution of the Gini G NP ( Xn ) = = (4)
index moves away from gaussianity towards a totally ∑in=1 X(i) 1 n
n ∑ i = 1 Xi
skewed to the right α-stable limit. where X(1) , X(2) , ..., X(n) are the ordered statistics of
This change of behavior is the responsible for the main X1 , ..., Xn such that: X(1) < X(2) < ... < X(n) .
problem of the "non-parametric" estimators for the Gini
index: a downwards bias, for almost every sample size, The asymptotic normality of (4) under finite variance
when data are heavy tailed. assumption has been shown already by different authors.
This result, suggests another possible solution to im- The result follows from the properties of the U-statistics
prove the quality of the "non-parametric" estimator in and the L-estimators involved in formulation 4. (For
case a maximum likelihood approach is not preferred. additional details see for example [4].)
The idea is to correct for the skewness of the distribu- We introduce the following notations for what con-
tion of the "non-parametric" estimator in order to place cerns α-stable distributions. A random variable X is
its mode on the true value of the Gini. This correction, distributed accordingly to an α-stable distribution if:
we will show, improves the consistency and the bias
of the estimator, while still allowing the use of a non- X ∼ S(α, β, γ, δ)
parametric approach. The correction reduces the risk of
underestimating the Gini index. where α ∈ (0, 2) is the tail parameter, β ∈ [−1, 1] is the
The remainder of the paper is organized as follows. skewness, γ ∈ R+ is the scale and δ ∈ R is the location.
In Section II we derive the asymptotic distribution of Further, the standardized α-stable random variable is
the sample Gini index when the data distribution has expressed as follows:
infinite variance. We subsequently provide an example
Zα,β ∼ S(α, β, 1, 0) (5)
with Pareto distributed data and we compare the quality
of the limit distribution between the maximum likeli-
hood estimator and the "non-parametric" one. Finally in Now, α-stable distributions are a subclass of infinitely
Section III we propose a simple correction mechanism divisible distributions; thanks to their stability under
based on the mode-mean distance of the asymptotic summation, they can be used to describe the entire class
distribution of the "non-parametric" estimator to adjust of distributions arising from a Central Limit Theorem
the bias that is present in small samples. type of argument. In particular for α = 2 we obtain
the normal distribution, the limit distribution for the
most classical central limit theorem, for different values
II. A SYMPTOTIC AND PRE - ASYMPTOTIC RESULTS ON
of α we obtain other type of limiting distributions. In
"N ON - PARAMETRIC " E STIMATOR
particular we use the following notation to describe a
In this Section, we derive the asymptotic distribution random variable in the domain of attraction of an α-
for the "non-parametric" estimator of the Gini index stable distribution: X ∈ DA(Sα ) [3].
when the data generating process is fat tailed with finite Given the possible confusion that can arise in the
mean but infinite variance. parametrization of α-stable distribution we shall clarify
The so-called stochastic representation of the Gini, which definition we are using. We use the parametriza-
denoted by g, is tion presented in [7] also called S1 parametrization [6],

A. Fontanari, N. N. Taleb, P. Cirillo


TAIL RISK RESEARCH PROGRAM

Fig. 1: Empirical distribution for Gini "non-parametric" estimator for Type I Pareto ditribution with different tail
index (results has been centered to ease comparison)

therefore the characteristic function defining the α-stable Difference" (GMD) (numerator in Equation 3), while
is given by: Theorem II.2 completes the proof for the whole Gini
index.
E(eitX ) = e−γ
α | t |α (1−iβ sign( t )) tan( πα )+iδt
2 , α 6= 1
Theorem II.1. Consider a sequence ( Xi )1≤i≤n of i.i.d
E(e itX
)=e −γ|t|(1+iβ π2 sign(t)) ln |t|+iδt
,α = 1 random variables from a distribution X on [c, +∞) c >
0, such that F is in the maximum domain of attraction
Since we are dealing with finite mean distributions of an Frechet random variable: X ∈ MDA(Φ(ρ)) where
we restrict ourselves to the case of α ∈ (1, 2). Therefore we restrict to the case in which ρ ∈ (1, 2). Then the
our parametrization defines a location-scale family for ∑n Z
sample Gini mean deviation (GMD) i=1n (i) = GMDn
every choice of α ∈ (1, 2) and there is not need to use
satisfies the following limit in distribution.
the S0 parametrization introduced by J. Nolan in [6].
Additionally in the case of α ∈ (1, 2) the expected value − 1ρ
of an α-stable random variable coincide with the location
n (∑in=1 Z(i) − nθ ) d
→ Zρ,1 (6)
parameter: E( X ) = δ L0 ( n )
For more references, [7], [6]. where θ = E[ Z(i) ], L0 (n) is a slowly varying function
such that Equation 7 holds and Zρ,1 is a standardized
A. The α-Stable asymptotic limit for the Gini index α-stable random variable defined as 5.
We are now ready to present our main result: the Proof. Theorem 3.1 (ii) in [5] proves the existence of a
asymptotic distribution of the Gini index estimator, as weak limit for the GMD prior to the existence of the
presented in (4), when the data generating process is fat right scaling sequences to apply to the sequence of i.i.d.
tailed and more specifically with infinite variance. random variables Zi = (2F ( Xi ) − 1) Xi where F ( X ) is
More formally we assume that our observations the integral probability transform of X ∼ F. Therefore
( Xi )1≤i≤n are i.i.d. generated by a random variable X what is left to prove is the characterization of scaling
in the maximum domain of attraction of a Frechet dis- sequences and the limiting distribution.
tribution [2], X ∈ MDA(Φ(ρ)) with ρ ∈ (1, 2) such that To start recall that by assumption X ∈ MDA(Φ(ρ)). A
d −ρ standard result in extreme value theory , [2], character-
P(max( X1 , ..., Xn ) ≤ x ) → e− x
izes the tail of the distribution in the Frechet domain
The result is divided into two theorems, Theorem II.1 as: P(| X | > x ) ∼ L( x ) x −ρ , where L( x ) is a slowly
takes care of the limiting distribution of the "Gini Mean varying function. Recall [3] that this is a characterization

A. Fontanari, N. N. Taleb, P. Cirillo


TAIL RISK RESEARCH PROGRAM

for distributions in the domain of attraction of α-stable as z → ∞ and therefore Z ∈ DA(Sα ) with α = ρ. We
distribution as well. are now ready to invoke the generalized Central Limit
Therefore X ∈ MDA(Φ(ρ)) is equivalent to X ∈ Theorem [2] for the sequence of Zi :
DA(Sρ ) with ρ ∈ (1, 2). This result enables us to use
∑in=1 Zi − nθ d
a Central Limit Theorem type argument for the conver- 1
→ Zρ,β
gence of the sum in the estimator. L0 ( n ) n ρ

However, we first need to prove that the r.v. Z ∈ −1


DA(Sα ) as well. i.e. P(| Z | > z) ∼ L(z)z−ρ with ρ ∈ (1, 2) where θ = E( Zi ), L0 (n) = cn n ρ where cn is a sequence
and L(z) slowly varying. which must satisfy the following relation:
Note that:
nL(cn ) 1−ρ
lim = πρ = Cρ (7)
n→∞ Γ(2 − ρ) cos( 2 )
ρ
P(| Z̃ | > z) ≤ P(| Z | > z) ≤ P(2X > z) cn

and Zρ,β is a standardized α-stable r.v.


where Z̃ = (2U − 1) X with U ∼ uni f [0, 1] and U ⊥ X.
The skewness parameter β must respect the following
The first bound hold because of positive dependence equation:
between X and F ( X ) and can be proven rigorously by P( Z > z) 1+β
noting that 2UX ≤ 2F ( X ) X by re-arrangement inequity. →
P(| Z | > z) 2
The upper bound is trivial.
By assumption P(2X > z) ∼ 2ρ L(z)z−ρ .2 In order Recalling that by construction Z ∈ [−c, +∞) the above
to show that, in addition, Z̃ ∈ DA(Sα ), we use the expression reduces to
Breiman’s Theorem [9] which ensure the stability of the
Fréchet class under the product as long as the second P( Z > z) P( Z > z) 1+β
→ =1→ (8)
random variable is not too heavy tailed. P( Z > z) + P(− Z > z) P( Z > z) 2
To apply the theorem we re-write P(| Z̃ | > z) as: and therefore β = 1.
Hence by applying result (ii) of Theorem 3.1 in [5]
P(| Z̃ | > z) = P( Z̃ > z) + P(− Z̃ > z) = P(ŨX > z)
we conclude the existence of the same limiting α-stable
+ P(−ŨX > z) distribution also for the ordered version Z(i) .
where Ũ ∼ uni f [−1, 1] and Ũ ⊥ X.
We focus on P(ŨX > z) since for P(−ŨX > z) the
procedure is the same.
Theorem II.2. Given the same assumptions of Theorem
We have: II.1, the estimated Gini index G NP ( Xn ) satisfies the
P(ŨX > z) = P(ŨX > z|Ũ > 0) P(Ũ > 0) following limit in distribution:

+ P(ŨX > z|Ũ ≤ 0) P(Ũ ≤ 0) ρ −1 G NP ( Xn ) − g d


n ρ ( )→Q (9)
for z → +∞, P(ŨX > z|Ũ ≤ 0) → 0 while applying L0 ( n )
Breiman’s Theorem, P(ŨX > z|Ũ > 0) becomes: where g = E( G NP ( Xn )) and Q is an α-stable random
P(ŨX > z|Ũ > 0) → E((Ũ )ρ |U > 0) P( X > z) P(U > 0) variable S(ρ, 1, µ1 , 0)

d
Proof. In Theorem II.1 we proved that ∑ Z(i) → Zρ,β if
Therefore: d
∑ Zi → Zρ,β . Recall that by (4) the Gini index is given by
1 ∑ Z(i)
P(| Z̃ | > z) → E((Ũ )ρ |U > 0) P( X > z) ∑ Zi d
2 ∑ Xi
. Therefore it is sufficient to prove that ∑ Xi
→ Λ to
1 ∑Z d
+ E((−˜U )ρ |U ≤ 0) P( X > z) prove that ∑ X(i) → Λ. We achieve this through a Slutsky
i
2 type argument.
From this: ∑in=1 Z(i) −nθ
Call Yn the sequence 1 . By Theorem II.1 we
n ρ L0 ( n )
1 d
P(| Z̃ | > z) → P( X > z)[ E((Ũ )ρ |U > 0) have that Yn → Zρ,1 . By Weak Law of Large Numbers
2
∑ Xi p
+ E((−˜U )ρ |U ≤ 0)] we also have that mn = n → µ. By Slutsky Theorem:
2ρ 2ρ Yn d
= P( X > z) ∼ L (z )z−ρ mn → µ1 Zρ,1 .
1−ρ 1−ρ What is left to prove is that also (9) converges in
We conclude then that by the squeezing theorem: distribution to the same limit.
A well known theorem in probability theory [3] states
P(| Z | > z) ∼ L(z)z−ρ d
that if a sequence Wn → Λ and Wn − Vn = o p (1) with
d
2 Note that L( 2z ) ∼ L(z) by definition of slowly varying function Vn another sequence, then Vn → Λ.

A. Fontanari, N. N. Taleb, P. Cirillo


TAIL RISK RESEARCH PROGRAM

Yn
Take Wn = mn and Vn the sequence defined by Vn = Theorem II.3 shows how, once a parametric family
ρ −1
G NP ( Xn )− g for the data generating process has been identified, it is
n ρ ( L0 ( n )
), we prove that
possible to estimate the Gini index via Maximum Like-
Yn p lihood (ML) over the parameters of the chosen model
− Vn → 0 (10) for the parent distributions. The so obtained estimator
mn
will not only be asymptotically normal but it will be
Which reduces to showing that
asymptotically efficient.
ρ −1  
n ρ θ n 1 p
− →0 (11) Theorem II.3. Let X ∼ Fθ such that Fθ is a parametric
L0 ( n ) ∑ i
X µ
family belonging to the exponential family. Then the
p
Thanks to the continuous mapping theorem ∑nX → µ1 in Gini index obtained by plug-in of maximum likelihood
i
particular ∑nX − µ1 → o p (n−1 ). estimator of θ, G ML ( Xn )θ , is asymptotically normal and
i efficient. Namely:
Therefore Equation (11) goes to zero as n → ∞ given
that ρ ∈ (1, 2) by assumption.
√ d
We conclude the proof by noting that an α-stable n( G ML ( Xn )θ − gθ ) → N (0, gθ02 I −1 (θ )), (12)
random variable is closed under scaling by a constant dgθ
[7]. In particular this parameters changes as follows: the where gθ0 = dθ and I (θ ) is the Fisher Information.
tail parameter is unchanged and equal to ρ, the skewed
parameter β = 1 and the scale parameter γ = µ1 . Proof. The result follows easily from the asymptotic effi-
ciency of MLE estimators of exponential family and the
invariance principle of MLE estimators. In particular the
The next remark highlights some insight on the behav- validity of the inviariance principle for the Gini index
ior of the asymptotic distribution of the "non-parametric" is granted by continuity and monotonicity of gθ with
estimator provided by Theorem II.2. respect to θ.
The asymptotic variance is obtained by application of
Remark 1. In view of equation (8), in case of fat tails,
the delta-method.
the asymptotic distribution of the Gini index estimator
is always totally right skewed regardless the distribution
of the data generating process.
Comparing this result with the case of a finite variance C. A Paretian illustration
data generating process (leading to a Gaussian limit We provide an illustration of the above results using
distribution), we can see how the limiting distribution a Pareto type I distribution as distribution for the data.
of the estimator undergoes a phase transition in its More formally: consider the standard Pareto distribution
skewness when variance becomes infinite, thus shifting for a random variable X with density given in Equation
from a symmetric Gaussian to a totally skewed α-stable. 1.
Therefore a fat tailed data distribution not only in-
duces a fatter tail limit but it also changes in shape. Corollary 1. Let ( Xi ) be a sequence of i.i.d. observations
Therefore the estimator, whose asymptotic consistency with distribution type I Pareto with tail index ρ ∈ (1, 2).
is still guaranteed, even in the fat tail case, [5], will Then the "non-parametric" Gini estimator has the
approach its true value more slowly and from below. following limit:
Evidence for these behaviors are given in in Table I.
A perhaps less risky outcome would have been if the !
NP d ( ρ − 1)
limiting distributions would have exhibit only fat tails G ( Xn ) − g ∼ S ρ, ρ−1 , 0, 1 (13)
but still symmetric behavior with a β = 0. Unfortunately ρ
n ρ

this is not the case.


Proof. The results is a mere application of Theorem II.2,
recalling that a Pareto distribution satisfy the domain of
B. The Maximum Likelihood estimator attraction of α-stable random variables with slowly vary-
Theorem II.2 shows that the "non-parametric" estima- ing function L( x ) = 1. Therefore the sequence cn which
1
tor for Gini index is not the best option when dealing satisfies Equation 7 becomes: cn = n ρ Cρ−1 , and therefore
with infinite variance parent distributions due to the L0 (n) = Cρ−1 independent on n, for convenience we call
skewness ad "fatness" of its limiting distribution. A Cρ−1 = d. Additionally the mean of the distribution is
way out could be to seek estimators that still preserve ρ
also a function of ρ: µ = ρ−1 .
asymptotic normality under fat tails. In general this
is not possible in view of the α-stable Central Limit
Theorem to which any "non-parametric" estimator will
eventually fall into. However, a possible solution is to Corollary 2. Let the sample ( Xi ) be distributed as in
adopt parametric techniques.

A. Fontanari, N. N. Taleb, P. Cirillo


TAIL RISK RESEARCH PROGRAM

Corollary 1, let GθML be the MLE for the Gini index as Consider the function:
1
defined in Theorem II.3. In particular GρML = 2ρ ML −1
. PSn (| X | > c)
Then the asymptotic distribution of GρML is: r (c, n) =
PN (| X | > c)
4ρ2
GρML ( Xn ) − g ∼ N (0, ) (14) we wish to find ñ such that r (c, ñ) = 1 for fixed c.
n(2ρ − 1)4 Table II displays the results for some thresholds and
some tail parameters.
Proof. The result follows from the fact that the Pareto dis-
In particular, we can see how for just a sample size
tribution belongs to the exponential family and therefore
of n = 100 the MLE estimator outperform the "non-
satisfies the "regularity" conditions necessary for asymp-
parametric" one. In fact, a much bigger amount of obser-
totic normality and efficiency of maximum likelihood
vations is needed to obtain similar tail probabilities for
estimator.
the "non-parametric" estimator.
Recall also that the Fisher information for a Pareto One thing to notice is that the number of observations
distribution is ρ12 . needed to match the tail probability does not vary uni-
formly with the threshold. However this is correct since
as the threshold goes to infinity: limc→∞ P(| X | > c) and
Now that we have worked out both the asymptotic
to zero: limc→0 P(| X | > c) the tail probabilities are the
distributions we can show by how much the quality of
same for every number of n. Therefore given the uni-
the convergence in the MLE case compared to the "non-
modality of the distributions we expect that there will be
parametric" is better.
a threshold which maximize the number of observations
In particular we approximate the distribution of the needed to match the tail probabilities, while for all the
deviations from the true value of the Gini index for finite others threshold level the number of observations will
sample size by Equation 13 and Equation 14. be smaller.
To start we visualize in Figure 2 how the noise around Figure 3 additionally shows some examples on how
the mean of the two different type of estimators is the parity of the tail probabilities is reached for different
distributed and how these distributions change as the thresholds c and different tails index.
number of observations increase. In particular, to ease We conclude that when in presence of fat tailed wealth
the comparison between MLE and "non-parametric" es- distribution with infinite variance a plug-in MLE based
timator we have fixed the number of observation in estimator should be preferred with respect to the "non-
the MLE case and vary them in the "non-parametric" parametric" one due to its normality and more efficient
one. We perform this study for different type of tail use of the observations.
index showing how big impact it has on the consistency
of the estimator. It worth to point out that as the tail TABLE II: The optimal number of observations ñ
index goes towards 1, the threshold for infinite mean, needed in order to match tail probably of asymp-
the mode of the distribution of the "non-parametric" totic MLE distribution with fixed n = 100.
estimator shifts farther and farther away from the mean
of the distribution (centered on 0 by definition). This Treshold c:
effect is responsible for the small sample bias observed α 0.005 0.01 0.015 0.02
in applications. This phenomena is not present in the 1.8 321 1242 2827 2244
MLE case because of the normality of the limit for every 1.5 844 836 2925 23036
value of the tail parameter. 1.2 402200 194372 111809 73888
We now make more rigorous our argument by assess-
ing the number of observations needed for the "non-
parametric" estimator to be as good as the MLE one,
where the following concentration measure is taken as III. S MALL SAMPLES CORRECTION
quantitative measure for the expression "as good as".
An alternative approach, instead of assuming a para-
Z metric data distribution and compute the Gini through
1| x|>c dPi = Pi (| X | > c) (15) MLE, could be to capitalize the information given by
Theorem II.2 to try to build a correction mechanism for
With Pi , i ∈ S, N being the distribution of each the bias in the "non-parametric" estimator which arises
estimator as in Equation 14 and Equation 13. 1 A is the especially for small samples size.
indicator function. The key idea is to recognize that for unimodal distri-
More precisely, we wish to compare for a fixed number butions most of the observations comes from around the
of observation in 14 how many observation are required mode of the distribution, in symmetric distributions the
to reach the same value of the concentration measure in mode and the mean coincide and therefore we expect
13. that most of the observations will be close to the mean
The problem can be rephrased in the following way. value as well. For skewed distributions this is not the

A. Fontanari, N. N. Taleb, P. Cirillo


TAIL RISK RESEARCH PROGRAM

Limit distribution for α = 1.8, MLE vs Non−Parametric Limit distribution for α = 1.5, MLE vs Non−Parametric
120

100
MLE MLE

n = 100 n = 100
100

80
n = 500 n = 500
80

n = 1000 n = 1000

60
60

40
40

20
20
0

0
−0.10 −0.05 0.00 0.05 0.10 −0.10 −0.05 0.00 0.05 0.10
Deviation from mean value Deviation from mean value

(a) α = 1.8 (b) α = 1.5

Limit distribution for α = 1.2, MLE vs Non−Parametric


25

MLE

n = 100

n = 500
20

n = 1000
15
10
5
0

−0.3 −0.2 −0.1 0.0 0.1


Deviation from mean value

(c) α = 1.2

Fig. 2: Comparison between MLE and "non-parametric" asymptotic distribution for different values of tail index α.
Number of observation for MLE is fixed to n = 100. Note that despite all the distributions has mean zero the mode
of the "non-parametric" one is different form it.

case. In particular for right skewed continuous unimodal Gini "non-parametric" estimator G NP ( Xn ) and according
distributions the mode is lower than the mean. There- to our reasoning above is what we will call it the
fore, given that the distribution of the "non-parametric" correction term.
Gini index is right skewed we expect that the realized In particular, performing the type of correction de-
(i.e. observed) value of the Gini index will be usually scribed in the Equation 16 is equivalent to shift the
lower than the true value of the Gini placed at the mean distribution of G NP ( Xn ) in order to place its mode on
level. In particular we can quantify this difference (i.e. the true value of the Gini index in order to increase
the bias) by looking at the distance between the mode the probability of observing values produced by the
and the mean of its distribution and once this distance estimator closer to it.
is known we can adjust our estimate of the Gini index Ideally, we would like to measure this mode-mean
by adding it. distance on the exact distribution of the Gini index to
More formally we would like to derive a "corrected get the most accurate correction. However, the finite
non-parametric" estimator GC ( Xn ) such that: distribution is not always easily derivable and it requires
assumptions on the parametric structure of the data
generating process.
GC ( Xn ) = G NP ( Xn ) + ||m( G NP ( Xn )) − E( G NP ( Xn ))|| Therefore, we propose to use the limiting distribution
(16) for the "non-parametric" Gini obtained in Section II
where ||m( G NP ( Xn )) − E( G NP ( Xn ))|| is the distance be- to approximate the finite sample distribution and to
tween the mode and the mean of the distribution of the estimate on it the mode-mean distance to use in the

A. Fontanari, N. N. Taleb, P. Cirillo


TAIL RISK RESEARCH PROGRAM

(a) (b)

(c) (d)
PSn (| X |>c)
Fig. 3: Speed of convergence of probability ratio r (c, n) = PN (| X |>c)
as n grows, note that MLE observations are
fixed to n = 100

correction term. This procedure allows for more freedom


in the modeling assumptions and potentially decrease
the number of parameters to estimate since the limiting ||m( G NP ( Xn )) − E( G NP ( Xn ))|| ≈ ||m(ρ, γ(n)) + g − g||
distribution only depends on the tail index of the data = ||m(ρ, γ(n))||
ρ and possibly the mean µ, which however can be
assumed to be a function of the tail index itself as in where m(ρ, γ(n)) is the mode function of the α-stable
ρ
the Pareto case i.e. µ = ρ−1 . distribution in Equation 17.
This means that in order to obtain the correction term
In particular by exploiting the location-scale property the knowledge of the true Gini index is not necessary
of α-stable distributions and Equation 9 we approximate since m(ρ, γ(n)) does not depend on g.
the distribution of G NP ( Xn ) for finite sample by: We then proceed in computing the correction term
which will be so obtained:

G NP ( Xn ) ∼ S (ρ, 1, γ(n), g) (17) ζ (ρ, γ(n)) = arg max f ( x ) (18)


x

where f ( x ) is the numerical density3 of the associated


1 L0 ( n )
where γ(n) = ρ −1 µ is the scale parameter of the α-stable distribution in Equation 17 but centered in 0.
n ρ
Recalling that α-stable distributions are unimodal con-
limiting distribution.
tinuous distributions we conclude that: ζ (ρ, γ(n)) =
Recalling once again the location-scale property of arg maxx f ( x ) = m(ρ, γ(n)).
α-stable distributions we can reduce the approximated 3 Note also that for α-stable distributions the mode is not available in
version of the correction term in Equation 16 in the closed form, however it can be computed numerically by optimizing
following way: the numerical density [6].

A. Fontanari, N. N. Taleb, P. Cirillo


TAIL RISK RESEARCH PROGRAM

Corrected vs Original Estimator, data Tail index = 1.2 Corrected vs Original Estimator, data Tail index = 1.5
1.0

1.0
Corrected Estimator Corrected Estimator

Original Estimator Original Estimator


0.8

0.8
True Value True Value
Estimator Values

Estimator Values
0.6

0.6
0.4

0.4
0.2

0.2
0.0

0.0
0 500 1000 1500 2000 0 500 1000 1500 2000
Sample size Sample size

(a) α = 1.2 (b) α = 1.5

Corrected vs Original Estimator, data Tail index = 1.8


1.0

Corrected Estimator

Original Estimator
0.8

True Value
Estimator Values
0.6
0.4
0.2
0.0

0 500 1000 1500 2000


Sample size

(c) α = 1.8

Fig. 4: Comparison between corrected "non-parametric" estimator (in red) and usual "non-parametric" estimator
(black), it is clear how especially from small samples size the corrected one improves the quality of the estimation.

Hence our "corrected non-parametric" estimator will parametric" one. Indeed note that:
have the following form:

GC ( Xn ) = G NP ( Xn ) + ζ (ρ, γ(n)) (19) lim | Ḡ ( Xn )C − G NP ( Xn )|


n→∞

and asymptotic distribution: = lim | G NP ( Xn ) + ζ (ρ, γ(n)) − G NP ( Xn )


n→∞
C
G ( Xn ) ∼ S (ρ, 1, γ(n), g + ζ (ρ, γ(n))) (20) = lim |ζ (ρ, γ(n))| → 0
n→∞

where the mode (i.e. the correction term ζ (ρ, γ(n))) However, because of the correction, GC ( Xn ) will be-
cannot be expressed in closed form. have better in small samples. Note also that, from 20 the
Note that the correction term ζ (ρ, γ(n)) is a function of distribution of the corrected estimator have now mean
the tail index of the data ρ and is connected to the sample g + ζ (ρ, γ(n)) which converges to the true Gini g as
size n by the scale parameter γ(n) of the associated n → ∞ but its mode will be not placed in g. Therefore
limiting distribution . In particular it is important to note the estimator is placing most of its probability mass to
that it is decreasing in n, and that limn→∞ ζ (ρ, γ(n)) → 0. values closed to the true Gini value.
This happens because as n increases the distribution In general, the quality of this correction depends on
described in 17 becomes more and more centered around the distance between the exact distribution of G NP ( Xn )
its mean value pushing to zero the distance between and its α-stable limit, more the two are close to each other
the mode and the mean. This ensure the asymptotic better the approximation of the mode-mean distance of
equivalence of the "corrected" estimator and the "non- the finite sample Gini distribution with its asymptotic

A. Fontanari, N. N. Taleb, P. Cirillo


TAIL RISK RESEARCH PROGRAM

counterpart ζ (ρ, γ(n)) will be. R EFERENCES


Additionally, as it is clear from the notation that the [1] I. Eliazar, I.M. Sokolov, Measuring statistical evenness: A panoramic
correction term depends on the tail index of the data overview, Physica A 391 (2012) 1323-1353.
and possibly also on their mean. These parameters, if not [2] P. Embrechts, C. Kluppelberg, T. Mikosch, Modelling Extremal
Events for Insurance and Finance, Springer, (2003).
assumed to be known a priori must be estimated as well [3] Feller, Willliam. An introduction to probability theory and its applica-
either from the same dataset or through other calibration tions. Vol. 2. John Wiley and Sons, 2008.
procedure. Therefore the additional uncertainty due to [4] C. Kleiber, S.Kotz, Statistical Size Distributions in Economics and
Actuarial Sciences, Wiley (2003).
the estimation will reflect as well on the quality of the [5] Li, Deli and Rao, M Bhaskara and Tomkins, RJ, The law of the
correction. iterated logarithm and central limit theorem for L-statistics Journal of
We conclude the Section by showing the effect of this multivariate analysis 78 (2001) 2.
[6] Nolan, John P. Parameterizations and modes of stable distributions.
correction procedure through the following experiment. Statistics and Probability Letters 38.2 (1998): 187-195.
While we assumed that the data were coming from a [7] Samorodnitsky, Gennady, and Murad S. Taqqu. Stable non-Gaussian
Pareto distribution where the tail index ρ is known, random processes: stochastic models with infinite variance. Vol. 1. CRC
press, (1994).
such an assumption can be weakened without significant [8] Taleb, Nassim Nicholas and Douady, Raphael On the super-
effect. We thus simulate 1000 samples of increasing size, additivity and estimation biases of quantile contributions Physica A:
from n = 10 to n = 2000 and for each of the sample size Statistical Mechanics and its Applications 429, 252-260 (2015)
[9] Yingying Yang, Shuhe Hu, Tao Wu The tail probability of the product
we compute both the original "non-parametric" estimator of depended random variables from max-domains of attraction Statistics
G NP ( Xn ) and the corrected one GC ( Xn ). We repeat the and Probability Letters 81 (2011) 1876-1882
experiment for a different tail index ρ. [10] Yitzhaki, Shlomo and Schechtman, Edna, The Gini Methodology:
A primer on a statistical methodology Springer Science & Business
Figure 4 presents our results. It is clear that in all Media (2012).
the examples the corrected estimators performs equally
or better than the original one. In the case of a small
sample size n ≤ 500, the gain is quite remarkable. As
expected the difference between the estimators decreases
as a function of the sample size and the tail index reflect
the fact that the correction term is decreasing both in n
and the tail index ρ. This effect is due to the fact that
as the tail index of an α-stable approaches the value of
2 the skewness parameter β loses his influence on the
distribution, reducing the skewness and so the difference
between the mode and the mean. Ultimately when the
value of the tail index is equal to 2 we obtain the
symmetric Gaussian distribution and the two estimators
will coincide.

IV. C ONCLUSION
In this paper we addressed the issue of asymptotic
behavior of the Gini index estimator in presence of a
distribution with infinite variance, an issue that has been
curiously ignored by the literature.
The central mistake in the nonparametric methods
used is to believe that asymptotic consistency translate
into equivalent presamptotic properties.
We showed that a parametric approach provides bet-
ter asymptotic results thanks to maximum likelihood
asymptotic properties. In view of the above results we
strongly suggest that, if the collected data are suspected
to be fat tailed distributed, parametric methods should
be preferred to strict "non-parametric" estimation.
Finally, in the event a parametric approach cannot be
performed, we propose a simple correction mechanism
for the "non-parametric" estimator based on the distance
between the mode and the mean of its asymptotic dis-
tribution. We show though an experiment with Pareto
data how this correction improves the quality of the
estimations. However we suggest caution in its use
because of possible additional uncertainty deriving from
the estimation of the correction term.

A. Fontanari, N. N. Taleb, P. Cirillo

Anda mungkin juga menyukai