Anda di halaman 1dari 28

Non white sample covariance matrices.

S. P

ech

e ,
Universite Grenoble 1,
joint work with D. Feral, Uni. Bordeaux, O. Ledoit, Uni. Zurich
03/12/2009, Institut Henri Poincare
Plan
I. Eigenvalues/eigenvectors of sample covariance matrices: problem and motivations.
II. Eigenvalues: global behavior, extreme eigenvalues.
III. Eigenvectors: the white case and the non white case.
IV. Conclusion.
- 1 -
: : The problem
Model
We consider sample covariance matrices:
M
N
() =
1
p

1/2
XX

1/2
where
X is a N p random matrix s.t. the entries X
ij
are i.i.d. complex (or real) random
variables with distribution ,
_
xd(x) = 0,
_
[x[
2
d(x) = 1.
p = p(N) with p/N (0, ) as N ;
is a N N Hermitian deterministic (or random) matrix, > 0 with bounded
spectral radius. is independent of X.
What can be said about the spectrum (eigenvalues and eigenvectors)
as N ?
- 2 -
: : The problem
Motivations I.
Statistics Knowing M
N
() what can be said about ?
-if N is xed and p : M
N
() good estimator of ;
-in high dimension (genetics, nance, ...)?
Understand e.g. the behavior of PCA in such a setting.
0 0.5 1 1.5 2
0
2
4
6
8
10
12
x
F
(
x
)


=1000
=100
=10
Density of the eigenvalues of M
N
() when = Id.
Dispersion of the eigenvalues: M
N
() is NOT a good estimator of (smallest and
largest eigenvalues e.g.)
- 3 -
: : The problem
Motivations II.
Communication theoryCDMA: received signal r =

K
k=1
b
k
s
k
+w,
with K number of users, s
k
C
N
the signature
b
k
C, Eb
k
= 0, E[b
k
[
2
= p
k
transmitted signal,
and w C
N
a Gaussian white noise with i.i.d. ^(0,
2
) components.
One has to decode/estimate the signal b
k
. A measure of the performance of the
communication channel is the so-called SIR (Signal to Interference Ratio): linear
receiver x
1
= c

1
r
SIR =
[C

1
s
1
[
2
p
1
[c
1
[
2

2
+

i2
[c

1
s
i
[
2
p
i
.
=as N, K , K/N , the SIR depends on the eigenvalues AND the eigenvectors
of SDS

where S = [s
2
, . . . , s
K
] is the signature matrix (random) and D =
diag(p
2
, . . . , p
N
).
- 4 -
: :
Eigenvalues.
- 5 -
: : Eigenvalues
The eigenvalues I
We denote by
1

2

N
the eigenvalues of and suppose that

N
() :=
1
N
N

i=1

i
a.s.
H,
where H is a probability measure.
Let
1

2

N
be the eigenvalues of M
N
();
N
=
1
N
N

i=1

i
.
Theorem Marchenko-Pastur (67)
A.s. lim
N

N
=
MP
, where the Stieltjes transform of
MP
given by
z C, (z) > 0, m

(z) :=
_
1
z
d
MP
(),
satises m

(z) =
_
+

_
1
1

1
z m

(z)

z
_
1
dH().
- 6 -
: : Eigenvalues
The eigenvalues II
If = Id, one knows explicitly the density of the Marchenko-Pastur distribution
1,
d
MP
du
=

2u
_
(u
+
u)(u u

)1
[u

,u
+
](u)
,
with u

= (1
1

)
2
.
Valid for both complex and real random matrices.
For general H, the relationship between
MP
and H is not simple, determining H
from
MP
is not easy. El Karoui (2008) gives a consistent estimator (using convex
approximation).
Assume that H has been estimated, can we improve our knowledge of ?
- 7 -
: : Eigenvalues
The usual behavior of largest eigenvalues
Assume that = Id.
Theorem Johnstone (2001) Johansson (2000), El Karoui (2005): = ^(0, 1)
Soshnikov (2001) Peche (2007) : non Gaussian symmetric distribution with sub-Gaussian
tails
C > 0, k > 0,
_
[x[
2k
d(x) (Ck)
k
.

1
largest value of M
N
(Id) = M
N
, u
N
+
= (1 +
1

N
)
2
.
lim
N
P
_
N
2/3

1/2
N
_
u
N
+
_
2/3
(
1
(M
N
()) u
N
+
) x
_
= F
TW
2(1)
(x), Tracy Widom distribution.
-6 -4 -2 2 4 6
0.1
0.2
0.3
0.4
- 8 -
: : Eigenvalues
A slight perturbation of the true covariance
Let = diag(
1
,
2
, . . . ,
r
, 1, . . . , 1),
i

i+1
> 1, i r 1, r independent of N.
is a nite rank perturbation of the identity matrix: H =
1
.
What is the impact of the
i
s on the spectrum?
The global behavior of the spectrum is unchanged but the largest eigenvalues are impacted.
Studied by : Baik-Ben Arous-Peche (2005) complex ^(0, 1); Bai-Yao (2008) and
Feral-Peche (2008) for more general ensembles.
Baik-Silverstein (2006): a.s. limit of the largest and smallest eigenvalues for very general
ensembles.
El Karoui (2007): X Gaussian nite rank perturbation of a deterministic
0
,= I
N
.
- 9 -
: : Eigenvalues
Phase Transition
We set
w
c
:= 1 +
1

, (
1
) =
1
_
1 +

1

1
1
_
, (
1
) =
1
_
1
1
/(
1
1)
2
.
(F.-P.) If
1
< w
c
, and is symmetric with sub-gaussian tails
then lim
N,p
P
_
N
2/3

1/2
N
_
u
N
+
_
2/3
(
1
(M
N
()) u
N
+
) x
_
= F
TW
2(1)
(x) Tracy-Widom
distribution (complex or real). Same as if = I
N
.
(Bai-Yao) If
1
= . . . =
k
> w
c
and
k+1
<
1
and E[X
ij
[
4
< then
lim
N,p
P
_

N
(
1
)
(
1
(M
N
()) (
1
)) x
_
= G
k
(x),
where G
k
is the distribution of the largest eigenvalue of the GUE H = (H
ij
)
k
i,j=1
with
i.i.d. complex ^(0, ()) entries.
- 10 -
: : Eigenvalues
Remarks
Spikes in the true covariance can be detected if they are large enough.
Actually the true conjecture assumes that rst point should hold true provided that
E[X
ij
[
4
< only.
If
1
= . . . =
k
= w
c
and
k+1
< w
c
the limiting distribution of
1
(M
N
()) is also
determined (in particular
1
(M
N
()) u
N
+
).
The asymptotic uctuations of the smallest eigenvalues is expected to exhibit the same
behavior (Baik-Silverstein (2006)).
The proof of these results relies on the explicit computation of the distribution of the
largest eigenvalues (Gaussian case). The extension to other ensembles is based on the
moment approach due to Soshnikov (Feral-Peche) and via the resolvent and Central
Limit Theorem (Bai-Yao, Baik-Silverstein).
No result for non Gaussian if H ,=
1
.
- 11 -
: :
Eigenvectors: the white case.
- 12 -
: : The white case
Gaussian sample
Suppose that = Id and X
ij
i.i.d. ^(0, 1) complex or real.
M
N
= M
N
(Id) is a so-called white Wishart matrix.
Let (U, D) be a diagonalization of M
N
: M
N
= UDU

with U U(N) and D a real


diagonal matrix.
U is Haar distributed.
Proof: Gram-Schmidt+ rotationnal invariance of the Gaussian distribution.
Conjecture: if = Id and if X has non-Gaussian entries with E[X
ij
[
4
< , the matrix
of eigenvectors of M
N
shall asymptotically be Haar distributed.
Idea: neither direction is preferred.
Question: how to dene asymptotically Haar distributed?
- 13 -
: : The white case
Non Gaussian matrices I.
Silversteins idea (95): U is asymptotically Haar distributed if, given an arbitrary vector
x S
N1
= x R
N
, [x[ = 1, y = Ux is asymptotically uniformly distributed on the
unit sphere. Or setting
Y
N
(t) :=
_
N
2
[Nt]

i=1
([y
i
[
2
1/N),
Y
N
(t) shall converge in distribution to a Brownian bridge if y is uniformly distributed
(y = Z/[Z[
2
with Z Gaussian).
Consider instead X
N
(t) = Y
N
(F
N
(t)) =
_
N
2
_
F
N
1
(t) F
N
(t)
_
with F
N
(t) =
1
N
N

i=1
1

i
t
cumulative distribution function (c.d.f.) of the spectral measure of M
N
() and
F
N
1
(t) =
1
N
N

i=1
[y
i
[
2
1

i
t
, with y = U

x
also a c.d.f. (but combining the eigenvectors).
- 14 -
: : The white case
Non Gaussian matrices II.
Let
G
N
(t) =

N
_
F
N
1
(t) F
N

(t)
_
where F
N

is the c.d.f. of
MP
when p/N and H
N
() spectral measure of ).
Here G
N
X
N
and should be close to B(F(t)) if B is a Brownian bridge.
Let also g be analytic on [u

, u
+
].
Theorem Bai-Miao-Pan (2007)
Assume also that E[X
ij
[
4
= 2 and x

( zI)
1
x
_
1
z
dH(). Then as N ,
_
g(x)dG
N
(x) a Gaussian random variable (centered and with known variance).
Remark: extension to non-white matrices but with the additionnal assumption on
x

( zI)
1
x.
- 15 -
: : The white case
A few explanations
the niteness of E[X
ij
[
4
ensures that the largest eigenvalues have the same
asymptotic behavior as for a Gaussian sample (conjecture). If this moment is not
nite, the eigenvectors associated to the largest eigenvalues are actually determined by
the largest entries of X (Biroli-Bouchaud-Potters (2007), Aunger-Ben Arous-Peche
(2009)).
The fact that the fourth moment needs to equal that of a Gaussian random variable
was proved by Silverstein (81).
One needs a certain proximity with the Gaussian distribution!
the assumption on x ensures that the projection of x on the eigenvectors of M
N
()
does not see the lack of rotationnal invariance.
It also ensures that F
N
1
(t) F(t) if F is the c.d.f of the Marchenko-Pastur distribution

MP
.
- 16 -
: :
Eigenvectors: the non-white case.
- 17 -
: : The non white case
Preliminary remarks
Even for a Gaussian sample,the distribution of the eigenvectors is unknown if ,= Id.
It is NOT expected that the matrix of eigenvectors is Haar distributed.
Only known result due to D. Paul (2006):
= diag(
1
, 1, . . . , 1) with
1
> 1 + 1/

.
Let u
1
(resp. e
1
) be the normalized eigenvector of M
N
() (resp. of ) associated to

1
(resp.
1
):
lim
N
[ < u
1
, e
1
> [ =

1 /(
1
1)
2
1 +/(
1
1)
a.s .
Idea: perturbation of the eigenvector associated to
1
(the largest eigenvalue of ) by
a random matrix.
- 18 -
: : The non white case
Another approach (Ledoit-Peche (2009))
The idea is to study functionals:

N
(g) :=
1
N
Tr
_
g()(M
N
() zI)
1
_
,
with z C
+
= z C, z > 0,
g is a regular function (bounded with a nite number of discontinuities or analytic),
g() = V diag(g(
1
), . . . , g(
N
))V

if V is the matrix of eigenvectors of .


Aim : understand how the eigenvectors of M
N
() project onto those of .
Remarks:
-if g 1, then is just the Stieltjes transform of
N
.
-If Id useless. We thus concentrate on the case where H ,=
1
.
- 19 -
: : The non white case
A theoretical result
Assume that the support of H is included in [a
1
, a
2
] with a
1
> 0 and
E[X
ij
[
12
< independent of N and p.
Theorem: Ledoit-Peche (2009)
Let g be a bounded function with a nite number of discontinuities on [a
1
, a
2
]. Then

N
(g) (g) a.s. as N where
z C
+
,
g
(z) =
_
+

_
1
1

1
zm

(z)

z
_
1
g()dH().
Remark: the same kernel
_

_
1
1

1
zm

(z)

z
_
1
arises as in the Marchenko-Pastur theorem.
- 20 -
: : The non white case
Corrolary 1.
Question: How much do the eigenvectors of M
N
() deviate from those of ?
We set g = 1
(,)
and
N
(, ) =
1
N
N

i=1
N

j=1
[u

i
v
j
[
2
1
[
i
,+)
() 1
[
j
,+)
().
Let v
j
be the normalized eigenvector of associated to
j
. The average of N[u

i
v
j
[
2
bearing on the eigenvectors associated to sample eigenvalues (resp. eigenvalues of the
true covariance) in the interval [, ] (resp. [, ]) is:

N
(, )
N
(, )
N
(, ) +
N
(, )
[F
N
() F
N
()] [H
N
() H
N
()]
,
if F
N
(resp. H
N
) is the c.d.f. of M
N
() (resp. ).
If one can choose , and , arbitrarily close, then one gets precise information!
- 21 -
: : The non white case
Corrolary 1.
Theorem:
N
(, )
a.s.
(, ) at any point of continuity of . And (, )
R
2
, (, ) =
_

(l, t) dH(t) d
MP
(l), where
(l, t) =
_

1
lt
(at l)
2
+b
2
t
2
, 1
1


l m

(l)

=: a +ib, if l > 0
1
(1 )[1 + m

(0) t]
if l = 0 and < 1
0 otherwise
Here m

(0) = lim
z0
m

(z) and m

is the limiting Stieltjes transform of X

X/N.
Thus in principle one can obtain precise information on the eigenvectors (but this assumes
that one knows the c.d.f. of H
N
).
- 22 -
: : The non white case
Corrolary 2.
Question: how does M
N
() dier from and how can we improve the initial estimator
of given by M
N
()?
We get a better estimator by choosing g(x) = x.
One seeks an estimator of of the kind UD
N
U

, D
N
diagonal i.e. an estimator which
has the same eigenvectors as M
N
().
The best estimator (Frobenius norm) is

D
N
= diag(

d
1
, . . . ,

d
N
) where i = 1, . . . , N

d
i
= u

i

N
u
i
.
Can we say a few things on the

d
i
s:
yes asymptotically by choosing g(x) = x.
- 23 -
: : The non white case
Corrolary 2.
We set
x R,
N
(x) =
1
N
N

i=1

d
i
1
[
i
,+)
(x) =
1
N
N

i=1
u

N
u
i
1
[
i
,+)
(x).
Then one has
i = 1, . . . , N

d
i
= lim
0
+

N
(
i
+)
N
(
i
)
F
N
(
i
+) F
N
(
i
)
.
Theorem: For all x ,= 0,
N
(x) (x). Moreover (x) =
_
x

() dF(), with
R, () =
_

[
1
1

1
m

()
[
2
if > 0

(1 ) m

(0)
if = 0 and < 1
0 otherwise.
- 24 -
: : The non white case
An improved estimator
We consider the improved estimator

S
N
:= UD

, where
D

i
=
i
/[1
1

i
m

(
i
)[
2
.
We ran 10,000 simulations with
N
() = 0.2
1
+ 0.4
3
+ 0.4
10
, = 2 and increasing
the number of variables p from 5 to 100. For each simulation, we calculate the Percentage
Relative Improvement in Average Loss (PRIAL):
if M is an estimator of
N
and if [A[
2
F
= TrAA

(Frobenius norm),
PRIAL(M) = 100
_

_
1
E
_
_
_M U
N

D
N
U

N
_
_
_
2
F
E
_
_
_M
N
() U
N

D
N
U

N
_
_
_
2
F
_

_
.
- 25 -
: : The non white case
Simulations
Even for small sizes, p = 40, the PRIAL is 95%.
10 20 40 60 80 100 120 140 160 180 200
86%
88%
90%
92%
94%
96%
98%
100%
R
e
l
a
t
i
v
e

I
m
p
r
o
v
e
m
e
n
t

i
n

A
v
e
r
a
g
e

L
o
s
s
Sample Size
=2
- 26 -
: : Conclusion
Remarks and conclusion
Eigenvalues
-Using the techniques introduced by Tao-Vu (2009), the universality results can surely be
improved for largest and smallest eigenvalues (condition number).
Eigenvectors
-
N
(g) is a new tool that allows to study the average behavior of the eigenvectors: for
instance we cannot recover D. Pauls result for the eigenvector associated to the largest
eigenvalue separating from the bulk.
-in general we cannot say anything on the eigenvectors associated to extreme eigenvalues:
average behavior of the eigenvectors.
-for the moment theoretical results only: one has to dene rst appropriate estimators for
m

, H
N
. . .
- 27 -

Anda mungkin juga menyukai