A New Strong Invariance Principle For Sums of Independent Random Vectors

A NEW STRONG INVARIANCE PRINCIPLE FOR SUMS OF
INDEPENDENT RANDOM VECTORS

UWE EINMAHL
Abstract. We provide a strong invariance principle for sums of independent,
identically distributed random vectors which need not have nite second abso-
lute moments. Various applications are indicated. In particular, we show how
one can re-obtain some recent LIL type results from this invariance principle.
1. Introduction
Let X, X
1
, X
2
, . . . be independent, identically distributed (i.i.d.) random vectors
in R
d
and set S
n
=
n
i=1
X
i
, n 1, S
0
:= 0. If the random vectors have mean zero
and a nite covariance matrix it follows from the multidimensional central limit
theorem that
(1.1) S
n
/
n
d
Y normal(0, ),
where
d
stands for convergence in distribution.
There is also a much more general weak convergence result available, namely
Donskers Theorem. To formulate this result we rst have to recall the denition
of the partial sum process sequence S
(n)
: C
d
[0, 1]:
S
(n)
(t) =
_
S
k
if t = k/n, 0 k n,
linearly interpolated elsewhere.
Let W(t), t 0 be a standard d-dimensional Brownian motion and denote the
Euclidean norm on R
d
by [ [. Then the d-dimensional version of Donskers theorem
can be formulated as follows,
Theorem 1.1 (Donsker). Let X, X
1
, X
2
, . . . be i.i.d. random vectors such that
E[X[
2
< and EX = 0. Let be the positive denite, symmetric matrix satisfying
2
= cov(X) =: . Then we have,
S
(n)
/
n
d
W,
where W(t), 0 t 1 is the restriction of W to [0, 1].
In order to prove this result one can use a coupling argument, that is one can
construct the random variables X
1
, X
2
, . . . and a d-dimensional Brownian motion
W(t) : t 0 on a suitable p-space so that one has
(1.2) |S
(n)
W
(n)
|/
n
P
0,
where W
(n)
)(t) = W(nt), 0 t 1,
P
stands for convergence in probability, and
| | is the sup-norm on C
d
[0, 1].
2000 Mathematics Subject Classication. Primary 60F17, 60F15.
1
2 U. EINMAHL
Relation (1.2) clearly implies Donskers theorem since we have W
(n)
/
n
d
= W.
It is natural now to ask whether one can replace convergence in probability
by almost sure convergence. This is not only a formal improvement of the above
coupling result, but it also makes it possible to infer almost sure convergence results
for partial sum processes from the corresponding results for Brownian motion. This
was pointed out in the classical paper by Strassen [15] who obtained a functional
law of the iterated logarithm for general partial sum processes along these lines. So
one can pose the following
Question 1.2. Given a monotone sequence c
n
, when is a construction possible
such that with probability one,
|S
(n)
W
(n)
| = O(c
n
) as n ?
If such a construction is possible, one speaks of a strong invariance principle with
rate O(c
n
).
We rst look at the 1-dimensional case. (Then is simply the standard devia-
tion of X.) Though it was already known at an early stage that no better con-
vergence rate than O(log n) is feasible unless of course the variables X, X
1
, X
2
, . . .
are normally distributed, it had been an open question for a long time whether a
strong invariance principle with such a rate is actually attainable. Very surprisingly,
Komlos, Major and Tusnady [10] eventually were able to show that such a construc-
tion is possible in dimension 1 if and only if the moment generating function of X is
nite and if X has mean zero. More generally, they proved that a strong invariance
principle with rate O(c
n
) is possible for any sequence c
n
of positive real numbers
such that c
n
/n
is decreasing for some < 1/3 and c

n
/ log n is non-decreasing, if
and only if
(1.3)
n=1
P[X[ c
n
< and EX = 0.
Major [11] obtained analogous results for sequences c
n
satisfying c
n
/n
is non-
increasing for some < 1/2 and c
n
/n
1/3
is non-decreasing. This includes especially
the sequences c
n
= n
, 1/3 < 1/2. For sequences c

n
in this range one can also
get a strong invariance principle with rate o(c
n
) rather than O(c
n
). Moreover, it
is well known that it is impossible to obtain an analogous result for the sequence
c
n
=

n. Note that in this case condition (1.3) is equivalent with the classical
condition EX
2
< and EX = 0. In this case the best possible strong invariance
principle is of order o(
nlog log n). The remaining gap, namely the determination

of the optimal convergence rates for big sequences c
n
of order o(
n) where no
< 1/2 exists such that c
n
/n
is non-decreasing, was closed by Einmahl [3]. (Note

that this includes all sequences of the form

n/h(n) where h : [1, []0, [ is
slowly varying at innity and h(x) as x .) We next mention the work of
Major [12] who showed that under the classical condition EX
2
< and EX = 0
a strong approximation with rate o(
n) is possible if one replaces the Brownian

motion by a slightly dierent Gaussian process.
Following up the ideas from [12, 3], Einmahl and Mason [9] nally obtained the
following strong invariance principle.
STRONG INVARIANCE PRINCIPLE 3
Theorem 1.3. Let X, X
1
, X
2
, . . . be i.i.d. random variables satisfying condition
(1.3) for a non-decreasing sequence c
n
of positive real numbers such that c
n
/n
1/3
is eventually non-decreasing and c
n
/
n is eventually non-increasing. If the under-

lying p-space is rich enough, one can construct a 1-dimensional Brownian motion
such that with probability one,
|S
(n)

n
W
(n)
| = o(c
n
) as n ,
where
2
n
= E
_
X
2
I[X[ c
n
.
Using this result, one can easily determine the optimal convergence rate for the
strong invariance principle in its classical formulation for all sequences c
n
in this
range. (See the subsequent Corollary 2.2 for more details.) Note that Theorem 1.3
only applies if EX
2
< . This follows from the fact that c
n
= O(
n) under the
above assumptions and the second moment is nite if condition (1.3) holds for such
a sequence. Very recently, Einmahl [6] showed that Theorem 1.3 has also a version
in the innite variance case and he used this one to prove new functional LIL type
results in this setting.
We return to the multidimensional case. Most of the results for (1-dimensional)
random variables have been extended to random vectors by now. We mention the
work of Philipp [13] who extended Strassens strong invariance principle with rate
o(
nlog log n) to the d-dimensional case (actually also to Banach spaced valued
random elements) and that of Berger [1] who generalized Majors result from [11]
to the d-dimensional case. This led to the best possible rate of o(n
1/3
) in the multi-
dimensional invariance principle at that time. This rate was further improved in [3]
to o(n
), for > 1/4. The next major step was taken by Einmahl [5] who was able
to extend all the results of Komlos, Major and Tusnady [10] up to order O((log n)
2
)
to the multivariate case. Moreover, it was shown in this article that under an extra
smoothness assumption on the distribution of X strong approximations with even
better rates, especially with rate O(log n) are possible in higher dimensions as well.
Zaitsev [16] nally showed that such constructions are also possible for random
vectors which do not satisfy the extra smoothness condition so that we now know
that all the results of [10] have versions in higher dimensions.
Given all this work, one has now a fairly complete picture for the strong invari-
ance principle for sums of i.i.d. random vectors. In the present paper we shall
close one of the remaining gaps. We shall show that it is also possible to extend
Theorem 1.3 to the d-dimensional case. Actually, this is not too dicult if one
proves it as the original result is stated above, but as we have indicated, there is
also a version of this result in the innite variance case. The purpose of this paper
is to establish a general multidimensional version of Theorem 1.3 which also applies
if E[X[
2
= . In this case the problem becomes more delicate since one has to use
truncation arguments which lead to random vectors with possibly very irregular co-
variance matrices. Most of the existing strong approximation techniques for sums
of independent random vectors require some conditions on the ratio of the largest
and smallest eigenvalues of the covariance matrices (see, for instance, [4, 16]) and,
consequently, they cannot be applied in this case. Here a new strong approximation
method which is due to Sakhanenko [14] will come in handy.
4 U. EINMAHL
2. The main result and some corollaries.
We rst state our new strong invariance principle where we only assume that
E[X[ < . (This follows from the subsequent assumption (2.1) since all sequences
c
n
considered are of order O(n). If condition (2.1) is satised for such a sequence,
we have E[X[ < .)
Theorem 2.1. Let X, X
1
, X
2
, . . . be i.i.d. mean zero random vectors in R
d
. As-
sume that
(2.1)
n=1
P[X[ c
n
< ,
where c
n
is a non-decreasing sequence of positive real numbers such that
(2.2) ]1/3, 1[: c
n
/n
is eventually non-decreasing, and

(2.3) > 0 m
1 : c
n
/c
m
(1 +)(n/m), m
m < n.
If the underlying p-space is rich enough, one can construct a d-dimensional standard
Brownian motion W(t), t 0 such that with probability 1,
(2.4) |S
(n)

n
W
(n)
| = o(c
n
) as n ,
where
n
is the sequence of positive semidenite, symmetric matrices determined
by
(2.5)
2
n
=
_
E
_
X
(i)
X
(j)
I[X[ c
n
__
1i,jd
.
As a rst application of our above strong invariance principle, we show how one
can re-obtain the main results of [3] from it. Here we are assuming that E[X[
2
<
so that cov(X) (= covariance matrix of X) exists.
Corollary 2.2. Let X, X
1
, X
2
d
and
assume that E[X[
2
< . Let be the positive semidenite, symmetric matrix
satisfying
2
= cov(X). Assume that condition (2.1) is satised for a sequence c
n
such that c
n
/
n is eventually non-increasing and (2.2) holds. Then a construction

is possible such that we have with probability one:
(2.6) |S
(n)
W
(n)
| = o(c
n
c
2
n
_
log log n/n).
Furthermore, we have,
(2.7) |S
(n)
W
(n)
|/c
n
P
0
Remark 2.3. We get the following results due to [3] from (2.6):
(1) If c
n
satises additionally c
n
= O(
_
n/ log log n), then we also have the
almost sure rate o(c
n
) for the standard approximation by W
(n)
.
(2) Set
n
=
c
n
n/ log log n
. If liminf
n
n
> 0, we get the rate o(c
n
n
), where
the extra factor
n
is sharp (see [3]).
Note also that (2.7) (with c
n
=

n) immediately implies Donskers theorem.
To formulate the following corollary we need somewhat more notation: For any
(d,d)-matrix A we set |A| := sup[A v[ : [v[ 1. We recall that |A|
2
is equal
to the largest eigenvalue of the symmetric matrix A
t
A. This is due to the well
known fact that the largest eigenvalue (C) of a positive semidenite, symmetric
(d,d)-matrix C satises (C) = supv, Cv) : [v[ 1, where , ) is the standard
scalar product on R
d
. Furthermore, let for any t 0,
H(t) := supE[v, X)
2
I[X[ t] : [v[ 1.
If we look at the matrices
n
we see that |
n
|
2
= H(c
n
). Similarly as in [8] we set
for any sequence c
n
as in Theorem 2.1,
0
= sup
_
0 :
n=1
n
1
exp
_

2
c
2
n
2nH(c
n
)
_
=
_
.
Using Theorem 2.1 we now can give a very short proof of Theorem 3 [8] in the
nite-dimensional case. This result is the basis for all the LIL type results in [7, 8]
and, consequently, we can prove all these results in the nite-dimensional case via
Theorem 2.1.
Corollary 2.4. Let X, X
1
, X
2
d
. As-
sume that condition (2.1) holds for a non-decreasing sequence c
n
of positive real
numbers such that c
n
/
n is eventually non-decreasing and condition (2.3) is sat-

ised. Then we have with probability one,
(2.8) limsup
n
[S
n
[
c
n
=
0
.
We nally show how the general law of the iterated logarithm (see Corollary
2.5), follows directly from Corollary 2.4. (In [7, 8] we had obtained this result as a
corollary to another more general result, the law of a very slowly varying function
which also follows from Corollary 2.4, but requires a more delicate proof.)
As usual, we set Lt = log(t e) and LLt = L(Lt), t 0.
Corollary 2.5 (General LIL). Let X, X
1
, X
2
, . . . be i.i.d. random vectors in R
d
.
Let p 1 and 0. Then the following are equivalent:
(a) We have with probability one, limsup
n
[S
n
[/
_
2n(LLn)
p
=
(b) limsup
t
H(t)/(LLt)
p1
=
2
and E[X] = 0.
Note that we do not explicitely assume that
n=1
P[X[
_
n(LLn)
p
< ,
or, equivalently, E[[X[
2
/(LL[X[)
p
] < . In the nite dimensional case this condi-
tion follows from (b). This was already pointed out in the 1-dimensional case (see,
for instance, [6]), and we shall give here a detailed proof of this fact in arbitrary
nite dimension. We mention that this implication does not hold in the innite
dimensional setting so that one has an extra condition in this case (see [8]).
The remaining part of this paper is organized as follows: The proof of Theorem 2.1
will be given in Section 3 and then we shall show in Section 4 how the corollaries
can be obtained.
3. Proof of the strong invariance principle.
3.1. Some auxiliary results. Our proof is based on the following strong approx-
imation result which follows from the work of Sakhanenko [14]. (See his Corollary
3.2.)
6 U. EINMAHL
Theorem 3.1. Let X
j
, 1 j n be independent mean zero random vectors on R
d
such that E[X
j
[
3
< , 1 j n. Let x > 0 be xed. If the underlying p-space is
rich enough, one can construct independent normal(0, I)-distributed random vectors
Y
j
, 1 j n such that
(3.1) P
_
_
_
max
1kn
j=1
(X
j
A
j
Y
j
)
x
_
_
_
C
d
j=1
E[X
j
[
3
/x
3
,
where A
j
is the positive semidenite, symmetric matrix satisfying A
2
j
= cov(X
j
), 1
j n and C is a positive constant depending on d only.
Proof. From Corollary 3.2. in [14] we get independent random vectors Y
1
, . . . , Y
n
so that the probability in (3.1) is
C
x
3
n
j=1
(E[X
j
[
3
+E[Y
j
[
3
),
where Y
j
:= A
j
Y
j
, 1 j n and C
is a positive constant depending on d only.

Writing Y
j
= (Y
j,1
, . . . , Y
j,d
)
t
and using the inequality [v[
3
d
1/2
d
i=1
[v
i
[
3
, v R
d
(which follows from the Holder inequality), we get for 1 j n,
E[Y
j
[
3
d
1/2
d
i=1
E[Y
j,i
[
3
= d
1/2
E[Z[
3
d
i=1
(E[Y
j,i
[
2
)
3/2
= d
1/2
E[Z[
3
d
i=1
(E[X
j,i
[
2
)
3/2
d
1/2
E[Z[
3
d
i=1
E[X
j,i
[
3
d
3/2
E[Z[
3
E[X
j
[
3
,
where Z : R is standard normal. Thus we have,
(3.2) E[Y
j
[
3
C
E[X
j
[
3
, 1 j n,
where C
is a positive constant depending on d only and Theorem 3.1 has been

proved.
Corollary 3.2. Let X
n
, n 1 be a sequence of independent mean zero random
vectors on R
d
such that we have for a non-decreasing sequence c
n
of positive real
numbers which converges to innity,
n=1
E[X
n
[
3
/c
3
n
< .
If the underlying p-space is rich enough, one can construct a sequence of independent
normal(0, I)-distributed random vectors such that with probability one,
n
j=1
(X
j
A
j
Y
j
) = o(c
n
) as n ,
where A
n
is the sequence of positive semidenite, symmetric matrices satifying
A
2
n
= cov(X
n
), n 1.
Proof. We employ a similar argument as on p.95, [4]. It is easy to see that one can
nd another non-decreasing sequence c
n
so that c
n
, c
n
= o(c
n
) as n
and still
(3.3)
n=1
E[X
n
[
3
/ c
3
n
< .
Set
m
0
:= 1, m
n
:= mink : c
k
2 c
m
n1
, n 1.
By the denition of the subsequence m
n
we have
c
m
n
1
/ c
m
n1
2 c
m
n
/ c
m
n1
, n 1.
Theorem 3.1 enables us to dene independent normal(0, I)-distributed random vec-
tors Y
j
: m
n1
j < m
n
in terms of the random vectors X
j
: m
n1
j < m
n
(for any n 1) such that

P
_
_
_
max
m
n1
k<m
n
j=m
n1
(X
j
A
j
Y
j
)
c
m
n1
_
_
_
C
m
n
1
j=m
n1
E[X
j
[
3
/ c
3
m
n1
8C
m
n
1
j=m
n1
E[X
j
[
3
/ c
3
j
.
(3.4)
The resulting sequence Y
n
: n 1 consists of independent random vectors since
the blocks X
j
: m
n1
j < m
n
are independent.
Recalling (3.3) and using the Borel-Cantelli Lemma we see that we have with prob-
ability one,
max
m
n1
k<m
n
j=m
n1
(X
j
A
j
Y
j
)
c
m
n1
eventually
Employing the triangular inequality and adding up the above inequalities we get
with probability one,
j=1
(X
j
() A
j
Y
j
())
K() +
n1
i=1
c
m
i
K() + 2 c
m
n1
, m
n1
k < m
n
and we see that our corollary holds.
The following lemma collects some more or less known facts.
Lemma 3.3. Let X : R
d
be a random vector such that (2.1) holds for a
non-decreasing sequence c
n
of positive real numbers.
(a) If c
n
satises condition (2.2), we have:
n=1
E[[X[
3
I[X[ c
n
]/c
3
n
< .
(b) If c
n
satises condition (2.3), we have
E[[X[I[X[ > c
n
] = o(c
n
/n) as n .
8 U. EINMAHL
(c) If E[X] = 0, and both conditions (2.2), (2.3) are satised, we have:
n
k=1
E[XI[X[ c
k
] = o(c
n
) as n .
Proof. First observe that setting p
j
= Pc
j1
< [X[ c
j
, j 1, where c
0
= 0, we
have by our assumption (2.1),
(3.5)
j=1
jp
j
<
To prove (a) we note that we have on account of (2.2):
c
j
/j
c
n
/n
for n j j
0
(say),
which in turn implies that c
j
/j
K
1
c
n
/n
, 1 j n, n 1, where K
1
> 0 is a
suitable constant. It follows that
(3.6) c
j
/c
n
K
1
(j/n)
, 1 j n, n 1.
We now see that
n=1
E[[X[
3
I[X[ c
n
]/c
3
n

n=1
n
j=1
c
3
j
p
j
/c
3
n
=
j=1
_
_
n=j
(c
j
/c
n
)
3
_
_
p
j
K
3
1
j=1
_
_
n=j
n
3
_
_
j
3
p
j
K
2
j=1
jp
j
< .
Here we have used the fact that

n=j
n
3
= O(j
13
) as j which follows
easily by comparing this series with the integral
_
j
x
3
dx < . (Recall that
> 1/3.)
To prove (b) we observe that
(3.7) nE[[X[I[X[ > c
n
]/c
n

j=n+1
n(c
j
/c
n
)p
j
K
3
j=n+1
jp
j
,
where we have used the fact that c
j
/c
n
K
3
j/n, j n for some positive constant
K
3
. (This easily follows from condition (2.3).) Recalling (3.5) we readily obtain
(b).
We turn to the proof of (c). Let > 0 be xed and choose an m
1 so that
mE[[X[I[X[ > c
m
]/c
m
for m m
, which is possible due to (b).

Since EX = 0, we trivially have E[XI[X[ c
m
] = E[XI[X[ > c
m
] and we
can conclude that
k=1
E[XI[X[ c
k
]/c
n
E[X[/c
n
+
n
k=m
+1
c
k
/(kc
n
).
Due to (3.6) we further have,
n
k=m
+1
c
k
/(kc
n
) K
1
n
k=m
+1
k
1
/n
K
1
/.
Consequently, we have,
limsup
n
k=1
E[XI[X[ c
k
]/c
n
K
1
/.
This implies (c) since we can choose arbitrarily small.
The next lemma gives us more information on the matrices
n
.
Lemma 3.4. Let the sequence
n
be dened as in Theorem 2.1. Then we have for
n m 1,
(a)
n

m
is positive semidenite.
(b) |
n

m
|
2
E[[X[
2
Ic
m
< [X[ c
n
].
Proof. By denition we have
v, (
2
n

2
m
)v) = E[X, v)
2
Ic
m
< [X[ c
n
] 0, v R
d
,
which clearly shows that
2
n
2
m
is positive semidenite. This in turn implies that
this also holds for
n
m
since f(t) =

t, t 0 is an operator monotone function
(see Proposition V.1.8, [2]). We thus have proved (a).
Furthermore, we can conclude from the above formula that
|
2
n

2
m
| E[[X[
2
Ic
m
< [X[ c
n
].
Here we have used the fact that if A is a positive semidenite, symmetric (d,d)-
matrix, we have |A| = supv, Av) : [v[ 1.
Finally, noting that by Theorem X.1.1, [2]
|
n

m
|
2
|
2
n

2
m
|,
we see that (b) also holds.
3.2. Conclusion of the proof.
(i) Set X
n
= X
n
I[X
n
[ c
n
, X
n
= X
n
EX
n
, n 1. Then we clearly have by
assumption (2.1),
(3.8)
n=1
PX
n
,= X
n
< ,
which via the Borel-Cantelli lemma trivially implies that with probability one,
n
j=1
(X
j
X
j
) = o(c
n
) as n . Recalling Lemma 3.3(c), we see that with
probability one,
(3.9) S
n

n
j=1
X
j
= o(c
n
) as n .
(ii) Noting that E[X
n
[
3
8E[[X[
3
I[X[ c
n
], n 1, we get from Lemma 3.3(a)
that
(3.10)
n=1
E[X
n
[
3
/c
3
n
< .
10 U. EINMAHL
In view of Corollary 3.2 we now can nd a sequence Y
n
of independent normal(0, I)-
distributed random vectors such that with probability one,
(3.11)
n
j=1
(X
j
A
j
Y
j
) = o(c
n
) as n ,
where A
n
are the positive semidenite symmetric matrices satisfying A
2
n
= cov(X
n
) =
cov(X
n
).
(iii) We next claim that with probability one,
(3.12)
n
j=1
(
j
A
j
) Y
j
= o(c
n
) as n .
In order to prove that it is sucient to show that
(3.13)
j=1
E[[(
j
A
j
) Y
j
[
2
]
c
2
j
< .
To see that we argue as follows:
Using a standard 1-dimensional result on random series componentwise, we then
can conclude that the random series
j=1
(
j
A
j
) Y
j
/c
j
is convergent in R
d
with
probability one, which in turn via Kroneckers lemma (applied componentwise)
implies (3.12).
Next observe that E[[(
j
A
j
) Y
j
[
2
] d|
j
A
j
|
2
, j 1 so that (3.13) follows
once we have shown that
(3.14)
j=1
|
j
A
j
|
2
c
2
j
< .
From the denition of these matrices we immediately see that for any v R
d
,
v, (
2
j
A
2
j
)v) = (E[X, v)I[X[ c
j
])
2
which on account of E[X, v)] = 0 implies,
|
2
j
A
2
j
| = sup
|v|1
(E[X, v)I[X[ > c
j
])
2
E[[X[I[X[ > c
j
]
2
.
Using once more Theorem X.1.1. in [2] and recalling Lemma 3.3(b), we nd that
|
j
A
j
|
2
|
2
j
A
2
j
|
j
c
2
j
/j
2
, j 1
where
j
0 as j . This trivially implies (3.14).
(iv) Combining relations (3.9), (3.11) and (3.12), we see that with probability one,
S
n

n
j=1
j
Y
j
= o(c
n
) as n .
This of course implies that with probability one,
(3.15) max
1kn
[S
k

k
j=1
j
Y
j
[ = o(c
n
) as n .
Set
n
:= max
1kn
j=1
(
n

j
)Y
j
, n 1.
We claim that with probability one,
(3.16)
n
/c
n
0 as n .
We rst show that with probability one,
(3.17)
2
/c
2
0 as .
To that end we note that by combining Levys inequality and the Markov inequality,
we get for any > 0,
P
2
c
2
2P
_
_
_
j=1
[(
2

j
)Y
j
c
2
_
_
_
2
2
c
2
2
j=1
E[[(
2

j
)Y
j
[
2
].
As we have E[[(
2

j
)Y
j
[
2
] d|
2

j
|
2
, it suces to show,
(3.18)
=1
2
j=1
|
2

j
|
2
/c
2
2
< .
Using the inequality |
2

j
|
2
E[[X[
2
Ic
j
< [X[ c
2
] (see Lemma 3.4(b)),
we can prove this by essentially the same argument as on page 908 in [6]. (Note
that we now have c
2
j
/c
2
2
(j/2
)
2
so that one has to modify the last two bounds
on this page slightly.)
(v) Let 2
< n < 2
+1
. Then we have by the triangular inequality,
n
max
1kn
j=1
(
2
+1
j
)Y
j
+ max
1kn
(
2
+1
n
)
k
j=1
Y
j
which in turn is

2
+1 +|
2
+1
2
| max
1k2
+1
j=1
Y
j
.
Here we have used the fact that |
2
+1
n
| |
2
+1
2
|, 2
n 2
+1
which
follows from Lemma 3.4(a).
Using obvious modications of the proof of relation (3.11) in [6], we can conclude
that with probabilty one,
(3.19) |
2
+1
2
| max
1k2
+1
j=1
Y
j
= o(c
2
) as .
Combining relations (3.17) and (3.19), we see that (3.16) holds.
(vi) In view of (3.15) and (3.16) we have with probability one,
max
1kn
S
k

n
k
j=1
Y
j
= o(c
n
) as n .
Letting T
(n)
: C
d
[0, 1] be the partial sum process sequence based on
n
j=1
Y
j
, n
1, we see that with probability one
(3.20) |S
(n)

n
T
(n)
| = o(c
n
) as n .
12 U. EINMAHL
If the underlying p-space is rich enough, we can nd a d-dimensional Brownian
motion W(t) : t 0 such that W(n) =

n
j=1
Y
j
, n 1. Using the correspond-
ing result in the 1-dimensional case (see [9]) componentwise, we nd that with
probability one,
|T
(n)
W
(n)
| = O(
_
log n) as n
and consequently we have with probability one,
(3.21) |
n
T
(n)

n
W
(n)
| |
n
||T
(n)
W
(n)
| = O(|
n
|
_
log n) = o(c
n
),
where we have used the fact that |
n
|
2
E[[X[
2
I[X[ c
n
] c
n
E[X[ and (2.2).
Combining (3.20) and (3.21), we obtain the assertion and the theorem has been
proved.
4. Proofs of the corollaries
4.1. Proof of Corollary 2.2. We need the following lemma.
d
be a mean zero random vector with E[X[
2
< .
Assume that (2.1) holds, where c
n
is a non-decreasing sequence of positive real
numbers such that c
n
/
n is eventually non-increasing. Then we have for

n
dened
as in Theorem 2.1,
|
2
n
cov(X)| = o(c
2
n
/n) as n .
Proof. We have,
|
2
n
cov(X)| = sup
|v|1
v, (cov(X)
2
n
)v)
= sup
|v|1
E[v, X)
2
I[X[ > c
n
]
E[[X[
2
I[X[ > c
n
].
(4.1)
Furthermore, using the fact that c
2
m
/m is eventually non-increasing, we get for large
n,
E[[X[
2
I[X[ > c
n
]
k=n
c
2
k+1
Pc
k
< [X[ c
k+1
c
2
n
n
k=n
(k + 1)Pc
k
< [X[ c
k+1
,
which is of order o(c
2
n
/n) since the series

k=1
kPc
k
< [X[ c
k+1
converges by
(2.1).
We next show that |
n
| is of the same order. This is trivial in dimension 1,
but in higher dimensions one needs some extra arguments.
Lemma 4.2. Let be the positive semidenite symmetric matrix satisfying
2
=
cov(X). Under the assumptions of Lemma 4.1 we have:
|
n
| = o(c
2
n
/n) as n .
Proof. We rst look at the case where cov(X) is not positive denite.
Set d
1
=rank(cov(X)) and choose an orthonormal basis v
1
, . . . , v
d
of R
d
consisting
of eigenvectors of cov(X), where the vectors v
i
, i > d
1
correspond to the eigenvalue
0. Let S be the orthogonal matrix with column vectors v
1
, . . . , v
d
. Then we clearly
have,
S
t
cov(X)S =
_
C 0
0 0
_
,
where C is a positive denite symmetric (d
1
, d
1
)-matrix. (C is actually a diagonal
matrix). Choosing the unique positive denite symmetric (d
1
, d
1
)-matrix

C such
that

C
2
= C, we readily obtain (by unicity of the square root matrix) that
= S
_

C 0
0 0
_
S
t
.
Noticing that E[X, v
j
)
2
] = 0, j > d
1
, we see that we have also for the matrices
2
n
,
S
t
2
n
S =
_
C
n
0
0 0
_
,
where C
n
are positive semidenite symmetric (d
1
, d
1
)-matrices (not necessarily di-
agonal). This implies that
n
= S
_

C
n
0
0 0
_
S
t
,
where

C
n
are the positive semidenite symmetric matrices satisfying

C
2
n
= C
n
.
If cov(X) is positive denite, we set

C = , C = cov(X),

C
n
=
n
, C
n
=
2
n
, n 1.
Using Theorem X.1.1 in [2] we can conclude from Lemma 4.1 that
|
C
n

C| = |
n
| |
2
n
cov(X)|
1/2
0 as n .
This implies that

C
n
is positive denite for large n. Moreover, we have that the
smallest eigenvalue
n
of

C
n
converges to that one of

C which is equal to the
smallest positive eigenvalue of . If we denote this eigenvalue by we nd that
n
/2 > 0 for large n.
Applying Theorem X.3.7. in [2] (with A = C
n
, B = C and =
2
/4) we see that
for large n,
|
n
| = |
C
n

C|
1
|C
n
C| =
1
|
2
n
cov(X)|,
which in conjunction with Lemma 4.1 implies the above assertion.
Now we can conclude the proof of Corollary 2.2 by a simple application of the
triangular inequality. Just observe that by Theorem 2.1, with probability one
|S
(n)
W
(n)
| |S
(n)

n
W
(n)
| +|(
n
) W
(n)
|
o(c
n
) +|
n
||W
(n)
|
Note that we can apply Theorem 2.1 since we are assuming that c
n
/
n is eventually
non-increasing and we thus have for some m
0
1, c
n
/
n c
m
/
m, m
0
m n
which implies that condition (2.3) holds.
By the law of the iterated logarithm for Brownian motion we have with probability
one,
|
n
||W
(n)
| = O(|
n
|
_
nlog log n)
which is in view of Lemma 4.2 of order o(c
2
n
/
_
n/ log log n).
Since W
(n)
/
n
d
= W, where W(t), 0 t 1 is the Brownian motion on the
14 U. EINMAHL
compact interval [0, 1], we also have,
|
n
||W
(n)
| = O
P
(|
n
|
n) = o
P
(c
2
n
/
n) = o
P
(c
n
).
Corollary 2.2 has been proved.
4.2. Proof of Corollary 2.4. We shall use the following d-dimensional version of
Lemma 3 of [6]. The proof is almost the same as in dimension 1 and it is omitted.
Recall that H(c
n
) = supE[v, X)
2
I[X[ c
n
] : [v[ 1 = |
n
|
2
, n 1.
d
be a mean zero random vector and assume that
condition (2.1) holds for a sequence c
n
n
/
n is
non-decreasing. Whenever n
k
is a subsequence satisfying for large enough k,
1 < a
1
< n
k+1
/n
k
a
2
< ,
we have:
(4.2)
k=1
exp
_

2
c
2
n
k
2n
k
|
n
k
|
2
_
_
= if <
0
,
< if >
0
.
4.2.1. The upper bound part. W.l.o.g. we can assume that
0
< .
We rst show that under the assumptions of the corollary we have with probability
one,
(4.3) limsup
n
[S
n
[/c
n

0
.
To that end it is sucient to show that we have for any > 0 and n
k
= n
k
() =
[(1 +)
k
], k 1 with probability one,
(4.4) limsup
k
max
1nn
k
[S
n
[/c
n
k

0
Note that we trivially have,
max
n
k1
nn
k
[S
n
[/c
n
(c
n
k
/c
n
k1
) max
1nn
k
[S
n
[/c
n
k
.
Moreover, it follows from condition (2.3) and the denition of n
k
that
limsup
k
c
n
k
/c
n
k1
limsup
k
n
k
/n
k1
= 1 +.
Combining these two observations with (4.4) we get for any > 0 with probability
one,
limsup
n
[S
n
[/c
n

0
(1 +),
which clearly implies (4.3).
In view of our strong invariance principle, (4.4) follows if we can show that with
probability one,
(4.5) limsup
k
|
n
k
W
(n
k
)
|/c
n
k

0
.
In order to prove the last relation, we need a deviation inequality for max
0t1
[W(t)[.
The following simple (suboptimal) inequality will be sucient for our purposes.
Lemma 4.4. Let W(t) : t 0 be a standard d-dimensional Brownian motion
and let be a positive constant. Then there exists a constant C
= C
(d) > 0 which

depends only on and d such that
(4.6) P
_
max
0t1
[W(t)[ u
_
C
exp(u
2
/(2 + 2)), u 0
Proof. Since W
d
= W, we can infer from the Levy inequality that for u 0,
P
_
max
0t1
[W(t)[ u
_
2P[W(1)[ u
The random variable [W(1)[
2
has a chi-square distribution with d degrees of freedom
and thus we have
P[W(1)[ u = 2
d/2
(d/2)
1
_

u
2
x
d/21
exp(x/2)dx
Ku
d2
exp(u
2
/2), u 1,
where K > 0 is a constant depending on d only.
Obviously we can nd a positive constant C
so that the last term is bounded above

by
C
exp(u
2
/(2 + 2)).
Setting C
= 2C
e
1/(2+2)
, we see that inequality (4.6) holds for any u 0 and
the lemma has been proved.
We are ready to prove (4.5). Let > 0 be xed and set
= (1 + )(
0
+ ).
Recall that (W
(n)
(t)/
n)
0t1
d
= (W(t))
0t1
. Then we can infer from (4.2) that
k=1
P|
n
k
W
(n
k
)
|
c
n
k

k=1
P|
n
k
||W
(n
k
)
|
c
n
k
k=1
exp
_
(1 +)(
0
+)
2
c
2
n
k
2n
k
|
n
k
|
2
_
< .
This implies via the Borel-Cantelli lemma that with probability one,
limsup
k
|
n
k
W
(n
k
)
|/c
n
k
(1 +)(
0
+).
Since this holds for any > 0 we get (4.5) and consequently (4.3).
4.2.2. The lower bound part. We assume that
0
> 0. Otherwise, there is nothing
to prove.
Furthermore, we can assume that c
n
/
n . If c
n
= O(
n), then we have
0
= unless of course X = 0 with probability one. Applying Corollary 2.4 with
c
n
=

n(log log n)
1/4
, it follows that even limsup
n
[S
n
[/(
n(log log n)
1/4
) =
if X is non-degenerate. This trivially implies Corollary 2.4 for any sequence c
n
of
order O(
n).
We need the following lemma. Since the proof is almost identical with that one in
the 1-dimensional case (see Lemma 1, [7]) it is omitted. An inspection of this proof
also reveals that one needs not assume that X has a nite mean and thus we have,
d
be a random vector satisfying condition (2.1) for a
sequence c
n
n
/
n is non-decreasing and con-

verges to innity. Then we have,
(4.7) E[[X[
2
I[X[ c
n
] = o(c
2
n
/n) as n .
16 U. EINMAHL
Let ]0, 1[ be xed and m 1 +
1
a natural number. Consider the sub-
sequence n
k
= m
k
, k 1. We rst show that if 0 < (1 + ) <
0
we have with
probability one,
(4.8) limsup
k
[S
n
k+1
S
n
k
[/c
n
k+1
.
Rewriting S
n
k+1
S
n
k
as S
(n
k+1
)
(1)S
(n
k+1
)
(1/m), we see that Theorem 2.1 implies
that (4.8) holds if and only if one has with probability one,
(4.9) limsup
k
[
n
k+1
(W(n
k+1
) W(n
k
))[/c
n
k+1
.
Consider the independent events
A
k
:= [
n
k+1
(W(n
k+1
) W(n
k
))[ c
n
k+1
, k 1.
As |
n
k+1
| is the largest eigenvalue of
n
k+1
, we can nd an orthonormal vector
v
k+1
R
d
so that
n
k+1
v
k+1
= |
n
k+1
|v
k+1
and we can conclude that
P(A
k
) P[v
k+1
,
n
k+1
(W(n
k+1
) W(n
k
)))[ c
n
k+1
= P|
n
k+1
|
_
n
k+1
n
k
[Z[ c
n
k+1
,
where Z : R is standard normal.
Employing the trivial inequality P[Z[ t exp(t
2
(1 + )/2), t t
, where t
is a positive constant depending on only, we see that for large k,

P(A
k
) exp
_
2
(1 +)c
2
n
k+1
2(n
k+1
n
k
)|
n
k+1
|
2
_
.
We can apply the above inequality for large k since by Lemma 4.5
|
n
|
2
E[[X[
2
I[X[ c
n
] = o(c
2
n
/n) as n
and, consequently,
c
n
k+1
/(
_
n
k+1
n
k
|
n
k+1
|) as k .
Since we have chosen m 1 +
1
, it follows that n
k+1
n
k
= n
k+1
(1 1/m)
n
k+1
(1 +)
1
. We can conclude that for large enough k,
P(A
k
) exp
_
2
(1 +)
2
c
2
n
k+1
2n
k+1
|
n
k+1
|
2
_
,
and, consequently, we have on account of (4.2),
k=1
P(A
k
) = .
Using the Borel-Cantelli lemma, we see that (4.9) holds which in turn implies (4.8).
If
0
= , we use the trivial inequality
limsup
k
[S
n
k+1
S
n
k
[/c
n
k+1
2 limsup
k
[S
n
k
[/c
n
k
,
which in conjunction with (4.8) (where we set = 1/2 and m = 3) implies that we
have for any > 0 with probability one,
limsup
k
[S
n
k
[/c
n
k
/2.
It is now obvious that limsup
n
[S
n
[/c
n
=
0
= with probability one.
If
0
< we get from the upper bound part and the denition of n
k
with prob-
ability one,
limsup
k
[S
n
k
[/c
n
k+1

0
limsup
k
c
n
k
/c
n
k+1
2
0
/
m.
Combining this with (4.8) we see that we have if (1+) <
0
for any m 1+
1
with probability one,
limsup
n
[S
n
[/c
n
2
0
/
m.
Since we can make arbitrarily small, we see that limsup
n
[S
n
[/c
n

0
with
probability one and Corollary 2.4 has been proved.
4.3. Proof of Corollary 2.5. We only show how (b) implies (a) and we do this
if p > 1. For the implication (a) (b) we refer to [7]. We need another lemma.
d
be a random vector and set
H(t) = E[[X[
2
I[X[ t] 1, t 0
Then we have for any > 0 : E[[X[
2
/(

H([X[))
1+
] < .
Proof. Without loss of generality we can assume that E[X[
2
= and consequently
that

H(t) as t , where

H(t) = E[[X[
2
I[X[ t], t 0. Obviously,

H
is right continuous and non-decreasing. Therefore there exists a unique Lebesgue-
Stieltjes measure on the Borel subsets of R
+
satisfying,
(]a, b]) =

H(b)

H(a), 0 a < b < .
Let G be the generalized inverse function of

H, i.e.
G(u) = infx 0 :

H(x) u, 0 < u < .
As

H is right continuous, the above inmum is actually a minimum. In particular
we have

H(G(u)) u, u > 0. Moreover:
(4.10) G(u) x u

H(x).
Let

the Lebesgue measure on the Borel subsets of R
+
. From (4.10) it easily
follows that is equal to the image measure

G
.
Next set = G(1) so that

H(x) = 1, x < and

H(x) =

H(x), x . It trivially
follows that
E
_
[X[
2
/

H([X[)
1+
_
E
_
[X[
2
I[X[
+
_
],[
H(x)
1
(dx).
The rst term is obviously nite. As for the second term we have
_
],[
H(x)
1
(dx) =
_
],[
H(x)
1
G
(dx)
=
_

H()
H(G(u))
1
du
_

1
u
1
du <
and the lemma has been proved.
As we trivially have

H(t) dH(t), t 0, we get from (b) that

H(t) = O((LLt)
p1
)
as t and we readily obtain that for some positive constant C,
E[[X[
2
/(LL[X[)
p
] CE[[X[
2
/(

H([X[))
p/(p1)
]
18 U. EINMAHL
which is nite in view of Lemma 4.6.
Consequently, we have,
n=1
P[X[
_
n(LLn)
p
< .
We can apply Corollary 2.4 with c
n
=
_
n(LLn)
p
and we see that with probability
one,
limsup
n
[S
n
[/
_
2n(LLn)
p
=
0
/
2,
where
0
= sup
_
0 :
n=1
1
n
exp
_

2
(LLn)
p
2H(
_
n(LLn)
p
)
_
=
_
.
It remains to show that
0
=
2.
Consider =
2
2, where
2
> . If
1
],
2
[, we clearly have by (b) for large
n,
H(
_
n(LLn)
p
)
2
1
(LLn)
p1
and it follows that
1
n
exp
_

2
(LLn)
p
2H(
_
n(LLn)
p
)
_
1
n(Ln)
(
2
/
1
)
2
.
which leads to a convergent series. Thus, we have
0

2.
As for the opposite inequality, we can and do assume that > 0.
Consider =
1
2, where 0 <
1
< . Let further
2
,
3
be positive numbers such
that
1
<
2
<
3
< .
Choose a sequence t
k
such that
H(t
k
)
2
(1 1/k)(LLt
k
)
p1
.
Set
m
k
= minm : t
k

_
m(LLm)
p
.
It is easy to see that t
k

_
m
k
(LLm
k
)
p
as k and we thus have for large k,
H(
_
m
k
(LLm
k
)
p
)
2
3
(LLm
k
)
p1
,
Here we have used the fact that LLt LL(t
2
) as t , from which we can also
infer that for large k,
(LLn)
p
(
2
/
1
)
2
(LLm
k
)
p
, m
k
n m
2
k
=: n
k
.
Recalling that =

2
1
we get for large k,
n
k
n=m
k
1
n
exp
_

2
(LLn)
p
2H(
_
n(LLn)
p
)
_
n
k
n=m
k
1
n(Lm
k
)
(
2
/
3
)
2
(Lm
k
)
1(
2
/
3
)
2
.
The last term converges to innity and thus the series in the denition of
0
di-
verges, which means that
0

1
2 for any
1
< . Thus we have
0

2 and
the corollary has been proved.
Acknowledgements
The author would like to thank D. Mason for carefully checking a rst version of
this paper and making a number of useful suggestions. Thanks are also due to J.
Kuelbs for some helpful comments on this earlier version.
References
[1] E. Berger Fast sichere Approximation von Partialsummen unabhangiger und station arer
ergodischer Folgen von Zufallsvektoren Dissertation, Universitat G ottingen (1982).
[2] R. Bhatia Matrix Analysis Springer, New York, 1997.
[3] U. Einmahl Strong invariance principles for partial sums of independent random vectors.
Ann. Probab. 15 (1987), 14191440.
[4] U. Einmahl A useful estimate in the multidimensional invariance principle. Probab.Th.
Rel. Fields 76 (1987), 81101.
[5] U. Einmahl Extensions of results of Koml os, Major and Tusnady to the multivariate case
J. Multivar. Analysis 28 (1989), 2068.
[6] U. Einmahl A generalization of Strassens functional LIL. J. Theoret. Probab. 20 (2007),
901915.
[7] U. Einmahl and D. Li Some results on two-sided LIL behavior. Ann. Probab. 33 (2005),
16011624.
[8] U. Einmahl and D. Li Characterization of LIL behavior in Banach space. Trans. Am. Math.
Soc. 360 (2008), 66776693.
[9] U. Einmahl and D. M. Mason Rates of clustering in Strassens LIL for partial sum processes.
Probab. Theory Relat. Fields 97 (1993), 479487.
[10] J. Komlos, P. Major and G. Tusnady An approximation of partial sums of independent
r.v.s and the sample d.f., II. Z. Wahrsch. Verw. Gebiete 34 (1976), 3358.
[11] P. Major The approximation of partial sums of independent r.v.s. Z. Wahrsch. Verw. Ge-
biete 35 (1976), 213220.
[12] P. Major An improvement of Strassens invariance principle. Ann. Probab. 7 (1979), 5561.
[13] W. Philipp Almost sure invariance principles for sums of B-valued random variables. In:
Probability in Banach spaces II, Lecture Notes in Math. 709 Springer, Berlin (1979), 171
193.
[14] Sakhanenko A.I. A New Way to Obtain Estimates in the Invariance Principle
In: High Dimensional Probability II, Progress in Probability, 47, Birkh auser-Boston (2000),
223245.
[15] V. Strassen An invariance principle for the law of the iterated logarithm. Z. Wahrsch. Verw.
Gebiete 3 (1964), 211226.
[16] A. Zaitsev Multidimensional version of the results of Komlos, Major and Tusn ady for vec-
tors with nite exponential moments. ESAIM Probab. Statist. 2 (1998), 41108
Department of Mathematics, Free University of Brussels (VUB), Pleinlaan 2, B-1050
Brussels, Belgium
E-mail address: ueinmahl@vub.ac.be

A New Strong Invariance Principle For Sums of Independent Random Vectors

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

A New Strong Invariance Principle For Sums of Independent Random Vectors

Diunggah oleh

Hak Cipta:

Format Tersedia

A NEW STRONG INVARIANCE PRINCIPLE FOR SUMS OF

INDEPENDENT RANDOM VECTORS

is decreasing for some < 1/3 and c

, 1/3 < 1/2. For sequences c

nlog log n). The remaining gap, namely the determination

is non-decreasing, was closed by Einmahl [3]. (Note

n) is possible if one replaces the Brownian

n is eventually non-increasing. If the under-

is eventually non-decreasing, and

n is eventually non-increasing and (2.2) holds. Then a construction

n is eventually non-decreasing and condition (2.3) is sat-

is a positive constant depending on d only.

is a positive constant depending on d only and Theorem 3.1 has been

(for any n 1) such that

, which is possible due to (b).

n is eventually non-increasing. Then we have for

(d) > 0 which

so that the last term is bounded above

n), then we have

n is non-decreasing and con-

is a positive constant depending on only, we see that for large k,

Anda mungkin juga menyukai