Serena N G
1. INTRODUCTION
Consider a series {Xt }Tt=1 with mean and standard deviation . Let r = E[(x )r ] be the rth central moment of Xt
with 2 = 2 . The coefficients of skewness and kurtosis are
defined as
=
3
E[(x )3 ]
=
3
E[(x )2 ]3/2
(1)
E[(x )4 ]
4
=
.
4
E[(x )2 ]2
(2)
and
d
d
then T N(0, 6) and T( 3) N(0, 24) (see, e.g.,
Kendall and Stuart 1969). This article presents the limiting distributions for and when the data are weakly dependent.
The tests can be applied to the observed data whose population mean and variance are unknown, as well as least squares
regression residuals.
Whether time series data exhibit skewed behavior has been
an issue of macroeconomic interest. Some authors (e.g., Neftci
1984; Hamilton 1989) have used parametric models to see
whether economic variables behave similarly during expansions and recessions. Others use simple statistics to test skewness. In a well-known article, Delong and Summers (1985)
studied whether business cycles are symmetrical by applying
the skewness coefficient to GDP, industrial production, and the
unemployment rate. However, because the sampling distribution of the skewness coefficient for serially correlated data is
not known, these authors obtained critical values by simulating
an AR(3) model with normal errors. These critical values are
correct only if the AR(3) model is the correct data-generating
process and the errors are indeed normal. The results developed
50
method to obtain the critical values, or use martingale transformation methods (as in Bai 2003) to obtain distribution-free
tests. Empirical implementation of the KS test in a time series
setting thus can be rather computationally intensive. The tests
that we consider in this article are, in contrast, very simple to
construct.
Several authors have used the generalized methods of moments (GMM) to test the null hypothesis that the data are
Gaussian. Richardson and Smith (1993) considered testing multivariate normality, focusing on the case when the data are
cross-sectionally correlated (and also discussed how serial correlation can be accommodated). Their test is based on the overidentifying restrictions from matching the first four moments of
the data with those implied by the normal distribution. More
recently, Bontemps and Meddahi (2002) considered testing the
validity of the so-called Stein equations, which are moment
conditions that should hold under normality. Both of these normality tests are GMM-based tests of overidentifying restrictions. As is well known, the optimal weighting matrix used in
GMM is the inverse of the long-run variance of the moments
under consideration, which can be consistently estimated using
kernel methods. We directly test the skewness and kurtosis coefficients, properly scaled by their corresponding long-run variances, in view of the fact that the data are serially correlated.
Because we also estimate long-run variances, it might appear as
though we are also implementing GMM. This is not the case,
however. Testing overidentifying restrictions by GMM is not
equivalent to testing skewness and kurtosis, at least for nonnormal distributions. As we explain later, even for normal distribution, the theoretical weighing matrix must be used to interpret
testing overidentifying restrictions as testing skewness and kurtosis.
We do not assume normal distribution in deriving the skewness and kurtosis tests. The results can be used to test any given
value of skewness and kurtosis coefficients. Testing normality
is no more than a joint test that can be conveniently obtained
within our framework. The tests used to determine whether the
data are Gaussian can be easily amended to determine whether
the data are consistent with, for example, the lognormal (which
has skewness and kurtosis coefficients of 6.18 and 113.9). This
is in contrast with the tests just described, which are designed
for testing normality. The motivation of our tests is closer to that
of Dufour, Khalaf, and Beaulieu (2002), although we allow the
data (or residuals) to be serially correlated. Although our focus
is on asymptotic results, these authors implemented exact tests
via the bootstrap. Simplicity and generality are two noteworthy
aspects of our tests.
2.
=
r
,
r r
r
r
" 1 !T
#
r
T
t=1 (Xt X) r
=
r
"
#
r (( 2 )r/2 ) (( 2 )r/2 )
r
.
Lemma A.1 in Appendix A provides large-sample approximations to the central moments. These are used to obtain the sampling distributions of and .
2.1
We first derive the limiting distribution of the sample skewness coefficient under arbitrary (not necessarily 0), and then
specialize the general result to = 0. Throughout, we assume
that the central limit theorem holds for the 4 1 vector series
W&t = [Xt , (Xt )2 2 , (Xt )3 3 , (Xt )4 4 ]
(t = 1, . . . , T). This requires finite (8 + )th ( > 0) moment
and some mixing conditions. When testing symmetry, finite
(6 + )th moment and some mixing conditions are sufficient.
Theorem 1. Suppose that Xt is stationary up to sixth order.
Then
T
1 $
T( ) = 3
Zt + op (1),
T t=1
where
and
1
T
!T
t=1 Zt
= [ 1 3 2 32 ] ,
(Xt )3 3
Zt = (Xt ) ,
(Xt )2 2
d
N(0, "), where " = limT TE(Z Z & ),
" &
d
T( ) N 0, 6 .
Zt =
.
(3)
T t=1
T t=1 (Xt )
This leads to the following result.
2 " 22 &2
d
,
T N 0,
6
(4)
d
T 3 N(0, 2 " 22 &2 ).
Bai and Ng: Skewness, Kurtosis, and Normality for Time Series Data
51
T 3
T d
=
N(0, 1).
3 =
s( 3 )
s( )
That is, 3 and are the same. To construct 3 , one need only a
consistent estimate of this long-run variance, which can be obtained nonparametrically by kernel estimation. The test is valid
even if the null distribution is not normally distributed, albeit
symmetric. Simple calculations show that if Xt is iid normal,
then the variance of is 6.
Depending on the distribution under investigation, a large
number of observations might be required to detect symmetry.
The possibility of low power can be remedied in two ways.
The first is to exploit the fact that most economic time series
are bounded below by 0. Hence one can test symmetry against
positive skewness. Second, the odd moments of symmetric distributions are 0, if they exist. Therefore, in theory one can construct a joint test of several odd moments to increase power. To
illustrate, consider a joint test of two odd moments, r1 and r2 .
Let
+ 1 !T
r1 ,
t=1 (Xt X)
T
YT =
.
!T
r2
1
t=1 (Xt X)
T
where
"
1
=
0
0 r1 r1 1
1 r2 r2 1
and
Zt =
(Xt )r1
(Xt )r2
(Xt )
have YT N(0, " & ) under the null hypothesis of symmetry. Let " & be a consistent estimate for " & (which is
easy to obtain); we then have
.1
d
r1 ,r2 = Y&T " & YT 22 .
where
and
Let
1 $
T( ) = 4
Wt + op (1),
T t=1
= [ 1 43 2 2 ] ,
(Xt )4 4
Wt = (Xt ) ,
(Xt )2 2
(5)
!T
d
&
1
t=1 Wt N(0, $) with $ = limT TE(WW ).
T
be consistent estimates of 2 and $. Then
2 and $
T( ) d
4 () =
N(0, 1),
s()
& / 8 )1/2 .
where s()
= ( $
2.2
(Xt )3
gt ( ) = (Xt ) .
(Xt )2 2
2 ), which
Note that in general, GMM is not equal to = (X,
is what we use in our tests. Indeed, in the present case, estimating by GMM entails solving a nonlinear equation. Define
JT = T g T ( GMM )& " g T ( GMM ),
52
34 = 32 + 42 22 .
An asymptotically equivalent test, based directly on the third
and fourth central moments, can be constructed as follows. Let
!T
,
+
3
1
t=1 (Xt X)
T
.
YT =
!T
4
1
2 )2 ]
t=1 [(Xt X) 3(
T
T
1 $
YT =
Zt + op (1),
T t=1
#
0
1 0
3 2
,
and
=
0
6 2 0 1
(6)
(Xt )
(Xt )2 2
,
Zt =
(Xt )3
(Xt )4 3 4
!T
d
&
t=1 Zt N(0, ') and ' = limT TE(ZZ ).
"
with
1
T
d
be consistent estimaThus YT N(0, ' & ). Let and '
tors of and '; then we have
d
& )1 YT
22 .
34 = Y&T ( '
and
34 22
d
34 22 .
Bai and Ng: Skewness, Kurtosis, and Normality for Time Series Data
T t=1
T t=1
j = 2, 3, 4. (7)
53
Skewness
We consider both one-tailed and two-tailed tests for symmetry (denoted by 3 and 3 ). Results are reported at the 5%
level without loss of generality. The critical values are 1.64
(one-tailed) and 1.96 (two-tailed). We also consider 35 , a joint
test of the third and fifth central moments, and the 5% critical value is 5.99. Three sample sizes are considered: T = 100,
200, and 500. Table 1 indicates that 3 has accurate size even
for small T, but the 35 statistic rejects less often than 3 under the null. The size of the tests are, however, quite robust
to the degree of persistence in the data. It is useful to remark
that in theory, the 35 test requires the data to have tenth moments. Even though the t5 distribution does not have sixth moment, the test still has good size. When a large number of
observations are available for testing symmetry, the 35 test
can be a good complementary test to 3 because it is easy to
implement.
The two-sided test has low power, but imposing a priori
information on the direction of skewness leads to substantial
power gains. All of the tests considered have low power for
A4, A5, and A6 unless the sample size is large (say, more than
200 observations). The statistics developed by Randles et al.
(1980) for testing symmetry in iid data also have low power
for the same distributions, all of which have large kurtosis. In
general, 35 has very good power even for T as small as 50.
However, whereas the size of the test is quite robust to serial
correlation in the data, the power function is quite sensitive to
persistence. Results for = .8 reveal that the power of the tests
drops significantly when the degree of persistence
!increases.
The reason for this is that for AR(1) models, yt = tj=1 j utj
(assuming
that y0 = 0). In the limit when = 1, yt (scaled
54
Table 1. Size and Power of the Test Symmetry for = 0 (NeweyWest kernel, no prewhitening)
T = 100
3
35
T = 200
3
35
T = 500
3
35
=0
S1
S2
S3
S4
S5
S6
S7
A1
A2
A3
A4
A5
A6
A7
A8
0
0
0
0
0
0
0
6.18
2.0
2.0
.5
1.5
2.0
3.16
3.8
3.0
9.0
2.5
3.0
6.0
11.6
126.0
113.9
9.0
9.0
2.2
7.5
21.2
23.8
40.7
.04
.03
.04
.03
.03
.02
.03
.43
.74
.75
.85
.67
.23
.56
.53
.05
.05
.05
.04
.04
.04
.04
.63
.88
.89
.93
.85
.43
.76
.71
.02
.02
.03
.01
.02
.02
.01
.62
.94
.96
.64
.79
.22
.79
.76
.05
.04
.05
.04
.04
.04
.02
.52
.90
.91
1.00
.85
.43
.73
.67
.06
.05
.06
.05
.05
.05
.04
.69
.96
.97
1.00
.94
.64
.87
.82
.03
.03
.04
.03
.02
.02
.02
.81
1.00
1.00
.99
.99
.55
.96
.95
.06
.03
.05
.05
.04
.03
.03
.68
.98
.98
1.00
.98
.72
.87
.83
.06
.04
.06
.05
.06
.04
.04
.81
.99
.99
1.00
1.00
.85
.94
.92
.04
.02
.04
.04
.03
.03
.02
.95
1.00
1.00
1.00
1.00
.93
1.00
.99
= .5
S1
S2
S3
S4
S5
S6
S7
A1
A2
A3
A4
A5
A6
A7
A8
0
0
0
0
0
0
0
6.18
2.0
2.0
.5
1.5
2.0
3.16
3.8
3.0
9.0
2.5
3.0
6.0
11.6
126.0
113.9
9.0
9.0
2.2
7.5
21.2
23.8
40.7
.03
.03
.02
.03
.03
.03
.04
.43
.62
.65
.30
.44
.19
.48
.46
.05
.05
.05
.04
.04
.05
.05
.64
.84
.86
.54
.70
.38
.72
.67
.03
.02
.01
.02
.01
.02
.01
.62
.77
.78
.16
.32
.15
.54
.52
.04
.04
.04
.04
.04
.04
.02
.52
.87
.87
.70
.76
.37
.67
.64
.05
.05
.06
.05
.06
.05
.04
.71
.94
.95
.83
.90
.56
.84
.81
.02
.03
.02
.03
.02
.03
.02
.81
.99
1.00
.48
.85
.38
.90
.92
.05
.04
.04
.05
.05
.04
.03
.68
.97
.97
.99
.96
.69
.86
.82
.05
.04
.05
.06
.05
.05
.04
.82
1.00
.99
1.00
.99
.82
.94
.92
.04
.03
.03
.04
.03
.03
.03
.94
1.00
1.00
.97
1.00
.86
.99
.98
= .8
S1
S2
S3
S4
S5
S6
S7
A1
A2
A3
A4
A5
A6
A7
A8
0
0
0
0
0
0
0
6.18
2.0
2.0
.5
1.5
2.0
3.16
3.8
3.0
9.0
2.5
3.0
6.0
11.6
126.0
113.9
9.0
9.0
2.2
7.5
21.2
23.8
40.7
.02
.04
.02
.02
.02
.04
.04
.07
.01
.01
.06
.04
.06
.02
.04
.05
.06
.04
.03
.04
.06
.05
.20
.05
.06
.18
.16
.22
.11
.15
.01
.02
.01
.01
.04
.04
.03
.04
.03
.03
.04
.29
.13
.12
.12
.20
.15
.18
.22
.06
.06
.05
.05
.04
.06
.04
.55
.28
.27
.25
.41
.33
.43
.49
.01
.02
.01
.04
0
.03
.02
.25
.17
.16
.06
.05
.11
.07
.14
.03
.04
.05
.05
.04
.06
.04
.64
.73
.73
.32
.64
.45
.72
.72
.04
.05
.06
.05
.05
.06
.05
.79
.88
.89
.47
.83
.68
.88
.88
.02
.03
.03
.04
.02
.04
.03
.81
.77
.78
.17
.51
.42
.82
.89
.03
.02
.06
.04
.05
.03
0
.04
.01
.01
3.2
Table 2 reports results for 4 (), which tests the true population value of , and 4 (3), which tests = 3 as would be the
case under normality. There are two notable results. First, there
are large size distortions, so that a large type I error could be
expected if one were to test = 0 . Second, although the test
has power to reject = 3, the power is very low for the sample
sizes considered. In many cases, serial correlation in the data
further reduces the power of the test. For example, consider
case A3 with = 9. With T = 1,000, the power is .90 when
= 0, falls to .84 when = .5, and is only .14 when = .8.
One needs more than 5,000 observations to reject the hypothesis that = 3 when = .8. For this reason, we report results for
sample sizes much larger than for to highlight the problems
with testing for kurtosis in finite samples.
Bai and Ng: Skewness, Kurtosis, and Normality for Time Series Data
55
Table 2. Size and Power of the Test Kurtosis: NeweyWest Kernel, No Prewhitening
T = 100
4 ( 3)
4 ( )
0
0
0
0
0
0
0
6.18
2.0
2.0
.5
1.5
2.0
3.16
3.8
3.0
9.0
2.5
3.0
6.0
11.6
126.0
113.9
9.0
9.0
2.2
7.5
21.2
23.8
40.7
.02
.68
.01
.02
.35
.60
.43
.15
.24
.25
.02
.36
.61
.34
.30
= .5
S1 0
S2 0
S3 0
S4 0
S5 0
S6 0
S7 0
A1 6.18
A2 2.0
A3 2.0
A4
.5
A5 1.5
A6 2.0
A7 3.16
A8 3.8
3.0
9.0
2.5
3.0
6.0
11.6
126.0
113.9
9.0
9.0
2.2
7.5
21.2
23.8
40.7
.01
.70
= .8
S1 0
S2 0
S3 0
S4 0
S5 0
S6 0
S7 0
A1 6.18
A2 2.0
A3 2.0
A4
.5
A5 1.5
A6 2.0
A7 3.16
A8 3.8
3.0
9.0
2.5
3.0
6.0
11.6
126.0
113.9
9.0
9.0
2.2
7.5
21.2
23.8
40.7
=0
S1
S2
S3
S4
S5
S6
S7
A1
A2
A3
A4
A5
A6
A7
A8
.02
.35
.66
.48
.29
.46
.44
0
.43
.63
.42
.38
T = 500
4 ( 3)
4 ( )
T = 1,000
4 ( 3)
4 ( )
.02
.74
.01
.03
.37
.69
.48
.32
.29
.29
.01
.38
.81
.57
.56
.02
.13
.44
.03
.16
.28
.28
.28
.38
.41
.81
.20
.25
.34
.34
.02
.69
.01
.01
.37
.66
.37
.69
.29
.27
.01
.34
.77
.66
.78
.02
.53
.80
.01
.66
.63
.51
.41
.75
.73
.99
.65
.52
.57
.51
.02
.64
.01
.01
.32
.58
.25
.85
.24
.26
.02
.31
.74
.58
.73
.02
.70
.96
.01
.87
.73
.58
.49
.92
.90
1.00
.84
.63
.66
.60
.01
.01
.84
.01
.03
.07
.01
0
.07
.11
.24
.14
.13
.19
.01
.05
.09
.11
.01
.85
.01
.22
.19
.01
.06
.43
.44
.39
.60
.57
.38
.20
.33
.45
.44
.01
.81
.02
.01
.61
.80
.54
.91
.70
.71
.13
.65
.87
.80
.89
.01
.56
.28
.01
.50
.68
.57
.49
.87
.83
.68
.63
.59
.62
.59
.02
.02
.23
.02
.03
.08
.10
.19
.12
.13
.52
.04
.07
.16
.16
.06
.02
.01
.03
.09
.03
.02
.12
0
.01
.01
.03
.01
.53
.01
.28
.50
.41
.28
.23
.23
0
.36
.52
.32
.26
T = 200
4 ( 3)
4 ( )
0
0
0
0
0
0
0
0
0
0
0
.01
.01
.01
.01
.01
.49
.82
.62
.42
.58
.56
0
.52
.84
.65
.63
.01
.77
.01
.50
.79
.62
.48
.65
.66
.63
.76
.60
.59
.01
.01
.01
0
.01
.01
.04
0
.01
0
0
0
0
0
0
0
.01
.59
.83
.60
.76
.67
.63
.01
.61
.90
.83
.91
.96
.81
.96
.83
.72
.94
.91
.08
.88
.96
.90
.91
To understand the properties of the kurtosis tests, Table 3 reports the average estimates of and at the different sample
sizes. Three results are noteworthy. First, both and are generally downward biased, with biases increasing in . However,
the biases are substantially larger for .
Second, even with T
as large as 5,000, cannot be estimated precisely from serially
correlated data. In some cases, is severely biased for even iid
data (see, e.g., case A1). This result has the important empirical
implication that the sample kurtosis measure is generally unreliable and should always be viewed with caution. Third, the one
exception when can be well estimated is when = 3. This is
important in interpreting the results for testing normality.
Table 4 reports the results for testing normality. Except for
the first row, which is based on the normal distribution and thus
indicates size, all other rows indicate power. The 34 test is
generally more powerful than the 34 test. Because the kurtosis
0
0
.01
0
0
.01
.05
.22
.03
.03
.01
0
.03
.01
.07
0
0
.95
.02
.92
.97
.86
.88
.97
.98
.49
.95
.97
.96
.96
.02
.01
0
0
.09
.22
.43
.16
.14
.02
.01
.13
.16
.28
T = 2,500
4 ( 3)
4 ( )
.01
.61
.01
.25
.53
.11
.78
.17
.16
.01
.23
.67
.51
.67
.01
.82
1.00
.01
.94
.84
.67
.60
.98
.97
1.00
.97
.75
.82
.75
T = 5,000
4 ( 3)
4 ( )
.01
.24
.46
.05
.75
.15
.15
.02
.18
.62
.46
.61
.01
.88
1.00
.01
.99
.89
.70
.67
1.00
1.00
1.00
.99
.81
.88
.79
.81
.40
0
.78
.79
.28
.88
.85
.86
1.00
.78
.83
.79
.83
.01
.58
0
.86
.92
0
.98
.88
.70
.68
1.00
.99
1.00
.98
.80
.88
.80
0
.59
.01
0
.20
.82
.68
.66
.94
.94
.06
.52
.75
.83
.77
.84
.10
0
.69
.80
.37
.91
.78
.78
.76
.72
.86
.80
.87
.96
.20
.96
.96
.79
.98
.99
.99
.99
.97
.97
.97
.98
.80
.62
0
.90
.84
.66
.60
.97
.96
.98
.95
.75
.80
.75
.19
.02
.02
.53
.59
.57
.64
.67
.03
.09
.55
.67
.69
.97
.75
.99
.97
.75
.97
1.00
1.00
1.00
.99
.96
.97
.98
test has such low power, the results for normality by and large
reflect the results for the tests for skewness. The tests have low
power when a distribution is symmetric. We also considered
prewhitening, as discussed by Andrews and Monahan (1992).
Prewhitening lowers the power of the tests when in fact the data
are not serially correlated, but raises power when the data are
genuinely serially correlated. These additional results are available in the working version of this article.
The fact that the kurtosis test has large size distortions might
appear problematic for the normality test. Interestingly, however, this is not the case. This is because precisely estimates
when = 3, and size distortion is not an issue. Thus, although the kurtosis test is itself not very useful per se, we can
still use it to test normality, as was done by Bera and Jarque
(1981). It is interesting to note that although the fourth moment
56
100
200
500
1,000
2,500
5,000
100
200
500
1,000
2,500
5,000
=0
S1
S2
S3
S4
S5
S6
S7
A1
A2
A3
A4
A5
A6
A7
A8
0
0
0
0
0
0
0
6.18
2.0
2.0
.5
1.5
2.0
3.16
3.8
.01
.01
.01
.01
.02
.03
.03
3.15
1.78
1.76
.48
1.33
1.34
2.33
2.58
0
.02
0
0
.02
.02
.04
3.81
1.90
1.89
.50
1.43
1.49
2.60
2.89
0
.02
0
0
0
.02
.06
4.44
1.95
1.98
.51
1.47
1.72
2.84
3.21
0
.01
0
0
.01
0
.01
4.82
1.97
1.97
.51
1.49
1.80
3.01
3.56
0
.02
0
0
0
.01
.03
5.34
1.99
2.00
.51
1.51
1.87
3.11
3.66
0
.02
0
0
0
.01
.08
5.71
1.99
1.99
.51
1.52
1.97
3.13
3.72
3.0
9.0
2.5
3.0
6.0
11.6
126.0
113.9
9.0
9.0
2.2
7.5
21.2
23.8
40.7
2.95
5.38
2.47
2.96
4.95
6.70
8.96
17.24
7.14
6.98
2.19
5.86
8.30
10.81
12.65
2.95
6.11
2.49
2.97
5.39
7.66
12.49
25.71
8.05
7.94
2.21
6.53
10.00
13.64
16.43
3.00
6.66
2.49
2.99
5.63
8.92
17.62
37.63
8.47
8.85
2.22
6.90
12.66
16.98
20.86
2.99
7.88
2.50
2.99
5.75
9.80
19.64
46.55
8.70
8.70
2.22
7.16
14.14
19.83
27.93
3.00
7.87
2.50
3.00
5.90
10.71
28.66
63.72
8.91
9.02
2.22
7.30
15.74
21.92
31.22
3.00
8.12
2.50
3.00
5.88
11.09
35.93
79.20
8.93
8.91
2.22
7.42
18.68
22.58
32.50
= .5
S1
S2
S3
S4
S5
S6
S7
A1
A2
A3
A4
A5
A6
A7
A8
0
0
0
0
0
0
0
6.18
2.0
2.0
.5
1.5
2.0
3.16
3.8
0
.02
0
.01
.02
.03
.02
2.23
1.21
1.21
.35
.96
.99
1.65
1.85
0
.02
.01
.01
.01
.01
.05
2.77
1.35
1.35
.36
1.05
1.09
1.90
2.09
0
.02
0
0
0
.01
.05
3.27
1.42
1.45
.37
1.09
1.27
2.10
2.36
0
.01
0
0
0
0
0
3.57
1.45
1.45
.38
1.10
1.34
2.22
2.62
0
.01
0
0
0
.01
.03
3.95
1.47
1.48
.38
1.12
1.39
2.31
2.71
0
.01
0
0
0
.01
.06
4.24
1.47
1.47
.38
1.13
1.46
2.32
2.75
3.0
9.0
2.5
3.0
6.0
11.6
126.0
113.9
9.0
9.0
2.2
7.5
21.2
23.8
40.7
2.91
4.31
2.63
2.91
4.05
5.06
6.43
10.92
5.14
5.09
2.43
4.53
6.00
7.31
8.44
2.94
4.81
2.67
2.94
4.34
5.72
8.57
16.11
5.80
5.76
2.47
5.04
7.10
9.17
10.67
2.98
5.16
2.68
2.98
4.56
6.51
11.65
23.49
6.18
6.39
2.51
5.32
8.72
11.29
13.54
2.99
5.91
2.70
2.99
4.64
7.08
12.91
29.03
6.36
6.38
2.52
5.45
9.65
13.01
17.74
2.99
5.88
2.70
3.00
4.75
7.61
18.27
39.22
6.51
6.57
2.53
5.55
10.68
14.33
19.83
3.00
6.09
2.70
3.00
4.73
7.87
22.72
48.85
6.52
6.52
2.53
5.65
12.42
14.68
20.60
= .8
S1
S2
S3
S4
S5
S6
S7
A1
A2
A3
A4
A5
A6
A7
A8
0
0
0
0
0
0
0
6.18
2.0
2.0
.5
1.5
2.0
3.16
3.8
0
.01
0
.01
.01
.02
.01
.84
.20
.20
.32
.42
.56
.53
.67
0
.01
0
0
.01
0
.03
1.81
.70
.73
.25
.61
.76
1.14
1.29
0
.01
0
0
0
0
.01
2.06
.79
.79
.24
.64
.78
1.27
1.50
0
.01
0
0
0
0
.02
2.31
.85
.85
.23
.66
.83
1.35
1.59
0
.01
0
0
0
.01
.04
2.51
.86
.86
.23
.67
.87
1.37
1.63
3.0
9.0
2.5
3.0
6.0
11.6
126.0
113.9
9.0
9.0
2.2
7.5
21.2
23.8
40.7
2.80
3.21
2.71
2.78
3.14
3.48
3.92
5.01
3.71
3.72
2.77
3.17
3.67
4.07
4.40
2.90
3.53
2.79
2.89
3.35
3.81
4.80
6.87
3.90
3.86
2.79
3.46
4.23
4.85
5.19
2.95
3.72
2.84
2.95
3.50
4.18
6.03
9.86
4.07
4.15
2.82
3.73
4.95
5.76
6.46
2.99
4.02
2.88
2.98
3.57
4.44
6.53
12.09
4.19
4.19
2.82
3.82
5.29
6.47
8.08
2.98
4.03
2.88
2.99
3.63
4.64
8.49
15.84
4.25
4.26
2.83
3.90
5.77
7.09
9.03
2.99
4.12
2.89
3.00
3.63
4.78
10.13
19.67
4.27
4.26
2.82
3.97
6.45
7.20
9.37
.01
.01
.02
0
0
.05
1.36
.49
.48
.27
.52
.62
.88
.98
is less reliably estimated, the higher odd moments of symmetric distributions, which are 0, if they exist, appear to be well
estimated. This is why the symmetry test 35 performs well
even for random variables that do not have finite fifth moment.
For nonsymmetric distributions, the bias in high moments often translates into better power for 35 . In summary, whereas
the performance of the kurtosis test raises concerns about the
use of high moments in statistical testing, this concern is less
important when the objective is symmetry. As also pointed out
earlier, the behavior of the kurtosis test does not undermine the
performance of the normality test.
4. EMPIRICAL APPLICATIONS
We applied our tests to 21 macroeconomic time series. Data
for GDP, the GDP deflator, the consumption of durables, final
sales, the consumption of nondurables, residential investment,
and nonresidential investment are taken from the national accounts and are quarterly data. The unemployment rate, employment, M2, and CPI are monthly series. The 30-day interest rate
and M2 are weekly data. (All data are taken from the Economic
Time Series Page, available at www.economagic.com.) With the
exception of the interest rate and the unemployment rate (for
which we do not take logs), we take the first difference of the
logarithm of the data before applying the tests. We also considered three exchange rates (in logged first differences), and the
value as well as the equally weighted CRSP daily stock returns.
These data are not transformed. The sample skewness and kurtosis coefficients for the 21 series are also reported, along with
tests for skewness, kurtosis, and normality.
The first column of Table 5 reports tests for symmetry, that
is, testing = 0, with given in the fourth column. Several aspects of the results are of note. Following Delong and Summers
(1985), we fail to reject symmetry in output and industrial pro-
Bai and Ng: Skewness, Kurtosis, and Normality for Time Series Data
57
Table 4. Size and Power of the Test Normality: NeweyWest Kernel, No Prewhitening
T = 100
34
T = 200
34
0
0
0
0
0
0
0
6.18
2.0
2.0
.5
1.5
2.0
3.16
3.8
3.0
9.0
2.5
3.0
6.0
11.6
126.0
113.9
9.0
9.0
2.2
7.5
21.2
23.8
40.7
.05
.03
.27
.03
.06
.07
.09
.84
.99
.99
1.00
.81
.22
.95
.94
.02
.02
.16
.02
.03
.06
.07
.48
.71
.74
.99
.60
.27
.59
.56
.09
.11
.58
.08
.13
.22
.20
.94
1.00
1.00
1.00
1.00
.52
.99
.99
.03
.09
.35
.03
.12
.16
.17
.55
.88
.88
1.00
.83
.53
.75
.68
.08
.39
.90
.07
.53
.50
.37
.99
1.00
1.00
1.00
1.00
.87
1.00
1.00
.03
.32
.72
.03
.43
.45
.35
.69
.98
.96
1.00
.98
.76
.86
.82
.09
.62
.98
.07
.82
.67
.55
1.00
1.00
1.00
1.00
1.00
.97
1.00
1.00
.03
.57
.94
.03
.74
.61
.46
.80
1.00
1.00
1.00
.99
.86
.92
.88
.07
.81
1.00
.06
.94
.85
.64
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
.02
.76
1.00
.03
.89
.77
.56
.84
1.00
1.00
1.00
1.00
.94
.97
.94
.06
.89
1.00
.06
.98
.91
.71
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
.03
.80
1.00
.02
.96
.83
.61
.90
1.00
1.00
1.00
1.00
.94
.98
.98
= .5
S1
S2
S3
S4
S5
S6
S7
A1
A2
A3
A4
A5
A6
A7
A8
0
0
0
0
0
0
0
6.18
2.0
2.0
.5
1.5
2.0
3.16
3.8
3.0
9.0
2.5
3.0
6.0
11.6
126.0
113.9
9.0
9.0
2.2
7.5
21.2
23.8
40.7
.03
.02
.06
.03
.01
.04
.04
.73
.71
.71
.35
.21
.10
.45
.51
.01
.02
.02
.01
.01
.03
.04
.48
.51
.54
.21
.20
.11
.41
.41
.05
.05
.21
.06
.04
.09
.11
.93
1.00
1.00
.90
.83
.35
.97
.97
.01
.04
.05
.02
.02
.07
.08
.57
.83
.84
.75
.65
.36
.67
.64
.09
.20
.49
.09
.22
.34
.33
.97
1.00
1.00
1.00
1.00
.79
1.00
1.00
.02
.13
.11
.02
.07
.24
.27
.69
.97
.97
1.00
.95
.73
.85
.82
.09
.48
.67
.09
.59
.59
.50
.99
1.00
1.00
1.00
1.00
.96
1.00
1.00
.02
.36
.20
.02
.23
.55
.43
.80
1.00
.99
1.00
.99
.84
.91
.88
.08
.76
.93
.07
.90
.82
.62
.99
1.00
1.00
1.00
1.00
1.00
1.00
1.00
.01
.72
.44
.02
.78
.74
.57
.85
1.00
1.00
1.00
1.00
.94
.97
.94
.05
.87
1.00
.05
.98
.89
.70
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
.02
.80
.79
.01
.95
.83
.62
.90
1.00
1.00
1.00
1.00
.95
.98
.98
= .8
S1
S2
S3
S4
S5
S6
S7
A1
A2
A3
A4
A5
A6
A7
A8
0
0
0
0
0
0
0
6.18
2.0
2.0
.5
1.5
2.0
3.16
3.8
3.0
9.0
2.5
3.0
6.0
11.6
126.0
113.9
9.0
9.0
2.2
7.5
21.2
23.8
40.7
.01
.01
.01
.01
0
.01
.01
.02
.01
.02
.01
0
.01
0
0
.01
.01
.02
.02
.04
.03
.01
.01
.01
.01
.04
.03
.08
.06
.02
.06
.09
.88
.56
.60
.31
.46
.36
.73
.86
.01
.02
.01
.02
.01
.04
.05
.66
.60
.61
.13
.40
.35
.64
.71
.07
.09
.14
.08
.06
.14
.22
.97
.99
.99
.70
.93
.73
1.00
.99
.02
.03
.03
.02
.01
.08
.14
.77
.95
.96
.38
.85
.69
.85
.85
.09
.26
.23
.10
.25
.55
.53
.99
1.00
1.00
.97
1.00
.98
1.00
1.00
.01
.12
.02
.02
.04
.32
.42
.85
1.00
1.00
.85
1.00
.92
.95
.94
.07
.64
.31
.08
.67
.78
.64
.99
1.00
1.00
1.00
1.00
1.00
1.00
1.00
.01
.36
.03
.02
.11
.67
.58
.90
1.00
1.00
.99
1.00
.94
.98
.97
0
0
0
0
.01
.01
.03
.01
.01
.02
.03
.16
.07
.06
.06
.04
.06
.03
.05
.02
.02
.30
.08
.06
.03
.04
.05
.07
.11
34
T = 5,000
34
S1
S2
S3
S4
S5
S6
S7
A1
A2
A3
A4
A5
A6
A7
A8
34
T = 2,500
34
34
0
0
34
T = 1,000
34
=0
34
T = 500
34
34
Results for the normality test are reported in the third column
of Table 5. We reject normality in the U.S.Japan exchange rate,
unemployment rate, CPI inflation, 30-day interest rate, and two
stock return series. With the exception of the 30-day interest
rate, which failed the kurtosis test but not the skewness test, series that exhibit non-Gaussian behavior all failed the symmetry
test. This accords with our observation that the power of the
normality test is derived from asymmetry.
5. CONCLUDING COMMENTS
The goal of this article was to obtain tests for skewness, kurtosis, and a joint test of normality suited for time series observations. Our results depend only on stationarity and the existence
of some moments. Monte Carlo simulations accord with our
prior that tests for kurtosis will have low power because of the
58
Series
1.505
.682
2.532
.913
1.195
2.108
1.420
2.618
.376
.096
1.858
.596
1.590
1.510
1.890
.387
.035
1.654
1.019
2.747
3.782
34
.359
.915
1.567
1.323
1.417
.188
1.335
1.484
1.814
.278
1.722
1.341
1.659
1.305
1.845
1.317
1.421
.828
1.240
3.063
2.632
2.438
1.641
7.444
4.853
2.305
4.529
2.356
6.858
4.084
.202
3.499
3.623
3.761
2.379
3.580
3.771
2.688
2.775
2.058
10.219
16.349
.226
.134
.481
.308
.994
.870
.561
.942
.415
.017
.791
.212
.280
.732
1.644
.117
.017
.409
.457
.481
.990
3.139
3.499
3.905
8.621
13.274
3.284
4.717
4.491
11.861
3.116
5.023
4.721
3.733
5.254
10.543
5.857
5.404
3.680
5.134
5.187
6.943
The 5% critical values are 1.96 for 3 and 4 and 5.99 for 34 .
T
T
1 $
1 $
=
(Xt )r r r1
(Xt ) + op (1).
T t=1
T t=1
- 2 r/2
. r
T ( ) ( 2 )r/2 = ( 2 )r/21 T[ 2 2 ] + op (1).
2
Proof. We show (without loss of generality) the derivations
for r = 3. The generalization is immediate.
T
1 $
3
(Xt X)
T t=1
T
1 $
3
=
(Xt + X)
T t=1
T
T
1 $
1$
3
=
(Xt ) + 3
(Xt )2 T( X)
T
T t=1
t=1
1 $
2
3.
+ 3( X)
(Xt ) + T( X)
T t=1
Proof of Theorem 1
=
3 3
3 3
3 3
3 3
3
3
" 2 3/2
#
" 1 !T
3 3 #
(Xt X)
( ) ( 2 )3/2
= T t=1
3
3
1 T
.
1 1 $(Xt )3 3
= 3
T
=
t=1
,
2
T
1$
2
3
(Xt ) (X )
T
t=1
Bai and Ng: Skewness, Kurtosis, and Normality for Time Series Data
3 ( 2 2 )
+ op (1)
2 3
1 T
.
1 1 $= 3
(Xt )3 3
T
From
T
1$
(Xt )2
3
T
t=1
,+
T
1$
Xt
T
t=1
!T
(xt
t=1 e
x ) = 0, we have
T
1 $ 2
e t
T t=1
t=1
59
T
T
1 $
1$
2
=
(et e ) + 2
et (xt x ) T( )
T
T t=1
t=1
,2
2
T
3 $
2
2
(Xt ) + op (1)
2T 3
t=1
!T [(X )3 ]
t
3
t=1
!T
1
3 1
2
= 3 [ 1 3
(Xt )
2 ]
T !T t=1
2
2
t=1 [(Xt ) ]
+ op (1)
T
1$
3
Zt + op (1).
T
T
1 $
[(xt x )& ( )]2 .
+
T t=1
!
The last term is Op (T 1/2 ), because T1 Tt=1 )xt )2 = Op (1)
and ) )2 = Op (T 1 ). The second term is also Op (T 1/2 ),
!
because T1 Tt=1 et xt = Op (T 1/2 ) and e x = Op (T 1/2 ). For
j = 3,
e 3t = (et e )3 3(et e )2 (xt x )& ( )
t=1
and thus
T
T
1 $ 3
1 $
e t =
(et e )3
T t=1
T t=1
Proof of Theorem 4
The convergence of 34 in distribution to 22 follows from
d
the central limit theorem for Zt as defined in (6). To see 34
2
2 , it suffices to argue that 3 and 4 are asymptotically independent, so that the squared value of 3 and 4 are also asymptotically independent. The limit of 3 is determined by the
partial sums of Zt [see (3)], whereas the limit of 4 is determined by the partial sums of Wt [see (5), with the second component of Wt dropped off because the second component of
is 0 under normality], where
#
#
"
"
(Xt )4 4
(Xt )3
and
Wt =
Zt =
.
(Xt )
(Xt )2 2
It is easy to verify that EZt W&t = 0 for all t. Moreover,
EZtj W&t = 0 for all j. To see this, using the projection theory
for normal random variables, we can write Xt = c(Xtj
) + t for a constant c, where t is a mean-0 normal random
variable independent of Xtj . Thus powers of (Xt ) can be
expressed as those of (Xtj ) and t . The desired result follows from EZtj W&tj = 0 for all j and from the property of t .
The foregoing analysis shows that 3 and 4 are asymptotically
uncorrelated. Because they are asymptotically normal, they are
also asymptotically independent.
Proof of Theorem 5
It is sufficient to prove (7). By direct calculations, e t = et
e + (xt x )& ( ). For j = 2,
e 2t = (et e )2 + 2(et e )(xt x )( ) + [(xt x )& ( )]2 .
1$
(et e )2 (xt x )& T( ) + rT
T
t=1
t=1
t=1
and thus
T
T
1 $
1 $ 4
e t =
(et e )4
T t=1
T t=1
1$
4
(et e )3 (xt x )& T( ) + rT ,
T
t=1
60
A5:
A6:
A7:
A8:
REFERENCES
Andrews, D. W. K. (1991), Heteroskedastic and Autocorrelation Consistent
Matrix Estimation, Econometrica, 59, 817854.
Andrews, D. W. K., and Monahan, J. (1992), An Improved Heteroskedasticity and Autocorrelation Consistent Covariance Matrix Estimator, Econometrica, 60, 953966.