AbstractRecently, it was suggested that spectrum estimation wavelet coefficients, and then inverting the DWT. The technique
can be accomplished by applying wavelet denoising methodology has already been used in helioseismology [9]. The tapers reduce
to wavelet packet coefficients derived from the logarithm of a spec- side-lobe leakage in a spectrum estimator for a time series with
trum estimate. The particular algorithm we consider consists of
computing the logarithm of the multitaper spectrum estimator, ap- a spectrum that has a large dynamic range and/or is rapidly
plying an orthonormal transform derived from a wavelet packet varying. The use of multiple tapers gives a consistent spectrum
tree to the log multitaper spectrum ordinates, thresholding the em- estimator with a variance inversely proportional to the number
pirical wavelet packet coefficients, and then inverting the trans- of tapers used. The resulting multitaper estimator is thus
form. For a small number of tapers, suitable transforms/partitions superior to the periodogram in terms of reduced leakage-bias
for the logarithm of the multitaper spectrum estimator are de-
rived using a method matched to statistical thresholding proper- and variance. The empirical wavelet coefficients derived from
ties. The partitions thus derived starting from different stationary the log multitaper spectrum estimator are correlated, and it was
time series are all similar and easily derived, and any differences noted that the known variances of these coefficients increase
between the wavelet packet and discrete wavelet transform (DWT) with the level of the DWT; this acts in accordance with the
approaches are minimal. For a larger number of tapers, where the wavelet thresholding paradigm to suppress small-scale noise
chosen parameters satisfy the conditions of a proven theorem, the
simple DWT again emerges as appropriate. Hence, using our ap- while leaving informative coarse-scale coefficients relatively
proach to thresholding and the method of partitioning, we conclude unattenuated. Wavelet denoising is adaptive and is extremely
that the DWT approach is a very adequate wavelet-based approach rapidly implemented using the fast pyramid algorithm for the
and that the use of wavelet packets is unnecessary. DWT and its inverse. Since the variances are known in our
Index TermsEstimation, spectral analysis, statistics, time se- context (and define the thresholds), there are no unknown fixed
ries, wavelet transforms. smoothing parameters requiring optimization, such as those
that occur with spline or kernel smoothing.
If the DWT is applied to an ordinary time series with
I. INTRODUCTION
unit sample interval, the wavelet coefficients at level are
In view of the approach in [11], and the fact that the DWT is b) For Haar wavelet filters, as the number of tapers increases,
a particular orthonormal transform chosen from the many avail- the partition variances (which determine the threshold
able from a wavelet packet tree for partitioning the pseudo-fre- levels) theoretically converge to those of the DWT.
quency support , an obvious question arises as to what, c) The DWT approach is a very adequate wavelet-based ap-
if any, benefits would arise from replacing the DWT in [20] by proach, and the use of wavelet packets is unnecessary.
a wavelet packet approach, i.e., the algorithm of interest is as Section II gives a brief introduction to the multitaper spec-
follows. trum estimator and its covariance structure. Wavelet packets and
orthonormal partitions are presented in Section III. Section IV
shows that the thresholding step will depend on i) the wavelet
Compute the logarithm of the multitaper packet variances and ii) the particular wavelet packet partition
spectrum estimator. used. The wavelet packet variances are derived in Section V.
Apply an orthonormal transform derived Example partitions are given in Section VI. Section VII presents
from a wavelet packet tree to the log a theorem and examples showing that for the Haar wavelet
multitaper spectrum ordinates. filter and a sufficiently large number of tapers, the wavelet
Threshold the empirical wavelet packet packet variances become constant over DWT subbands. The
coefficients. white noise packet selection algorithm is given and applied in
Invert the transform. Section VIII. Results of the full wavelet-packet-based spectrum
estimation algorithm are given in Section IX.
cov cov
cov (3)
(4)
Fig. 1. Illustration of first- and second-level DWPTs. Put together, they form
where has unit periodicity. a wavelet packet tree. # 2 denotes downsampling by 2.
A simple approximation for was formulated in [20],
namely, and Given the sequence of length , where
denotes the integer part operator, we produce , which
are the coefficients at the next level, using
if (5)
otherwise.
(7)
B. Orthonormal Transforms/Partitions
In addition to the th level DWPTs, we can extract many
other orthonormal transforms, or disjoint dyadic decomposi-
tions, from a WP tree [14, p. 213]. Such an -packet partition
of the pseudo-frequency in-
terval , where , and
for is associated with an -dimen-
sional orthonormal matrix with rows given by the rows of
matrices .
By way of example, a 16-packet partition of , which
will be discussed further later, is given by
(11)
for and
.
(13)
where
(16)
(14)
for . We can map the variable to a scale with
provides a conservative threshold for coefficients in packet
unit periodicity by setting
and, hence, defines a level-and-subband dependent
universal threshold. (17)
C. Requirements This has the shape of Fejrs kernel. The variable is our
To set up the thresholding step, we thus require the wavelet pseudo-frequency variable.
packet variances for (14) and the particular WP partition A slightly more accurate version of is obtained by
to use. Fourier transforming in (4); we denote this latter case, by
shown for ten sine tapers, as the continuous line
V. VARIANCE OF NOISE COEFFICIENTS in Fig. 3(a) and (b).
Wavelet packet variances can be easily computed VI. PARTITIONING
as follows [5]. Since the covariance matrix is cir-
cular, , where the th row of contains Suppose for a moment that the transform used in (11) was
. denotes Hermitian the DWT. If the noise was statistically independent and iden-
transpose, and is a diagonal matrix with diagonal elements tically distributed (iid) zero-mean Gaussian with variance ,
, which is the DFT of (10). then the sequence would be as well (since the DWT is
From (11), the row of , whose multiplication with orthonormal), and then, the threshold is appro-
yields , is given by the row vector defined in (7). priate and has optimality properties [3]. If any other orthonormal
Let us denote the DFT of by wavelet packet transform was employed instead, and the noise
; then, we have that . Thus, was statistically iid Gaussian, then again, so would be the se-
var is quence (since the packet transform is orthogonal), and the
same threshold is again entirely appropriate.
By way of contrast, if we use the DWT and the noise is
correlated Gaussian (as in our application here), then the
(15) variance of the noise transform coefficients is no longer
the same for all packets composing the DWT, and moreover,
Note that var does not depend on . the threshold , where is the variance
CRISTN AND WALDEN: MULTITAPER POWER SPECTRUM ESTIMATION AND THRESHOLDING 2981
for packet , is conservative (Section IV-B), i.e., will tapers used. Hence, the partition must likewise depend only on
result in less noise remaining at the expense of possible damage the tapers used.
to the signal. How conservative this threshold is depends on the Since the log multitaper spectrum ordinates derived from
nature of the correlation of the the noise transform coefficients. AR(2) and RC processes (and other processes studied but not
First, we wish to discover whether there is an orthonormal discussed here) all give rise to very similar resulting partitions
wavelet packet transform where the Gaussian noise transform of the pseudo-frequency interval , it is clear that the
coefficients in the packets have minimal correlation so that single process-independent approximated by or
would again be an appropriate threshold, is widely applicable.
and second, how similar the resulting spectrum estimate is to
that found using the DWT. VII. EFFECT OF NUMBER OF TAPERS
We can deduce from [8, Sec. 2.2] that there will be little or For a level- DWPT, the variances can be approximated
no correlation between the coefficients in different packets as
from (15) by inserting of (16) to give
these are generated essentially by bandpass filters with near dis-
joint support. Further, and importantly, we can select the wavelet
packet partition so that the within-packets elements of are (18)
a white noise sample for each . This is the basis
of the scheme we use, which will be discussed later in Sec-
tion VIII-B. Let so that gives . Fig. 4
Using ten tapers, we derived a WP partition for the log mul- shows, for Haar scaling and wavelet filters, the values of ,
titaper spectrum ordinates resulting from each of two stochastic for 10, 22, and 30 tapers. Fig. 4 also shows
processes used previously by [12] and [20]: the corresponding , of which the are weighted aver-
ages. We see that as the number of tapers increases, the values
1) the AR(2) process specified by of become constant over some subbands on the pseudo-fre-
, where is zero mean white noise; quency axis. We can make a general statement on this behavior
2) a typical mobile radio communications (RC) process via Theorem 1.
with spectrum being a superposition of two bandlimited, Theorem 1: Given and Haar scaling and wavelet
fading, mobile radio signals, white background noise, and filters, suppose we carry out the DWPT to level
a narrowband interference term with Gaussian spectrum. such that . Then, for every , all the
The spectrum of the AR(2) process is smooth and slowly variances of the subbands , where
varying. Hence, it satisfies the assumptions necessary for the , take the form
log multitaper spectrum estimator to depend only on the tapers
used [see (3) and (4)]. The spectrum of the RC process is (19)
rapidly varying and nonsmooth in places, so in this case, the
assumptions do not hold perfectly; however, as we shall see,
this appears to have minimal effect. Using the method set out Proof: This is lengthy and involved. Details can be found
in Section VIII-C, an 18-packet partition resulted for the log at http://stats.ma.ic.ac.uk/~atw.
multitaper spectrum ordinates derived from the AR(2) process, Consequently, if the number of tapers is large enough, then for
and a 16-packet partition [see (8) and Fig. 2] resulted for the every , all the variances over the subband
log multitaper spectrum ordinates derived from the RC process. are identical and equal
The partition intervals are shown on the top of the to (19).
pseudo-frequency axis of Fig. 3(a) for the 18-packet partition Consider Fig. 4, which shows a level decomposition.
and Fig. 3(b) for the 16-packet partition. The result for the log For the theorem to hold, , i.e., . Hence, the
multitaper spectrum ordinates derived from the AR(2) process theorem is applicable for Fig. 4(c) (30 tapers), and we see indeed
is only slightly different from that derived from the RC process. that the subbands of constant values are the same as those for the
usual DWT to level 5, namely, .
The corresponding variance values , computed using Then, for example, all the variances of the subbands making up
(15) with , are also shown in Fig. 3(a) and (b). We see are identical, which means that the average
that the correspond to averaging over the pseudo- value of is the same in all these subbands so that in this
frequency partition intervals , in line with (15). sense, they may be taken to be homogeneous and composited
The role of the partition is to select the widths of the pseudo- together and treated as the single band .
frequency intervals so that changes in the shape of Hence, the practical significance of Theorem 1 is that if our
can be captured; roughly speaking, where it is slowly changing, choice of level and number of tapers satisfy those of the
the pseudo-frequency intervals are wide, and where it is rapidly theorem (and we use Haar scaling and wavelet filters), DWPT
changing, they are relatively narrower. However, is the thresholding appears to offer no advantages over DWT thresh-
DFT of the autocovariance of , and we already know that olding since the subbands of constant variance are the same as
subject to the stated approximations in Section II, the covariance those for the usual DWT. However, Fig. 4(a) suggests that for a
structure of the noise , and, hence, the covariance structure smaller number of tapers, a good partition selection method is
of the log multitaper spectrum estimator, depends only on the potentially useful, and so this is considered in Section VIII.
2982 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 50, NO. 12, DECEMBER 2002
Fig. 4. Approximate wavelet packet variances , n = 0; . . . ; 31, on a decibel scale (top row), along with the corresponding DFTs S (p) (bottom row) for
(a) and (d) 10, (b) and (e) 22, and (c) and (f) 30 tapers.
VIII. WAVELET PACKET PARTITION SELECTION METHOD form for small , the wavelet packet transform of (20) is essen-
Here, we will illustrate the characteristics of wavelet packet tially a combination of these two cases, except for frequencies
partition selection by white noise testing for our particular close to 0 or 1/2, for which our theoretical development does
problem using a simulation study. not apply anyway.
The wavelet packet coefficients in are thresholded
A. Matching to Circular Nature of DWPT resulting in, say, . The inverse orthonormal wavelet packet
transform gives . Since the thresholded multitaper
The covariance matrix was formulated using the non-neg- ordinates need not be symmetric about , we have
ative frequencies , . Since
chosen to average and , where
is an even periodic function with unit period, with the
is the wavelet-packet-thresholded multitaper
final spectrum estimation in mind, we can, as done in [20], use
spectrum estimate, i.e., our estimator has the form
all frequencies because then, we compute
the transform of a complete period of the log multitaper spec-
trum ordinates, which better matches the circular nature of the
DWPT. Therefore, in what follows, we apply the wavelet packet
transform to an expanded version of (9), namely, :
(21)
.. .. ..
. . . (20)
Note that we still use for thresholding. This level is ap- B. White Noise Test Algorithm
propriate for the non-negative and nonpositive cases separately, We wish to determine a suitable orthonormal transform or,
and because of the localized nature of the wavelet packet trans- equivalently, the wavelet packet partition to apply to the log mul-
CRISTN AND WALDEN: MULTITAPER POWER SPECTRUM ESTIMATION AND THRESHOLDING 2983
titaper spectrum ordinates. In our approach, we make use of an We then base our test for white noise on the Kolmogorov good-
algorithm proposed in [15]. Why choose this method? As stated ness of fit test for a completely specified distribution (in this
in Section VI, we wish to find an orthonormal wavelet packet case a uniform on ). If we let
transform where the Gaussian noise transform coefficients in
the packets have minimal correlation. This is exactly what the
proposed partition selection algorithm achieves.
Consider the white noise hypothesis test for the
wavelet packet noise coefficients for a par-
ticular wavelet packet label of the form
we can reject the null hypothesis of white noise at the level
is a white noise sample, versus of significance if exceeds , which is
is not a white noise sample the upper point for the distribution of under the
null hypothesis. A simple approximation for is given in
The white noise test depends entirely on second-order (correla- [17].
tion) properties. The fact that in carrying out the test on we The top-down nature of this algorithm is appropriate since
must make use of the observed sequence does not matter once a packet is identified for which the noise coefficients are
since both have the same correlation structure. To see this, note deemed white, further splitting is pointless.
that , ,
and , where denotes expected C. Partition Selection From the Wavelet Packet Tree
value. Then For the log multitaper spectrum estimator derived from the
AR(2) and RC processes, the following steps were repeated
cov 2000 times.
1) A sample was generated from the
process, with .
2) The log multitaper spectrum ordinates were calcu-
cov (22) lated using sinusoidal tapers, then
Hence, we can test for being a white noise sample by was calculated, and , (20) was formed up.
working with . 3) An orthonormal partition was selected using the three
With defined in (6), the algorithm takes the form: steps set out in Section VIII-B; the significance of the
white noise test was set to , and in line with
1) Set the level , and create the results in [20], we took 8 and 6 for the AR(2) and
wavelet packet coefficients , RC processes, respectively.
for 1. For each of the processes, the 2000 orthonormal partitions found
2) Carry out the white noise hypothesis in this way were stacked one on top of the other, and a grey-scale
test on each , and if is not re- plot was produced of the result, so that dark areas represent a
jected, then make the subband an ele- pseudo-frequency partition subband selected in most of
ment of , else split , i.e., create the 2000 cases, whereas a light grey area marks an interval se-
the wavelet packet coefficients and lected only occasionally. These grey-scale stack plots of DWPT
. partitions for the log multitaper spectrum ordinates derived from
3) For (where is the the AR(2) process are shown in Fig. 5 and, for the RC process,
deepest level of the transform), repeat in Fig. 6.
step 2 on all those coefficient vectors We wish to extract from each of the two stack plots a
created by the previous step. representative partition. The rule applied is that if ei-
ther of the children subbands and
is darker (occurs more often) than its parent subband
A cumulative periodogram method can be used to carry out , then the parent subband
the white noise test. Let be the magnitude squared of is split into its children subbands. We will refer to Fig. 6
the DFT of at the pseudo-frequencies , and denote as our example. Starting with the pseudo-frequency sub-
the number of these frequencies satisfying band , we see that this occurs more often
by . Then, the normalized cumulative (darker) than the children subbands and
periodogram is given by ; likewise, the latter occur more often than
their descendants; therefore, the subband
is not split but becomes part of the representative partition.
It is thus marked on the top of the pseudo-frequency axes of
Figs. 3(b) and 6 and is a filled segment of Fig. 2. Moving
on, we see that occurs more often than its
parent and more often than its descendants.
2984 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 50, NO. 12, DECEMBER 2002
Fig. 5. Grey-scale stack plot of DWPT partition for the log multitaper Fig. 6. Grey-scale stack plot of DWPT partition for the log multitaper
spectrum ordinates derived from the AR2 process. The partition intervals spectrum ordinates derived from the RC process. The partition intervals I
I are shown along the top of the pseudo-frequency axis. are shown along the top of the pseudo-frequency axis.
ACKNOWLEDGMENT
Helpful comments by the referees and associate editor led to
a much improved exposition.
REFERENCES
[1] M. S. Bartlett and D. G. Kendall, The statistical analysis of variance-
heterogeneity and the logarithmic transformation, Suppl. J. R. Statist.
Soc., vol. 8, pp. 128138, 1946.
[2] I. Daubechies, Ten Lectures on Wavelets. Philadelphia, PA: SIAM,
1992.
[3] D. L. Donoho and I. M. Johnstone, Ideal spatial adaptation by wavelet
shrinkage, Biometrika, vol. 81, pp. 425455, 1994.
Fig. 7. Representative spectrum estimates for (a) AR(2) and (b) RC processes [4] M. Fligge and S. K. Solanki, Noise reduction in astronomical spectra
using wavelet packet thresholding. (c) Representative spectrum estimate for the using wavelet packet, Astron. Astrophys. Suppl. Series, vol. 124, pp.
RC process using instead cubic spline smoothing. The thick line shows the true 57987, 1997.
power spectrum. [5] W. A. Fuller, Introduction to Statistical Time Series, 2nd ed. New York:
Wiley, 1996.
[6] H.-Y. Gao, Wavelet estimation of spectral densities in time series anal-
For each of the 2000 simulations, the root mean square error ysis, Ph.D. dissertation, Dept. Statist., Univ. Calif., Berkeley, 1993.
in the log spectrum estimate was calculated. Figs. 7(a) and (b) [7] , Choice of thresholds for wavelet shrinkage estimate of the spec-
show, respectively, the estimates of the AR(2) and RC spectra trum, J. Time Series Anal., vol. 18, pp. 231251, 1997.
[8] I. M. Johnstone and B. W. Silverman, Wavelet threshold estimators for
that have closest to the average root mean square error and data with correlated noise, J. R. Statist. Soc. B, vol. 59, pp. 319551,
may thus be considered as representative spectrum estimates. 1997.
Clearly, wavelet packet thresholding of multitaper spectrum es- [9] R. W. Komm, Y. Gu, F. Hill, P. B. Stark, and I. K. Fodor, Multitaper
spectral analysis and wavelet denoising applied to helioseismic data,
timates using the scheme outlined works extremely well for both Astrophys. J., vol. 519, pp. 407421, 1999.
processes. Comparing the results with those of [20], who used [10] C. V. Lambrecht and M. Karrakchou, Wavelet packets-based high-
the DWT, we find that the average root mean square error is 2% resolution spectral estimation, Signal Process., vol. 47, pp. 135144,
1995.
smaller for the wavelet packet method for the AR(2) process
[11] B. Lumeau, J. C. Pesquet, J. F. Bercher, and L. Louveau, Optimiza-
but is 1% larger for the RC process. Hence, any differences tion of bias-variance trade-off in nonparametric spectral analysis by de-
appear minimal at best. By way of comparison, for the diffi- composition into wavelet packets, Progress Wavelet Anal. Applicat.,
cult-to-estimate RC spectrum, we also give the equivalent result pp. 28590, 1993.
[12] P. Moulin, Wavelet thresholding techniques for power spectrum esti-
using cubic spline smoothing of the log multitaper spectrum es- mation, IEEE Trans. Signal Processing, vol. 42, pp. 31263136, Nov.
timate; here, the average root mean square error is 3% larger 1994.
than achieved using the DWT; see Fig. 7(c). [13] D. B. Percival and A. T. Walden, Spectral Analysis for Physical Appli-
cations. Cambridge: Cambridge Univ. Press, 1993.
[14] , Wavelet Methods for Time Series Analysis. Cambridge, U.K.:
Cambridge Univ. Press, 2000.
[15] D. B. Percival, S. Sardy, and A. C. Davison, Wavestrapping time se-
X. SUMMARY AND CONCLUSIONS ries: Adaptive wavelet-based bootstrapping, in Nonlinear and Nonsta-
tionary Signal Processing, W. J. Fitzgerald, R. L. Smith, A. T. Walden,
For a fairly small number of tapers, as often used in prac- and P. C. Young, Eds. Cambridge, U.K.: Cambridge Univ. Press, 2000,
tice (such as ), suitable partitions for the log multi- pp. 442471.
[16] K. S. Riedel and A. Sidorenko, Minimum biased multitaper spectral
taper spectrum ordinates derived from different stationary time estimation, IEEE Trans. Signal Processing, vol. 43, pp. 188195, Jan.
series will all be similar and could be simulated from, say, the 1995.
AR(2) process, as was done here, but applied to an unknown [17] M. A. Stephens, EDF statistics for goodness of fit and some compar-
isons, J. Amer. Statist. Assoc., vol. 69, pp. 730737, 1974.
process; any differences between the spectrum estimates using [18] D. J. Thomson, Spectrum estimation and harmonic analysis, Proc.
the DWPT and DWT approaches have been found to be min- IEEE, vol. 70, pp. 10551096, 1982.
imal. [19] A. T. Walden, E. J. McCoy, and D. B. Percival, The effective bandwidth
of a multitaper spectral estimator, Biometrika, vol. 82, pp. 201214,
For a larger number of tapers, where the chosen parameters 1995.
(number of tapers, transform level, etc.) satisfy the conditions [20] A. T. Walden, D. B. Percival, and E. J. McCoy, Spectrum estimation
of Theorem 1, then, subject to the approximations and use of by wavelet thresholding of multitaper estimators, IEEE Trans. Signal
Processing, vol. 46, pp. 31533165, Dec. 1998.
the Haar filters, the simple DWT emerges as appropriate for [21] M. V. Wickerhauser, Adapted Wavelet Analysis from Theory to Software
thresholding. Algorithms. Wellesley, MA: A. K. Peters, 1994.
2986 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 50, NO. 12, DECEMBER 2002
Alberto Contreras Cristn received the B.Sc. Andrew T. Walden (A86) received the B.Sc.
degree in actuarial science from the National degree in mathematics from the University of Wales,
Autonomous University of Mexico, Mexico City, Bangor, in 1977 and the M.Sc. and Ph.D. degrees
in 1990 and the M.Sc. degree in statistics from the in statistics from the University of Southampton,
Mathematics Research Center, Guanajuato, Mexico, Southampton, U.K., in 1979 and 1982.
in 1993. He received the Ph.D. degree in statistics He was a research scientist at British Petroleum
from Imperial College, University of London, from 1981 to 1990, specializing in statistical signal
London, U.K., in 1998. processing for oil and gas exploration. From 1985
He joined the Institute for Research in Applied to 1986, he was a visiting Assistant Professor with
Mathematics and Systems (IIMAS), National the University of Washington, Seattle, teaching and
Autonomous University of Mexico, in 1998, where researching in both the Statistics and Geophysics
he is currently researcher and lecturer. His research interests include time Departments. He joined the Department of Mathematics, Imperial College,
series, spectrum analysis and wavelets, and Bayesian inference. London, U.K., in 1990, where he is currently Professor of statistics. His re-
search interests are in time series, spectrum analysis, and wavelets, particularly
with applications to geophysics and oceanography. He is coauthor of the
recent texts (with D. Percival) Spectral Analysis for Physical Applications and
Wavelet Methods for Time Series Analysis.