Fourier Crash Course for Spectral Analysis

Fourier
crash course for MPO 581

Brian Mapes Mar 2011
1. Spectral (Fourier) analysis: why?

a. Time   frequency

It is traditional to teach spectral (Fourier) analysis with respect to time series, like
x(t) in class, but actually time is an awkward domain for Fourier decomposition. Time is
not a periodic dimension (so far as we know) and an infinite time series will take forever
to collect, even if we knew the whole past. So the Fourier spectrum in time is always a
fiction: an ideal, unknowable. Fourier analysis in time is therefore a bundle of
compromises that should really be thought of as “estimation” of this unknowable true
frequency spectrum.

Why would we want to estimate a frequency spectrum? Usually it is to seek
spectral peaks, corresponding to periodicities in the time series.

Scientifically, a spectral peak might tell us about what processes are at work
generating the things we observed. We may be trying to detect forced oscillations, like
from periodic astronomical forcings (diurnal, annual, 22000 years, etc.), in a sea of noise.
Fluids also have free oscillations at special periodicities, like “inertial” motions with
period 2π/Coriolis = 12 hours at the poles, or “gravity” (buoyancy) waves with period
2π/N ~ 10 minutes in the troposphere. Fluid dynamical instabilities sometimes have
preferred frequencies, so detecting one frequency vs. another may serve as evidence for
one theory or another. Sometimes there are periodic artifacts in data, like say 60 Hz
electronic noise entering some high frequency data sets. Besides wanting to just detect
these things, we sometimes want to take them out (artifacts), or isolate them (take out
everything else, to study pure oscillations). Spectral analysis is a method for such
decomposition of components, or filtering, of data sets.

More practically, if a periodicity exists in time, that corresponds to predictability.
In the extreme case of a single sinusoidal variation (where the spectrum is a single
infinitely narrow peak), knowing the value at one time means you know the value forever.
Weaker but positive predictability is implied by spectral peaks of finite strength and
width. Efficient-‐market economists say there is no periodicity in the stock market because
if there were, someone would exploit the predictability by demanding to buy when the
price is low (driving up price) and sell at the price peak (which would eliminate the peak).

Like all objective types of estimation, spectrum estimation can be thought of as 2
parts: 1) obtaining a result (known to be imperfect), and 2) obtaining error bars
around that result, which bound the true answer with some high probability (confidence
level) -‐-‐ commonly 95 or 99% in science, but perhaps much higher if engineering safety is
involved. We need to understand how error bars in spectral space correspond to (and
arise from) errors or imperfections in the time-‐domain data series f(t).
a. Longitude   planetary wavenumber

In the atmosphere, we have a much more natural domain for spectral analysis:
longitude. Variations along a latitude line actually ARE periodic, unlike time variations, so
a discrete Fourier spectrum in longitude, with integer planetary wavenumbers 1,2,3...
[units: cycles/(circumference of earth)], is actually a complete, uncompromised
representation of data. In fact a lot of atmosphere models use a spectral representation,
because dynamical terms involving spatial derivatives (like advection or the PGF) can be
computed exactly in analytic form [d/dx(sinx) = cos x, etc], rather than inaccurately with
finite differences like Δu/Δx.

As in the time domain, there are theories which predict periodicities in longitude,
like the prefererred wavelength of baroclinic instability. So Fourier analysis of data bears
on scientific theories. Also there are forced wavelengths in longitude (continent-‐ocean
spacings). And again, like in time, there may be predictability to be gained: Linear wave
equations have superimposable (non-‐interacting) sinusoidal wave solutions, so
sometimes isolating a long wavelength feature (usually with a corresponding long time
scale) can allow extrapolations = predictions of large-‐scale aspects of the flow pattern.
In nature, time series are (essentially) continuous and infinite, while we always have data
that are finite in length (with record length T), available only at discrete times (with
spacing dt), and quantized in value (stored in bytes). These compromises all have different
signatures in spectral space.

An example spectral analysis: buoy time series of downwelling LW radiation.

Let’s take a nice rich time series (2 minute data for 7 days) as a quasi-‐continuous, quasi-‐
infinite “truth” series (Remember, this means the truth is the same 7 days repeated
periodically to infinitiy! More about padding it with zeros below in the convolution part…).

Let’s see its variance or power spectrum, and what happens in time and frequency space
as we truncate the spectrum, or mangle the data in the time domain (undersample it, or
average it into coarser time bins, or quantize its values).

The data x(t) are longwave downwelling radiation from a TAO buoy on the equator at
95W. You can perhaps sense a diurnal cycle (maybe due to temperature or water vapor?),
plus some higher frequency spikes of high downwelling radiation as clouds advect
overhead. I removed the mean (409 W m-‐2). The variance (mean of squared values) is
110.9 (W m-‐2 )2, and the standard deviation is the square root of that, 10.5 W m-‐2.

A simple fft(x) yields a complex number array, the ‘spectrum’ xhat(f), which can be
translated into sine and cosine components, or into amplitude and phase if you prefer, at
each frequency f. The squared magnitude of xhat is called Power (or variance per
frequency interval) P(f). The frequencies are discrete because the time interval is finite
(and assumed perioidic). They are equally spaced: 1,2,3,... cycles within the data period (1
week). I divide these integers by 7.0 to express them as cycles per day (cpd). The highest
frequency resolvable is (1 cycle)/(4 minutes) since it takes at least two 2-‐minute data
points to indicate a minimal ‘cycle’ (a zig-‐zag).

Here is the Power spectrum P(f):

P(f) is a spiky thing, most of the power is in the lowest few frequencies. The area under
this spiky curve is total variance (110.9 W2 m-‐4), but that is really hard to see. Sometimes
people use log plots to squash the spikes and/or the frequency scale and bring out the
long tail, but then the ‘area under the curve’ aspect is lost.

I like to plot the cumulative sum ΣP(f), which asymptotes to the total variance, against log
(f) which helps emphasize the lowest frequencies which are so important. Here, about half
the power is in frequencies lower than 2 cycles per day in this case. The big step up at 1
cycle per day indicates the diurnal cycle. The red triangles emphasize the discrete nature
of the spectrum, but a line connects them for eyeball convenience. The second-‐lowest
frequency (2 cycles/ 7 days) contains a lot of power, but that is not a very well resolved
period...you’d want a longer record to conclude anything.

We can reconstruct the time series by adding up the Fourier harmonics, either all of them
or only with certain frequency bands included (“filtering”). Here’s an example of several
frequency bands, then a rainbow colored cumulative reconstruction using more and more
frequencies until the full time series is recovered.

There is a phase spectrum too, in addition to amplitude (or power). In [-‐pi,pi] range:

What if we reconstruct the time series by keeping P(f) but scrambling the phase? We get
time series with the identical power spectrum, identical total variance. But the
distribution changes: with random phasing to the Fourier components, the total time
series is now the sum of many i.i.d. variables (sines and cosines), so the PDF is more
Gaussian instead of skewed like the original. In other words, all that skew in the original is
encoded somehow in the phase spectrum, and is easily lost in phase scrambling. In fact,
the red and blue curves here come from just periodically shifting the phase array by 1 and
2 positions (see blown up corner of the phase spectrum with black, red, blue): For red, I
mis-‐assigned the phase of frequency 2520 to frequency 1, 1 to 2, 2 to 3, for example. Slight
change, drastic result! Phase information is delicate and subtle.

How about when we set all the phases to 0? (green line in phase spectrum plot)?

Now all the cosines interfere constructively up at t=0, but they cancel elsewhere. Again,
same total variance (phase information is independent of amplitude or Power), but now
the time series is mainly one huge spike near t=0. This is the autocovariance function,
which is equal to the autocorrelation function multiplied by an amplitude factor that
ensures the total variance is 110.9 W/m2.

The power spectrum is the same for both the data and its autocorrelation:

x(t)  fft/ ifft with real phases  P(f)  fft/ ifft with phases=0  autocorr(lag)

(I shifted the data to put t=0 in the middle for a clearer picture – since data are treated as
periodic in Fourier analysis, I get to do that.)
OK, so there’s our reference time series and its spectral representation. What happens to
the spectral signature as we degrade the time series in various ways?

1. Undersampling: Suppose we only grab a value every 3 hours. The total variance of
the undersampled series (red, 117.564) is almost the same (actually a bit more) as
the original, yet the frequencies resolved are only from 1 cycle/week to 1 cycle/6h,
so we know that the power in the higher frequencies in the “real” data are
somehow being improperly mapped or folded (aliased) into the resolved part of
the spectrum.

Here’s the view of the undersampled case in spectral space:

The undersampled series (red) has more power at most frequencies (due aliasing), and
some innacuracy (noise from the undersampling). The diurnal peak gets moved by one
frequency interval bin (from 7 cycles/7days = 1 cpd to 8cycles/7days) as the high values
in the latter part of already-‐peculiar day 6 got missed by the sampling -‐-‐ see 2nd panel in
rainbow graphs above for the pure diurnal harmonic.

Using averages rather than samples every 3 hours is MUCH better: there is less total
variance, as there should be since high frequencies are killed, but the spectrum in the
resolved frequencies is almost exactly right (red right on top of black).

Now let’s quantize the data (red): only 5 different values occur (-‐10,0,10,20,30,40).

The total variance is almost unchanged (increased a bit in this case), how about the
spectrum? Nothing drastic: a bit of spurious power, much of it at f> 10 cpd, associated with
the sharp edges of the square peaks and valleys.

How about a finite data record? Well it depends a bit on which segment you happen to
sample.

Here are the first 2 days. There is 117 W^2 m^-‐4 of variance, but much of that is in a big
trend, so treating this segment as periodic (as xhat = fft(x) implies) gives a big spurious 0.5
cpd signal. Recall the cautions about the lowest frequencies... This is why spectral analysis
usually starts with “detrending” the data. Estimating a trend can be done various ways, I
like just regressing x on t and removing that linear regressed part before taking the fft.

Looking only at days 2-‐3 give a very different picture: less trend (but still some), and no
clear diurnal power. Total variance is only 90 W^2 m^-‐4 in this case.

3. The convolution theorem and padding/windowing/smoothing:

Convolution of 2 functions x(t) and y(t) is the integral of their product. It commutes
(x*y = y*x), which makes it a little hard to appreciate one of the ways it’s useful.
Sometimes we convolve functions to show they are orthogonal, or (if not) to derive
the “projection” of one function on the other. Projection is a symmetric concept. But
another powerful use of convolution comes from having 1 function be your data
(x(t) or x_hat(f)), and the other be a “kernel” or “weighting function” or “window” or
“mask”. For example, it is often a smoothing kernel, a function with finite values around
t=0 but decaying away to 0 for large |t|. This might be a square “top-‐hat” or “boxcar”
function, or a Gaussian or other bell curve, or a “1-‐2-‐1 filter”, which is a narrow boxcar
stacked on a wider boxcar (plotted below).

The convolution of a time series x(t) with a smoothing kernel k(t) can be thought of
as replacing each data point on the graph of x(t) with a little copy of the
(normalized) kernel function k(t), centered at that time t, scaled by the magnitude x(t),
and adding up all these kernels. Think of an infinitesimal area under the graph x(t), x(t)dt,
as a narrow square “tower”, as in basic calculus class. If you are convoluting x(t) with a
bell shaped k(t) kernel, you just replace this tower at each value of t by a bell curve with
the same amount of area under it. Sum up all the little bell curves, and what do you get? A
smoothed (smeared) version of x(t).

Here’s a graphical example of the above idea: (Forgive the small mis-‐offsets of
colored bars; software annoyance)

Now here’s the powerful part (the convolution theorem):

Back to “spectrum estimation” – trying to understand the error bars in frequency that
result from various imperfections and finiteness problems with the data in time.

We saw that taking a finite time segment of length T makes the power spectrum discrete,
with integer frequencies 1,2,3... [UNITS: cycles/T]. Padding with zeroes at both ends, out to
infinity, makes the spectrum continuous. This padded series is just a multiplication in time
of an infinitely repeating set of copies of the time series on [0,T] times a boxcar(t) masking
function which is 0 everywhere except on [0,T]. Transforming this product into spectral
space, the multiplication becomes convolution or smoothing process, with a kernel that is
the Fourier transform of the boxcar function.

TIME: Zero-‐padded series on [0,T] = x(t) x boxcar(time window)
FREQ: FT( “ “ “ ) = x_hat(f) x FT(boxcar)

The ESTIMATED spectrum (left side) we get from transforming our finite record padded
with zeros will be the TRUE SPECTRUM x_hat(f), convoluted with (smeared or smoothed
by) the FT(boxcar) function. In other words, at every frequency in the true spectrum, you
replace the actual value with the wide fft(boxcar) kernel shown below, and add em up.

A nice sharp spectral peak in the true xhat(f) will become a smeared peak, with the side-‐
lobes smearing its power very widely across the frequency domain. “Leakage” refers to
this long-‐range smearing, power getting shifted a long way in frequency space. It’s not
good.

Might it be better to smoothly taper the ends of our padded data sequence, that is,
multiply the infinite data record by some kind of a rounded window rather than the
square boxcar (mask) function above? Let’s guess from the fft of different window
functions:

Not bad – weaker side lobes. How about going fully smooth?

WOW! The Fourier transform of a Gaussian is a Gaussian. No side lobes at all.

Now remember, this works both ways:

So when somebody says they used a “1-‐2-‐1 smoother” on their data (I hear this a lot in
seminars.), it means they convoluted their time series with the 121 boxcar. Often people
will do this several times in succession to make things more smooth. Well that is
equivalent to multiplying their spectrum by the fft(box121) function above. It’s a “low
pass” filter, as frequencies around zero are almost unaffected while frequencies faster
than 1 cycle/(boxcar width) are damped (averaged away).

If you really want a pure low-‐pass filter, it would have a boxcar-‐like structure in frequency
(low frequencies completely allowed, high frequencies completely killed). To do that with
a smoothing kernel, you’d need a complicated wiggly kernel like fft(boxcar) on the
previous page. But then it’s got a wide stencil or footprint in the time domain, so it’s hard
to apply near the ends of a time series. Well these are the inescapable trade-‐offs of having
limited information content.
Another powerful use of convolution is in wavelet analysis. Recall that Fourier coefficients
are the “projection” (mean of the product) of the data with sine and cosine functions. You
could express data in terms of other basis functions, like damped wiggly functions that
don’t extend to infinity, centered at various places in your time series, and with wiggles of
various widths. These “wavelet” basis functions probe the time series for local
periodicities in certain parts of the record series, not just global periodicities that exist
somewhere within the whole time series. To estimate the coefficients of a wavelet
decomposition, you’d just do a projection again: take the product of your data and each
basis function, and sum up (average) that. That’s an expensive computation. It turns out
that convoluting the Fourier spectrum of your data with the Fourier spectrum of the basis
functions allows a huge speedup of this projection problem. In other words, Fourier
analysis serves as a useful halfway “base camp” on the way to wavelet transforms.

OK, now that we’re all set with convolutions, let’s revisit the aliasing problem with 3-‐
hourly subsampling of 2-‐minute LW radiation data (the x(t) example from my first notes).
Here’s a “shah” (comb or spikes) function that, when multiplied by the full time series,
gives the 3-‐hourly samples. Its amplitude spectrum (bottom) is also a series of spikes, now
in frequency, spaced 8 cycles/day apart: the Nyquist interval, twice (+ and -‐) the Nyquist
frequency of 4 cycles per day.

When we transform to spectral space, the effect of multiplying x(t) by the 3-‐hourly shah
function in time is to convolute xhat(f) with the ft(shah) in frequency domain. In this case,
our spectrum is the thing that is compact around the origin (as geophysical spectra usually
are: “red”), while the sampling-‐related spectrum is the thing that extends to infinity. So it
is clearer to call our spectrum the “kernel” and fft(shah) the thing being ‘smoothed’ by that
kernel. But remember, convolution commutes, so labeling one function the “kernel” in a
convolution is just for our convenience and intuitive comfort.

Here’s the spectrum of the full data, plotted as amplitude (not power), and symmetric (not
just the positive frequencies).
Remember, it’s just the same information as the “power spectrum” from the notes above
(figure repeated at right here:).

Let’s build the convolution now:

The red-‐black gap at bottom here is a vertical “error bar”. 3h sampling gives tiny error on
that lowest frequency peak (2 cycles/7 days). For the diurnal peak (1cpd), 3h sampling
produces about ~100% error bars (spurious, aliased power about equal to true power).
For f > 1cpd, the spurious power is several times the true power. That’s consistent with
the results of the one particular grab sample realization I showed in those first notes: the
low frequency is about right, diurnal problematic, the rest mostly spurious.

“ERROR BARS” in frequency space arising from data shortcomings in time

Taking the Fourier transform of a finite segment of data x(t1t2) (of length T) to
get x_hat(f) tacitly assumes that the patterns in x(t1t2) repeat periodically, forever in all
t. The finiteness of T corresponds to a lower bound on the lowest frequency (1 cycle per
record length T) where we can obtain a value of x_hat. It also discretizes the frequencies
where we get an x_hat (1,2,3,... cycles per T). This discrete frequency step width of 1
cycle/T is, in spectrum-‐estimation terms, a form of horizontal “error bar” in frequency
space.
Spectral “leakage” is the continuous version of this discretization “error bar.” In
data-‐analysis terms, leakage corresponds to taking our finite length segment x(t1t2)
and, instead of assuming it is periodic, padding it with zeroes on both ends to make an
infinite (or at least much longer) sequence. For example, say we pad the T length data
sequence with 5T worth of zeros on each ends, giving a total padded record length of 11T.
Analyzing this longer sequence (which, again, assumes this longer 11T sequencie is
periodically repeated to infinity) will give 11 x finer spectral resolution (an xhat value will
be available every 1 cycle/11T instead of every cycle/T). But does this mean we will be
able to actually discriminate frequencies that are closer together than 1 cycle/T? No,
because “leakage” or smearing will still be a horizontal (in frequency space) error
bar. Information-‐wise, you can’t get something for nothing (where padding with zeros is
the nothing).

When we have a discretely sampled sequence of data values at times separated by
dt, it gives an upper bound to the frequencies we can resolve: (1 cycle per 2dt), the
Nyquist frequency. There is folding (aliasing) of power across frequency, which is a
vertical error bar on our estimates of the power spectrum, as some of the power in an
undersampled spectrum is spurious (aliased).

A couple of you mentioned Doppler radar, where there is a “Nyquist velocity.” Doppler
velocity estimation comes from a pulse pair calculation: One pulse of radar energy (waves)
is sent out with known phase (let’s call this phase “0 degrees”). It reflects off a target and
returns, and its phase is measured. If the distance to the target is precisely 1km, and the
wavelength is 10 cm, the returned phase will be 0 degrees (since the distance is 2 km,
exactly the integer 200000 wavelengths to the target and back). Now a second pulse is
sent, say 1/100 s later. Suppose its phase comes in at 36 degrees (1/10 of 360 degrees).
That means the target is now 1km + 0.5cm away (giving a 200000.1 wavelengths travel
distance for the pulse). So the target has moved 36/360 = 1/10 of a wavelength = 1 cm in
1/100 s, a radial velocity of 1m/s. But maybe the target is 1km + 5.5cm away (200001.1
wavelengths travel distance), implying 11 m/s speed? That would also give a 36 degree
phase angle – we can’t tell. So all the other radial velocities that might exist in nature (11,
21, 31... m/s) are folded or aliased back into the instrument’s resolved range, the Nyquist
interval, set by the pulse repetition frequency. Sending pulses more often would solve the
problem (but creates others).

Fourier Crash Course for Spectral Analysis

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Fourier Crash Course for Spectral Analysis

Diunggah oleh

Hak Cipta:

Format Tersedia

Fourier

crash course for MPO 581

1. Spectral (Fourier) analysis: why?

3. The convolution theorem and padding/windowing/smoothing:

Now remember, this works both ways:

Anda mungkin juga menyukai