An
example
spectral
analysis:
buoy
time
series
of
downwelling
LW
radiation.
Let’s
take
a
nice
rich
time
series
(2
minute
data
for
7
days)
as
a
quasi-‐continuous,
quasi-‐
infinite
“truth”
series
(Remember,
this
means
the
truth
is
the
same
7
days
repeated
periodically
to
infinitiy!
More
about
padding
it
with
zeros
below
in
the
convolution
part…).
Let’s
see
its
variance
or
power
spectrum,
and
what
happens
in
time
and
frequency
space
as
we
truncate
the
spectrum,
or
mangle
the
data
in
the
time
domain
(undersample
it,
or
average
it
into
coarser
time
bins,
or
quantize
its
values).
The
data
x(t)
are
longwave
downwelling
radiation
from
a
TAO
buoy
on
the
equator
at
95W.
You
can
perhaps
sense
a
diurnal
cycle
(maybe
due
to
temperature
or
water
vapor?),
plus
some
higher
frequency
spikes
of
high
downwelling
radiation
as
clouds
advect
overhead.
I
removed
the
mean
(409
W
m-‐2).
The
variance
(mean
of
squared
values)
is
110.9
(W
m-‐2
)2,
and
the
standard
deviation
is
the
square
root
of
that,
10.5
W
m-‐2.
A
simple
fft(x)
yields
a
complex
number
array,
the
‘spectrum’
xhat(f),
which
can
be
translated
into
sine
and
cosine
components,
or
into
amplitude
and
phase
if
you
prefer,
at
each
frequency
f.
The
squared
magnitude
of
xhat
is
called
Power
(or
variance
per
frequency
interval)
P(f).
The
frequencies
are
discrete
because
the
time
interval
is
finite
(and
assumed
perioidic).
They
are
equally
spaced:
1,2,3,...
cycles
within
the
data
period
(1
week).
I
divide
these
integers
by
7.0
to
express
them
as
cycles
per
day
(cpd).
The
highest
frequency
resolvable
is
(1
cycle)/(4
minutes)
since
it
takes
at
least
two
2-‐minute
data
points
to
indicate
a
minimal
‘cycle’
(a
zig-‐zag).
Here
is
the
Power
spectrum
P(f):
P(f)
is
a
spiky
thing,
most
of
the
power
is
in
the
lowest
few
frequencies.
The
area
under
this
spiky
curve
is
total
variance
(110.9
W2
m-‐4),
but
that
is
really
hard
to
see.
Sometimes
people
use
log
plots
to
squash
the
spikes
and/or
the
frequency
scale
and
bring
out
the
long
tail,
but
then
the
‘area
under
the
curve’
aspect
is
lost.
I
like
to
plot
the
cumulative
sum
ΣP(f),
which
asymptotes
to
the
total
variance,
against
log
(f)
which
helps
emphasize
the
lowest
frequencies
which
are
so
important.
Here,
about
half
the
power
is
in
frequencies
lower
than
2
cycles
per
day
in
this
case.
The
big
step
up
at
1
cycle
per
day
indicates
the
diurnal
cycle.
The
red
triangles
emphasize
the
discrete
nature
of
the
spectrum,
but
a
line
connects
them
for
eyeball
convenience.
The
second-‐lowest
frequency
(2
cycles/
7
days)
contains
a
lot
of
power,
but
that
is
not
a
very
well
resolved
period...you’d
want
a
longer
record
to
conclude
anything.
We
can
reconstruct
the
time
series
by
adding
up
the
Fourier
harmonics,
either
all
of
them
or
only
with
certain
frequency
bands
included
(“filtering”).
Here’s
an
example
of
several
frequency
bands,
then
a
rainbow
colored
cumulative
reconstruction
using
more
and
more
frequencies
until
the
full
time
series
is
recovered.
There
is
a
phase
spectrum
too,
in
addition
to
amplitude
(or
power).
In
[-‐pi,pi]
range:
What
if
we
reconstruct
the
time
series
by
keeping
P(f)
but
scrambling
the
phase?
We
get
time
series
with
the
identical
power
spectrum,
identical
total
variance.
But
the
distribution
changes:
with
random
phasing
to
the
Fourier
components,
the
total
time
series
is
now
the
sum
of
many
i.i.d.
variables
(sines
and
cosines),
so
the
PDF
is
more
Gaussian
instead
of
skewed
like
the
original.
In
other
words,
all
that
skew
in
the
original
is
encoded
somehow
in
the
phase
spectrum,
and
is
easily
lost
in
phase
scrambling.
In
fact,
the
red
and
blue
curves
here
come
from
just
periodically
shifting
the
phase
array
by
1
and
2
positions
(see
blown
up
corner
of
the
phase
spectrum
with
black,
red,
blue):
For
red,
I
mis-‐assigned
the
phase
of
frequency
2520
to
frequency
1,
1
to
2,
2
to
3,
for
example.
Slight
change,
drastic
result!
Phase
information
is
delicate
and
subtle.
How
about
when
we
set
all
the
phases
to
0?
(green
line
in
phase
spectrum
plot)?
Now
all
the
cosines
interfere
constructively
up
at
t=0,
but
they
cancel
elsewhere.
Again,
same
total
variance
(phase
information
is
independent
of
amplitude
or
Power),
but
now
the
time
series
is
mainly
one
huge
spike
near
t=0.
This
is
the
autocovariance
function,
which
is
equal
to
the
autocorrelation
function
multiplied
by
an
amplitude
factor
that
ensures
the
total
variance
is
110.9
W/m2.
The
power
spectrum
is
the
same
for
both
the
data
and
its
autocorrelation:
x(t)
fft/
ifft
with
real
phases
P(f)
fft/
ifft
with
phases=0
autocorr(lag)
(I
shifted
the
data
to
put
t=0
in
the
middle
for
a
clearer
picture
–
since
data
are
treated
as
periodic
in
Fourier
analysis,
I
get
to
do
that.)
OK,
so
there’s
our
reference
time
series
and
its
spectral
representation.
What
happens
to
the
spectral
signature
as
we
degrade
the
time
series
in
various
ways?
1. Undersampling:
Suppose
we
only
grab
a
value
every
3
hours.
The
total
variance
of
the
undersampled
series
(red,
117.564)
is
almost
the
same
(actually
a
bit
more)
as
the
original,
yet
the
frequencies
resolved
are
only
from
1
cycle/week
to
1
cycle/6h,
so
we
know
that
the
power
in
the
higher
frequencies
in
the
“real”
data
are
somehow
being
improperly
mapped
or
folded
(aliased)
into
the
resolved
part
of
the
spectrum.
Here’s
the
view
of
the
undersampled
case
in
spectral
space:
The
undersampled
series
(red)
has
more
power
at
most
frequencies
(due
aliasing),
and
some
innacuracy
(noise
from
the
undersampling).
The
diurnal
peak
gets
moved
by
one
frequency
interval
bin
(from
7
cycles/7days
=
1
cpd
to
8cycles/7days)
as
the
high
values
in
the
latter
part
of
already-‐peculiar
day
6
got
missed
by
the
sampling
-‐-‐
see
2nd
panel
in
rainbow
graphs
above
for
the
pure
diurnal
harmonic.
Using
averages
rather
than
samples
every
3
hours
is
MUCH
better:
there
is
less
total
variance,
as
there
should
be
since
high
frequencies
are
killed,
but
the
spectrum
in
the
resolved
frequencies
is
almost
exactly
right
(red
right
on
top
of
black).
Now
let’s
quantize
the
data
(red):
only
5
different
values
occur
(-‐10,0,10,20,30,40).
The
total
variance
is
almost
unchanged
(increased
a
bit
in
this
case),
how
about
the
spectrum?
Nothing
drastic:
a
bit
of
spurious
power,
much
of
it
at
f>
10
cpd,
associated
with
the
sharp
edges
of
the
square
peaks
and
valleys.
How
about
a
finite
data
record?
Well
it
depends
a
bit
on
which
segment
you
happen
to
sample.
Here
are
the
first
2
days.
There
is
117
W^2
m^-‐4
of
variance,
but
much
of
that
is
in
a
big
trend,
so
treating
this
segment
as
periodic
(as
xhat
=
fft(x)
implies)
gives
a
big
spurious
0.5
cpd
signal.
Recall
the
cautions
about
the
lowest
frequencies...
This
is
why
spectral
analysis
usually
starts
with
“detrending”
the
data.
Estimating
a
trend
can
be
done
various
ways,
I
like
just
regressing
x
on
t
and
removing
that
linear
regressed
part
before
taking
the
fft.
Looking
only
at
days
2-‐3
give
a
very
different
picture:
less
trend
(but
still
some),
and
no
clear
diurnal
power.
Total
variance
is
only
90
W^2
m^-‐4
in
this
case.
Back
to
“spectrum
estimation”
–
trying
to
understand
the
error
bars
in
frequency
that
result
from
various
imperfections
and
finiteness
problems
with
the
data
in
time.
We
saw
that
taking
a
finite
time
segment
of
length
T
makes
the
power
spectrum
discrete,
with
integer
frequencies
1,2,3...
[UNITS:
cycles/T].
Padding
with
zeroes
at
both
ends,
out
to
infinity,
makes
the
spectrum
continuous.
This
padded
series
is
just
a
multiplication
in
time
of
an
infinitely
repeating
set
of
copies
of
the
time
series
on
[0,T]
times
a
boxcar(t)
masking
function
which
is
0
everywhere
except
on
[0,T].
Transforming
this
product
into
spectral
space,
the
multiplication
becomes
convolution
or
smoothing
process,
with
a
kernel
that
is
the
Fourier
transform
of
the
boxcar
function.
TIME:
Zero-‐padded
series
on
[0,T]
=
x(t)
x
boxcar(time
window)
FREQ:
FT(
“
“
“
)
=
x_hat(f)
x
FT(boxcar)
The
ESTIMATED
spectrum
(left
side)
we
get
from
transforming
our
finite
record
padded
with
zeros
will
be
the
TRUE
SPECTRUM
x_hat(f),
convoluted
with
(smeared
or
smoothed
by)
the
FT(boxcar)
function.
In
other
words,
at
every
frequency
in
the
true
spectrum,
you
replace
the
actual
value
with
the
wide
fft(boxcar)
kernel
shown
below,
and
add
em
up.
A
nice
sharp
spectral
peak
in
the
true
xhat(f)
will
become
a
smeared
peak,
with
the
side-‐
lobes
smearing
its
power
very
widely
across
the
frequency
domain.
“Leakage”
refers
to
this
long-‐range
smearing,
power
getting
shifted
a
long
way
in
frequency
space.
It’s
not
good.
Might
it
be
better
to
smoothly
taper
the
ends
of
our
padded
data
sequence,
that
is,
multiply
the
infinite
data
record
by
some
kind
of
a
rounded
window
rather
than
the
square
boxcar
(mask)
function
above?
Let’s
guess
from
the
fft
of
different
window
functions:
Not
bad
–
weaker
side
lobes.
How
about
going
fully
smooth?
WOW!
The
Fourier
transform
of
a
Gaussian
is
a
Gaussian.
No
side
lobes
at
all.
When
we
transform
to
spectral
space,
the
effect
of
multiplying
x(t)
by
the
3-‐hourly
shah
function
in
time
is
to
convolute
xhat(f)
with
the
ft(shah)
in
frequency
domain.
In
this
case,
our
spectrum
is
the
thing
that
is
compact
around
the
origin
(as
geophysical
spectra
usually
are:
“red”),
while
the
sampling-‐related
spectrum
is
the
thing
that
extends
to
infinity.
So
it
is
clearer
to
call
our
spectrum
the
“kernel”
and
fft(shah)
the
thing
being
‘smoothed’
by
that
kernel.
But
remember,
convolution
commutes,
so
labeling
one
function
the
“kernel”
in
a
convolution
is
just
for
our
convenience
and
intuitive
comfort.
Here’s
the
spectrum
of
the
full
data,
plotted
as
amplitude
(not
power),
and
symmetric
(not
just
the
positive
frequencies).
Remember,
it’s
just
the
same
information
as
the
“power
spectrum”
from
the
notes
above
(figure
repeated
at
right
here:).
Let’s
build
the
convolution
now:
The
red-‐black
gap
at
bottom
here
is
a
vertical
“error
bar”.
3h
sampling
gives
tiny
error
on
that
lowest
frequency
peak
(2
cycles/7
days).
For
the
diurnal
peak
(1cpd),
3h
sampling
produces
about
~100%
error
bars
(spurious,
aliased
power
about
equal
to
true
power).
For
f
>
1cpd,
the
spurious
power
is
several
times
the
true
power.
That’s
consistent
with
the
results
of
the
one
particular
grab
sample
realization
I
showed
in
those
first
notes:
the
low
frequency
is
about
right,
diurnal
problematic,
the
rest
mostly
spurious.
“ERROR
BARS”
in
frequency
space
arising
from
data
shortcomings
in
time
Taking
the
Fourier
transform
of
a
finite
segment
of
data
x(t1t2)
(of
length
T)
to
get
x_hat(f)
tacitly
assumes
that
the
patterns
in
x(t1t2)
repeat
periodically,
forever
in
all
t.
The
finiteness
of
T
corresponds
to
a
lower
bound
on
the
lowest
frequency
(1
cycle
per
record
length
T)
where
we
can
obtain
a
value
of
x_hat.
It
also
discretizes
the
frequencies
where
we
get
an
x_hat
(1,2,3,...
cycles
per
T).
This
discrete
frequency
step
width
of
1
cycle/T
is,
in
spectrum-‐estimation
terms,
a
form
of
horizontal
“error
bar”
in
frequency
space.
Spectral
“leakage”
is
the
continuous
version
of
this
discretization
“error
bar.”
In
data-‐analysis
terms,
leakage
corresponds
to
taking
our
finite
length
segment
x(t1t2)
and,
instead
of
assuming
it
is
periodic,
padding
it
with
zeroes
on
both
ends
to
make
an
infinite
(or
at
least
much
longer)
sequence.
For
example,
say
we
pad
the
T
length
data
sequence
with
5T
worth
of
zeros
on
each
ends,
giving
a
total
padded
record
length
of
11T.
Analyzing
this
longer
sequence
(which,
again,
assumes
this
longer
11T
sequencie
is
periodically
repeated
to
infinity)
will
give
11
x
finer
spectral
resolution
(an
xhat
value
will
be
available
every
1
cycle/11T
instead
of
every
cycle/T).
But
does
this
mean
we
will
be
able
to
actually
discriminate
frequencies
that
are
closer
together
than
1
cycle/T?
No,
because
“leakage”
or
smearing
will
still
be
a
horizontal
(in
frequency
space)
error
bar.
Information-‐wise,
you
can’t
get
something
for
nothing
(where
padding
with
zeros
is
the
nothing).
When
we
have
a
discretely
sampled
sequence
of
data
values
at
times
separated
by
dt,
it
gives
an
upper
bound
to
the
frequencies
we
can
resolve:
(1
cycle
per
2dt),
the
Nyquist
frequency.
There
is
folding
(aliasing)
of
power
across
frequency,
which
is
a
vertical
error
bar
on
our
estimates
of
the
power
spectrum,
as
some
of
the
power
in
an
undersampled
spectrum
is
spurious
(aliased).
A
couple
of
you
mentioned
Doppler
radar,
where
there
is
a
“Nyquist
velocity.”
Doppler
velocity
estimation
comes
from
a
pulse
pair
calculation:
One
pulse
of
radar
energy
(waves)
is
sent
out
with
known
phase
(let’s
call
this
phase
“0
degrees”).
It
reflects
off
a
target
and
returns,
and
its
phase
is
measured.
If
the
distance
to
the
target
is
precisely
1km,
and
the
wavelength
is
10
cm,
the
returned
phase
will
be
0
degrees
(since
the
distance
is
2
km,
exactly
the
integer
200000
wavelengths
to
the
target
and
back).
Now
a
second
pulse
is
sent,
say
1/100
s
later.
Suppose
its
phase
comes
in
at
36
degrees
(1/10
of
360
degrees).
That
means
the
target
is
now
1km
+
0.5cm
away
(giving
a
200000.1
wavelengths
travel
distance
for
the
pulse).
So
the
target
has
moved
36/360
=
1/10
of
a
wavelength
=
1
cm
in
1/100
s,
a
radial
velocity
of
1m/s.
But
maybe
the
target
is
1km
+
5.5cm
away
(200001.1
wavelengths
travel
distance),
implying
11
m/s
speed?
That
would
also
give
a
36
degree
phase
angle
–
we
can’t
tell.
So
all
the
other
radial
velocities
that
might
exist
in
nature
(11,
21,
31...
m/s)
are
folded
or
aliased
back
into
the
instrument’s
resolved
range,
the
Nyquist
interval,
set
by
the
pulse
repetition
frequency.
Sending
pulses
more
often
would
solve
the
problem
(but
creates
others).