Anda di halaman 1dari 5


Exploiting Signal and Noise Statistics for Fixed

Point FFT Design Optimization in OFDM systems
Sthanunathan Ramakrishnan, Jaiganesh Balakrishnan and Karthik Ramasubramanian
email: {sthanu,jai,karthikr}

AbstractScaling the different stage outputs in an FFT appropriately is crucial for getting a good Signal to Quantization
noise ratio (SQNR) in fixed point FFT design. Traditional designs
have either handled this through Convergent block floating
point technique (CBFP), which has memory, computation and
latency penalties or through time consuming simulations. In this
paper, we consider the special case of FFT design for OFDM
transceivers. We exploit the Gaussian nature of OFDM signals
to predict the bit-growth of the signal through the various
stages of the FFT and propose a technique to scale the signal
appropriately. Additionally, we investigate the quantization error
profile and propose a technique to improve SQNR by exploiting
the presence of null tones at the band edges. With the proposed
techniques, the performance comes close to the CBFP design, with
no increase in complexity compared to existing static designs.
Simulation results illustrating the performance improvements of
the proposed technique are presented.

FFT and derive optimal and near-optimal scaling schemes. In

section V, we obtain the nature of the quantization noise
affecting the FFT outputs and exploit it to get higher SQNR.
Finally simulation results are presented in section VI to
quantify the gains of the proposed technique.
Orthogonal frequency division multiplexing is a method
where data tones are modulated on orthogonal sub carriers
and transmitted through a channel. The time domain signal
x(t) for one symbol, is generated from the frequency domain
tones X(k) using the IFFT relationship
x(n) =


X(k) exp


n = 0 N 1


Fast fourier transforms (FFT) have been widely used in
various fields for computing the Discrete Fourier Transform
(DFT) of a signal. A lot of work has gone into designing
efficient architectures like radix 2, radix 22 [1], radix 23 [2]
and split radix algorithm. Fixed point FFT designs have largely
used the complex but optimal Convergent Block Floating Point
(CBFP) method [3]or its variants [4]. This method essentially
scales the output of every stage appropriately to maximally
utilise the dynamic range and hence gets the best Signal to
Quantization Noise Ratio(SQNR) for a given number of bits.
The disadvantage of this method is its higher complexity and
latency. Desingers often use simulations to decide the scaling
factors at the expense of increased design time.
Orthogonal frequency division multiplexing (OFDM) is now
used in a number of wireless systems like WLAN, WiMax,DVB, DAB, UWB and has also been proposed in future
cellular technologies like LTE. In OFDM, the data is sent on
orthogonal tones and an IFFT at the transmitter and an FFT at
the receiver multiplex and demultiplex the data respectively.
Many FFT designs for OFDM systems have also used the
CBFP technique as reported in [5], [3].
In this paper we exploit the statistical property of OFDM
signals and use it to derive a near-optimal scaling scheme. We
also discuss the effects of truncation and rounding and show
ways of increasing SQNR even with the low complexity truncation operation. In section II we present a short introduction
to OFDM. Section III discusses the traditional techniques
for fixed point design and points out their advantages and
disadvantages. In section IV, we derive the statistics of the
OFDM signal as it passes through different stages of the

At the receiver, the transmitted tones multiplied by the channel

coefficient at that frequency are recovered using an FFT on the
time domain signal.
X(k) =


x(n) exp


k = 0...N 1


For reasonably large N , the time domain samples x(n) are

gaussian distributed due to the central limit theorem. This
implies that the Peak-to-average ratio (PAR) of OFDM signals
is high. OFDM has become hugely popular due to its inherent
ability to combat multipath and to scale to higher data rates.
Higher data rates typically imply higher constellations for a
given channel bandwidth and hence higher SNR requirement
from the system. This makes the problem of fixed point FFT
design more challenging as higher SQNR is needed to realise
these systems while keeping the implementation complexity
and latency low. Design cycle times also need to be kept low
to ensure that systems can be productized quickly to keep pace
with market dynamics.
Many architectures like radix-2, radix-4, radix-22 or radix2 or a mix thereof can be used for FFT design in OFDM
systems. An example signal-flow graph representation for
an 8 point FFT with radix 2 implementation is shown in
figure 1. Pipelined architectures are prefered for their low
latency, high processing element utilization and low memory
requirements. A radix r pipelined FFT stage, using single path
delay feedback, operating on an N th order FFT needs N/r
memory elements, r adders and one multiplier and generates

FFT. For example for a 4 point

2, the matrix is given by

1 0
0 1
T =
1 0
0 1

Fig. 1.

Signal flow graph of a radix-2 8 point FFT

r independent streams of length N/r, which are then passed

on to the next stage (an N/r FFT) for further processing [1].
In fixed point design, the usual technique to optimally scale
the outputs at different stages is the Convergent Block Floating
Point method (CBFP) [3]. In this method each of the N/r
output streams are assigned a scale factor so as to optimally
utilise the bit-widths before passing on to the next stage. At
the end of all the stages, each FFT output has an associated
scale factor like a floating point representation. To get it back
to fixed point, the common exponent is extracted and all the
outputs are quantized to the desired bit-widths. The advantages
of this method are that it gives the best SQNR for the bitwidths allocated. But this method has a penalty in terms of
memory and latency. Each stage needs N/r memory elements
in full precision to store the outputs of the stage, before the
scale factor for that block can be computed and the outputs
scaled. For example, in a 128 point radix 4 FFT design, this
would introduce an extra overhead of 32+8+2 = 42 memory
elements. This reduces the effectiveness of a pipelined design
and increases the latency through the FFT. There have been
some variants where this memory requirement is traded-off
for more complicated butterfly stages that equalise the scale
factors on the input data on the fly [4]. Due to the increased
complexity of CBFP, traditional fixed point designs have relied
on simulations to fix the scaling requirement at each stage.
To design a static scaling scheme,we need to know how
the signal behaves as it passes through multiple stages of the
FFT. In a Nyquist-sampled OFDM system, the time domain
samples at the receiver are independent and are Gaussian
distributed as shown in section II. Now let the signal at the
output of butterfly stage m be Sm (n), S0 (n) = x(n) and
Sf inal (n) be the FFT output. Any FFT architecture using a
radix r structure takes in r inputs and gives out N/r sets of
r data values that are then processed independently. In each
butterfly itself, the operation is performed on r inputs to give
r outputs. The output of stage m can be expressed in terms of
the input Sm1 (n), a transformation matrix T , which performs
some operation on the inputs, a twiddle factor matrix Ttw , a
diagonal matrix containing the appropriate twiddle factors for
the input set, as Sm = Ttw T Sm1 . Note that since Ttw is a
diagonal matrix containing complex exponentials, it is unitary.
The matrix T depends on the radix r used to implement the

FFT implemented using radix




For a radix 22 FFT, each stage has two butterflies and let the
corresponding matrices be T1 and T2 . The output is given by
Sm = Ttw T2 T1 Sm1 It can be seen that

1 0 1
1 1 0 0
0 1 0
1 1 0 0

T1 =
1 0 1 0 T2 = 0 0 1 j
0 1 0 1
0 0 1 j
These transformations are orthogonal transformations and
given that the input to the FFT is zero mean i.i.d Gaussian, the
output of every stage is also zero mean iid Gaussian. To show
this, it is enough to show that if the input has a correlation
matrix of the form 2 I, then the output also has a similar
form. Assume that the input Sm1 is iid with variance of
. Then the variance of the output is given by
E[Sm Sm


Ttw T E[Sm1 Sm1
]T H Ttw
Ttw T T H Ttw
rm1 Ir


where r is the radix of the FFT implementation and Ir is

the identity matrix of order
r. So the rms value of the output
of a stage increases by r. A radix 22 stage can be considered
as a cascade of two stages where,
at each stage the rms value

of the signal increases by 2. The analysis shown here for

one butterfly stage can be simply extended to all the outputs
of a particular stage. In other words, given that the output of
one stage of the FFT is iid gaussian, the output of the next
stage is also iid with the variance increasing by a factor of r.
Note that the twiddle factor multiplication does not affect the
scaling as its a unitary transformation.
In appendix VIII, it is shown that for any given number of
bits, there is a certain rms value that gives the best SQNR. The
exact rms value at which the SQNR is maximised depends on
the Peak-to-Average ratio (PAR) of the signal. The PAR of
an OFDM signal in turn typically depends on the number of
tones used to generate the OFDM signal. At the input of the
FFT, the best SQNR for the given number of bits is typically
assured by the use of an Automatic Gain Control (AGC) block.
To retain the same rms value at different stages
of the FFT, all
we need to do is to scale the signal down by r. Note that this
scaling need not be precise, as the SQNR degrades gradually
from the peak. For a radix 4 architecture, the optimal scaling
factor is 2 and hence is trivial to implement.By choosing the
same number of bits at every stage and using the above scaling
before quantizing, the SQNR contribution from each stage is
made equal. This is essentially due to the fact that the above
scaling factor makes the operation in each stage a unitary
transformation. When each stage gives the same SQNR, then

at the input) and let Wm
denote an mth order FFT of the noise
affecting the i stage. Let Q(k) denote the quantization noise
affecting the k th tone at the FFT output, then
log2 N

Q(k) =


Fig. 2.

Impact of Quantization noise on FFT output

the implementation complexity is minimized for a given total

SQNR requirement.1
For radix 2 stage or the individual butterflies of a radix
22 stage, a lower complexity method, referred to as SubOPtimally Static scaled FFT (SOPS FFT), can be employed
where the input is scaled down by 2 for every two stages.
Note that this method is static and hence does away with large
simulation times and has no additional complexity. In section
VI, it is shown that the performance of SOPS FFT is close to
that of the CBFP.
Many OFDM systems are implemented with a 2x or 4x
oversampling to relax filtering requirements on analog filters.
In this case, the independence assumption in section IV is not
true. However, adjacent input samples affect the FFT output
only in the last stage. Hence for a 2x FFT the independence
assumption is true till the last but one stage and for a 4x FFT,
till the last but two stages. So for a 2x FFT, the scaling by 2
should be done once every two stages, untill the last but one
stage, and a scaling by 2 should be done for the last stage.
Quantization noise gets added at different points in the FFT.
For simplicity, we assume that the noise gets added at the end
of every stage. We will illustrate the case of a radix-2 FFT but
the analysis can be extended for other cases as well with some
modifications. For example in a radix 2 FFT, as shown in figure
2, the quantization noise at the output of every stage passes
through a lower order FFT before affecting the output. In the
example shown, the quantization noise at the input affects the
output through an 8-point FFT. The quantization noise at the
second stage passes through 4 point FFT before affecting the
output. The quantization noise in subsequent stages go through
lower order FFTs and finally the quantization noise at the
output affects the signal directly.
Let wi denote the vector of quantization noise affecting the
i stage output (i = 0 corresponds to the quantization noise
1 The output of the last few stages of the FFT may not be gaussian as the
FFT outputs would be close to the constellation points. This means that the
PAR of those outputs would be lower. So the SQNR vs signal rms curve would
further increase beyond 0.2 and would fall at a higher rms value (as against
the SQNR of a gaussian signal shown in figure 8). So the proposed method
does not get the best SQNR for the last stages but it does not degrade the
performance. In practice this is insignificant as our simulation results indicate.

W iN (



The equation represents the fact that the output is computed

using the Decimation-in-Frequency method. The Quantization
noise at the DC tone in the FFT output is the sum of the DC
value of the quantization noises added in the different stages.
The Quantization noise at the first tone(k = 1) in the FFT
output is the sum of the first tone of the input(i = 0) stage
quantization noise and the DC tones of other stages(i > 0
quantization noise and so on. To obtain quantization noise
statistics, let us assume that the quantization noise at any stage
and index is independent of quantization noise at any other
index or stage. For simplicity we assume that the noise process
is also identically distributed although the results can easily
be extended to the cases where the quantization noise sources
at different stages have different mean and variances. Let the
mean and variance of the individual quantization noise samples
in wi for any stage i be and 2 respectively. Then

M (k)


The above equation just reflects the fact that the mean of the
FFT output is the same as the FFT of the mean of the input,
which is just a non-zero value at DC and zero elsewhere.
(k) E[WM
(k)]|2 ]

= M 2


The above equation shows that the variance of the FFT output
is identical for all tones due to the i.i.d assumption and the fact
that the FFT is an orthogonal transformation. M represents the
processing gain of the FFT. The Quantization noise power is
given by

E[Q2 (k)]

log2 N
2 (7)
log2 N

The equation shows that the variance is same for all the
output tones of the quantization noise, but the mean is different
for different tones thereby causing a different noise power
profile. Now let us consider the two most common quantization schemes - truncation and rounding. Both truncation
and rounding have the same variance but their mean values
are different. Suppose b fractional bits are either truncated or
rounded to the nearest integer, it can be shown that the mean
and with rounding is 0.5
error with truncation is 0.5 0.5
The quantization noise power profile at the FFT output for
a 16 point FFT is plotted in figure 3 for both truncation and
rounding, based on equation 7, assuming 2 fractional bits at
every stage before rounding/truncating to an integer. The plot
shows that the quantization noise is high near the DC tones
while it significantly smaller near the edge tones. The trend is
the same for rounding as well, but the difference between edge
tone and DC tone is smaller due to the lower mean for the


One radix 2 stage

Effect of quantization noise on 16 pt FFT


Stage number is i


BF 1

Noise Pwr



14 bits


BF 2



<13,0,tc> Stage 1


<13,2,tc> Stage 3

<13,4,tc>Stage 5 <13,6,tc>

radix 2






Tone index

Fig. 5.

Fig. 3. Theoretical plot of Quantization noise power over tones at the output
of a FFT
Plot of Mean absolute value of error vs tone index

Mean absolute value of error














Tone Index

Fig. 4.

Quantization noise over tones at the output of a FFT - Simulation

rounding noise. This analysis disregards quantization within a

stage as well as quantization of twiddle factors. To get a more
realistic view of the quantization noise impact at the receiver,
we took a 128 pt FFT design as in figure 5 and plotted
the quantization noise at the output with both truncation and
rounding. The plot in figure 4 shows a high quantization noise
near the DC tone as expected. But the quantization noise again
increases near the highest tones and gives near U-shape to the
spectrum. This is due to twiddle factor quantization and other
factors that we have disregarded in our simple analysis.
A. Rotated FFT
In almost all OFDM systems, there are a number of zero
tones at the band edges to prevent spectral leakage to adjacent
bands.For example, WLAN has 12 guard tones, 6 on eitherside
of the spectrum. Also oversampling at the receiver introduces
null tones at high frequencies. At the receiver, These zero
tones fall in the low quantization noise region of the U-shaped
curve, while the desired tones fall in the high quantization
noise region. This causes a degradation in the SQNR of the
desired tones. So for systems with oversampling or with guard
tones, an easy way to increase SQNR would be to cyclically

Legacy 128 point FFT

shift the desired tones to the highest SQNR region. This is

easily accomplished by multiplying the inputs by (1)n , so
that the FFT outputs are cyclically shifted by N/2 tones. We
call this method as Rotated FFT (R-FFT) and this method
can be used to improve the performance of any FFT design
that uses truncation and achieve performance gains without
the additional complexity due to rounding 2 . In section VI,
we demonstrate the gains due to this method when applied in
tandem with the earlier proposed methods.
The simulations were performed for an 802.11a/g system,
oversampled by a factor of 2. There are a total of 128 tones,
with 52 data tones, 12 guard tones which are zero and 64
zero tones due to oversampling. The modulation mode used
is 54 Mbps, where the transmitted symbols are chosen from a
64 QAM constellation. A legacy 128 point FFT design, based
on a radix 22 architecture for all stages but the last stage,
is chosen as the starting point for evaluating our proposals.
The input for each stage is 13 bits and the final FFT output
is 14 bits. The legacy fft, shown in figure 5, increases the
number of integer bits by 2 for every radix 22 stage which is
the worst case bit growth possible. 3 The SOPS technique is
now applied to this design by modifying the scale factor (or
equivalently the number of integer bits) at the different stages,
while retaining the same complexity. The block diagram of the
new design is shown in 6. The Rotated-SOPS (R-SOPS) FFT
is obtained by modifying the SOPS FFT to include the trivial
multiplication of the inputs by (1)n . The CBFP FFT uses
13 output bits at every stage and uses the CBFP algorithm
to obtain the scaling factors for each output. The final FFT
outputs are again converted to fixed point by extracting a
common scale factor for all outputs. The comparison is done
based on the cdf of the SQNR and the cdf was obtained from
1000 simulation runs.
Figure 7 compares the performance of fixed-point-input
floating-point-output, CBFP (rounded), SOPS (truncated),
2 rounding requires an extra adder. Multiplication by (1)n can be absorbed
into the FFTs first stage and will only cause a control overhead. Even if a
separate adder is needed for multiplying the input by -1, it still needs to be
done only at the input and not at every stage.
3 The format for the numbers is as follows:13,1,tc means a twos
complement number with 13 total bits and 1 integer bit


One radix 2 stage

SQNR as a function of signal RMS level and bitwidth

Stage number is i

16 bits


14 bits



BF 2


BF 1





<13,0,tc> Stage 1

<13,1,tc> Stage 3



<13,2,tc>Stage 5 <13,3,tc>


8 bits

radix 2





Fig. 6.

128 point FFT design using sub optimal scaling method

Fig. 8.











SQNR vs. signal RMS level

CDF of SQNR for different ffts

legacy fft
fixed in float out
SOPS trunc
SOPS round













Fig. 7.



SQNR in dB




CDF of SQNRs for different FFT designs

SOPS(rounded), R-SOPS (truncated) and legacy FFT designs.

Rounding was employed in CBFP to compare the proposed
designs against the best possible design. The figure shows a
significant performance gain of about 16 dB with the R-SOPS
FFT compared to the legacy design. Note that the rotated
SOPS (R-SOPS) design is 1.5 dB better than the rounded
SOPS design, although only truncation is employed in R-SOPS
design. This can be explained using figure 4, where the error
profile for the rounding case near the DC tones is higher than
the error profile for the truncation case at the edge tones. The
rotated SOPS design also performs within about .5 dB of the
CBFP design. It must be borne in mind that rotation gives such
large gains because the signal is oversampled by a factor of 2.
For Nyquist sampled systems, the gains due to rotation would
be smaller. The SQNR curves were similar for all modulation
modes and in the presence of multi-path.
In this paper we have presented a technique to design
a fixed point FFT that does not require highly complex
block floating point implementation. We have investigated the
quantization error profile at the output of FFT and based on
this have proposed a technique to improve the SQNR. A legacy
128 point FFT design has been modified according to these
principles and shown to achieve near-CBFP performance.

For a reasonably large number of subcarriers, the OFDM

signal has a probability density function (pdf) that is approximately a complex Gaussian. Let us assume that for both I
and Q, the full-scale fixed-point range is [-1, 1) and that the
I and Q are each quantized to N-bit signed twos-complement
representation. In the following, we derive the expression for
the SQNR that results from the quantization and clipping, as
a function of the RMS signal level and the number of bits
The mean-squared error due to loss of precision at the LSB
is given by q2 = 2 /12 = 22N /3, where = 2N +1 is
the bin-width of each of the N possible quantized levels.
Using the Gaussian assumption for the pdf of the signal,
the mean-squared error due to clipping at +/-1 is given as

(v 1)2 e 2s2 dv
c2 =
2s2 v=1
The above integral can be simplified using complementary
error function and the composite SQNR can be written as


(s2 + 1)erfc







Figure 8 shows the plot of SQNR as a function of s for

different bit-widths. From the figure, it can be observed that
for a given bit-width, as the signal RMS level increases, the
SQNR improves upto a certain point , but beyond that it starts
decreasing steeply. Therefore, the signal RMS level has to be
chosen appropriately to get the best SQNR.
[1] S. He and M. Torkelson, A new approach to pipeline fft processor, in
Proc. IEEE IPPS, 1996.
[2] S. He and M. Torkelson, Designing pipeline fft processor for ofdm
(de)modulation, in Proc. IEEE International symposium on signals and
systems, 1998.
[3] E.Bidet, C.Joanblanq, and P.Senn, A fast single chip implementation
of 8192 complex points fft, in Proc. IEEE Custom integrated circuits
conference, 1994.
[4] T.Lenart and V.Owall, A 2048 complex point fft processor using a novel
data scaling approach, in Proc. IEEE International symposium on circuits
and systems, 2003.
[5] S. H. Park, D. H. Kim, D. S. Han, Sequential design of a complex
8192 point fft in ofdm receiver, in IEEE AP-ASIC, 1999.