Theoretical Review of FFT Implementations for Digital Signal Processors
Abstract
The discrete Fourier transform (DFT) is one of the most pivotal tools employed in the realm of digital signal
processing and Fast Fourier Transform (FFT) is a powerful algorithm optimization of DFT. The world is fast moving
from analog to digital and in essence, FFT thrives to achieve the same. Though the outputs of DFT and FFT are the
same, the difference lies in the algorithm that is optimized to amputate redundant calculations. Several algorithms
have been developed to improve the computation time of FFT – the overall aim herein remains the same i.e. to
reduce the number of complex calculations. This paper aims to throw light on different implementations through
which the efficiency of FFT can be augmented to design more powerful signal processors.
1. 
Introduction 

Fourier series is a representation of a periodic function as a sum of sines and cosines.
Solving for coeficient gives,
1.2. Fourier Series Transform
The practice of isolating a signal into individual frequencies is known as a Fourier transform. The
applications include audio processing wherein individual sounds from a recording are picked out using this
series transform.
1.3. Discrete Fourier Transform
Given a sequence of N samples f(n), indexed by n = 0
..
N1,
the Discrete Fourier transform (DFT) is defined
as F(k), where k=0 N1:
..
F(k) are often called the 'Fourier Coefficients' or 'Harmonics'.
1.4.
Fast Fourier Transform
An FFT computes the DFT and produces exactly the same result as evaluating the DFT definition directly; the
most important difference is that an FFT is much faster.The DFT is defined by the formula:
2. Problem Analysis
2.1. Complexity Bounds The lower bounds on the complexity along with the exact operation counts of FFT continue to be grey areas in the signal processing sphere. Despite of the fact that today’s computers have robust caching mechanisms and optimized processqueuing, the arithmetic count of operations required by FFT is pivotal. It is still not firmly established if FFT in fact require Ω(N log N) or greater operations. The complexity bounds problem analysis has so far been approached using the ordinary complexdata case due to its uncomplicated nature but these are as closely related to FFTs as are the realdata FFTs.
2.2. Approximation & Accuracy The tradeoff between the approximation error and speed/precision of output is another problem analysis area associated with FFT algorithms. This tradeoff can be explained using Guo and Burrus’ waveletbased approximate FFT which is more efficient than exact FFT as it uses sparse data (input/output). The complexity can be reduced to O(K log(N)log(N/K)) if the data are sparse. Another computational issue linked to FFT algorithms is Accuracy. In fixedpoint arithmetic, the finiteprecision errors emitted by FFT algorithms are critical and involve rescaling at each transitional decomposition state (example, CooleyTukey).
3. Design Requirement, Specifications & Proposed Solutions
There are multiple ways to decompose an FFT of which Radix2 is the simplest one. Though, it has been
proven that Radix4 FFT has a fair advantage in the realm of encrypted domain implementation. In fact,
for large transforms Radix4 20% is more efficient than Radix 2. Nonetheless, Radix2 and Radix4 are the
most common FFTs. Radix8 is rarely used because of its high complexity and hardware implementations
which have only a slight effect on overall efficiencies. Some illustrations for Radix2 & Radix4 FFTs:
3.1. CommonFactor FFTs
Also called as CooleyTukey FFTs, CommonFactor FFTs are most common class of FFTs. The factors of N used in decomposition have common factor(s). Radix‐r and Mixed‐radix are further two categories of common FTTs. While for Radixr, N = rk, and Butterflies used in each stage, for Mixedradix N ≠ rk necessarily and radices of component butterflies are not all equal
Data flow diagram for N=8: a decimationintime radix2 FFT breaks a lengthN DFT into two lengthN/2 DFTs followed by a combining stage consisting of many size2 DFTs called "butterfly" operations (socalled because of the shape of the dataflow diagrams).
3.2. PrimeFactor FFTs
The transform length must be the product of numbers that are relatively prime. Their pros are absence of WN twiddle factor multiplication. Lastly, they have irregular sorting of input and output data and irregular addressing for butterflies. PrimeFactor FFTs constitute of reindexing of input/output arrays which are then substituted into DFT to get a 2dimensional DFT. Suppose that N = N _{1} N _{2} , where N _{1} and N _{2} are relatively prime. The reindexing of input n and out k can then be keyed as:
Substituting this reindexing in the DFT transform formula, we get
The inner and outer sums denoted the DFTs of size N _{2} and N _{1} , respectively.
3.3. Other FFTs
3.3.1. Split‐radix FFTs have N = pk, where p is a small prime number and k is a positive integer, this method can be more efficient than standard radix‐p FFTs. Butterfly for SRFFT algorithm:
3.3.2. Winograd Fourier Transform Algorithm (WFTA) is a type of prime factor algorithm based on DFT building blocks using a highly efficient convolution algorithm and requires many additions but only order N multiplications.
3.3.3. Goertzel DFT is not considered a normal FFT in that its computational complexity is still order N2 – It allows a subset of the DFT’s N output terms to be efficiently calculated.
4.
Conclusion
There are several research areas that have to be addressed in the future to extend the FFT research for
emerging standards and applications. Though the major area of application for FFTs remain as Digital Signal
Processing, these are also used extensively in Aerospace Industry, energy management systems, image
processing, etc. Thus, the challenges related to computational efficiencies of FFTs remain the focus on
different researches going on in this field.
References
[1] J. Johnson, R. Johnson, D. Rodríguez, R. Tolimieri, “A Methodology for Designing, Modifying, and Implementing Fourier Transform Algorithms on Various Architectures,” Journal of Circuits, Systems and Signal Processing, Birkhäuser, Boston, Vol. 9, No. 4, 1990.
[2] D. Rodríguez, N. G. Santiago, H. Nava, “High Performance SAR Raw Array Data Environment (SARADAS),” IEEE 5th European Conference on Synthetic Aperture Radar, EUSAR 2004, May 2004, Ulm, Germany.
[3] N. G. Santiago, D. T. Rover, D. Rodríguez, “A Statistical Approach for the Analysis of the Relation between LowLevel Performance Information, the Code, and the Environment,” Proceedings of the SIAM Journal of Parallel and Distributed Computing Practice. Accepted for publication.
[4] D. Rodríguez, “SAR Point Spread Signals and Earth Surface Property Characteristics,” (Invited Paper), SPIE 44th Annual Meeting and
Exhibition, Colorado, July 1823, 1999.
[5] D. Rodríguez, “A Computational Kroneckercore Array Algebra SAR Raw Data Generation Modeling System,” Proceedings of the Asilomar Conference on Signals, Systems, and Computers, Monterey, California, Nov. 2001. [6] R. Tolimieri, M. An, “TimeFrequency Representations,” Birkhäuser, Boston, 1998.
[6] W.Press, B.Flannery, S.Teukolsky, and W.Vetterling, Numerical Recipes; the Art of Scientific Computing, Cambridge Univ. Press, 1986. [Rad] H.Rademacher, Lectures on Elementary Number Theory, Chelsea, New York, 1958.
[7] W.Rudin, Real and Complex Analysis, McGrawHill, New York, 1976