Anda di halaman 1dari 6

Efficient Radix-4 and Radix-8 Butterfly Elements

Weidong Li and Lars Wanhammar Electronics Systems, Department of Electrical Engineering Linkping University, SE-581 83 Linkping, Sweden Tel.: +46 13 28 {1721, 1344} Fax: +46 13 139282 E-mail: {weidongl, larsw}@isy.liu.se Abstract: In this paper, we present a class of high-radix butterfly elements that utilize (m, n)counters to replace the adders in conventional realization of butterfly elements. With these butterfly elements, we reduce the hardware complexity, delay time, and the power consumption.

1. INTRODUCTION
FFT/IFFT has been one of the most important algorithms in digital signal processing. In the recent years, the FFT/IFFT has frequently been applied in the modern communication systems due to its efficiency for OFDM (Orthogonal Frequency Division Multiplex) implementation. Many applications, like xDSL modems, HDTV, mobile radio terminals, use FFT/IFFT processor as a key component. The butterfly elements are one of the basic building blocks in an FFT/IFFT processor. Since FFT/IFFT processors using a radix-4 architecture has fewer multiplications than the processors using radix-2, the radix-4 architectures are often used for FFT/IFFT processors. Higher radix are tend to reduce the memory access rate, arithmetic workload, and, hence, the power consumption [2] [3]. Efficient design of high-radix butterfly elements is therefore important. In the following section, we give a short review on the conventional implementation of butterfly elements. We introduce the carry-save based butterfly in section 3 and the results are presented in section 4.

2. A 4-POINT DFT WITH A CONVENTIONAL BUTTERFLY ELEMENT


2.1 4-point DFT The 4-point DFT is defined as X (k) = where k = 0, 1, 2, 3 . Since e j 2 i 4 = 1 or j and the multiplications with 1 or j are trivial, i.e., they can be simply realized with bypass, inversion, and/or swap for twos-complement numbers. Hence, it does not require any multiplier to construct a butterfly element for a 4-point DFT (radix-4 butterfly). 2.2 Conventional buttery elements for 4-point DFT We can rewrite equation (1) in matrix form X (0) 1 1 X (1) = 1 j X (2) 1 1 X (3) 1 j 1 1 1 1 1 j 1 j x(0) x(1) x(2) x(3)
3

x ( n ) e j2nk 4
n=0

(1)

(2)

Using the numerical strength reduction technique at the word-level [4], we obtain the signalflow graph shown below. x(0) x(2) x(1) x(3) j X(0) X(2) X(1) X(3) j Complex Multiplication

Figure 1 Signal-flow graph for 4-point DFT. A conventional butterfly element based on an isomorphic mapping of the signal-flow graph above, requires therefore 8 adders/subtractors and a delay of 2 additions/subtractions.

3. CARRY-SAVE BASED BUTTERFLY ELEMENTS


3.1 Principle of carry-save based buttery elements for a 4-point DFT For the sake of simplicity, we consider only the real part of one output, i.e., X re ( 0 ) for the 4point butterfly operation. From eq. (2), X re ( 0 ) is X re ( 0 ) = 1 x re ( 0 ) + 1 x re ( 1 ) + 1 x re ( 2 ) + 1 x re ( 3 ) Consider also a real multiplication Y = B 15 . The multiplication can be expressed as Y = 1 (2 B) + 1 (2 B) + 1 (2 B) + 1 (2 B) (4) Comparing eq. (3) with eq. (4), we find that these two equations are of the same nature, i.e., both the butterfly operation and the multiplication are addition of multiple addends. To speed up the butterfly operation, we can apply the same technique for carry-save multipliers [1]. That is to use an adder tree for the summation of partial products and a fast adder for the vector-merging addition. In a more general scheme the inner adders/subtractors are replaced by (m, n)-counters.This reduces the hardware complexity. Since it does not require the sequential operation of addition/subtraction of inputs, the execution time is reduced as well. A simplified notation for the conventional and the carry-save based butterfly element is shown in Fig. 2.
0 1 2 3

(3)

} }
(i) Conventional

} }
(ii) Carry-save based

(4,2) counter outputs

}Fast Adder
input output intermediate results

Figure 2 Simplified notation for the conventional and the carry-save butterfly element.

3.2 Implementation of carry-save based buttery element for 4-point DFT The carry-save based butterfly element can be realized with (m, n)-counters and a fast adder. From the eq. (2), we find that some outputs require both additions and subtractions. Since we use twos-complement number representation, the subtraction can be realized by addition of the negative number. The negative value can be obtained by adding 1 at the LSB to the bit-complement. Hence, there are more than 4-inputs at the LSB which requires (m, 2)-counters (m > 4). Since the other bits except the LSB has only 4-inputs, the use of (m, 2)-counters (m > 4) is not efficient. Due to the fact that there are either zero or two inputs that are needed to be changed to their negative values simultaneously, we retain (4,2)-counters by adding the carry inputs to the final merging adder instead of adding two correction terms at the carry-save tree (See Fig. 3). (4,2) counter input output 1 or 0 intermediate result

(6,2) counter 1 or 0

1 or 0

(i) Straightforward realization.

(i) New realization.

Figure 3 Solution for subtractions. For the parallel implementation of radix-4 butterfly element, the subtractions are known in advance and this can be used to simplify the implementation further. The XOR-gates to obtain negative values in the general adder/subtractor can be replaced with inverters. The resulting butterfly element is shown in Fig. 4. The critical path is reduced from a path consisting of two fast adders to a path consisting of an inverter, a (4,2)-counter, and a fast adder. xre(0) xim(0) xre(1) xim(1) xre(2) xim(2) xre(3) xim(3) Figure 4 Parallel radix-4 butterfly element. This technique can be applied for other butterfly architectures as well. For a split-radix (SR) butterfly element [5], the odd-indexed terms can use this technique and result in a simpler strucXre(0) Xim(0) Inverter Xre(1) Xim(1) Xre(2) Xim(2) Xre(3) Xim(3) (4,2)-counter Fast Adder

ture (See Fig. 5). In a simplified butterfly element [6], it can also give an efficient implementation [7] (See Fig. 5). x(0) x(1) x(2) x(3) j j (i) Split-radix BE X(0)
j

x(0) x(1) x(2) x(3) Control signals (ii) Simplified BE


Fast Adder XOR

X(1) Multiplication X(2)


Addition

X(i) i=0,1,2,3

X(3) Subtraction

Figure 5 Split-radix and simplified butterfly element. 3.3 Efcient implementation for radix-8 buttery elements

(4,2)-counter

High-radix butterfly elements can be realized according to the signal-flow graph with the same technique as described above. However, the routing cost for high-radix butterfly element becomes excessive. It is therefore often implemented by cascading lower radix butterfly elements (with twiddle factor multiplier in between). Hence we choose to use the radix-8 butterfly element as the largest building block. The signal-flow graph for a radix-8 DIF (Decimation In Frequency) butterfly is shown in Fig. 6 [8]. By moving the two complex multipliers forward according the dashed arrows, the radix8 butterfly element can be regarded as two radix-4 butterfly elements and four radix-2 butterfly elements with twiddle factor multiplication between. Using carry-save radix-4 butterfly element combining with constant multiplication with [ 2 ( 1 i ) ] 2 , we can implement the radix-8 butterfly element more efficiently. x(0) x(4) x(2) x(6) x(3) x(7) x(1) x(5) e-j/4 j e-j/4 j j j X(0) X(4) X(5) X(1) c Complex X(2) Multiplication X(6) X(3) X(7)

Figure 6 Signal-flow graph for radix-8 DIF butterfly element. The radix-8 butterfly element can also implemented with a split-radix structure. The signalflow graph for split-radix butterfly element [9] differs slightly from that of Fig. 6. It requires in all three SR radix-4 SR butterfly elements, and three radix-2 butterfly elements.

4. RESULTS
In this section, we present the synthesis results for butterfly elements in AMS 0.8 m technology to demonstrate the efficiency. Table 1 shows the area cost and delay for the main components of the butterfly element with the synthesis tool AutoLogic II from Mentor Graphics. Component VMA(15 bits) VMA(16 bits) VMA(17 bits) 4-2 counter inverter Area cost 584.24 643.80 689.50 20.32 1.64 Delay@3.3 V, 25C 5.708 ns 5.747 ns 6.015 ns 3.441 ns 0.405 ns

Table 1: Performance of key components in buttery elements. With the result from Table 1, we can calculate the area and delay for different radix-4 butterfly elements. The result shows that the area saving can be up to 21% and 38% for carry-save radix4 butterfly element and SR radix-4 butterfly element with carry-save. The delay can be reduced with 22%. Architecture Conventional Carry-save SR with carry-save Area cost 10504.16 8266.48 6494.80 Delay@3.3 V, 25C 12.32 ns 9.59 ns 9.59 ns

Table 2: Comparison of different 15-bit radix-4 buttery elements. For the radix-8 butterfly elements, if we exclude the cost of multipliers, we can summarize the result in the following table. This result can be improved if we apply carry-save technique when possible. Architecture Conventional Carry-save SR with carry-save Area costa 32250.40 27774.88 27915.84 Delaya@3.3 V, 25C 18.74 ns 16.01 ns 16.01 ns

Table 3: Comparison of different 15-bit radix-8 buttery elements (excluding the multipliers).
a. Excluding the multipliers.

5. CONCLUSION
In this paper, we have presented an efficient method to realize the higher radix butterfly elements with carry-save technique. It shows that this method has advantages in term of chip area, short execution time, and power consumption as well.

ACKNOWLEDGEMENT
The authors would like to thank Thomas Johansson and Dr. Kent Palmkvist for fruitful discussions. This project is financed by SSF, the Foundation for Strategic Research in Sweden, under the program of INTELECT.

REFERENCES
[1] L. Wanhammar, DSP Integrated Circuits, Academic Press, 1999. [2] J. Melander, Design of SIC FFT Architectures, Linkping Studies in Science and Technology, Thesis No. 618, Linkping University, Sweden, 1997. [3] T. Widhe, Efficient Implementation of FFT Processing Elements, Linkping Studies in Science and Technology Thesis No. 619, Linkping University, Sweden, 1997. [4] K. K. Parhi, VLSI Digital Signal Processing Systems: Design and Implementation, John Wiley & Sons, 1999. [5] P. Duhamel and H. Hollmann, Split-Radix FFT Algorithm, Electronics Letters, Vol. 20, No. 1, pp. 14-16, 1984. [6] G. Bi and E. V. Jones, A Pipelined FFT Processor for Word-Sequential Data, IEEE Trans. on Acoust., Speech, and Signal Process., Vol. ASSP-37, No. 12, pp. 1982-1985, 1989. [7] W. Li and L. Wanhammar, A Pipeline FFT Processor, accepted for publication at IEEE Workshop on Signal Processing Systems (SiPS), 1999. [8] T. Widhe, J. Melander, and L. Wanhammar, Design of efficient radix-8 butterfly PEs for VLSI, IEEE Intern. Symp. on Circuits and Systems (ISCAS), Vol. 3, pp. 2084 -2087, 1997. [9] H. V. Sorensen, M. T. Heideman, and C. S. Burrus, On Computing the Split-Radix FFT, IEEE Trans. on Acoust., Speech, and Signal Process., Vol. ASSP-34, No. 1, pp. 152-156, 1986.

Anda mungkin juga menyukai