11 Andraka Poster

Hybrid Floating point Technique yields 1.
2 Giga-sample per second 32 to 2048 point floating point FFT in a single FPGA
HPEC 2006 Poster Session B.4 20 September 2006
Ray Andraka, P.E. President, Andraka Consulting Group, Inc ray@andraka.com

the
Andraka Consulting Group, Inc.
copyright 2006 Andraka Consulting Group, Inc. All Rights reserved
Floating point addition & subtraction is resource intensive

Barrel Shift Denormalize Mantissa A
Exchange Network
Mantissa Add/Sub
Barrel Shift Renormalize
Rounding Mantissa
Mantissa B Exponent A Leading Zeros Detect Exponent Difference
Exponent B
Exponent Exponent Adder
the
Apply floating point to larger functions

Apply floating point to larger functions Floating point typically applied at add and multiply level operations Instead construct higher order operations from fixed point operators Phase rotator FFT Apply floating point to those more complicated operators Denormalize to convert mantissa to fixed point plus common scale Pass exponent around series of fixed point operations Renormalize after several operations rather than after each one
the
Apply floating point to larger functions

Barrel Shift Denormalize Mantissas Fixed point function Exponent Difference Exponents Leading Zeros Detect Barrel Shift Renormalize
Rounding Mantissa
Max Exponent
Exponent Exponent Adder

the
Floating point sum has only as much precision as larger addend

Add requires both addends to have the same scale
Radix points must align Addition is inherently fixed point
Examples:
Different exponents A= 1.101 * 25 B= 1.101 * 23 = 0.01101 * 25 A+B= (1.101 + 0.011) * 25 = (11.000) * 25 LSBs of B are lost Renormalizing A= 1.101 * 25 B= 1.011 * 25 A-B= (1.101 - 1.011) * 25 = (0.010) * 25 = (1.000) * 23 Sum LSBs are filled with 0s
Smaller addends mantissa is right shifted until exponent is same as larger

Exponent increments each shift Right shift truncates LSBs Truncated LSBs are lost
Sum is left shifted to left justify

LSBs zero filled No improvement to precision
the
Phase rotation does not change amplitude

Re (y) = re(x) * cos(w) - im(x) * sin(w) Im(y) = re(x) * sin(w) + im(x) * cos(w) Magnitudes of individual I and Q components change, but complex magnitude is not altered. No loss of precision by treating I and Q with common exponent Complex operation is limited to precision of larger component Using common exponent for I and Q reduces hardware
Single copy of exponent logic No rescaling of I with respect to Q
Simplifies rotator
Fixed point complex multiply (smaller of I or Q is denormalized) Fixed point sines and cosines Output renormalize is +/-1 bit shift
the
FFT butterflies are only as precise as largest input

Cooley-Tukey FFT butterfly
Sum and difference of pair of complex inputs one input is rotated by twiddle factor phasor
FFT Butterfly
Rotation does not affect scale Smaller input right shifted

Shift to match scale LSBs are lost
Complex inputs
Complex outputs
Both outputs have same LSB weight before renormalizing Renormalizing does not add precision (zero fills LSBs) Output is 1 bit wider than input
Sum of similar sized addends
the
Twiddle factor wk=cos(w)+jsin(w)
FFT output is only as precise as largest input

Cascade of butterfly elements Each output is essentially an adder tree with phase rotators
Rotators dont change scale Inputs right shifted to match scale of largest input intermediate renormalizing not effective Term from every FFT input
Butterfly
wk
1 bit growth per stage

Renormalize maintains width Alternative: grow word width
wk
Similar effect in other FFTs

Winograd, Sande-Tukey, Singleton etc.)
the
wk
wk
Fixed Point FFT Replaces Floating Point FFT

Denormalize inputs Shift each input right to match scale of largest Perform fixed point FFT Pass common exponent around it Input width = mantissa bits Maximum 1 bit growth per equivalent radix 2 stage Renormalize outputs Add common exponent to delta exponent from renormalize
Max Exponent + Exp.

>>n Fixed Point FFT <<
Exp .
Mant.
Mant.
Denormalize
Renormalize
the
Advantages and Limitations

Advantages Large reduction in required hardware Less complexity means higher clock rates, smaller parts Limitations Word width grows for each radix 2 stage Becomes excessive for large FFTs Max Exponent needed at beginning of set Problem for large sequential FFTs Use periodic renormalization to manage word widths A few bits growth dont significantly affect timing Word not limited to specific widths in FPGA Fixed width assets like DSP48s limit practical word sizes. Find balance between precision, growth and renormalizing stages
the
10
Small FFTs as building blocks

Larger FFT constructed from small FFTs with mixed radix algorithm Similar to Cooley-Tukey decomposition Arbitrarily large FFTs using small off-the shelf kernels Combination uses FFT plus phase rotator and reorder memory In-place operation (results written to same memory locations)
Fill along rows

the
FFT down cols
Mult by e-j2pkn/N
FFT along rows
Read down cols
11
Winograd FFT
Different factorization Minimizes multiplies Advantageous for hardware implementation 74 adds and 18 real multiplies for 16pt Winograd 176 adds and 72 real multiplies for 16pt Cooley-Tukey Reorder Reorder Irregular data sequence Difficult for shared memory Easy when reorder memory is distributed
Weights Reorder
Reorder
Reorder
Reorder
the
12
32 to 2048 point mixed radix FFT

2K FFT is 8 x 256 mixed radix 256 point is 16 x 16 mixed radix Combined algorithms 2K = 8 x 16 x16 Data arranged in cube, FFT along each dimension Reorder at input and output (not shown) Kernel is proprietary 1/4/8/16 Winograd kernel
Each kernel has floating point wrapper
32/64/128/256 point FFT
1/8 Point FFT
Phase Rotator
Data Reorder 4k sample BRAM
4/8/16 Point FFT
Phase Rotator
Data Reorder 512 sample BRAM
8/16 Point FFT
the
13
32-2K point FFT statistics

Speed: 400 MS/sec per FFT engine (3 in FPGA) 400MHz clock in XC4VSX55-10 (slowest speed grade) 1 complex sample per clock in and out continuous Latency: ~430 + 3*FFT length + (32,64,128 or 256) clocks Utilization less than 30% of XC4VSX55 DSP48s: 151 Slice flip-flops: 9707 RAMB16s: 69 LUTs: 7736 (4975 are SRL16) Precision 30-35 bit mantissa internal, 8 bit exponents IEEE single precision input and output Matches Matlab FFT to +/- 1 LSB of output mantissa
the
14
1.2 GSample/sec IEEE floating point FFT
32 to 2K pt floating pt FFT
Input Buffer
32 to 2K pt floating pt FFT 32 to 2K pt floating pt FFT
Output buffer
the
15
Who is Andraka Consulting Group?

Exclusively FPGAs since 1994 Leading industry expert on DSP in FPGAs Charter Xilinx Xperts partner First published FIR filter in FPGAs (1992) Fastest single threaded FFT kernel for FPGA
Other current projects Beamforming digital receiver: 10 25MHz channels, 260 antennas, 500MS/sec input sample rate Cylindrical Sonar Array processor Other Digital receiver and radar projects
the
16
Floating Point Format

Floating point dedicates part of word to indicate scale (exponent) Tracks radix point position as part of data Compare to fixed point where radix point position is at an implied fixed location Trades precision for dynamic range Useful when data range is unknown or spans a large range
The IEEE single precision floating point standard is a 32 bit word, Leftmost bit is the sign bit, S. 1 is negative, 0 is positive Next 8 bits are exponent, excess 127 format Right 23 bits are the fraction. There is an implicit 1 bit to the left of the fraction except in special cases. The fractions radix point is between the implied 1 and the leftmost bit of the fraction.
S EEEEEEEE FFFFFFFFFFFFFFFFFFFFFFF
Number = -1S * 2 (E-127) * (1.F)

the
17

11 Andraka Poster

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

11 Andraka Poster

Diunggah oleh

Hak Cipta:

Format Tersedia

Hybrid Floating point Technique yields 1.

Ray Andraka, P.E. President, Andraka Consulting Group, Inc ray@andraka.com

Andraka Consulting Group, Inc.

copyright 2006 Andraka Consulting Group, Inc. All Rights reserved

Floating point addition & subtraction is resource intensive

Barrel Shift Renormalize

Mantissa B Exponent A Leading Zeros Detect Exponent Difference

Exponent Exponent Adder

Andraka Consulting Group, Inc.

copyright 2006 Andraka Consulting Group, Inc. All Rights reserved

Apply floating point to larger functions

Andraka Consulting Group, Inc.

copyright 2006 Andraka Consulting Group, Inc. All Rights reserved

Apply floating point to larger functions

Exponent Exponent Adder

Andraka Consulting Group, Inc.

Floating point sum has only as much precision as larger addend

Smaller addends mantissa is right shifted until exponent is same as larger

Sum is left shifted to left justify

Andraka Consulting Group, Inc.

Phase rotation does not change amplitude

Andraka Consulting Group, Inc.

copyright 2006 Andraka Consulting Group, Inc. All Rights reserved

FFT butterflies are only as precise as largest input

Rotation does not affect scale Smaller input right shifted

Twiddle factor wk=cos(w)+jsin(w)

Andraka Consulting Group, Inc.

copyright 2006 Andraka Consulting Group, Inc. All Rights reserved

FFT output is only as precise as largest input

1 bit growth per stage

Similar effect in other FFTs

Andraka Consulting Group, Inc.

copyright 2006 Andraka Consulting Group, Inc. All Rights reserved

Fixed Point FFT Replaces Floating Point FFT

Max Exponent + Exp.

Andraka Consulting Group, Inc.

copyright 2006 Andraka Consulting Group, Inc. All Rights reserved

Advantages and Limitations

Andraka Consulting Group, Inc.

copyright 2006 Andraka Consulting Group, Inc. All Rights reserved

Small FFTs as building blocks

Fill along rows

FFT down cols

FFT along rows

Read down cols

Andraka Consulting Group, Inc.

copyright 2006 Andraka Consulting Group, Inc. All Rights reserved

Andraka Consulting Group, Inc.

copyright 2006 Andraka Consulting Group, Inc. All Rights reserved

32 to 2048 point mixed radix FFT

1/8 Point FFT

Data Reorder 4k sample BRAM

4/8/16 Point FFT

Data Reorder 512 sample BRAM

8/16 Point FFT

Andraka Consulting Group, Inc.

copyright 2006 Andraka Consulting Group, Inc. All Rights reserved

32-2K point FFT statistics

Andraka Consulting Group, Inc.

copyright 2006 Andraka Consulting Group, Inc. All Rights reserved

1.2 GSample/sec IEEE floating point FFT

32 to 2K pt floating pt FFT 32 to 2K pt floating pt FFT

Andraka Consulting Group, Inc.