Gallagher
P188/MAPLD2004
Gallagher
P188/MAPLD2004
Architectural Considerations
FPGA architectures are vendor specific
Unlike ASICS, no two are alike
Vendor independent HDL can be written but this usually achieves mediocre results in clock speed and design size instantiation
Gallagher
P188/MAPLD2004
LPF
LPF
Gallagher
P188/MAPLD2004
+ + + + + +
Semi-Parallel
Serial
+ + + + + +
+ + + +
DQ
DQ
Speed
Optimized for?
Area
Gallagher
P188/MAPLD2004
9
9 9 9 k3 0 +
4 channels
k2 + k1 + k0 + 18
9 channels
9 k3 0 + k2 +
Gallagher
P188/MAPLD2004
Gallagher
P188/MAPLD2004
Gallagher
10
P188/MAPLD2004
Gallagher
11
P188/MAPLD2004
The next slide shows the filter design needed if decimating by 25 in one step
the total coefficient count is 184
The two slides after the next show the two filters necessary to decimate in steps, decimating by 5 in each step
The total coefficient count is 11+43=54
Gallagher
12
P188/MAPLD2004
Gallagher
13
P188/MAPLD2004
Gallagher
14
P188/MAPLD2004
Gallagher
15
P188/MAPLD2004
Gallagher
16
P188/MAPLD2004
Gallagher
17
P188/MAPLD2004
Gallagher
18
P188/MAPLD2004
Gallagher
19
P188/MAPLD2004
Gallagher
20
P188/MAPLD2004
Gallagher
21
P188/MAPLD2004
Balancing latencies is a common requirement in DSP designs. The Sync block uses SRL16s (very efficient) to automatically balance pipeline delays
Gallagher
22
P188/MAPLD2004
Gallagher
23
P188/MAPLD2004
Channelizer Design
The following design is a 64 channel channelizer based on the technique known as polyphase decimation filter with a DFT bank The design basebands and decimates 64 channels simultaniously The polyphase decimation is the same structure as the previous design, hence very efficient device utilization. This filter structure uses the on-chip ram blocks of the Xilinx device to store the coefficients This technique requires a tapped shift register that requires 6272 registers (3136 slices). However, Xilinxs patented ability to turn the logic look-up table into a 16 bit register reduces this require by more than an order of magnitude. The whole design is less than 1700 slices. The DFT is implemented with a streaming fft core. The streaming mode allows the FFT to keep up with the data rate Individual channels out of the fft are demuxed using the implied clocking technique seen in the previous design
Gallagher
24
P188/MAPLD2004
Gallagher
25
P188/MAPLD2004
Filter coefficients are stored in on-chip block rams. A new phase of the 64 phase-polyphase filter is rotated into the multipliers on every clock cycle. There are 64 phases x 8 taps =512 coefficients
Gallagher
26
P188/MAPLD2004
Gallagher
27
P188/MAPLD2004
Conclusion
Efficient FPGA instantiation of DSP algorithms requires exploitation of the FPGA vendors architecture. Xilinxs Virtex II architecture is especially amenable to systolic computation structures FPGA architectures may present non-obvious instantiation choices that are more efficient then a typical textbook approach Algorithms can and should be modified for parallelized data flow instantiation.
Gallagher
28
P188/MAPLD2004