Anda di halaman 1dari 4

Methods for Implementation of Feedback Loops in

High Speed FPGA Applications


Nima Safari, Volker Mauer, Shahin Gheitanchi
Wireless SSE
Altera Corporation
High Wycombe, UK
nsafari@altera.com, vmauer@altera.com, sgheitan@altera.com
AbstractIn many Digital Signal Processing (DSP) modules,
increasing the number of pipelining stages to achieve higher
throughput may break the module functionality if a feedbackloop exists in the algorithm. This paper addresses a novel
algorithmic-level technique to modify implementation of
feedback loops to allow deeper pipelining while sustaining the
module functionality. An equivalent model for a first-order
Infinite Impulse Response (IIR) filter can be obtained by a
cascade model including a higher order repeated-pole IIR filter
followed by a Finite Impulse Response (FIR) filter. The order of
the repeated-pole IIR filters, and hence the number of pipelining
stages can be chosen to meet the Fmax requirements. The model
is further developed to include a class of mathematical recursive
functions to cover many different DSP applications.
KeywordsFPGA, Fmax , IIR filters, feedback loop, recursive
functions, Pipelining.

I. INTRODUCTION
FPGAs are required to provide higher throughput to
support high sampling rate applications. Digital Front End
(DFE) modules in the next generation of wireless/mobile
communication systems needs to support 100 MHz bandwidth
for multi-standard multi-carrier applications. This bandwidth
requirement obligates at least 5x sampling rate to be able to
run DFE modules such as Crest Factor Reduction (CFR),
and/or Digital Predistortion (DPD). Increasing the pipelining
stages is a common approach to meet the timing constraints
throughout the digital design in FPGAs. However, modules
containing feedback loops are particularly challenging. Since
in high bandwidth feedback loops, all the closed loop
calculations should be performed in a sample period,
increasing the number of pipelining stages in the loop to
achieve higher performance can lead to functionality failure.
Feedback loops are widely used in DSP applications such as
IIR filters, Phase-Locked Loops (PLLs), Proportional-Integral
(PI) controllers, carrier-phase trackers, Automatic Gain
Controllers (AGCs), Max and Min functions, etc. Therefore,
modifying the implementation of feedback-loops to insert
arbitrary pipeline registers while sustaining the loop
functionality, may lead to a significant breakthrough achieving
desired Fmax. Maximum throughput a device can run without
violating timing constraints is usually defined as Fmax.
One solution to resolve the problem stated above is to use
FIR approximation of the recursive function ([1]-[2]) to

remove the recursion and therefore pipelining the design. This


solution is an approximation and may lead to significant
resource usage increase.
The other alternative solution is signal decimation. This
solution is applicable if the sample rate is lower than the clock
rate, so that multi-cycles can be used to finish the feedback
chain computations [3].
In [4]-[5], a technique called Scattered-Lookahead was
presented for IIR filter pipelining by adding extra poles and
zeros to the original filter. However, the technique is only
limited to IIR filters.
In this paper, we propose a mathematically equivalent
system solution that the number of pipelining stages can be
arbitrarily chosen to meet the
requirement. The modified
structure preserves the stability conditions of the original IIR
filter, i.e. if the original IIR filter is stable the pipelined
structure will also be stable.
The method was originally developed to resolve the IIR
filters pipelining problems; however the algorithm is further
generalized to cover recursive functions satisfying associative,
distributive and commutative properties. This generalization is
particularly important as the method can further be deployed
for many math functions such as max, min, norm, multiply,
etc.
It should be pointed out that the great benefit of the new
structure is obtained at the expense of higher resource usage.
As discussed later, the logic usage increases linearly by the
number of pipelining stages required to hit the desired F
.
The increase in resource count is insignificant when compared
against overall design utilization.
The rest of the paper is organized as follows: section 2
states the problem of feedback loops in DSP applications. The
proposed solution to meet the timing constraints in IIR filters
and generalized recursive functions are presented in section 3.
In section 4, we give Crest Factor Reduction (CFR) example
to illustrate the benefit of using the proposed architecture for
high speed DSP applications in FPGAs. Finally, conclusions
are drawn in section 5.

II. FEEDBACK-LOOPS IN DSP APPLICATIONS


Single-cycle feedback loops are used in many DSP
applications. An accumulator or a single-pole IIR filter can be
represented by:

(1)

Z-transform of the system impulse response is then given by :


(2)
Running for the sample rate equal to the system clock, the
module should finish the closed-loop computations in one
clock cycle. In the accumulator with the transfer function given
above, the closed-loop computation includes one multiplication
and one addition (or subtraction). These operations typically
require several pipeline stages when implemented in high
applications. However, if the sample rate is the same as
the clock rate, only one register stage can be inserted without
breaking the functionality of this loop. Therefore it is highly
desirable to have a method to let designers insert pipelining
registers in single-cycle IIR filters to achieve the desired speed,
and at the same time sustain the loop functionality.
III.

higher
. It should be mentioned that the new structure
demands for more logic consumption. In fact the logic usage
increases linearly with the number of extra pipelining registers
inserted in the feedback loop. However, since the feedback
structures only account for a small portion of the overall
design, increasing the size of this structure does not have a
significant impact on the overall size.
Stability is always a concern when using IIR filters. It is worth
mentioning that the proposed model is stable if the original
IIR filter is stable, as no extra different poles are added in the
transfer function.
The other advantage of the proposed model is that it is
mathematically-equivalent to the original model and therefore,
the outputs of the original and the proposed models are
identical. This technique can be generalized to construct n-th
order IIR filters, as any IIR filter of order n can be
reconstructed using cascade/parallel realization of first order
IIR filters.

PIPELINED FEEDBACK-LOOPS

In this section, we first focus on IIR filters and modify the


transfer function of the filter to reach an equivalent
mathematical expression with more delays in the feedback
loop. Later we generalize the technique to cover more
operations inside the feedback loop to pipeline a number of
recursive functions.
A.

IIR Filters

The difference of two

Figure 1 The equivalent IIR filter implementation for loop


pipelining.

power can be factorized as:


(3)
1,

Repeating the factorization for


1

, we have:
(4)

Thus, the transfer function of the single-cycle accumulator in


(2) can be rewritten as follows:

B.
Generalized Technique for Recursive
Functions
The technique proposed earlier for IIR filters is a special case
of a more generalized architecture. Here, we show that the
technique can be generalized to cover many different recursive
functions. In fact the summation operation used in
accumulators can be replaced by an operation .
Claim: If satisfies the following properties:
1.
2.
3.

Commutative :
Associative:
Factorization:

,
, /

(5)

The equivalent model is a higher order IIR filter that can be


implemented using a cascade of multi-cycle IIR filter followed
by a FIR filter. The multi-cycle IIR filter can now be
arbitrarily pipelined by selecting the K value. The FIR filter
contains no loop and hence it causes no limitation for the
pipelining.
Figure 1 shows the Direct-form II structure for a single-pole
IIR filter implementation. The equivalent model allows
pipelining in the feedback loop, and therefore can achieve

The single cycle recursive structure given by (please refer to


Figure 2 for notations)

,
(6)
can be represented by:
,
,

(7)

The proof of this claim is given in the Appendix. Figure 2


shows the equivalent model for more general operations
satisfying the mentioned properties. As the figure shows the
proposed model consists of a feedback structure with the
desired number of pipelining registers followed by a feedforward architecture that can be pipelined for desired
.
Besides the IIR filters, the generalized model can be used to
reconstruct pipelined loops with different operations and
hence for variety of applications.
Max/Min functions are the other set of useful operations that
satisfy all three properties, and thus the proposed structure can
be used to add pipelining registers in the feedback loop.

Figure 2 Equivalent model for loop-pipelining in recursive


structures.

Multiplication operation satisfies the first and second


properties, but not the last one; therefore the method may not
be applied. However if the constant gain is set to 1, g=1, the
technique can be applied for recursive multiplications as well.
This leads to applications such as Factorial and GeometricMean calculations.
It should be mentioned that subtraction operation can be
realized by the summation operation with a negative g, and
again the method can be properly applied for subtractions.
IV.

CFR EXAMPLE

Radio Digital Front End (DFE) supporting Long-Term


Evolution Advanced (LTE-A) is required to run at a sample
rate of 491 MHz to support bandwidths up to 100 MHz. One
of the most challenging algorithms to run at the sample rate is
crest factor reduction (CFR), as it contains a feedback loop.
CFR modules are used in DFEs to detect and cancel the peaks
in transmit modulated signal to mitigate the distortions when
using nonlinear power amplifiers.
Figure 3 shows the block diagram of peak searching submodule in CFR implementation. The peak searching module
needs to select the maximum value in a continuous stream of
samples. This is typically done by comparing the incoming
sample against the previously found maximum, and stores the
result as the new maximum.

Figure 3 The realization of Max(.) function in CFR.

As the figure shows the loop has only a single register,


therefore the closed loop calculations including a comparator
and a multiplexer should be carried out in a single clock cycle.
Increasing the number of registers will alter the algorithm and
therefore breaks the overall functionality.

Figure 4 Realization of Max(.) function with the proposed


technique. Two registers are added in the feedback loop to hit the
desired Fmax.
TABLE I.

FMAX ACHIEVEMENT IN QUARTUS, TOGETHER WITH LOGIC


USAGES.
Resource Usage

ALMs

Model

CFR IP Fmax, SV

Original
Model

396 MHz

4069

Proposed
Model

495 MHz

4267

Reg

Mux Cmp

The feedback loop including Max(.) function can be modified


according to the proposed structure to allow more pipelining
registers to be inserted in the loop. Figure 4 shows the
proposed structure with 2 delays (
2). The CFR module is
implemented using Alteras high level synthesis tool, DSP
builder Advanced (DSPBA) [6]. Table 1 shows the Fmax and
resource estimation results after the CFR modification
according to the proposed technique. For the resource
estimation we run both designs in 491 MHz system clock. As
the results show the module with three iteration CFR blocks
can reach 495 MHz Fmax, and hence an increase of around 100
MHz is achieved at the expense of around 5 % increase in
resource utilization. As CFR module is again a relatively small
portion of a full radio DFE, the overall resource utilization is
kept below 1 %, while meeting the desired Fmax requirements.
It should be pointed out here that 491 MHz requirement is met

by adding 2 registers in the feedback loop, whereas if higher


Fmax is required, more number of registers can be inserted
according to (7) to meet the requirement.

The second argument in the function can then be rewritten


using the property 2 as:

V. CONCLUSIONS

,
,

It works by increasing the pipeline stages while maintaining


the original functionality. This technique can be utilized by
engineers and also can be built into high level synthesis tools
(such as DSP Builder Advanced) to eliminate Fmax
bottlenecks in feedback loops and get close to maximum
silicon speed.

The authors would like to thank Alteras Wireless System


Solutions Group and DSP builder team for their kind support,
guidance and advices.

In this paper we presented a novel algorithmic-level


technique to significantly increase Fmax by modifying
feedback loops used in many DSP applications across FPGA
vertical markets. The proposed technique is flexible and the
closed-loop latency and hence the required stages for
pipelining is adjustable to achieve the desired Fmax to meet
various targets.

AKNOWLEDGMENT

(12)

APPENDIX
Please refer to Figure 2 for the notations.
Starting from the output of the multi-cycle equivalent
recursive model we have:
,

which is identical to the output of the single-cycle


recursive structure.

REFERENCES

(8)

(9)

[1] G. Bylkin, On Factored FIR Approximation of IIR Filters,


Applied and Computational Harmonic Analysis 2, pp. 293-298,
1995.
[2] Y. Yamamoto, B. D. O. Anderson, M. Nagahara, and Y.
Koyanagi, Optimizing FIR Approximation for Discrete--Time
IIR Filters, IEEE Signal Processing Letters, Vol. 10, No. 9,
September 2003.

[3] A. Krukowski and I. Kale, DSP system design: Complexity


reduced IIR filter implementation for practical applications,
Boston: Kluwer, Academic Publishers, 2003.
[4] Michael Francis, Infinite Impulse Response Filter Structures in
Xilinx FPGAs, White paper by XILINX, Aug. 2009.

(10)

[5] K.K. Parhi and D.G. Messerschmitt, Pipeline Interleaving and


Parallelism in Recursive Digital Filters Parts I and II, IEEE
Trans Acoustic Speech, Signal processing.

,
,

(11)

[6] "Altera
DSPBA,"
[Online].
Available:
http://www.altera.co.uk/technology/dsp/advanced-blockset/dspadvanced-blockset.html.

Anda mungkin juga menyukai