DESIGN OF PARALLEL BIQUAD FOR IMPLEMENTING IIR FILTERS
PRASANTH KRISHNA K Dept. of Electronics and Communication, Amrita University, Amrita School of Engineering Kollam, Kerala - 690 525, India
ABSTRACT: IIR filters of order greater than four are generally implemented as cascade of second order biquads due to concerns over stability. As the order of the filter increases, the number of biquads that have to be cascaded also increases. This imposes a restriction on the maximum speed that is achievable by the filter. In this paper I present a three parallel implementation of a digital biquad, which can be used as a building block for higher order IIR filters. Since the proposed biquad block is three parallel the IIR filter that uses this can process 3 inputs at a time to give 3 outputs, effectively reducing the delay for producing one output to one third. The same structure can be used for reducing the power in applications where there is not much performance requirement. The simple biquad structure and the three parallel biquad structure was modelled in MATLAB and implemented in Verilog. The results of the Verilog implementation was verified through FFT and by Vector matching with the results obtained from that of the MATLAB model.
KEYWORDS: Biquad, IIR Filter, FIR Filter, Quantization, Fixed Point Representation, Booth Recoding, Wallace Tree, Convergent Rounding, Unfolding, FFT.
INTRODUCTION
IIR filters when compared to FIR filters are less complex and takes much less area since it utilizes lesser multipliers and adders to achieve the same specification. However since IIR filters have a feedback they are nonlinear in phase and can be unstable. Applications which do not require linearity in phase can use IIR filters in place of FIR filters which are often implemented in a higher order to meet the specification, thereby burning more power and area. Practical IIR filters are of order greater than four and are implemented as cascades of biquads due to concerns over stability. Biquads which are second order recursive filters contains two poles and two zeros, can be implemented in various forms like Direct Form I, Direct Form II and Transpose forms, Francis (2009) and Karen et al. (2013). Out of these structures Transpose Direct Form II is most canonical form as it uses minimum number of delay elements, multipliers and adders. Apart from the above mentioned structures there are other structures which are optimized for performance or area. In this paper the biquad is implemented in the Transpose Direct Form II structure, since it is hardware efficient and can be modularized for reuse. The simple biquad designed in Direct Form II structure forms the heart of the 3 parallel biquad which can be used as a building block of IIR filter. The design steps involved are algorithmic in nature and involves a number of iterations. The system design results thus derived are optimal in terms of hardware requirement and sticks to the maximum error requirement for the design under consideration.
IIR FILTER BIQUAD SYSTEM DESIGN
IIR filter biquad generally comprises of adders and multipliers along with delay elements. As discussed earlier IIR filters has a feedback path so ensuring stability of the design is an important design parameter. An IIR biquad can be represented mathematically as in (1) where b0 to b2 are the feed forward coefficients and a1 to a2 are the feedback coefficients. All the coefficients are
normalized such that the coefficient a0 is unity. x(n) and y(n) are the input and the output of the biquad.
y(n) = b 0 x(n) + b 1 x(n-1) + b 2 x(n-2) + a 1 y(n-1) + a 2 y(n-2)
(1)
To design and verify the biquad structures, a standard set of filter specifications were used. The filter specifications are tabulated in Table. 1. The coefficients of the filter (b0, b1, b2, a1, a2) were generated by using the Filter Design and Analysis Tool (FDA Tool) provided by MATLAB. The FDA Tool also gives a gain value which is to be multiplied to the input signal before it is processed. The coefficients, tabulated in Table. 2 were computed for a Low Pass Butterworth filter which is realized in Transpose Direct Form II structure. Apart from the specifications mentioned in Table. 1 other filter design specifications can also be added to generate the coefficients. The basic biquad structure is shown in Fig. 1. The Direct Form II implementation requires a total of six multipliers and three adders along with two delay elements. The Gain can be adjusted so that the coefficients b0 to b2 can be powers of two. Multiplication by a power of two can be efficiently implemented as shifters in digital domain, thereby reducing the number of multipliers required for the implementation.
Biquad – Data Representation Signed fixed point representation was used for representing the input and output of the biquad. Since the input data range and the coefficients are fixed, the hardware requirement for the biquad can be reduced if fixed point representation is used while maintaining the required accuracy. In fixed point representation the integer part and the fractional part are represented in fixed number of bits. This can reduce the area required for implementation of the biquad by 2x - 3x as mentioned in Andrew (2011), when compared with floating point representation.
Figure 1. Biquad Structure – Direct Form II.
Table 1. IIR low pass filter specifications.
|
Filter Parameter |
Value |
|
|
Pass band frequency |
100 |
KHz |
|
Stop band frequency |
500 |
KHz |
|
Sampling frequency |
20 MHz |
|
180
Table 2. Biquad Coefficients - Double Precision.
|
Coefficient |
Value |
|
b0 |
1 |
|
b1 |
2 |
|
b2 |
1 |
|
a0 |
1 |
|
a1 |
-1.9555782403150355 |
|
a2 |
0.95654367651120342 |
|
Gain |
0.00024135904904198073 |
Figure 2. Filter input quantization algorithm.
Biquad – Input Quantisation As the IIR Filter implementation is in digital domain the input bit width needs to be fixed. As the input signal ranges from -1 to +1 and also to minimize the area, signed fixed point representation was used for representing the data. Considering the input data range only two bits are required to represent the integer part. The bit width for representing the fractional part is computed so that the error introduced by quantizing the input is 0.001dB or less. The error was computed using (2), where IdealFilterOut is the ideal filter output and InputQuantizedFilterOut is the output of the filter with the input quantized to fixed bits.
181
(2)
The algorithm used for finding the width is shown in Fig. 2. A sine wave in the pass band (50 KHz) and a signal in transition band (150 KHz) was used for calculating the required amount of precision. After passing the two inputs to the algorithm the maximum value of ‘N’ was taken as the length required for representing the input. It was found that the total input width came to 28 bits, 2 bits to represent the integer part along with the sign and 26 bits for representing the fractional part. Since the input is 28 bits the output word length is also 28 bits. The same design algorithm can be used to find the input and output word length for a different error requirement. As the error requirement decreases the word length increases resulting in a higher hardware requirement.
Biquad – Coefficient and Gain Quantization The gain and coefficients generated by the FDA Tool are in double precision format. To implement the filter coefficients and gain has to be quantized to finite word length in such a way that it minimizes the quantization error there by preserving the precision and also optimizing the
amount of hardware required to meet the required precision. As the biquad has a feedback path the stability of the design also needs to be taken into consideration while the coefficients are quantized. The algorithm that was used for quantization of coefficients and gain is shown in Fig. 3.
A pass band signal and a transition band signal was used for finding the width of gain and
coefficients. After completing the iteration the coefficient width was computed to be 13 bit and the gain width to be 22 bits. Since the biquad uses signed fixed point representation and also considering re-usability of the design, two bits are allotted for representing the integer part and the rest are for representing the fractional part for both gain and coefficients. The stability of the quantized filter was verified using FDA Tool by importing the quantized filter.
Biquad – Data-path Truncation As illustrated in the biquad architecture, adders and multipliers form the integral part of the filter. When two signed ‘N’ bit numbers are multiplied the result will be 2N-1 bits and if further multiplication is performed the word length increases in such a way that it cannot be implemented
in practical digital system. Similar is the case with adders. When two ‘N’ bit signed numbers are
added the resultant should be of length N+1 to avoid over flow. In the biquad the input and the
feedback values are multiplied with the coefficients, thus the word length of the result becomes the total size of both of the multiplicands. This increases the size of the data-path. As the result of the multipliers goes to the adders there size also needs to be increased and subsequently results in an increase in the number of flip flops required in the delay element. To reduce this the data path has
to be rounded in such a way that the error reflected at the output of the filter is less than 0.01dB.
Unbiased rounding or convergent rounding was used as the rounding algorithm. This reduces the bias error that will be introduced if other rounding algorithms were used. The generic convergent rounding algorithm as explained in Freescale Semiconductors (2005) was implemented in the design.
182
Figure 3. Filter coefficient quantization algorithm
IIR FILTER BIQUAD SYSTEM DESIGN RESULTS
The algorithms discussed above were implemented in MATLAB. Test sinusoids of different frequencies one in pass band and other in transition band were generated and was given as input to the algorithms. The results of the system design are consolidated in Table. 3. The bit width
183
represents the total number of bits required for representing each quantities. These digital hardware specifications go as inputs to digital implementation of the biquad in Verilog.
Table 3. Biquad Coefficients - Double Precision.
|
Coefficient |
Value |
Bit Width |
|
b0 |
1 |
- |
|
b1 |
2 |
- |
|
b2 |
1 |
- |
|
a0 |
1 |
- |
|
a1 |
-1.95556640625 |
13 |
|
a2 |
0.95654296875 |
13 |
|
Gain |
0.00024127960205078125 |
22 |
|
Input |
- |
28 |
|
Output |
- |
28 |
MATLAB MODELING OF BIQUAD
Using the system design results the IIR Filter was modelled in MATLAB. The structure of the filter as shown in Fig. 1, which is the Transpose Direct Form 2 implementation was modelled in MATLAB as the topology is canonical in terms of delay required for the design. The multiplier used in the design utilizes Modified Booth Recoding Algorithm to reduce the number of partial products along with Wallace Tree approach for adding the partial products, to reduce the computation time. In order to limit the data path word length unbiased rounding was also implemented in the design. Special care was taken for designing multiplier so as to get greater performance and efficiency as multipliers take up huge chip area and create more delay in the critical path.
Multiplier Implementation
A multiplier computes the result by computing a set of partial products and then summing up all
the partial products. Since the multiplier deals with signed numbers the partial products has to be appropriately sign extended to get to the correct results. Also the last partial product has to convert appropriately when multiplying with the sign bit. Excessive number of partial products and the
sign extension that is required can result in high hardware requirements which directly results in increases area, power and delay values. To avoid this, Modified booth algorithm in Neil & David (2010), was implemented which approximately reduces the number of partial products into half. To further speed up the addition of the generated partial products Wallace Tree approach was implemented. The partial products are added together in different stages, thereby reducing the total delay. The stage wise addition process involves grouping of partial products into three. If there are ‘r’ rows of partial products then 3 × (⌊r/3⌋) rows are grouped and the remaining rows are passed to the next stage. The grouped rows are summed using a full adder if there are three bits in one column or by using a half adder if there are only two bits. The sum and carry thus generated are passed on to the next stage. This method is repeated until only two rows are left. The method can
be generalized and extended to any number of partial products. The optimum height of the partial
product matrix at stage n+1 is given by ⌈2r/3⌉ where ‘r’ is the height of the partial product matrix
at stage n. Sign extension of the partial products was avoided by using the algorithm described in
Israel (2005).
184
Rounding of Multiplier Results Most DSP processors have a rounding algorithm in place to reduce the data path width after a multiplication operation. Implementing this has a huge advantage in terms of are required for the design. Since some data is lost in the rounding process, it adds some amount of noise to the system so care must be taken so that the error does not cause a substantial impact on the filter result. Rounding can be implemented in two ways – Biased Rounding and Unbiased or Convergent Rounding. Freescale Semiconductors (2005) describes an unbiased rounding implementation that reduces the average noise imparted to the result. The rounding algorithm is shown in Fig. 4.
Figure 4. Unbiased Rounding - 32-Bit to 16-Bit
Unfolding of Biquad The simple biquad can process only one input sample at a time. This fundamental limitation puts a limit for the maximum frequency with which the biquad can be operated. Unlike an FIR filter structure pipelining or parallelization cannot be directly applied to the biquad due to the feedback path. To implement parallelism in a biquad the system has to be unfolded. Parhi (2007) explains an unfolding algorithm which was applied in the simple biquad. The resulting unfolded structure is shown in Fig. 5. Unlike a feed forward parallel design unfolded design utilizes the same number of delay elements as was present in the original design. The design can process 3 input samples at a time to give 3 outputs, thereby increasing the effective speed for generating an output.
IIR FILTER IMPLEMENTATION USING BIQUAD
After modeling the IIR Filter Biquad in MATLAB and verifying the results, the biquad structure shown in Fig. 1 was implemented in Verilog. Parallel biquad was also implemented using the biquad that was designed earlier. The architecture of the unfolded design is shown in Fig. 5. Since the design processes 3 input samples at a time and gives 3 outputs it gives room for reducing the VDD and thereby reducing the power consumption based on the implementation requirement.
IIR FILTER VERIFICATION
For verifying the designed filter sinusoidal inputs of different frequencies each of length 65000 were created in MATLAB. These test vectors were given as inputs to the MATLAB model and the
185
Verilog model. The FFT of both responses were compared with that of the input to verify the correctness of the design. Fig. 6 shows the FFT plots of input test sinusoids and output sinusoids of the biquad and the unfolded design. The input is a combination of three sinusoid signals of frequency 50 KHz, 150 KHz and 600 KHz and amplitude 0.33. From the filter magnitude response a 50 KHz signal has a magnitude of -0.3377. This corresponds to an output signal amplitude of 0.3179. This theoretical result agrees with the one that was computed from the response. Similar validation was performed for the signals with frequency of 150 KHz and 600 KHz and was found that the results agree with that computed theoretically.
Figure 5. 3 Level Unfolded Filter
CONCLUSIONS
The simple biquad and the parallel biquad was implemented in Verilog. The Verilog implementation was synthesized using Encounter RTL compiler in 45nm NanGate Open Cell Library. The MATLAB code combined with the Verilog code can be used for code generation as HDL was written in a highly parameterized way. This improves the HDL codes reusability and for its use in code generation purposes. As a future extension to this work the multipliers topology can be changed to include multi-bit recording which can give better performance at the cost of extra hardware.
186
Figure 6. FFT plot of Filter Test Input and Output
REFERENCES
Andrew, Rushton (2011). VHDL for Logic Synthesis, John Wiley & Sons. Francis, M. (2009). “Infinite Impulse Response Filter Structures in Xilinx FPGAs”, Xilinx White Paper. Freescale Semiconductors (2005). DSP56300 Family Manual, 54-58. Neil, H. E. Weste, David Money, Harris (2010), CMOS VLSI Design - A Circuits and Systems Perspective, Addison-Wesley: Pearson. Israel, Koren (2005). Computer Arithmetic Algorithms, Universities Press, 141-175. Karen M.G.V. Gettings, Andrew K. Bolstad, Michael N. Ericson & Xiao Wang (2013). “Biquad Implementation of an IIR filter for IQ mismatch correction in an SoC RF receiver”, High Performance Extreme Computing Conference (HPEC), 1-5. Keshab, K. Parhi (2007). VLSI Digital Signal Processing Systems: Design and Implementation, John Wiley & Sons.
187
Lebih dari sekadar dokumen.
Temukan segala yang ditawarkan Scribd, termasuk buku dan buku audio dari penerbit-penerbit terkemuka.
Batalkan kapan saja.