CHAPTER 5
DESIGN AND IMPLEMENTATION OF A NOVEL BI-
RECODER BASED DIGITAL FIR FILTER
N 1
y out n coeff xin
p
n 1 (5.1)
p 0
Where, xin(n) represents the input samples of FIR filter, yout (n) represents the
output samples of FIR filter, N is the order of the filter or length of the filter
and Coeffp denotes the coefficient of filters. Impulse response of FIR filter must
be finite and therefore, filtrations also based on some finite numerical values.
Periodical multiplication and accumulation structures are used to maintain the
impulse response of FIR filter as finite.
100
Square Root Carry Select Adder (SQRT CSLA) is one of the best VLSI
based adders, because it utilizes less hardware complexity and high speed. In
[Patel, S, 2014], area efficient CSLA architecture is developed for binary
addition process. The combination of Ripple Carry Adder (RCA) and Binary to
Excess1 Conversion (BEC) unit is used in Patel, S., (2014) to reduce the
propagation delay of addition process. In SQRT CSLA, N-bit data can be
divided into groups for performing parallel addition process. Ramkumar,
B., (2012) explained low-power and area-efficient carry select adder. It uses a
simple and efficient gate-level modification to significantly reduce the area and
power of CSLA. Thakare, A.P., (2013) designed a high efficiency carry select
adder using SQRT technique. The RCA with BEC in the structure is a great
advantage to reduction in the number of gates. The modified SQRT CSLA has
slightly large area for lower order bit which reduces for higher order bit and
also delay is reduced to a greater extent. Ramkumar, B., (2012) and Thakare,
A.P., (2013), group structures for N-bit BEC based SQRT CSLA are explained
with necessary diagrams. Reduced complexity Wallace multiplier is developed
in Waters, R.S., (2010) for the design of digital FIR filter. In reduced
complexity Wallace multiplier, the partial products are arranged in an inverted
triangle order. The matrix of triangular order outputs are divided into three row
groups. Full Adders (FAs) are used for adding three bits and Single bit and a
group of two bits are moved to the next stage directly. In final stage of Wallace
tree multiplier require sufficient N-bit binary adder for performing
accumulation operation. Paradhasaradhi, D., (2014), Efficient CSLA circuit is
used for addition part of reduced complexity Wallace multiplier. A Non-Linear
Carry Select Adder (Square root carry select adder) using ripple carry adder is
developed but it offers some speed penalty. Parallel Prefix Han-Carlson Adder
is used in [Priyatharshne, T. N], for addition part of reduced complexity
Wallace multiplier. At an operating frequency of 200 MHz. Each layer of the
tree multiplier has been reduced. Also in Gowrishankar, V., (2013), both
reduced complexity Wallace multiplier and CSLA circuits are incorporated into
101
digital FIR filter. FIR filter is designed to increase the speed of addition and
decrease the power taken by the multiplier unit. Different types of multipliers
are used in various literatures to design FIR filter. Chen, J., (2014) explained
low complexity programmable FIR filters based on extended double base
number system. It presents radically different approach to this problem for the
minimization of the TM-MCM block of programmable FIR digital filters. Lai,
X., (2014) described ICMEE- p method converts the highly nonconvex
constrained Lp magnitude error design of FIR NPS filter. For instance, in
Chen, J., (2014) and Lai, X., (2014), extended double base number system is
used for designing a low complexity programmable FIR filter. Similarly,
Multiple Constant Multiplication (MCM) technique is used in Kumar, C.U.,
(2014) for multiplication part of digital FIR filter.
Ripple Carry Adder (RCA) is one of the basic VLSI based adders which
is largely affected by Carry Propagation Delay (CPD). In [chang, T.Y. 1998]
explained Carry-select adder scheme using an add-one circuit to replace one
ripple carry adder. The carry select adder that needs a single ripple carry adder
with zero carry-in, an add-one circuit, and a multiplexer. To reduce the CPD of
circuit, Carry Select Adder (CSLA) is developed in past. In CSLA circuit, N-
bit data is divided into groups to provide the parallelism. Hence, this circuit
is named as SQRT CSLA. Splitted each and every group can operate instantly
at same time. However, RCA circuits of SQRT CSLA reduce the performance
in terms of speed. Hence, one set of RCA circuits is replaced by BEC circuits
(have same functionality with less number of gates) to increase the speed of the
adder significantly. The circuit diagram of 16-bit BEC based SQRT CSLA
circuit is illustrated in figure.5.2. Every group structures have RCA, BEC and
Multiplexer circuits, hence most essential components to design group
structures of SQRT CSLA are Full Adders (FAs), Half Adders (HAs), Logic
Gates (AND, EX-OR and NOT) and Multiplexers.
outputs. Carry input (Cin) is given to the selection input of first group of
Multiplexers. Remaining groups get the Carry inputs from previous groups.
Hence, final stage of SQRT CSLA only cause little CPD than traditional RCA
circuit.
In this paper, the complexity of BEC circuits and multiplexer circuits are
realized and re-constructed to increase the performance in terms of silicon area
and power consumption. Redundant logic function of each group structures are
identified and eliminated to reduce the hardware complexity. Hence, the
developed adder circuit is named as “Reduced Complexity SQRT CSLA”. The
circuit diagram of reduced complexity SQRT CSLA for 4-bit addition (group-4
structure) is illustrated in Figure.5.4. Similarly, we can extend and compress
the circuit of Figure.5.4 for group-5 structure and group-2, group-3 structures.
When compared to the traditional group structures of BEC based SQRT CSLA,
developed group structures of reduced complexity SQRT CSLA reduces the
gate count value significantly. Gate count for traditional group structures of
SQRT CSLA and developed group structures of reduced complexity SQRT
CSLA are demonstrated in table 1. Theoretically, 38% of gate counts are
reduced in reduced complexity SQRT CSLA than traditional SQRT CSLA
adder circuits. Further, the performance of reduced complexity SQRT CSLA is
compared with Compressor based adder circuits.
A[15:11] B[15:11] A[10:7] B[10:7] A[6:4] B[6:4] A[3:2] B[3:2] A[1:0] B[1:0]
4 4 3 3 2 2 2 2
5 5
2 10 2 8 2 6 2 4
5 4 3 2 2
a b a b a b a b a b
F H F F H
C3 Sum3 Sum2
C6 Sum6 Sum5 Sum4
Figure 5.3 Group-2 and Group-3 Structure for 16-bit SQRT CSLA
106
B3 A3 B2 A2 B1 A1 B0 A0
HA HA HA FA
Cin
C S3 S2 S1 S0
This proposed work consider 3-tap FIR filter for 8-bit word length of
input sample x[n]. Fixed coefficients also considered as 8-bit word length,
which is indicated as h[0], h[1],..h [N] in Figure.5.5. The dotted line of
108
Y[n]
From the Table 5.2, it is clear that a novel reduced complexity SQRT
CSLA based Bi-Recoder Multiplier offers 6.4% reduction in silicon area and
36.58% reduction in delay and 29.95% reduction of power consumption than
compressors based Bi-Recoder Multiplier.
Due to reducing the computational path and switching activities of Bi-
Recoder Multiplier using compressor, delay of proposed Bi-Recoder Multiplier
using Reduced Complexity SQRT CSLA is reduced to 19.955ns from
31.469ns. Due to reducing the number of gate counts in Bi-Recoder Multiplier
using compressor, the area of proposed Bi-Recoder Multiplier using Reduced
complexity SQRT CSLA, the slices are reduced to 72 from 77, and the LUTs
are reduced to 130 from 145. Power consumption of Bi-Recoder Multiplier
using reduced complexity SQRT CSLA has been reduced to 685mw from
978mw, due to reducing the access terminal of intermediate input data paths.
P-MOS and N-MOS Counts
Figure 5.13 Synthesis result (area) for traditional FIR filter using Bi-
Recoder Multiplier with compressor
116
The synthesis result of timing report and power of traditional FIR filter
using Bi-Recoder Multiplier with compressor is illustrated in Figure 5.14. and
Figure 5.15.
Figure 5.15 Synthesis result (Power) for traditional FIR filter using
Bi-Recoder Multiplier with compressor
Figure 5.16 Synthesis result (area) for proposed FIR filter using
Bi-Recoder Multiplier with Reduced Complexity SQRT CSLA
119
The synthesis result of timing report and power of proposed FIR filter
using Bi-Recoder Multiplier with Reduced Complexity SQRT CSLA is
illustrated in Figure.5.17 and Figure. 5.18.
Figure 5.17 Timing report for proposed FIR filter using Bi-Recoder
Multiplier with Reduced Complexity SQRT CSLA
120
Figure 5.18 Synthesis result (power) for proposed FIR filter using Bi-
Recoder Multiplier with Reduced Complexity SQRT CSLA
Table 5.3 Comparison of FIR filter with developed and existing Bi-
Recoder Multipliers
From table 3, it is clear that proposed digital FIR filter offers 19.51%
reduction in silicon area, 13.60% reduction in delay and 3.12% reduction in
power consumption than existing digital FIR filter. Hence, a novel Bi-Recoder
multiplier supports digital FIR filter absolutely to increase the performance of
FIR filter. Further the comparisons are graphically illustrated in figure.5.19.
300
256
248
250
P-MOS and N-MOS Counts
200 Delay(ns)
Slices
150
LUT
Power(mw)
100
59
48
50 41
33
10.945 9.456
0
FIR filter using Bi-Recoder multiplier FIR filter using Bi-Recoder multiplier
with compressor with reduced complexity SQRT CSLA
Figure 5.19 Performance of FIR filter with developed and existing Bi-
Recoder Multiplier
5.5 Summary