Anda di halaman 1dari 5

International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue6- June 2013

ISSN: 2231-5381 http://www.ijettjournal.org Page 2657



Da Based Fir Filter Using APC-Oms
Technique
K.Shareef Babu
1
, H.D.Praveena
2,
K.Charan Kumar
3
1
M.Tech(DECS)
, 3
M.Tech(VLSI) Students
, 3
Assistant Professor, ECE Department
,
Sree Vidyanikethan Engineering College(Autonomous), A.Rangampet,Tirupati.


Abstract-In FPGA design the implementation fir filters
for DSP applications place an important role. The FPGA
area is mainly decided by the number of LUTs
occupied. Hence for any design if the optimisation for
the area is carried out for LUTs, then delay will also
reduce. To optimize filters using LUTs for memory
based multiplications, four basic techniques are used
from which the combination of two techniques i.e., APC
and OMS gave better optimization results. Further if
Distributed Arithmetic (DA) technique is utilised for the
filter design approach. Then an efficient area
implementation can be achieved. In this paper L=2 to 8
bit width based filters are designed and synthesised
using Xilinx ISE 10.1i. Nearly 40% area improvement is
achieved for approximately same delay.
Keywords- Field Programmable Gate Array (FPGA),
Odd Multiple Storage(OMS), Anti Symmetric Product
Coding(APC), Distributed Arithmetic (DA) and Look
Up Table(LUT).
I. INTRODUCTION
In the design of digital processors and application
specific systems digital operations are very important
[1]. The important class in digital systems arithmetic
circuits are arithmetic circuits. Now a days many
complex circuits, unthinkable have become easy with
the remarkable progress in very large scale integration
(VLSI) circuit technology [2]. In present days
semiconductor devices has become more prominent
usage in every field due to the rapid development of
increasing technology. The operation of the these
devices is very fast which consumes less power, less
area, reduces time of operation & become more
efficient with respect to the several factors such as
reliability, flexibility, scaling etc. therefore it leads to
significant growth & improvement of these devices
become cheaper [3].
The semiconductors have embedded memory
which results in dominating presence in the SOCs
exceeding 90% of the total SOC. When compared to
logical components, the semiconductor memory
devices have high transistor packing density with
increasing fast rate. Apart from that, memory based
computing structures offers more other advantages
rather than multiply accumulate structures such as
greater potential for high throughput, low latency
implementation and less dynamic power consumption.
Fixed set of coefficients involved in the multiplication
for memory based computing is well suited for many
digital signal processing (DSP) The block diagram
shown below in fig. 1 is the conventional look up
table based multiplier.

Fig.1Conventional LUT based multiplier
In most of the DSP processors the memory based
computing structures are mainly concern about the
multiplier and accumulator structures. Reducing the
computational complexity for the complex
multiplication, operations are simplified with the
usage of LUTs that are used for the direct storage of
the complex computational values. Look-up-tables
provides better performance in terms of speed and
effective area utilization [4]. Using Odd Multiple
Storage(OMS) and Anti Symmetric Product
Coding(APC) are the optimizing schemes of LUT
based FPGA design for multiply and accumulate
structures used for DSP cores[4,5,6]. To store the odd
multiples of the LUT design the OMS mechanism is
used and even multiples can be derived by shifting the
available odd multiples by using the barrel shifter.
APC is the mechanism used to reduce the required
number of LUT bit positions [5].
LUT optimization using the APC coding and OMS
methodology are the primary factors for LUT based
FIR filter is designed for DSP applications. The odd
integer representation is always used for input and
output address transformation. Previously it is
observed that, when an Anti-symmetric product
coding approach is combined with the Odd multiple
storage technique, the twos complement operations
could be very much simplified since the odd integer
representation is always used for input and output
address transformation, and both cannot be combined
since the words generated are odd numbers.



International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue6- June 2013
ISSN: 2231-5381 http://www.ijettjournal.org Page 2658

II. APC-OMS DESIGN FOR LUT BASED
MULTIPLIER DESIGN
A discrete FIR filter consists of delay element, co-
efficient and adder blocks as shown in fig. 2.

Fig. 2 General Representation of FIR filter
For a discrete-time FIR filter, the output is a
weighted sum of the current and a finite number of
previous values of the input. The operation is
described by the following equation, which defines
the output sequence y[n] in terms of its input
sequence x[n] is shown in eq(1).

y[n] =b
o
x[n] +b
1
x[n 1] ++b
n
x[n N]

= b

x[n i]
N
=0
(1)
Here b
i
are the filter coefficients, also known as tap
weights, that make up the impulse response and N is
the filter order. An Nth-order filter has (N+1) terms
on the right-hand side. The x[n-i] in these terms are
commonly referred to as taps, based on the structure
of a tapped delay line that in many implementations
or block diagrams provides the delayed inputs to the
multiplication operations. One may speak of a 5th
order/6-tap filter, for instance.
III. LOOK UP TABLE (LUT)
The tables of multiplication are pre-calculated and
stored in memory. For fast accessing of values from
the memory, LUTs are used for saving the
computation complexity. In digital logic, an n-bit
LUT can be implemented with a multiplexer whose
select lines are the inputs of LUT and inputs are
constants. An n-bit LUT can encode any n-input
Boolean function by modelling with truth tables[7].
LUTs with 4-6 bits of input are the key component of
modem FPGAs and this is an efficient way of
encoding functions. General representation of LUT
for multiplication bits are shown in below fig. 3.

Fig.3 General Formfor LUT multiplier
In general LUT multiplier it has the input bit X of
length L and AX as output bit, where A is the
constant depends on the LUT value. 2
L
words are
required for multiplying X of L-bit with constant.
With the increase in input size LUT size increases
exponentially.
LUT for input of word length L=4 requires 16
address lines to store the input bit sequence is shown
in below table 1.
TABLE1
LUT REPRESENTATION
Address
word, X
Product
word
Address
word,X
Product
word
0000 0 1000 8A
0001 A 1001 9A
0010 2A 1010 10A
0011 3A 1011 11A
0100 4A 1100 12A
0101 5A 1101 13A
0110 6A 1110 14A
0111 7A 1111 15A
TABLE 2
OMS BASED REDUCTION SCHEME FOR LUT MULTIPLIER
Address
word
Product
word
0001 A
0011 3A
0101 5A
0111 7A
1001 9A
1011 11A
1101 13A
1111 15A

By using the OMS scheme only odd multiplies are
stored in the LUT and the even multiplies of the LUT
are derived by left shifting the odd multiplies by using
the barrel shifter scheme. By using the barrel shifter
we can produce the maximum (L-1) no. of left shifts
to produce the even multiples.




International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue6- June 2013
ISSN: 2231-5381 http://www.ijettjournal.org Page 2659

TABLE 3
APC WORDS FOR DIFFERENT INPUT WORDS FOR L=4
Input
X
1
X2
1
X1
1
X0
1

Product
value
No.
of
shifts
Shifted
input,
X
11

Stored
APC
word
Address
d
2
d
1
d
0

001 A 0
001 P0=A 000 010 2XA 1
100 4XA 2
011 3A 0
011 P1=3A 001
110 2X3A 1
101 5A 0 101 P2=5A 010
111 7A 0 111 P3=7A 011

Fig. 4 APC-OMS approach for L=4
The implementation of the proposed APC-OMS
combined LUT for memory based multiplier uses two
techniques, APC and OMS method shown in fig. 4.
This method is supposed to reduce the area to one
fourth. This multiplier uses four blocks [8] [5] [6] [7].
The address generation block converts our input to
address d0, d1, d2which is produced by combining
both the APC and OMS method. The 3-to-8 address
line decoder converts the address d0, d1, d2 to LUT
address from w1 to w7. The memory array is an LUT
and barrel shifter converts the LUT output to the
desired output.The control circuit is used to produce
the controls s0, s1 which is used in the proceeding
blocks [2] [7] [8] [11]. The control and reset circuit
can be designed as
S0=x0+(x1+x2) (2)
S1=(x0+x1) (3)
Reset=x3 and x2x1 (4)
The barrel shifter will right shift circularly
according to the control values (s0 s1), using the basic
gates to produce the control elements reset, s0, s1.
From the barrel shifter, thus producing the address
(d0d1d2) to use in the next sections. The address
generator circuit consists of a barrel shifter and some
basic gates, which converts our input to an address
d0d1d2, which is obtained by combining both of our
methods anti-symmetric (APC) and odd multiple
storage (OMS).
IV. DISTRIBUTED ARITHMETIC BASED FIR
FILTER
A basic DA architecture, for a length Nth sum-of-
product computation, accepts one bit from each of N
words. If two bits per word are accepted, then the
computational speed can be essentially improved. The
maximum speed can be achieved with the fully
pipelined word-parallel architecture. For maximum
speed, a separate ROM (with identical content) for
each bit vector xb[n] should be provided.Combined
approach of FIR filter using DA technique for L=8
using APC-OMS techniques with L=4 is shown in
Fig.5.

Fig.5 Combined approach of FIR filter using DA technique for L=8
using APC-OMS techniques with L=4.

V. RESULTS AND DISCUSSIONS

Fig 6 Simulated results for L=8
The fig. 6 shows the waveforms generated using
Xilinx ISE while performing combined approach FIR
filter using DA technique for L=8 using two L=4 LUT
design using combined APC approach. The detailed
description of the given inputs and the output
generated is given further.For 8-bit input operand X,
Data_in, Address, W, P are given with inputs of
8h05, 8h02, 8h04, 8h004, 8h04 respectively at
421ns, the output Q is obtained as 8h001.
In the Fig. 7 showing the comparison for different
lengths of binary words with number of 4 input
LUTs with the combination of both APC-OMS and
DA techniques when compared to using only APC-
OMS technique. Using that combination technique
uses less number of LUTs when compared with
APC-OMS technique which implies the reduction in
the area utilized by the FPGA.
International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue6- June 2013
ISSN: 2231-5381 http://www.ijettjournal.org Page 2660


Fig.7 Comparison for number of LUTs utilized
In the fig. 8the comparison for different lengths of
binary words with number of slices with the
combination of both APC-OMS and DA techniques
when compared to using only APC-OMS technique is
shown. Using this combination technique, less
number of slices when compared with APC-OMS
technique which implies the diminution in the area
utilized by the FPGA.


Fig 8 Comparison for number of slices utilized
In the fig. 9 the comparison for different lengths of
binary words with delay in Nano seconds with the
combination of both APC-OMS and DA techniques
when compared to using only APC-OMS technique is
shown. Even though there will be slight increase in
delay but there is 40% decrease in area utilization
with the combination of both APC-OMS and DA
techniques when compared to using only APC-OMS
technique.

Fig. 9 Comparison of delay

Fig.10 Spartan 3E results for L=8
The same code is implemented in Spartan 3E
FPGA kit with input as 8h05 and getting the output
sequences in hexa decimal value as 01shown in fig.
10.
V. CONCLUSION
The LUT based multipliers can be used to
implement the constant multiplication for DSP
applications. The full advantages of proposed LUT
based design can be derived if the LUTs are
implemented as NAND or NOR read-only memories.
The OMSAPC-based LUTs can be used for higher
input sizes with different forms like parallel and
pipelined addition schemes for suitable areadelay
trade-offs. Finite impulse response plays an important
role in manyDigital Signal Processing applications. In
this method multiplier less FIR filter is implemented
using DA technique. This architecture provides an
efficient area implementation of FIR filter with less
latency, less area when compared with existing FIR
filters. L=4 to 8-bit width based filters are designed
and simulated using Xilinx ISE 10.1i. The
performance of the filter can be improved further by
pipelining all the input and partition tables for higher
input sizes.
REFERANCES
[1] K. K. Parhi, VLSI Digital Signal Processing Systems:
Design and Implementation. New York: Wiley, 1999.
[2] HanhoLee, GeraldE. Sobelman, FPGA-based digit-
serial CSD FIR filter for image signal format
conversion,MicroelectronicsJ ournal33(56)(2002)
501 508.
[3] Narender Singh Pal, Harjit Pal Singh, R.K. Sarin,
SarbjeetSingh IMPLEMENTATION OF HIGH
SPEED FIR FILTER USING SERIAL AND
PARALLEL DISTRIBUTED ARITHMETIC
ALGORITHAM International J ournal of Computer
Applications (0975 8887) Volume 25 No.7, J uly
2011.
[4] P.K. Meher, LUT Optimization for Memory-Based
Computation IEEE TRANSCTIONS ON CIRCUITS
AND SYSTEMSII: EXPRESS BRIEFS, VOL. 57,
NO. 4, APRIL 2010.
[5] International Technology Roadmap for Semiconductors.
[Online]. Available:http://public.itrs.net.
0
5
10
15
20
4
bit
5
bit
6bit 7
bit
8
bit
N
u
m
b
e
r

o
f

4
-
i
n
p
u
t

L
U
T

s
Length of binary word size
Comparision for No.of LUT's
utilized
DA
APC-OMS
0
5
10
4
bit
5
bit
6bit 7
bit
8
bit
N
u
m
b
e
r

o
f

S
l
i
c
e
s
Length of binary word size
Comparision for No.of Slices
utilized
DA
APC-OMS
6
6.5
7
7.5
4

b
i
t
5

b
i
t
6
b
i
t
7

b
i
t
8

b
i
t
D
e
l
a
y

i
n

n
a
n
o

s
e
c
o
n
d
s
Length of binary word size
Comparision of Delay
DA
APC-OMS
International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue6- June 2013
ISSN: 2231-5381 http://www.ijettjournal.org Page 2661

[6] J iafengXie,J ianjun He, Guanzheng Tan, FPGA
Realization of FIR filters for high-speed and medium-
speed by using modified distributed arithmetic
architectures, Microelectronics journal 41(2010) 365-
370.
[7] Valeria Garofalo, Fixed-width multipliers for the
implementation of efficient digital FIR filters,
Microelectronics J ournal 39(12)(2008)14911498.
[8] Eldho J ohn, P. Dinesh Kumar MODIFIED APC-OMS
COMBINED LUT FOR MEMORY BASED
COMPUTATION, International J ournal of Systems,
Algorithms & Applications Volume 2, Issue 3, March
2012, ISSN Online: 2277-2677.

Anda mungkin juga menyukai