Anda di halaman 1dari 50

DESIGN OF EFFICIENT MULTIPLIER USING VHDL A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF Master

of Technology in Electronics and Communication Engineering By ARUN SHARMA

Department of Electronics and Communication Engineering J.M.I.T. RADAUR 2010

CERTIFICATE This is to certify that the thesis entitled, TO DESIGN EFFICIENT MULTIPLIER USING VHDL submitted by MR. Arun Sharma in partial fulfillments for the requirements for the award of Master of Technology Degree in him under my supervision and guidance. To the best of my knowledge, the matter embodied in the thesis has not been submitted to any other University / Institute for the award of any Degree. Electronics and Communication Engineering at J.M.I.T. Radaur is an authentic work carried out by

Date: .

Mr. Manoj Arora H.O.D.E.C.E.Deptt. J.M.I.T.Radaur

CONTENTS

Abstract 05 List of Figuresii List of Tables iii Introduction.09 VHDL ....14 Filters..........26 Type of filters.........29 Adders... 31 Binary multipliers.......41 Results........................54 Conclusion .57 Refrences.59

Abstract
There are different entities that one would like to optimize when designing a VLSI circuit. These entities can often not be optimized simultaneously, only improve one entity at the expense of one or more others.The design of an efficient multiplier circuit in terms of power, area, and speed simultaneously, has become a very challenging problem. Power dissipation is recognized as a critical parameter in modern VLSI design field. In Very Large Scale Integration, low power VLSI design is necessary. Multiplication occurs frequently in finite impulse response filters, fast Fourier transforms, convolution, and other important DSP and multimedia kernels. The objective of a good multiplier is to provide a physically compact, good speed and low power consuming chip. To save significant power consumption of a VLSI design, it is a good direction to reduce its dynamic power that is the major part of total power dissipation. In this thesis, we propose high speed low-power multiplier algorithms. The booth multiplier will reduce the number of partial products generated by a factor of 2. The adder will avoid the unwanted addition and thus minimize the switching power dissipation. The proposed high speed low power multiplier can attain speed improvement and power reduction in the Booth encoder when compared with the conventional array multipliers.

This thesis presents an efficient implementation of high speed multiplier using the array multiplier,shift & add algorithm,Booth multipliet,pyramid algorithm & modify pyramid algorithm. In this thesis we compare the working of the these multipliers by implementing each of them separately.

Chapter-1
INTRODUCTION

LOW POWER CONSUMPTION

Multipliers are key components of many high performance systems such as FIR filters, microprocessors, digital signal processors, etc. A systems performance is generally determined by the performance of the multiplier because the multiplier is generally the slowest clement in the system. Furthermore, it is generally the most area consuming. Hence, optimizing the speed and area of the multiplier is a major design issue. However, area and speed are usually conflicting constraints so that improving speed results mostly in larger areas. As a result, a whole spectrum of multipliers with different area-speed constraints have been designed with fully parallel. Multipliers at one end of the spectrum and fully serial multipliers at the other end. In between are digit serial multipliers where single digits consisting of several bits are operated on. These multipliers have moderate performance in both speed and area. However, existing digit serial multipliers have been Plagued by complicated switching systems and/or irregularities in design. Radix 2^n multipliers which operate on digits in a parallel fashion instead of bits bring the pipelining to the digit level and avoid most ofthe above problems. They were introduced by M. K. Ibrahim in 1993. These structures are iterative and modular. The pipelining done at the digit level brings the benefit of constant operation speed irrespective of the size of the multiplier. The clock speed is only determined by the digit size which is already fixed before the design is implemented.

CHAPTER 2

BASICS OF MULTIPLICATION

3.1. Basic binary multiplier The operation of multiplication is rather simple in digital electronics. It has its origin from the classical algorithm for the product of two binary numbers. This algorithm uses addition and shift left operations to calculate the product of two numbers. Two examples are presented below. 10 x 8 = 80 -6 x 4 = -24

1 0 1 0 1 0 1 0 1 0 0 0 0 1 0 0 00 0 0 0 0 0 0 0 00 0 0 0 0 0 00 0 0 1 1 1 0 1 0 1 01 0 0 0 0 0 0 1 01 0 0 0 0 1 1 1 0 1 0 0 0 Figure 3.1.1: Basic binary multiplication The left example shows the multiplication procedure of two unsigned binary digits while the one on the right is for signed multiplication.. The first digit is called Multiplicand and the second Multiplier. The only difference between signed and unsigned multiplication is that we have to extend the sign bit in the case of signed one, as depicted in the given right example in PP row 3. Based upon the above procedure, we can deduce an algorithm for any kind of multiplication which is shown in Figure 3.1.2. Here, we assume that the MSB represents the sign of digit.

Figure 3.1.2: Signed multiplication algorithm

3.2. Partial product generation Partial product generation is the very first step in binary multiplier. These are the intermediate terms which are generated based on the value of multiplier. If the multiplier bit is 0, then partial product row is also zero, and if it is 1, then the multiplicand is copied as it is. From the 2nd bit multiplication onwards, each partial product row is shifted one unit to the left as shown in the above mentioned example. In signed multiplication, the sign bit is also extended to the left. Partial product generators for a conventional multiplier consist of a series of logic AND gates as shown in Figure 3.2.1.

X7

X6 X5 X4 X3 X2 X1 X0 Yi

PPi7 PPi6 PPi5

PPi4 PPi3 PPi2 PPi1 PPi0

Figure 3.2.1: Partial product generation logic Careful optimization of the partial-product generation can lead to some substantial delay and area reduction [1].

Chapter 4 Type Of Adders ADDER In electronics, an adder is a digital circuit that performs addition of numbers. In modern computers adders reside in the arithmetic logic unit (ALU) where other operations are performed. Although adders can be constructed for many numerical representations, such as Binary-coded decimal or excess-3, the most common adders operate on binary numbers. In cases where two's complement is being used to represent negative numbers it is trivial to modify an adder into an adder-subtracter Types of adders For single bit adders, there are two general types. A half adder has two inputs, generally labeled A and B, and two outputs, the sum S and carry C. S is the two-bit XOR of A and B, and C is the AND of A and B. Essentially the output of a half adder is the sum of two one-bit numbers, with C being the most significant of these two outputs. The second type of single bit adder is the full adder. The full adder takes into account a carry input such that multiple adders can be used to add larger numbers. To remove ambiguity between the input and output carry lines, the carry in is labeled Ci or Cin while the carry out is labeled Co or Cout. Half adder

Fig-4.1 Half adder circuit diagram

A half adder is a logical circuit that performs an addition operation on two binary digits. The half adder produces a sum and a carry value which are both binary digits.

Following is the logic table for a half adder: Input Output A B C S 0 0 0 0 1 0 10 0 0 1 1

11 1 0 Table.1

Full adder

Inputs: {A, B, Carry In} Outputs: {Sum, Carry Out} Fig-4.2 Circuit Of Full Adder

Fig-4.3 Schematic symbol for a 1-bit full adder

A full adder is a logical circuit that performs an addition operation on three binary digits. The full adder produces a sum and carries value, which are both binary digits. It can be combined with other full adders (see below) or work on its own.

Input Output A B Ci Co S 000 0 00 10 0 1 00 01 1 1 1 0 00 0 1 1 0 1

1 0 1 0 1 1 1 1 0 0 1 1 1 1 1 Table.2 Note that the final OR gate before the carry-out output may be replaced by an XOR gate without altering the resulting logic. This is because the only discrepancy between OR and

XOR gates occurs when both inputs are 1 for the adder shown here, one can check this is ; never possible. Using only two types of gates is convenient if one desires to implement the adder directly using common IC chips.

A full adder can be constructed from two half adders by connecting A and B to the input of one half adder, connecting the sum from that to an input to the second adder, connecting Ci to the other input and or the two carry outputs. Equivalently, S could be made the three-bit xor of A, B, and Ci and Co could be made the three-bit majority function of A, B, and Ci. The output of the full adder is the two-bit arithmetic sum of three one-bit numbers.

CHAPTER 5
Multiplier types Multipliers are categorized relative to their applications, architecture and the way the partial products are produced and summed up. Based on all these, a designer might find following types of multipliers.

5.1. Array multipliers In array multipliers, the counters and compressors are connected in a serial fashion for all bit slices of the Partial Product parallelogram. As can be seen in Figure 21, the array topology is a two-dimensional structure that fits nicely on the VLSI planar process [2].

PP0

PP1
Full Adder

PP2

PP3
Full Adder

PP4
Full Adder

S0 Figure 5.1: Array multiplier mechanism There are several possible array topologies including simple, double and higherorder arrays.

5.2. Simple array multiplier In this type of array, the output of each row of counters (3:2 compressors) is the input to the next row of counters [2]. In the simple array, each row of [3:2] compressors adds a partial product to the partial sum, generating a new partial sum and a sequence of carries. The delay of the array depends on the depth of the array. Therefore, the summing time for the simple array is N-2 [3:2] compressor delays, where N is the number of partial products. The drawback of this type of array is the hardware is underutilized. The counters are used only once in the calculation of the result, for the remaining time, they are idle. This drawback can be diminished by pipelining the array so that several multiplications can occur simultaneously.Pipelining the throughput of the multiplier, but would also increase the would increase

latency and area of the multiplier. A fully pipelined array is normally avoided, since the array would be faster than the clock of processor. Figure 4.2.1 depicts the layout of a simple array topology. The dots represent the partial products.

Figure-5.2: Layout Of Simple Array Technology

5.3.Serial/Parallel Multiplier In a serial/parallel multiplier,the multiplicand x arrives bit serially while the multiplier a is applied in a bit parallel format.A common approach used in such multipliers is to generate a row or diagonal of bit products in each time lot and perform the additions concurrently. Suppose the data is positiveX>0.Using carry save adder shift & add algorithm can be applied as shown in figure.

Figure-5.3: Serial/parallel multiplier


Since X is processed bit serially and coefficient a is processed bit parallel,this type of multiplier is called a serial/parallel multiplier

5.5.Shift-and-Add Multiplier Shift-and-add multiplication is similar to the multiplication performed by paper and pencil. This method adds the multiplicand X to itself Y times, where Y de-notes the multiplier. To multiply two numbers by paper and pencil, the algorithm is to take the digits of the multiplier one at a time from right to left, multiplying the multi-plicand by a single digit of the multiplier and placing the intermediate product in the appropriate positions to the left of the earlier results.As an example, consider the multiplication of two unsigned 4-bit numbers, 8 (1000) and 9 (1001).

Multiplicand Multiplier

1000 1001 1000 0000 0000 1000

In the case of binary multiplication, since the digits are 0 and 1, each step of the multiplication is simple. If the multiplier digit is 1, a copy of the multiplicand (1 multiplicand) is placed in the

Product

1001000

proper positions; if the multiplier digit is 0, a number o 0 digits (0 multiplicand) are placed in the proper positions. Consider the multiplication of positive numbers. The first version of the multiplier circuit, which implements the shift-and-add multiplication method for two n-bit numbers, is shown in Figure 5.5.1.

Figure 5.5.1: First version of the multiplier circuit

The 2n-bit product register (A) is initialized to 0. Since the basic algorithm shifts the multiplicand register (B) left one position each step to align the multiplicand with the sum being accumulated in the product register, we use a 2n-bit multiplicand register with the multiplicand placed in the right half of the register and with 0 in the left half.

Figure 5.5.2.: The first version of the multiplication algorithm.

Figure 5.5.2 shows the basic steps needed for the multiplication. The algorithm starts by loading the multiplicand into the B register, loading the

multiplier into the Q register, and initializing the A register to 0. The counter N is initialized to n. The least significant bit of the multiplier register (Q0) determines whether the multiplicand is added to the product register. The left shift of the multiplicand has the effect of shifting the intermediate products to the left, just as when multiplying by paper and pencil. The right shift of the multiplier prepares the next bit of the multiplier to ex-amine in the following iteration. Example 1 Using 4-bit numbers, perform the multiplication 9 12 (1001x1100). Answer Table 2 shows the value of registers for each step of the multiplication algorithm.
Structure of Computer Systems
Table 3. Multiply example using the first version of the algorithm.

Step 0 1

A 0000 0000 0000 0000 0000 0000

Q 1100 1100 0110 0110 0011 0011 0011 0001 0001 0001 0000

B 0000 1001 0001 0010 0001 0010 0010 0100 0010 0100 0010 0100 0100 1000 0100 1000 0100 1000 1001 0000 1001 0000

Operation Initialization Shift left B Shift right Q Shift left B Shift right Q Add B to A Shift left B Shift right Q Add B to A Shift left B Shift right Q

0000 0000 0000 0000

0010 0100 0010 0100 0010 0100

0110 1100 0110 1100 0110 1100

The original algorithm shifts the multiplicand left with zeros inserted in the new positions, so the least significant bits of the product cannot change after they are formed. Instead of shifting the multiplicand left, we can shift the product to the right. Therefore the multiplicand is fixed relative to the

product, and since we are adding only n bits, the adder needs to be only n bits wide. Only the left half of the 2n-bit product register is changed during the addition. Another observation is that the product register has an empty space with the size equal to that of the multiplier. As the empty space in the product register disap-pears, so do the bits of the multiplier. In consequence, the final version of the multi-plier circuit combines the product (A register) with the multiplier (Q register). The A register is only n bits wide, and the product is formed in the A and Q registers. Figure 5.5.3 shows the new version of the circuit.

Figure 5.5.3: Final version of the multiplier circuit. the final version of the multiplication algorithm is shown in Figure 5.5.3.

Booth Multiplication Algorithm


Booth Multiplication Algorithm Booth algorithm gives a procedure for multiplying binary integers in signed s complement representation. I will illustrate the booth algorithm with the following example: Example, 2 tenx (- 4) ten 0010 two 1100 two * 0010 Step 1: Making the Booth table I. From the two numbers, pick the number with the smallest difference between a series of consecutive numbers, and make it a multiplier. i.e., 0010 -- From 0 to 0 no change, 0 to 1 one change, 1 to 0 another change ,so there are two changes on this one 1100 -- From 1 to 1 no change, 1 to 0 one change, 0 to 0 no change, so there is only one change on this one. Therefore, multiplication of 2 x ( 4), where 2 ten(0010 two is the multiplicand ) and ( 4) ten(1100 two is the multiplier. ) II. Let X = 1100 (multiplier) Let Y = 0010 (multiplicand) Take the 2s complement of Y and call it Y Y = 1110 III. Load the X value in the table. IV. Load 0 for X-1 value it should be the previous first least significant bit of X V. Load 0 in U and V rows which will have the product of X and Y at the end of operation. VI. Make four rows for each cycle; this is because we are multiplying four bits
numbers.

U 0000

V 0000

X 1100

X-1 Load the value 1st cycle 2nd cycle 3rd Cycle 4th Cycle

Table.4

Step 2: Booth Algorithm Booth algorithm requires examination of the multiplier bits, and shifting of the product. Prior to the shifting, the multiplicand may be added to partial product, partial subtracted from the partial product, or left unchanged according to the following rules: Look at the first least significant bits of the multiplier X, and the previous least significant bits of the multiplier X - 1. I 00 Shift only 11 Shift only. 01 Add Y to U, and shift 10 Subtract Y from U, and shift or add (-Y) to U and shift II Take U & V together and shift arithmetic right shift which preserves the sign bit of 2s complement number. Thus a positive number remains positive, and a negative number remains negative. III Shift X circular right shift because this will prevent us from using two registers for the X value.

U 0000

V 0000

X 1100 0 0110 0

X-1

Shift only

0000 0000

Repeat the same steps until the four cycles are completed.
U 0000 0000 0000 V 0000 0000 0000 X X-1 1100 0 0110 0 0011 0

Shift only

U V X 0000 0000 1100 0000 0000 0110 0000 0000 0011 1110 0000 0011 1111 0000 1001

X-1 0 0 0 0 1

Add Y (0000 + 1110 = 1110) Shift

U V 0000 0000 0000 0000 0000 0000 1110 0000 1111 0000 1111 1000

X 1100 0110 0011 0011 100 1 1100

X-1 0 0 0 0 1 1 Shift only

We have finished four cycles, so the answer is shown, in the last rows of U and V which is: 11111000 two Note: By the fourth cycle, the two algorithms have the same values in the Product register

Booth multiplication algorithm for radix 4 One of the solutions of realizing high speed multipliers is to enhance parallelism which helps to decrease the number of subsequent calculation stages. The original version of the Booth algorithm (Radix-2) had two drawbacks. They are: (i) The number of addsubtract operations and the number of shift operations becomes variable and becomes inconvenient in designing parallel multipliers. (ii) The algorithm becomes inefficient when there are isolated is. These problems are overcome by using modified Radix4 Booth algorithm which scan strings of three bits with the algorithm given below: 1) Extend the sign bit 1 position if necessary to ensure that n is even. 2) Append a 0 to the right of the LSB of the multiplier. 3) According to the value of each vector , each Partial Product will he 0, +y , -y, +2y or -2y. The negative values of y are made by taking the 2s complement and in this paper Carry-look-ahead (CLA) fast adders are used. The multiplication of y is done by shifting y by one bit to the left. Thus, in any case, in designing a n-bit parallel multipliers, only n/2 partial products are generated. X(i) X(i-i) 0 0 0 0 i i i i 0 0 I I 0 0 I I X(i-2) 0 i 0 i 0 i 0 i y +0 +y +y +2y -2y -y -y +0

Table 6 Radix4 Modified Booth algorithm scheme for odd values of I .

VHDL :THE LANGUAGE

Chapter-6 EXPERIMENTAL

V.H.D.L.The Language

Many applications demand high throughput and real-time response, performance constraints that often dictate unique architectures with high levels of concurrency. DSP designers need the capability to manipulate and evaluate complex algorithms to extract the necessary level of concurrency. Performance constraints can also be addressed by applying alternative technologies. A change at the implementation level of design by the insertion of a new technology can often make viable an existing marginal algorithm or architecture. The VHDL language supports these modeling needs at the algorithm or behavioral level, and at the implementation or structural level. It provides a versatile set of description facilities to model DSP circuits from the system level to the gate level. Recently, we have also noticed efforts to include circuit-level modeling in VHDL. At the system level we can build behavioral models to describe algorithms and architectures. We would use concurrent processes with constructs common to many high-level languages, such as if, case, loop, wait, and assert statements. VHDL also includes user-defined types, functions, procedures, and packages." In many respects VHDL is a very powerful, high-level, concurrent programming language. At the implementation level we can build structural models using component instantiation statements that connect and invoke subcomponents. The VHDL generate statement provides ease of block replication and control. A dataflow level of description offers a combination of the behavioral and structural levels of description. VHDL lets us use all three levels to describe a single component. Most importantly, the standardization of VHDL has spurred the development of model libraries and design and development tools at every level of abstraction. VHDL, as a consensus description language and design environment, offers design tool portability, easy technical exchange, and technology insertion

VHDL: The language


An entity declaration, or entity, combined with architecture or body constitutes a VHDL model. VHDL calls the entity-architecture pair a design entity. By describing alternative architectures for an entity, we can configure a VHDL model for a specific level of investigation. The entity contains the interface description common to the alternative architectures. It communicates with other entities and the environment through ports and generics. Generic information particularizes an entity by specifying environment constants such as register size or delay value. For example, entity A is port (x, y: in real; z: out real); generic (delay: time); end A; The architecture contains declarative and statement sections. Declarations form the region before the reserved word begin and can declare local elements such as signals and components.Statements appear after begin and can contain concurrent statements. For instance, architecture B of A is component M port ( j : in real ; k : out real); end component; signal a,b,c real := 0.0; begin "concurrent statements" end B;

The variety of concurrent statement types gives VHDL the descriptive power to create and combine models at the structural, dataflow, and behavioral levels into one simulation model. The structural type of description makes use of component instantiation statements to invoke models described elsewhere. After declaring components, we use them in the component instantiation statement, assigning ports to local signals or other ports and giving values to generics. invert: M port map ( j => a ; k => c); We can then bind the components to other design entities through configuration specifications in VHDL's architecture declarative section or through separate configuration declarations. The dataflow style makes wide use of a number of types of concurrent signal assignment statements, which associate a target signal with an expression and a delay. The list of signals appearing in the expression is the sensitivity list; the expression must be evaluated for any change on any of these signals. The target signals obtain new values after the delay specified in the signal assignment statement. If no delay is specified, the signal assignment occurs during the next simulation cycle: c <= a + b after delay; VHDL also includes conditional and selected signal assignment statements. It uses block statements to group signal assignment statements and makes them synchronous with a guarded condition. Block statements can also contain ports and generics to provide more modularity in the descriptions. We commonly use concurrent process statements when we wish to describe hardware at the behavioral level of abstraction. The process statement consists of declarations and procedural types of statements that make up the sequential program. Wait and assert statements add to the descriptive power of the process statements for modeling concurrent actions:

process begin variable i : real := 1.0; wait on a; i = b * 3.0;

c <= i after delay; end process; Other concurrent statements include the concurrent assertion statement, concurrent procedure call, and generate statement. Packages are design units that permit types and objects to be shared. Arithmetic operations dominate the execution time of most Digital Signal Processing (DSP) algorithms and currently the time it takes to execute a multiplication operation is still the dominating factor in determining the instruction cycle time of a DSP chip and Reduced Instruction Set Computers (RISC). Among the many methods of implementing high speed parallel multipliers, there is one basic approach namely Booth algorithm. Power consumption in VLSI DSPs has gained special attention due to the proliferation of high-performance portable battery-powered electronic devices such as cellular phones, laptop computers, etc. DSP applications require high computational speed and, at the same time, suffer from stringent power dissipation constraints. Multiplier modules are common to many DSP applications. The fastest types of multipliers are parallel multipliers. Among these, the Wallace multiplier is among the fastest. However, they suffer from a bad regularity. Hence, when regularity, highperformance and low power are primary concerns, Booth multipliers tend to be the primary choice. Booth multipliers allow the operation on signed operands in 2's-complement. They derive from array multipliers where, for each bit in a partial product line, an encoding scheme is used to determine if this bit is positive, negative or zero. The Modified Booth algorithm achieves a major performance improvement through radix-4 encoding. In this algorithm each partial product line operates on 2 bits at a time, thereby reducing the total number of the partial products. This is particularly true for operands using 16 bits or more.

ANALYSIS
VHDL code for sixteen bit adder ---------------------------------------------------------------------------------library IEEE; use IEEE.STD_LOGIC_ii64.ALL; use IEEE.STD_LOGIC_ARITH.ALL; use IEEE.STD_LOGIC_SIGNED.ALL; ---- Uncomment the following library declaration if instantiating ---- any Xilinx primitives in this code. --library UNISIM; --use UNISIM.VComponents.all; entity sixteenbit_fa is Port ( a : in STD_LOGIC_VECTOR (i5 downto 0); b : in STD_LOGIC_VECTOR (i5 downto 0); -- cin : in STD_LOGIC; yout : out STD_LOGIC_VECTOR (i5 downto 0); cout : out STD_LOGIC); end sixteenbit_fa; architecture Behavioral of sixteenbit_fa is signal s: std_logic_vector(i5 downto 0); signal carryi: std_logic_vector(i6 downto 0); COMPONENT twobit_add PORT(a : IN std_logic; b : IN std_logic; cin : IN std_logic;

sum : OUT std_logic; cout : OUT std_logic); END COMPONENT; begin carryi(0)<='0'; gi : for i in 0 to i5 generate f0 : twobit_add PORT MAP(a(i), b(i),carryi(i),yout(i), carryi(i+i)); -inter_carr<=carry(i+i); end generate gi; cout<=carryi(i6); end Behavioral;

VHDL code for array multiplier library IEEE; use IEEE.STD_LOGIC_ii64. ALL; use IEEE.STD_LOGIC_ARITH.ALL; use IEEE.STD_LOGIC_UNSIGNED.ALL; ---- Uncomment the following library declaration if instantiating ---- any Xilinx primitives in this code. --library UNISIM; --use UNISIM.VComponents.all; entity mult64 is Port ( clk:in std_logic; --rst:in std_logic;

a : in std_logic_vector(7 downto 0); b : in std_logic_vector(7 downto 0); prod : out std_logic_vector(i5 downto 0)); end mult64; architecture Behavioral of mult64 is constant n:integer :=8; subtype plary is std_logic_vector(n-i downto 0); type pary is array(0 to n) of plary; signal pp,pc,ps:pary; begin pgen:for j in 0 to n-i generate pgeni:for k in 0 to n-i generate pp(j)(k)<=a(k) and b(j); end generate; pc(0)(j)<='0'; end generate; ps(0)<=pp(0); prod(0)<=pp(0)(0); addr:for j in i to n-i generate addc:for k in 0 to n-2 generate ps(j)(k)<=pp(j)(k) xor pc(j-i)(k) xor ps(j-i)(k+i); pc(j)(k)<=(pp(j)(k) and pc(j-i)(k)) or (pp(j)(k) and ps(j-i)(k+i)) or (pc(j-i)(k)and ps(j-i)(k+i)); end generate; prod(j)<=ps(j)(0); ps(j) (n-i)<=pp(j)(n-i);

end generate; pc(n)(0)<='0'; addlast:for k in i to n-i generate ps(n)(k)<=pc(n)(k-i) xor pc(n-i)(k-i) xor ps(n-i)(k); pc(n)(k)<=(pc(n)(k-i) and pc(n-i)(k-i)) or (pc(n)(k-i) and ps(n-i)(k)) or (pc(n-i)(k-i)and ps(n-i)(k)); end generate; prod(2*n-i)<=pc(n)(n-i); prod(2*n-2 downto n)<=ps(n)(n-i downto i);

end Behavioral;

VHDL code for booth encoder ---------------------------------------------------------------------------------library IEEE; use IEEE.STD_LOGIC_ii64.ALL; use IEEE.STD_LOGIC_ARITH.ALL; use IEEE.STD_LOGIC_SIGNED.ALL; use ieee.std_logic_arith.all; ---- Uncomment the following library declaration if instantiating ---- any Xilinx primitives in this code. --library UNISIM; --use UNISIM.VComponents.all;

entity booth_mult is Port ( a : in STD_LOGIC_VECTOR (7 downto 0); b : in STD_LOGIC_VECTOR (7 downto 0); yout : out STD_LOGIC_VECTOR (i5 downto 0); ovf: out std_logic); end booth_mult; architecture Behavioral of booth_mult is COMPONENT booth_encoder PORT( a : IN std_logic_vector(7 downto 0); arg : IN std_logic_vector(2 downto 0); pprod : OUT std_logic_vector(i5 downto 0) ); END COMPONENT; COMPONENT sixteenbit_fa PORT( a : IN std_logic_vector(i5 downto 0); b : IN std_logic_vector(i5 downto 0); yout : OUT std_logic_vector(i5 downto 0); cout : OUT std_logic ); END COMPONENT; signal ppi,pp2,pp3,pp4, si,s2,s3,sumi,sum2,sum 3: std_logic_vector(i5 downto 0); signal st: std_logic_vector(2 downto 0); signal ki,k2,k3: std_logic; begin st<=b(i downto 0)&'0'; u0: booth_encoder PORT MAP(a,st,ppi ); ui: booth_encoder PORT MAP(a,b(3 downto i),pp2);

u2: booth_encoder PORT MAP(a,b(5 downto 3),pp3); u3: booth_encoder PORT MAP(a,b(7 downto 5),pp4); si<=pp2(i3 downto 0)&"00"; s2<=pp3(i downto 0)&"0000"; i s3<=pp4(9 downto 0)&"000000"; u4: sixteenbit_fa PORT MAP(ppi,si,sumi,ki ); u5: sixteenbit_fa PORT MAP(sumi,s2,sum2,k2); u6: sixteenbit_fa PORT MAP(sum2,s3,yout,ovf); end Behavioral;

VHDL code for booth multiplier radix 4 -------------------------------------------------------------------------------library IEEE; use IEEE.STD_LOGIC_ii64.ALL; use IEEE.STD_LOGIC_ARITH.ALL; use IEEE.STD_LOGIC_SIGNED.ALL; use ieee.numeric_std.all; ---- Uncomment the following library declaration if instantiating ---- any Xilinx primitives in this code. --library UNISIM; --use UNISIM.VComponents.all; entity booth_encoder is -- generic(N : integer:=8);

Port ( a : in std_logic_vector(7 downto 0); arg : in std_logic_vector(2 downto 0); pprod : out std_logic_vector(i5 downto 0)); end booth_encoder; architecture Behavioral of booth_encoder is function encoder(argi: std_logic_vector(2 downto 0);data:std_logic_vector(7 downto 0)) return std_logic_vector is variable temp,tempi,temp2: std_logic_vector(8 downto 0); variable sign: std_logic; begin case argi is when "00i"|"0i0" => if data <0 then temp:='i'& data; else temp:='0'&data; end if; when "0ii" => if data<0 then tempi: ='i'&data; temp:=tempi(7 downto 0)&'0'; else temp:='0'&data(6 downto 0)&'0'; end if ; when "i00" => if data<0 then tempi: ='i'&data; temp2:=(not tempi)+"00000000i "; temp:=(temp2(7 downto 0)&'0');

else tempi: ='0'&data; temp2:=(not tempi)+"00000000i "; temp:=(temp2(7 downto 0)&'0'); end if; when "i0i"|"ii0" => if data < 0 then tempi: ='i'&data; temp:=not(tempi)+"00000000i "; else tempi: ='0'&data; temp:=(not tempi)+"00000000i "; end if; when others => temp:="000000000"; --"(others=>'0'); end case; return temp; end encoder; signal si: std_logic_vector(8 downto 0); signal s2: std_logic; begin si<=encoder(arg,a); --s2<=si(8); --pprod<=s2&s2&s2&s2&s2&s2&s2&s2&si(7 downto 0); pprod<=sxt(si,i6 ); end Behavioral;

RESULTS

RESULTS OF DIFFERENT MULTIPLIERS

Common
Device Utilization Summary Selected Device Number of Slices

of FPGA resources utilized by different multiplication algorithms


Shift and Add algorithm 3s200ft2564 18 out of 1920 1% Booth Radix-4 Multiplier 3s200ft256-4 64 out of 1920 3% 123 out of 3840 3% 32 out of 173 18% 3s200ft256-4 92 out of 1920 1% 168 out of 3840 4% 33 out of 173 19% 377 out of 3840 9% 65 out of 173 37% 3s200ft256-4 189 out 1920 9% Pyramid Algorithm

Array Multiplier

Number of 4 -input LUTs:

32 out of 3840 1%

Number of bonded IOBs Total Equivalent number of gate count for design

34 out of 173 19%

495

738

1107

4326

Power Analysis Selected Device Total Power Consumption (in mW)

Shift and Add algorithm 3s200ft2564

Array Multiplier

Booth Radix-4 Multiplier

Pyramid Algorithm 3s200ft256-4

3s200ft256-4

3s200ft256-4

37mW

50mW

37mW

37mW

Power analysis of different multiplication algorithm

Time analysis

Shift and Add algorithm

Array Multiplier

Booth Radix-4 Multiplier m

Pyramid Algorith 36.435ns

5.01ns

32.001ns

37.291

CONLUSION

CONCLUSION
Our project gives a clear concept of different multiplier and their implementation in tap delay FIR filter. We found that the parallel multipliers are much option than the serial multiplier. We concluded this from the result of power consumption and the total area. In case of parallel multipliers, the total area is much less than that of serial multipliers. Hence the power consumption is also less. This is clearly depicted in our results. This speeds up the calculation and makes the system faster. While comparing the radix 2 and the radix 4 booth multipliers we found that radix 4 consumes lesser power than that of radix 2. This is because it uses almost half number of iteration and adders when compared to radix 2. When all the threse multipliers were compared we found that array multipliers are most power consuming and have the maximum area. This is because it uses a large number of adders. As a result it slows down the system because now the system has to do a lot of calculation.

Multipliers are one the most important component of many systems. So we always need to find a better solution in case of multipliers. Our multipliers should always consume less power and cover less power. So through our project we try to determine which of the these algorithms works the best. In the end we determine that radix 4 modified booth algorithm works the best.

REFRENCES
Websites referred: i. 2. 3.

www.wikipedia.com www.howstuffswork.com www.xilinx.com

Books referred: i. Circuit Design using VHDL, by Pedroni , page number 285-293. 2. VHDL by Sjoholm Stefan 3. VHDL by B Bhaskar Papers referred: 1 C.N.Marimuthu , Dr. P. Thangaraj & Aswathy Ramesan presented a paper in international journal of computer science & information Technology,volume2,november3,2010.They. 2 C.N.Marimuthu & P Thangraj presented paper in ICGST-PDCS Volume 8,issue 1,Dec 2008. 3 E Sutter & E Boembo presented a paper in Latin American Applied Research 37:99.104(2007). 4 Soojin Kim and Kyeongsoon Cho present a paper in the world academy of Science,Engineering & Technology 61 2010. 5 A.Chandrakasan and R.Brodersen,Low Power CMOS Digital Design,IEEE J.Solid State Circuits,Vol.27,no.4,pp 473-484,Apr 1992. 6O.T.Chen,S.Wang and Y,W.WuMinimisation of switching activities o partial products for designing low power multipliersIEEE Trans.Very Large Scale Integer. (VLSI)Syst.Vol.11,N0.-3,pp4, 7 jWen-Chang Yeh and Chein-Wei Jen, High-speed Booth encoded parallel multiplier design, IEEE Trans. on Computers, vol. 49, isseu 7,pp. 692-701, July 2000.

[7] Hwang-Cherng Chow and I-Chyn Wey, A 3.3V 1GHz high speed pipelined Booth multiplier, Proc. of IEEE ISCAS, vol. 1, pp. 457-460, May 2002. [8] M. Aguirre-Hernandez and M. Linarse-Aranda, Energy-efficient high-speed CMOS pipelined multiplier, Proc. of IEEE CCE, pp. 460-464, Nov. 2008. [9] Yung-chin Liang, Ching-ji Huang and Wei-bin Yang, A 320-MHz 8bit x 8bit pipelined multiplier in ultra-low supply voltage, Proc. of IEEE A-SSCC, pp. 73-76, Nov. 2008. [10] S. B. Tatapudi and J. G. Delgado-une-2003

Anda mungkin juga menyukai