Anda di halaman 1dari 62


1.1 BACKGROUND: In todays fast technologically developing world, the shift has been towards construction of small and portable devices. As the number of these battery operated, processor driven equipments increase and their performance demand is expected to be more, there is a need of increasing their processing speed and reducing their power dissipation. In such a consumer controlled scenario, these demands mean a serious look into the construction of the devices. These Processors used for such purposes but also, in these processors, major operations such as FIR filter design, DCT, etc are done through multipliers. As multipliers are the major components of DSP, optimization in multiplier design will surely lead to a better operating DSP.

1.2 IMPORTANCE OF MULTIPLIER : Computational performance of a DSP system is limited by its multiplication performance and since, multiplication dominates the execution time of most DSP algorithms therefore high-speed multiplier is much desired . Currently, multiplication time is still the dominant factor in determining the instruction cycle time of a DSP chip. With an ever-increasing quest for greater computing power on battery-operated mobile devices, design emphasis has shifted from optimizing conventional delay time area size to minimizing power dissipation while still maintaining the high performance . Traditionally shift and add algorithm has been implemented to design however this is not suitable for VLSI implementation and also from delay point of view. Some of the important algorithm proposed in literature for VLSI implementable fast multiplication is array multiplier and Wallace tree multiplier This paper presents the fundamental technical aspects behind these approaches. The low power and high speed VLSI can be implemented with different logic style. The three important considerations for VLSI design are power, area and delay. There are many proposed

logics (or) low power dissipation and high speed and each logic style has its own advantages in terms of speed and power.

1.3 MULTIPLIER SCHEMES: There are two basic schemes in the multiplication process. They are s erial multiplication and parallel multiplication. Serial Multiplication (Shift-Add) It Computing a set of partial products, and then summing the partial products together. The implementations are primitive with simple architectures (used when there is a lack of a dedicated hardware multiplier) Parallel Multiplication Partial products are generated simultaneously Parallel implementations are used for high performance machines, where computation latency needs to be minimized. Comparing these two types parallel multiplication has more advantage than the serial multiplication. Because the parallel type has lesser steps comparing to the serial multiplication. So it performs faster than the serial multiplication. 1.4 MULTIPLIER FEATURES: The features of the multiplier are

1.4.1 PIPELINING: Pipelining allows this multiplier to accept and start the partial process of multiplication of a set of data, eventhough a part of another multiplication is taking place.

1.4.2 MIXED ARCHITECTURE: The mixed type architecture has been considered, consisting of Wallace tree multiplier. This allows taking the advantage of low delay of Wallace multiplier.

1.4.3 CLOCKING: Clocking has been so done as to allow the multiplier to work at its highest clock frequency without compromising with the perfect flow of partial products in the structure.

1.4.4 DATA RANGE: The data range has been extended from initial 4x4 bit to 16x16 bit,which is actually the required working data range for many of the DSP processors.

1.4.5 STRUCTURAL MODELLING: This makes sure the best implementation of the multiplier, beit on ASIC or in FPGA, and removes any chance of redundant hardware that may be generated.


2.1 ADDER In electronics, an adder is a digital circuit that performs addition of numbers. In modern computers adders reside in the arithmetic logic unit (ALU) where other operations are performed. Although adders can be constructed for many numerical representations, such as Binary-coded decimal or excess-3, the most common adders operate on binary numbers. In cases where two's complement is being used to represent negative numbers it is trivial to modify an adder into an adder-subtracter.

2.2 TYPES OF ADDERS For single bit adders, there are two general types. A half adder has two inputs, generally labeled A and B, and two outputs, the sum S and carry C. S is the two-bit XOR of A and B, and C is the AND of A and B. Essentially the output of a half adder is the sum of two one-bit numbers, with C being the most significant of these two outputs.The second type of single bit adder is the full adder. The full adder takes into account a carry input such that multiple adders can be used to add larger numbers. To removeambiguity between the input and output carry lines, the carry in is labeled Ci or Cin while the carry out is labeled Co or Cout.
Half adder

Fig 1: Half adder circuit diagram A half adder is a logical circuit that performs an addition operation on two binary digits. The half adder produces a sum and a carry value which are both binary digits.

Following is the logic table for a half adder: TABLE 1: HALFADDER A 0 0 1 1 B 0 1 0 1 C 0 0 0 1 S 0 1 1 0

Fig 2: Full adder circuit diagram

Schematic symbol for a 1-bit full adder

A full adder is a logical circuit that performs an addition operation on three binary digits. The full adder produces a sum and carries value, which are both binary digits. It can be combined with other full adders (see below) or work on its own.

TABLE 2: FULL ADDER A 0 0 0 0 1 1 1 1 B 0 0 1 1 0 0 1 1 Ci 0 1 0 1 0 1 0 1 C0 0 0 0 1 0 1 1 1 S 0 1 1 0 1 0 0 1

Note that the final OR gate before the carry-out output may be replaced by an XOR gate without altering the resulting logic. This is because the only discrepancy

between OR and XOR gates occurs when both inputs are 1; for the adder shown here, one can check this is never possible. Using only two types of gates is convenient if one desires to implement the adder directly using common IC chips. A full adder can be constructed from two half adders by connecting A and B to the input of one half adder, connecting the sum from that to an input to the second adder, connecting Ci to the other input and or the two carry outputs. Equivalently, S could be made the threebit xor of A, B, and Ci and Co could be made the three-bit majority function of A, B, and Ci. The output of the full adder is the two-bit arithmetic sum of three one-bit numbers.



LITRETURE SURVEY 3.1 BASIC MULTIPLIER ARCHITECTURES: 3.1.1 INTRODUCTION: Basic multiplier consists ANDed terms (as shown in Fig 1) and array of full adders and/or half adders arranged so as to obtain partial products at each level. These partial products are added along to obtain the final result. It is the different arrangement and the construction changes in these adders that lead to various type of structures of basic multipliers .

Fig 3: AND gate

Full Adder (FA)implementation is showing the two bits(A,B) and Carry In (Ci) as inputs and Sum (S) and Carry Out (Cout) as outputs.

3.2 BINARY MULTIPLIER A Binary multiplier is an electronic hardware device used in digital electronics or a computer or other electronic device to perform rapid multiplication of two numbers in binary representation. It is built using binary adders. The rules for binary multiplication can be stated as follows 1. If the multiplier digit is a 1, the multiplicand is simply copied down and represents the product. 2. If the multiplier digit is a 0 the product is also 0. For designing a multiplier circuit we should have circuitry to provide or do the following three things: 1. it should be capable identifying whether a bit is 0 or 1. 2. It should be capable of shifting left partial products. 3. It should be able to add all the partial products to give the products as sum of partial products. 4. It should examine the sign bits. If they are alike, the sign of the product will be a positive, if the sign bits are opposite product will be negative. The sign bit of the product stored with above criteria should be displayed along with the product. From the above discussion we observe that it is not necessary to wait until all the partial products have been formed before summing them. In fact the addition of partial product can be carried out as soon as the partial product is formed. Notations: a multiplicand b multiplier p product

Binary multiplication (eg n=4 ) P = ab an1 an2a1a0 bn1 bn2b1b0 p2 n1 p2 n2p1 p0

xxxx xxxx --------xxxx xxxx xxxx xxxx --------------xxxxxxxx

a b b0a20

(Multiplicant) (Multiplier)

b1a21 (Partial Product) b2a22 b3a23 p (Partial Sum)

3.2.1 BASIC HARDWARE MULTIPLIER Partial products In binary, the partial products are trivial- If multiplier bit=1,copy the multiplicand Else 0 Use an AND gate.


3.2.2 MULTIPLY ACCUMULATE CIRCUITS Multiplication followed by accumulation is a operation in many digital systems, particularly those highly interconnected like digital filters, neural networks, data quantizers, etc. One typical MAC(multiply-accumulate) architecture is illustrated in figure. It consists of multiplying 2 values, then adding the result to the previously accumulated value, which must then be restored in the registers for future accumulations. Another feature of MAC circuit is that it must check for overflow, which might happen when the number of MAC operation is large . This design can be done using component because we have already design each of the units shown in figure. However since it is relatively simple circuit, it can also be designed directly. In any case the MAC circuit, as a whole, can be used as a component in application like digital filters and neural networks. 3.3 WALLACE TREE MULTIPLIER: A Wallace tree is an efficient hardware implementation of a digital circuit that multiplies two integers. For a NxN bit multiplication, partial products are formed from (N^2)AND gates. Next N rows of the partial products are grouped together in set of three rows each. Any additional rows that are not a member of these groups are transferred to the next level without modification. For a column consisting of three

partial products and a full adder is used with the sum dropped down to the same column whereas the carry out is brought to the next higher column. For column with two partial products, a half adder is used in place of full adder. At the final stage, a carry propagation adder is used to add over all the propagating carries to get the final result. It can also be implemented using Carry Save Adders. Sometimes it will be Combined with Booth Encoding.Various other researches have been done to reduce the number of adders, for higher order bits such as 16 & 32.Applications, as the use in DSP for performing FFT,FIR, etc.,


Fig 4: wallace tree hardware architecture


3.3.2 FUNCTION: The Wallace tree has three steps: Multiply (that is - AND) each bit of one of the arguments, by each bit of the other, yielding n2 results. Depending on position of the multiplied bits, the wires carry different weights, for example wire of bit carrying result of a2b3 is 32. Reduce the number of partial products to two by layers of full and half adders. Group the wires in two numbers, and add them with a conventional adder.

3.3.3 EXAMPLE: Suppose two numbers are being multiplied: a3a2a1a0 X b3b2b1b0 ___________________________________ a3b0 a2b0 a1b0 a0b0 a3b1 a2b1 a1b1 a0b1 a3b2 a2b2 a1b2 a0b2 a3b3 a2b3 a1b3 a0b3 _____________________________________ Arranging the partial products in the form of tree structure a3b3 a2b3 a1b3 a0b3 a0b2 a0b1 a0b0 a3b2 a2b2 a1b2 a1b1 a1b0 a3b1 a2b1 a2b0 a3b0


3.3.4 ADDER ELEMENTS Half Adder:


Full Adder:

3.3.5 ADVANTAGES: Each layer of the tree reduces the number of vectors by a factor of 3:2 Minimum propagation delay. The benefit of the Wallace tree is that there are only O(log n) reduction layers, but adding partial products with regular adders would require O(log n)2 time.

3.3.6 DISADVANTAGES: Wallace trees do not provide any advantage over ripple adder trees in many FPGAs. Due to the irregular routing, they may actually be slower and are certainly more difficult to route. Adder structure increases for increased bit multiplication .




4. ARRAY MULTIPLIER: This is the most basic form of binary multiplier construction. Its basic principle is exactly like that done by pen and paper. It consists of a highly regular array of full adders, the exact number depending on the length of the binary number to be multiplied. Each row of this array generates a partial product. This partial product generated value is then added with the sum and carry generated on the next row. The final result of the multiplication is obtained directly after the last row. ANDed terms generated using logic AND gate. Full Adder (FA) implementation showing the two bits(A,B) and Carry In (Ci) as inputs and Sum (S) and Carry Out (Co) as outputs.


Fig 5: Hardware architecture


4.2 EXAMPLE : 4*4 bit multiplication

a3 a2 a1 a0

b3 a3b0

b2 a2b0 a1b1 a0b2

b1 a1b0 a0b1

b0 a0b0

a3b1 a3b2 a3b3 p7 p6 a3b2 p5 a2b2 a3b1 p4

a2b1 a1b2 a3b0 p3





Fig 6: Array multiplier


Due to the highly regular structure, array multiplier is very easily constructed and also can be densely implemented in VLSI, which takes less space. But compared to other multiplier structures proposed later, it shows a high computational time. In fact, the computational time is of order of log O(N), one of the highest in any multiplier structure. 4.4 BAUGH-WOOLEY MULTIPLIER : Baugh-Wooley Multiplier are used for both unsigned and signed

number multiplication. Signed Numbe r operands which are represented in 2s complemented form. Partial Products are adjusted such that negative sign move to last step, which in turn maximize the regularity o f the multip lic atio n array. Baugh-W o o ley Multip lier operates on signed operands with 2s complement representation to make sure that the signs of all partial products are positive. To reiterate, the numerical value of 2s complement numbers, suppose X and Y can be obtained from following product terms made of one AND gate.

Variables with bars denotes prior inversions. Inverters are connected before the input of the full adder or the AND gates as required by the algorithm.

Each column represents the addition in accordance with the respective weight of the product term. 4.5 BAUGH-WOOLEY HARDWARE ARCHITECTURE:

Fig7: Signed 2s -Complement baugh wooley multiplier


The Baugh-wooley multiplication algorithm is an efficient way to handle the sign bits.This technique has been developed in order to design regular multipliers that is suited for 2s -complement numbers.Dr.Gebali has extended this basic idea and developed efficient fast inner product processors capable of performing doubleprecision multiply-accumulate operations without the speed penalty.Let us consider two n-bit numbers,A and B,to be multiplied. A and B can be represented as


Where the ais and b is are the bits in A and B, respectively, and an-1 and bn-1 are the sign bits. The product, P = A * B, is then given by the following equation:

It indicates that the final product is obtained by subtracting the last two positive terms from the first two terms.



Fig8: Block diagram of baugh wooley multiplier 4.8 ADVANTAGES: Minimum complexity. Easily scalable. Easily pipelined. Regular shape, easy to place & route. 4.9 DISADVANTAGES: High power consumption. More digital gates resulting in large chip area.




5.1 PROPOSED MULTIPLIER DESIGN Mathematics is a mother of all sciences. Mathematics is full of magic and mysteries. The ancient Indians were able to understand these mysteries and develop simple keys to solve these mysteries. Thousands of years ago the Indians used these techniques in different fields like construction of temples, astrology, medicine, science etc., due to which Indian emerged as the richest country in the world. The Indians called this system of calculations as The vedic mathematics. Vedic Mathematics is much simpler and easy to understand than conventional mathematics. The ancient system of Vedic Mathematics was reintroduced to the world by Swami Bharati Krishna Tirthaji Maharaj, Shan-karacharya of Goverdhan Peath. Vedic Mathematics was the name given by him. Bharati Krishna, who was himself a scholar of Sanskrit, Mathematics, History and Philosophy, was able to reconstruct the mathematics of the Vedas. According to his re-search all of mathematics is based on sixteen Sutras, or word-formulae and thirteen sub-sutras. According to Mahesh Yogi, The sutras of Vedic Mathematics are the software for the cosmic computer that runs this universe. Vedic Mathematics introduces the wonderful applications to Arithmetical computations, theory of numbers, compound multiplications, algebraic operations, factorizations, simple quadratic and higher order equations, simultaneous quadratic equations, partial fractions, calculus, squaring, cubing, square root, cube root, coordinate geometry and wonderful Vedic Numerical code. Conventional mathematics is an integral part of engineering education since most engineering system designs are based on various mathematical approaches. All the leading manufacturers of microprocessors have developed their architectures to be suit-able for conventional binary arithmetic methods. The need for faster processing speed is continuously driving major improvements in processor technologies, as well as the search for new algorithms. The Vedic mathematics approach is totally different and considered very close to the way a human mind works. A multiplier is one of the key

hardware blocks in most of applications such as digital signal processing , encryption and decryption algorithms in cryptography and in other logical computations. With advances in technology, many researchers have tried to design multipliers which offer either of the following high speed, low power consumption, regularity of layout and hence less area or even combination of them in multiplier. The Vedic multiplier is considered here to satisfy our requirements. In this work, we present multiplication operations based on Urdhva tiryagbhyam in binary, designed using a new proposed 4bit adder and implemented in HDL language. The paper is organized as follows. Vedic multiplication method based on Urdhva Tiryagbhyam sutra for binary numbers is discussed. A new 4-bit adder is proposed and deals with the design and implementation of the above said multiplier. Finally,summarizes the experimental results obtained, with this conclusions of the work.

5.2 VEDIC MULTIPLIER: Digital signal processors (DSPs) are very important in various engineering disciplines. Fast multiplication is very important in DSPs for convolution, Fourier transforms etc. A fast method for multiplication based on ancient Indian Vedic mathematics is proposed in this work. Among the various methods of multiplications in Vedic mathematics, Urdhva tiryakbhyam is discussed in detail. Urdhva tiryakbhyam is a general multiplication formula applicable to all cases of multiplication. This algorithm is applied to digital arithmetic and multiplier architecture is formulated. This is a highly modular design in which smaller blocks can be used to build higher blocks. The coding is done in Verilog HDL and synthesis is done using Altera Quartus-II. The combinational delay obtained after synthesis is compared with the performance of the Baugh wooley and Wallace tree multiplier which are fast multiplier. This Vedic multiplier can bring about great improvement in DSP performance.

5.3 IMPORTANCE OF VEDIC MATHEMATICS: Among the various methods of multiplication in Vedic mathematics, Urdhva tiryagbhyam, being a general multiplication formula, is equally applicable to all cases of multiplication. This is more efficient in the multiplication of large numbers with respect to speed and area. From this work, a 4 X 4 binary multiplier is designed using this sutra. This multiplier can be used in applications such as digital signal processing, encryption and decryption algorithms in cryptography, and in other logical computations. This design is implemented in Verilog HDL.

5.4 Urdhva Tiryakbhyam Sutra The given Vedic multiplier based on the Vedic multiplication formulae (Sutra). This Sutra has been traditionally used for the multiplication of two numbers. Urdhva Tiryakbhyam Sutra is a general multiplication formula applicable to all cases of multiplication. It means Vertically and Crosswise . The digits on the two ends of the line are multiplied and the result is added with the previous carry. When there are more lines in one step, all the results are added to the previous carry. The least significant digit of the number thus obtained acts as one of the result digits and the rest act as the carry for the next step. Initially the carry is taken to be as zero. The line diagram for multiplication of two 4-bit numbers is as shown in figure.


To illustrate this multiplication scheme, let us consider the multiplication of two decimal numbers (325 * 728). Line diagram for the multiplication is shown in Figure. The digits on the two ends of the line are multiplied and the result is added with the previous carry. When there are more lines in one step, all the results are added to the previous carry. The least significant digit of the number thus obtained acts as one of the result digits and the rest act as the carry for the next step. Initially the carry is taken to be zero

Fig 9: Multiplication of two decimal numbers by Urdhva tiryagbhyam sutra

Urdhva tiryagbhyam Sutra is used for two decimal numbers multiplication . This Sutra is used in binary multiplication as shown in Figure . The 4-bit binary numbers to be multiplied are written on two consecutive sides of the s quare as shown in the figure. The square is divided into rows and columns where each row/column corresponds to one of the digit of either a multiplier or a multiplicand. Thus, each bit of the multiplier has a small box common to a digit of the multiplicand. Each bit of the multiplier is then independently multiplied (logical AND) with every bit of the

multiplicand and the product is written in the common box. All the bits lying on a crosswise dotted line are added to the previous carry. The least significant bit of the obtained number acts as the result bit and the rest as the carry for the next step. Carry for the first step (i.e., the dotted line on the extreme right side) is taken to be zero. We can extend this method for higher order binary numbers.

Fig 10: Multiplication of two 4-bit binary numbers by Urdhva tiryagbhyam sutra Now we will extend this Sutra to binary number system. For the multiplication algorithm, let us consider the multiplication of two 8 bit binary numbers A7A6A5A4A3A2A1A0 and B7B6B5B4B3B2B1B0. As the result of this multiplication would be more than 8 bits, we express it as R7R6R5R4R3R2R1R0. As in the last case, the digits on the both sides of the line are multiplied and added with the carry from the previous step. This generates one of the bits of the result and a carry. This carry is added in the next step and hence the process goes on. If more than one lines are there in one step, all the results are added to the previous carry. In each step, least significant bit acts as the result bit and all the other bits act as carry.

For example, if in some intermediate step, we will get 011, then1 will act as result bit and 01 as the carry. Thus we will get the following expressions

R0=A0B0 C1R1=A0B1+A1B0 C2R2=C1+A0B2+A2B0+A1B1 C3R3=C2+A3B0+A0B3+A1B2+A2B1 C4R4=C3+A4B0+A0B4+A3B1+A1B3+A2B2 C5R5=C4+A5B0+A0B5+A4B1+A1B4+A3B2+A2B3 C6R6=C5+A6B0+A0B6+A5B1+A1B5+A4B2+A2B4 +A3B3 C7R7=C6+A7B0+A0B7+A6B1+A1B6+A5B2+A2B5 +A4B3+A3B4 C8R8=C7+A7B1+A1B7+A6B2+A2B6+A5B3+A3B5+A4B4 C9R9=C8+A7B2+A2B7+A6B3+A3B6+A5B4 +A4B5 C10R10=C9+A7B3+A3B7+A6B4+A4B6+A5B5 C11R11=C10+A7B4+A4B7+A6B5+A5B6 C12R12=C11+A7B5+A5B7+A6B6 C13R13=C12+A7B6+A6B7 C14R14=C13+A7B7

C14R14R13R12R11R10R9R8R7R6R5R4R3R2R1R0 being the final product. Hence this is the general mathematical formula applicable to all cases of multiplication. All the partial products are calculated in parallel and the delay associated is mainly the time taken by the carry to propagate through the adders which form the multiplication array. So, this is not an efficient algorithm for the multiplication of large numbers as a lot of propagation delay will be involved in such cases. To overcome this problem, Nikhilam Sutra will present an efficient method of multiplying two large numbers.

5.5 THE MULTIPLIER ARCHITECTURE: The multiplier architecture is based on this Urdhva tiryakbhyam sutra. The advantage of this algorithm is that partial products and their sums are calculated in parallel. This parallelism makes the multiplier clock independent. The other main advantage of this multiplier as compared to other multipliers is its regularity. Due to this modular nature the lay out design will be easy. The architecture can be explained with two four bit numbers i.e. the multiplier and multiplicand are four bit numbers. The multiplicand and the multiplier are split into four bit blocks. The four bit blocks are again divided into two bit multiplier blocks. According to the algorithm the 4 x 4 (AxB) bit multiplication will be as follows A = AH - AL, B = BH - BL

A = A3A2A1A0 B = B3B2B1B0

AH = A3A2, AL = A1A0 BH = B3B2, BL = B1B0



Fig 11: vedic Algorithm By the algorithm, the product can be obtained as follows.

Product of A x B = AL x BL + AH x BL + AL x BH + AH x BH The parallel multiplications:-


The 4 x 4 bit multiplication can be again reduced to 2 x 2 bit multiplications. The 4 bit multiplicand and the multiplier are divided into two-bit blocks. AH = AHH - AHL BH = BHH - BHL

AH x BH = AHL x BHL + AHH x BHL + AHL x BHH + AHH x BHH Here the parallel multiplications are


5.7 ADVANTAGE OF VEDIC METHODS The use of Vedic mathematics lies in the fact that it reduces the typical calculations in conventional mathematics to very simple ones. This is so because the Vedic formulae are claimed to be based on the natural principles on which the human mind works . Vedic Mathematics is a methodology of arithmetic rules that allow more efficient speed implementation . This is a very interesting field and presents some effective algorithms which can be applied to various branches of engineering such as computing.




6.1 VERILOG LANGUAGE 6.1.1 Introduction of Verilog HDL

Verilog HDL has evolved as a standard hardware description language. Verilog HDL offers many useful features.Verilog HDL is a general-purpose hardware description language that is easy to learn and easy to use. It is similar in syntax to the C programming language. Designers with C programming experience will find it easy to learn Verilog HDL. Verilog HDL allows different levels of abstraction to be mixed in the same model.Thus, a designer can define a hardware model in terms of switches, gates, RTL, or behavioral code. Also, a designer needs to learn only one language for stimulus and hierarchical design. Most popular logic synthesis tools support Verilog HDL. This makes it the language of choice for designers. All fabrication vendors provide Verilog HDL libraries for postlogic synthesis simulation. Thus, designing a chip in Verilog HDL allows the widest choice of vendors. The Programming Language Interface (PLI) is a powerful feature that allows the user to write custom C code to interact with the internal data structures of Verilog. Designers can customize a Verilog HDL simulator to their needs with the PLI.

6.2 Importance of HDLs

HDLs have many advantages compared to traditional schematic-based design. Designs can be described at a very abstract level by use of HDLs. Designers can write their RTL description without choosing a specific fabrication technology. Logic synthesis tools can automatically convert the design to any fabrication technology. If a new technology emerges, designers do not need to redesign their circuit. They simply input the RTL description to the logic synthesis tool and create a new gate-level netlist, using the new fabrication technology. The logic synthesis tool will optimize the circuit in area and timing for the new technology. By describing designs in HDLs,

functional verification of the design can be done early in the design cycle. Since designers work at the RTL level, they can optimize and modify the RTL description until it meets the desired functionality. Most design bugs are eliminated at this point. This cuts down design cycle time significantly because the probability of hitting a functional bug at a later time in the gate-level netlist or physical layout is minimized. Designing with HDLs is analogous to computer programming. A textual description with comments is an easier way to develop and debug circuits. This also provides a concise representation of the design, compared to gate-level schematics. Gate-level schematics are almost incomprehensible for very complex designs. HDL-based design is here to stay. With rapidly increasing complexities of digital circuits and increasingly sophisticated EDA tools, HDLs are now the dominant method for large digital designs. No digital circuit designer can afford to ignore HDL-based design.New tools and languages focused on verification have emerged in the past few years. These languages are better suited for functional verification. However, for logic design, HDLs continue as the preferred choice.

6.3 Trends in HDLs

The speed and complexity of digital circuits have increased rapidly. Designers have responded by designing at higher levels of abstraction. Designers have to think only in terms of functionality. EDA tools take care of the implementation details. With designer assistance, EDA tools have become sophisticated enough to achieve a close to optimum implementation. The most popular trend currently is to design in HDL at an RTL level, because logic synthesis tools can create gatelevel netlists from RTL level design. Behavioral synthesis allowed engineers to design directly in terms of algorithms and the behavior of the circuit, and then use EDA tools to do the translation and optimization in each phase of the design. However, behavioral synthesis did not gain widespread acceptance. Today, RTL design continues to be

very popular. Verilog HDL is also being constantly enhanced to meet the needs of new verification methodologies. Formal verification and assertion checking techniques have emerged. Formal verification applies formal mathematical

techniques to verify the correctness of Verilog HDL descriptions and to establish equivalency between RTL and gate-level netlists. However the need to describe a design in Verilog HDL will not go away. Assertion checkers allow checking to be embedded in the RTL code. This is a convenient way to do checking in the most important parts of a design. New verification languages have also gained rapid acceptance. These languages combine the parallelism and hardware constructs from HDLs with the object oriented nature of C++. These languages also provide support for automatic stimulus creation, checking, and coverage. However, these languages do not replace Verilog HDL. They simply boost the productivity of the verification process. Verilog HDL is still needed to describe the design. For very high-speed and timing-critical circuits like microprocessors, the gate-level netlist provided by logic synthesis tools is not optimal. In such cases, designers often mix gate-level description directly into the RTL description to achieve optimum results. This practice is opposite to the high-level design paradigm, yet it is frequently used for highspeed designs because designers need to squeeze the last bit of timing out of circuits, and EDA tools sometimes prove to be insufficient to achieve the desired results. Another technique that is used for system-level design is a mixed bottom-up methodology where the designers use either existing Verilog HDL modules, basic building blocks, or vendor-supplied core blocks to quickly bring up their system simulation. This is done to reduce development costs and compress design schedules. For example, consider a system that has a CPU, graphics chip, I/O chip, and a system bus. The CPU designers would build the nextgeneration CPU themselves at an RTL level, but they would use behavioral models for

the graphics chip and the I/O chip and would buy a vendor-supplied model for the system bus. Thus, the system-level simulation for the CPU could be up and running very quickly and long before the RTL descriptions for the graphics chip and the I/O chip are complete.




7.1 INTRODUCTION OF FPGA: A field-programmable gate array (FPGA) is an integrated circuit designed to be configured by a customer or a designer after manufacturing hence "fieldprogrammable". The FPGA configuration is generally specified using a hardware description language (HDL), similar to that used for an application-specific integrated circuit (ASIC) (circuit diagrams were previously used to specify the configuration, as they were for ASICs, but this is increasingly rare). Contemporary FPGAs have large resources of logic gates and RAM blocks to implement complex digital computations. As FPGA designs employ very fast IOs and bidirectional data buses it becomes a challenge to verify correct timing of valid data within setup time and hold time. Floor planning enables resources allocation within FPGA to meet these time constraints. FPGAs can be used to implement any logical function that an ASIC could perform. The ability to update the functionality after shipping, partial re-configuration of a portion of the design and the low non-recurring engineering costs relative to an ASIC design (notwithstanding the generally higher unit cost), offer advantages for many applications. FPGAs contain programmable logic components called "logic blocks", and a hierarchy of reconfigurable interconnects that allow the blocks to be "wired together"somewhat like many (changeable) logic gates that can be inter-wired in (many) different configurations. Logic blocks can be configured to perform complex combinational functions, or merely simple logic gates like AND and XOR. In most FPGAs, the logic blocks also include memory elements, which may be simple flipflops or more complete blocks of memory. Some FPGAs have analog features in addition to digital functions. The most common analog feature is programmable slew rate and drive strength on each output pin, allowing the engineer to set slow rates on lightly loaded pins that would otherwise

ring unacceptably, and to set stronger, faster rates on heavily loaded pins on highspeed channels that would otherwise run too slow. [ Another relatively common analog feature is differential comparators on input pins designed to be connected to differential signaling channels. A few "mixed signal FPGAs" have integrated peripheral analog-to-digital converters (ADCs) and digital-to-analog converters (DACs) with analog signal conditioning blocks allowing them to operate as a systemon-a-chip.[Such devices blur the line between an FPGA, which carries digital ones and zeros on its internal programmable interconnect fabric, and field-programmable analog array (FPAA), which carries analog values on its internal programmable interconnect fabric. 7.2 FPGA architecture: The most common FPGA architecture consists of an array of logic blocks (called Configurable Logic Block, CLB, or Logic Array Block, LAB, depending on vendor), I/O pads, and routing channels. Generally, all the routing channels have the same width (number of wires). Multiple I/O pads may fit into the height of one row or the width of one column in the array. An application circuit must be mapped into an FPGA with adequate resources. While the number of CLBs/LABs and I/Os required is easily determined from the design, the number of routing tracks needed may vary considerably even among designs with the same amount of logic. For example, a crossbar switch requires much more routing than a systolic array with the same gate count. Since unused routing tracks increase the cost (and decrease the performance) of the part without providing any benefit, FPGA manufacturers try to provide just enough tracks so that most designs that will fit in terms of Lookup tables (LUTs) and IOs can be routed. This is determined by estimates such as those derived from Rent's rule or by experiments with existing designs.

In general, a logic block (CLB or LAB) consists of a few logical cells (called ALM, LE, Slice etc.). A typical cell consists of a 4-input LUT, a Full adder (FA) and a D-type flip-flop, as shown below. The LUTs are in this figure split into two 3-input LUTs. In normal mode those are combined into a 4-input LUT through the left mux. In arithmetic mode, their outputs are fed to the FA. The selection of mode is programmed into the middle multiplexer. The output can be either synchronous or asynchronous, depending on the programming of the mux to the right, in the figure example. In practice, entire or parts of the FA are put as functions into the LUTs in order to save space.

Fig 12: FPGA architecture ALMs and Slices usually contains 2 or 4 structures similar to the example figure, with some shared signals. CLBs/LABs typically contains a few ALMs/LEs/Slices. In recent years, manufacturers have started moving to 6-input LUTs in their high performance parts, claiming increased performance. Since clock signals (and often


other high fan out signals) are normally routed via special-purpose dedicated routing networks in commercial FPGAs, they and other signals are separately managed. 7.3 Cyclone II FPGA Family: Altera's Cyclone II FPGA family is designed on an all layer copper, low k, 1.2V SRAM process and is optimized for the smallest possible die size. Built on TSMCs highly successful 90-nm process technology using 300-mm wafers, the Cyclone II FPGA family offers higher densities, more features, exceptional performance, and the benefits of programmable logic at ASIC prices. 7.4 Altera's Cyclone II FPGA Family Features: 1.Cost-Optimized Architecture The Cyclone II architecture is optimized for the lowest cost and offers up to 68,416 logic elements (LEs) more than 3x the density of first generation Cyclone FPGAs. The logic resources in Cyclone II FPGAs can be used to implement complex applications. 2.High Performance Cyclone II FPGAs are 60 percent faster than competing low-cost 90-nm making them the highest performing low-cost 90-nm FPGAs on the market. 3.Low Power Cyclone II FPGAs are half the power of competing low-cost 90-nm FPGAs, dramatically reducing both static and dynamic power. FPGAs,


4.Process Technology Cyclone II FPGAs are manufactured on 300-mm wafers using TSMC's leadingedge 90-nm, low-k dielectric process technology. 5.Embedded Multipliers Cyclone II FPGAs offer up to 150 18 x 18 multipliers that are ideal for low-cost digital signal processing (DSP) applications. These multipliers are capable of implementing common DSP functions such as finite impulse response (FIR) filters, fast Fourier transforms (FFTs), correlators, encoders/decoders, and numerically controlled oscillators (NCOs). 6.Fast On Capability Select Cyclone II FPGAs offer fast on capability, allowing them to be operational soon after power up, making them ideal for automotive and other applications where quick start-up time is essential. Cyclone II FPGAs, which offer a faster power-on reset (POR) time, are designated with an A in the device ordering code (EP2C5A, EP2C8A, EP2C15A, and EP2C20A).





Fig 13 : simulation result for baugh wooley multiplier


Fig:14 RTL schemetic of Baugh wooley multiplier



8.1.2 POWER:




Fig 15: simulation result of vedic multiplier


Fig 16: RTL schematic of the vedic multiplier




8.2.3 POWER:


TIME (ns)








TIME (ns)








BAUGH WOOLEY: 22.314 nS VEDIC: 16.939 nS



45 40
35 30

25 Total logic elements 20 15

10 5

0 Baugh-wooley multiplier Vedic multiplier



BAUGH WOOLEY: 195.68 mW VEDIC: 195.18 mW 56



CONCLUSION Multipliers are one the most important component of many systems. So we always need to find a better solution in case of multipliers. Our multipliers should always consume less power. So through our project we try to determine which of the two algorithms works the best. Our project gives a clear concept of different multiplier and their implementation in Altera Quartus-II tool. We found that the vedic multiplier is much option than the Baugh wooley multiplier. We concluded this from the result of power consumption and the total area. In case of vedic multiplier, the total area is much less than that of baugh wooley multipliers. Hence the power consumption is also less. This is clearly depicted in our results. This speeds up the calculation and makes the system faster. When the two multipliers were compared we found that baugh wooley multiplier is more power consuming and have the maximum area. This is because it uses a large number of adders. As a result it slows down the system because now the system has to do a lot of calculation. In the end we determine that Urdhva Tiryakbhyam algorithm works the best.




REFERENCES: [1] Delay-Power performance of Multipliers in VLSI Design Sumit Vaidya and Deepak Dandekar International Journal of Computer Networks & Communications (IJCNC), Vol.2, No.4, July 2010 [2] Design and implementation of different multipliers using VHDL Prof. Dr. K.K Mahapatra Dept. of Electronics and Communication Engineering, National Institute of Technology, Rourkela 2007.
[3] A Reduced -Bit Multiplication Algorithm for Digital Arithmetic Harpreet Singh

Dhillon and Abhijit Mitra, International Journal of Computational and Mathematical Sciences 2:2 ,2008. [4] VLSI implementation of vedic multiplier with reduced delay First Krishnaveni D., Department of TCE, A.P.S College of Engineering; Second Umarani T.G., Department of ECE, A.P.S College of Engineering, Somanahalli. International

Journal of Advanced Technology & Engineering Research (IJATER) National Conference on Emerging Trends in Technology (NCET-Tech) ISSN [5]A New Low Power 3232- bit Multiplier Pouya Asadi and Keivan Navi, World Applied Sciences Journal IDOSI Publication. [6] A Novel Parallel Multiply and Accumulate (V-MAC) Architecture Based On Ancient Indian Vedic Mathematics Himanshu Thapliyal and Hamid RArbania. [7] Low power and high speed 8x8 bit Multiplier Using Non-clocked Pass Transistor Logic C.Senthilpari,Ajay Kumar Singh and K. Diwadkar, 2007, IEEE.

[8] Kiat-seng Yeo and Kaushik Roy Low-voltage,low power VLSI sub system Mc Graw-Hill Publication.

[9] Jong Duk Lee, Yong Jin Yoony, Kyong Hwa Leez and Byung-Gook Park Applic ation of Dynamic Pass Transistor Logic to 8-Bit Multiplier Journal of the Korean Physical Society,March 2001 [10] C. F. Law, S. S. Rofail, and K. S. Yeo A Low-Power 1616-Bit Parallel Multiplier Utilizing Pass-Transistor Logic IEEE Journal of Solid State circuits, October 1999. [11] Low Power High Performance Multiplier C.N. Marimuthu and P.Thiangaraj, ICGST-PDCS,Volume 8, December 2008. [12] ASIC Implementation of 4 Bit Multipliers Pravinkumar Parate ,IEEE Computer society.ICETET,2008.25.

Books referred: 1.VHDL by B Bhaskar 2.Verilog HDL: A Guide to Digital Design and Synthesis, Second Edition By Samir Palnitkar, Publisher: Prentice Hall PTR, : February 21, 2003