0 Suka0 Tidak suka

659 tayangan36 halamanVerilog multiplication

Feb 04, 2016

© © All Rights Reserved

PDF, TXT atau baca online dari Scribd

Verilog multiplication

© All Rights Reserved

659 tayangan

Verilog multiplication

© All Rights Reserved

- The Woman Who Smashed Codes: A True Story of Love, Spies, and the Unlikely Heroine who Outwitted America's Enemies
- NIV, Holy Bible, eBook
- NIV, Holy Bible, eBook, Red Letter Edition
- Steve Jobs
- Cryptonomicon
- Hidden Figures Young Readers' Edition
- Make Your Mind Up: My Guide to Finding Your Own Style, Life, and Motavation!
- Console Wars: Sega, Nintendo, and the Battle that Defined a Generation
- The Golden Notebook: A Novel
- Alibaba: The House That Jack Ma Built
- Life After Google: The Fall of Big Data and the Rise of the Blockchain Economy
- Hit Refresh: The Quest to Rediscover Microsoft's Soul and Imagine a Better Future for Everyone
- Hit Refresh: The Quest to Rediscover Microsoft's Soul and Imagine a Better Future for Everyone
- The Innovators: How a Group of Hackers, Geniuses, and Geeks Created the Digital Revolution
- Autonomous: A Novel
- Algorithms to Live By: The Computer Science of Human Decisions
- Digital Gold: Bitcoin and the Inside Story of the Misfits and Millionaires Trying to Reinvent Money

Anda di halaman 1dari 36

A

Mini Project Report

Submitted in the Partial Fulfillment of the

Requirements

for the Award of the Degree of

BACHELOR OF TECHNOLOGY

IN

Submitted

By

K.BHARGAV

11885A0401

P.DEVSINGH

11885A0404

Mr. S. RAJENDAR

Associate Professor

Department of ECE

(AUTONOMOUS)

(Approved by AICTE, Affiliated to JNTUH & Accredited by NBA)

2013- 14

(AUTONOMOUS)

Es td.1999

CERTIFICATE

This is to certify that the mini project report work entitled Bit-Serial Multiplier Using

Verilog HDL carried out by Mr. K.Bhargav, Roll Number 11885A0401, Mr. P.Devsingh, Roll

Number 11885A0404, submitted to the department of Electronics and Communication

Engineering, in partial fulfillment of the requirements for the award of degree of Bachelor of

Technology in Electronics and Communication Engineering during the year 2013 2014.

Mr. S. Rajendar

Dr. J. V. R. Ravindra

Associate Professor

Head, ECE

Kacharam (V), Shamshabad (M), Ranga Reddy (Dist.) 501 218, Hyderabad, A.P.

Ph: 08413-253335, 253201, Fax: 08413-253482, www.vardhaman.org

ACKNOWLEDGEMENTS

The satisfaction that accompanies the successful completion of the task would be

put incomplete without the mention of the people who made it possible, whose constant

guidance and encouragement crown all the efforts with success.

I express my heartfelt thanks to Mr. S. Rajendar, Associate Professor, technical

seminar supervisor, for her suggestions in selecting and carrying out the in-depth study of

the topic. Her valuable guidance, encouragement and critical reviews really helped to

shape this report to perfection.

I wish to express my deep sense of gratitude to Dr. J. V. R. Ravindra, Head of

the Department for his able guidance and useful suggestions, which helped me in

completing the technical seminar on time.

I also owe my special thanks to our Director Prof. L. V. N. Prasad for his intense

support, encouragement and for having provided all the facilities and support.

Finally thanks to all my family members and friends for their continuous support

and enthusiastic help.

K.Bhargav

P.Devsingh

iii

11885A0401

11885A0404

ABSTRACT

Bit-serial arithmetic is attractive in view of it is smaller pin count, reduced wire

length, and lower floor space requirement in VLSI. In fact ,the compactness of the design

may allow us to run a bit-serial multiplier at a clock rate high enough to make the unit

almost competitive with much more complex designs with regard to speed. In addition, in

certain application contexts inputs are supplied bit-serially anyway. In such a case, using

a parallel multiplier would be quite wasteful, since the parallelism may not lead to any

speed benefit. Furthermore, in applications that call for a large number of independent

multiplications, multiple bit-serial multiplier may be more cost-effective than a complex

highly pipelined unit.

Bit-serial multipliers can be designed as systolic arrays: synchronous arrays of

processing element that are interconnected by only short, local wires thus allowing very

high clock rates. Let us begin by introducing a semi systolic multiplier, so named because

its design involves broadcasting a single bit of the multiplier x to a number of circuit

element, thus violating the short, local wires requirement of pure systolic design.

iv

CONTENTS

Acknowledgements

(iii)

Abstract

(iv)

List Of Figures

(vii)

1 INTRODUCTION

1.3 Multiplication

2 VLSI

2.1 Introduction

2.6 Conclusion

3 VERILOG HDL

10

3.1 Introduction

10

11

3.3 SYNTHESIS

12

3.4 Conclusion

12

4 BIT-SERIAL MULTIPLIER

14

4.1 Multiplier

14

4.2 Background

14

15

v

15

16

18

19

5 IMPLEMENTATION

22

22

22

22

23

24

24

25

25

26

26

27

6 CONCLUSIONS

28

REFERENCES

29

vi

LIST OF FIGURES

3.1

11

3.2

Synthesis process

12

3.3

13

4.1

15

4.2

17

4.3

18

4.4

19

4.5

21

5.1

22

5.2

Simulation window

23

5.3

Waveform window

23

5.4

24

5.5

25

5.6

26

5.7

27

5.8

27

vii

CHAP TER 1

INTRODUCTION

1.1 The Context of Computer Arithmetic

Advances in computer architecture over the past two decades have allowed the

performance of digital computer hardware to continue its exponential growth, despite

increasing technological difficulty in speed improvement at the circuit level. This

phenomenal rate of growth, which is expected to continue in the near future, would not

have been possible without theoretical insights, experimental research, and tool-building

efforts that have helped transform computer architecture from an art into one of the most

quantitative branches of computer science and engineering. Better understanding of the

various forms of concurrency and the development of a reasonably efficient and userfriendly programming model has been key enablers of this success story.

The downside of exponentially rising processor performance is an unprecedented

increase in hardware and software complexity. The trend toward greater complexity is not

only at odds with testability and verifiability but also hampers adaptability, performance

tuning, and evaluation of the various trade-offs, all of which contribute to soaring

development costs. A key challenge facing current and future computer designers is to

reverse this trend by removing layer after layer of complexity, opting instead for clean,

robust, and easily certifiable designs, while continuing to try to devise novel methods for

gaining performance and ease-of-use benefits from simpler circuits that can be readily

adapted to application requirements.

In the computer designers quest for user-friendliness, compactness, simplicity,

high performance, low cost, and low power, computer arithmetic plays a key role. It is

one of oldest subfields of computer architecture. The bulk of hardware in early digital

computers resided

generation computer designers were motivated to simplify and share hardware to the

extent possible and to carry out detailed cost- performance analyses before proposing a

design. Many of the ingenious design methods that we use today have their roots in the

bulky, power-hungry machines of 30-50 years ago.

In fact computer arithmetic has been so successful that it has, at times, become

transparent. Arithmetic circuits are no longer dominant in terms of complexity; registers,

memory and memory management, instruction issue logic, and pipeline control have

become the dominant consumers of chip area in todays processors. Correctness and high

performance of arithmetic circuits is routinely expected, and episodes such as the Intel

1

The preceding context is changing for several reasons. First, at very high clock

rates, the interfaces between arithmetic circuits and the rest of the processor become

critical. Arithmetic units can no longer be designed and verified in isolation. Rather, an

integrated design optimization is required, which makes the development even more

complex and costly. Second, optimizing arithmetic circuits to meet design goals by taking

advantage of the strengths of new technologies, and making them tolerant to the

weaknesses, requires a reexamination of existing design paradigms. Finally, incorporation

of higher-level arithmetic primitives into hardware makes the design, optimization, and

verification efforts highly complex and interrelated.

This is why computer arithmetic is alive and well today. Designers and

researchers in this area produce novel structures with amazing regularity. Carrylookahead adders comprise a case in point. We used to think, in the not so distant past,

that we knew all there was to know about carry-lookahead fast adders. Yet, new designs,

improvements, and optimizations are still appearing. The ANSI/IEEE standard floatingpoint format has removed many of the concerns with compatibility and error control in

floating-point computations, thus resulting in new designs and products with mass-market

appeal. Given the arithmetic-intensive nature of many novel application areas (such as

encryption, error checking, and multimedia), computer arithmetic will continue to thrive

for years to come.

A sequence of events, begun in late 1994 and extending into 1995, embarrassed

the worlds largest computer chip manufacturer and put the normally dry subject of

computer arithmetic on the front pages of major newspapers. The events were rooted in

the work of Thomas Nicely, a mathematician at the Lynchburg College in Virginia, who

is interested in twin primes (consecutive odd numbers such as 29 and 31 that are both

prime). Nicelys work involves the distribution of twin primes and, particularly, the sum

of their reciprocals S = 1/5 + 1/7 1/11+1/13 +1/17 +1/19+1/29+1/31+-+1/P +1/(p +2) + - -. While it is known that the infinite sum S has a finite value, no one knows what the

value is.

Nicely was using several different computers for his work and in March 1994

added a machine based on the Intel Pentium processor to his collection. Soon he began

noticing inconsistencies in his calculations and was able to trace them back to the values

computed for 1 / p and 1 / (p + 2) on the Pentium processor. At first, he suspected his own

programs, the compiler, and the operating system, but by October, he became convinced

2

that the Intel Pentium chip was at fault. This suspicion was confirmed by several other

researchers following a barrage of e-mail exchanges and postings on the Internet. The

diagnosis finally came from Tim Coe, an engineer at Vitesse Semiconductor. Coe built a

model of Pentiums floating-point division hardware based on the radix-4 SRT algorithm

and came up with an example that produces the worst-case error. Using double-precision

floating- point computation, the ratio c = 4 195 835/3 145 727 = 1.333 820 44- - - is

computed as 1.333 739 06 on the Pentium. This latter result is accurate to only 14 bits;

the error is even larger than that of single-precision floating-point and more than 10

orders of magnitude worse that what is expected of double-precision computation.

The rest, as they say, is history. Intel at first dismissed the severity of the problem

and admitted only a subtle flaw, with a probability of 1 in 9 billion, or once in 27,000

years for the average spreadsheet user, of leading to computational errors. It nevertheless

published a white paper that described the bug and its potential consequences and

announced a replacement policy for the defective chips based on customer need; that is,

customers had to show that they were doing a lot of mathematical calculations to get a

free replacement. Under heavy criticism from customers, manufacturers using the

Pentium chip in their products, and the on-line community, Intel later revised its policy to

no-questions-asked replacement.

Whereas supercomputing, microchips, computer networks, advanced applications

(particularly chess-playing programs), and many other aspects of computer technology

have made the news regularly in recent years, the Intel Pentium bug was the first instance

of arithmetic (or anything inside the CPU for that matter) becoming front-page news.

While this can be interpreted as a sign of pedantic dryness, it is more likely an indicator

of stunning technological success. Glaring software failures have come to be routine

events in our information-based society, but hardware bugs are rare and newsworthy.

Within the hardware realm, we will be dealing with both general-purpose

arithmetic/logic units (ALUS), of the type found in many commercially available

processors, and special-purpose structures for solving specific application problems. The

differences in the two areas are minor as far as the arithmetic algorithms are concerned.

However, in view of the specific technological constraints, production volumes, and

performance criteria, hardware implementations tend to be quite different. Generalpurpose processor chips that are mass-produced have highly optimized custom designs.

Implementations of 1ow-volume, special-purpose systems, on the other hand, typically

rely on semicustom and off-the-shelf components. However, when critical and strict

requirements, such as extreme speed, very low power consumption, and miniature size,

3

preclude the use of semicustom or off-the shelf components, the much higher cost of a

custom design may be justified even for a special-purpose system.

1.3 Multiplication

Multiplication (often denoted by the cross symbol "", or by the absence of

symbol) is the third basic mathematical operation of arithmetic, the others being addition,

subtraction and division (the division is the fourth one, because it requires multiplication

to be defined). The multiplication of two whole numbers is equivalent to the addition of

one of them with itself as many times as the value of the other one; for example, 3

multiplied by 4 (often said as "3 times 4") can be calculated by adding 4 copies of 3

together: 3 times 4 = 3 + 3 + 3 + 3 = 12 Here 3 and 4 are the "factors" and 12 is the

"product". One of the main properties of multiplication is that the result does not depend

on the place of the factor that is repeatedly added to it (commutative property). 3

multiplied by 4 can also be calculated by adding 3 copies of 4 together: 3 times 4 = 4 + 4

+ 4 = 12. The multiplication of integers (including negative numbers), rational numbers

(fractions) and real numbers is defined by a systematic generalization of this basic

definition. Multiplication can also be visualized as counting objects arranged in a

rectangle (for whole numbers) or as finding the area of a rectangle whose sides have

given lengths. The area of a rectangle does not depend on which side is measured first,

which illustrates the commutative property. In general, multiplying two measurements

gives a new type, depending on the measurements. For instance: 2.5 meters \times 4.5

meters = 11.25 square meters 11 meters/second times 9 seconds = 99 meters The inverse

operation of the multiplication is the division. For example, since 4 multiplied by 3 equals

12, then 12 divided by 3 equals 4. Multiplication by 3, followed by division by 3, yields

the original number (since the division of a number other than 0 by itself equals 1).

Multiplication is also defined for other types of numbers, such as complex numbers, and

more abstract constructs, like matrices. For these more abstract constructs, the order that

the operands are multiplied sometimes does matter.

Multiplication often realized by k cycles of shifting and adding, is a heavily used

arithmetic

operation

that

figures

prominently

in

signal processing

and

scientific

applications. In this part, after examining shift/add multiplication schemes and their

various implementations, we note that there are but two ways to speed up the underlying

multi operand addition: reducing the number of operands to be added leads to high-radix

multipliers, and devising hardware multi operand adders that minimize the latency and/or

maximize the throughput leads to tree and array multipliers. Of course, speed is not the

only criterion of interest. Cost, VLSI area, and pin limitations favor bit-serial designs,

4

while the desire to use available building blocks leads to designs based on additive

multiply modules. Finally, the special case of squaring is of interest as it leads to

considerable simplification

This report starts with introduction to computer arithmetic and then introduces

multiplication. Then it explains implementation of one of the multiplier bit serial

multiplier.

Chapter 1: Introduction This chapter explains importance of computer arithmetic and

multiplication in computations.

Chapter 2: VLSI This chapter focuses on VLSI and its evolution, also its applications

and advantages

Chapter 3: Verilog HDL This chapter explains how HDLs reduce design cycle in VLSI

and automation makes faster implementation.

Chapter 4: Bit-serial multiplier This chapter explains about multiplier and its types and

how bit serial multiplier is useful.

Chapter 5: Implementation This chapter explains Implementation flow of Bit-serial

multiplier its Verilog code and output waveforms.

Chapter 6: Conclusions This chapter summarizes Bit-serial multiplier and its future

improvements.

CHAP TER 2

VLSI

2.1 Introduction

Very-large-scale

integration (VLSI)

is the

began in the 1970s when complex semiconductor and communication technologies

were being developed. The microprocessor is a VLSI device. The term is no longer as

common as it once was, as chips have increased in complexity into the hundreds of

millions of transistors.

The first semiconductor chips held one transistor each. Subsequent advances

added more and more transistors, and, as a consequence, more individual functions or

systems were integrated over time. The first integrated circuits held only

few

devices, perhaps as many as ten diodes, transistors, resistors and capacitors, making it

possible to fabricate one or more logic gates on a single device. Now known

retrospectively as "small-scale integration" (SSI), improvements in technique led to

devices with hundreds of logic gates, known as large-scale integration (LSI),

i.e.

systems with at least a thousand logic gates. Current technology has moved far past

this mark and today's microprocessors have many millions of gates and hundreds of

millions of individual transistors.

At one time, there was an effort to name and calibrate various levels of largescale integration above VLSI. Terms like Ultra-large-scale Integration (ULSI) were

used. But the huge number of gates and transistors available on common devices has

rendered such fine distinctions moot. Terms suggesting greater than VLSI levels of

integration are no longer in widespread use. Even VLSI is now somewhat quaint,

given the common assumption that all microprocessors are VLSI or better.

As of early 2008, billion-transistor processors are commercially available, an

example of which is Intel's Montecito Itanium chip. This is expected to become more

commonplace as semiconductor fabrication moves from the current generation of 65 nm

processes to the next 45 nm generations (while experiencing new challenges such as

increased variation across process corners).

This microprocessor is unique in the fact that its 1.4 Billion transistor count,

capable of a teraflop of performance, is almost entirely dedicated to logic (Itanium's

transistor count is largely due to the 24MB L3 cache). Current designs, as opposed to

the earliest devices, use extensive design automation and automated logic synthesis to

6

lay out the transistors, enabling higher levels of complexity in the resulting logic

functionality. Certain high-performance logic blocks like the SRAM cell, however, are

still designed by hand to ensure the highest efficiency (sometimes by bending or

breaking established design rules to obtain the last bit of performance by trading

stability).

VLSI stands for "Very Large Scale Integration". This is the field which involves

packing more and more logic devices into smaller and smaller areas.

Design/manufacturing

of extremely

small,

modified

semiconductor material

Integrated circuit (IC) may contain millions of transistors, each a few mm in size

o 10s of transistors on a chip

o late 60s Medium Scale Integration (MSI)

o 100s of transistors on a chip

o

early 80s VLSI 10,000s of transistors on a chip (later 100,000s & now 1,000,000s)

While we will concentrate on integrated circuits, the properties of integrated

circuits-what we can and cannot efficiently put in an integrated circuit- largely

determine the architecture of the entire system.

characteristics in several critical ways. ICs have three key advantages over digital

circuits built from discrete components:

Size: Integrated circuits are much smaller-both transistors and wires are shrunk to

micrometer sizes, compared

to

the millimeter or

7

components. Small size leads to advantages in speed and power consumption, since

smaller components have smaller parasitic resistances, capacitances, and inductances.

Speed: Signals can be switched between logic 0 and logic 1 much quicker within a chip

than they can between chips. Communication within a chip can occur hundreds of

times faster than communication between chips on a printed circuit board. The high

speed of circuits on- chip is due to their small size-smaller components and wires have

smaller parasitic capacitances to slow down the signal.

Power consumption: Logic operations within a chip also take much less power. Once

again, lower power consumption is largely due to the small size of circuits on the chipsmaller parasitic capacitances and resistances require less power to drive them

These advantages of integrated circuits translate into advantages at the system

level:

Smaller physical size: Smallness is often an advantage in itself- consider portable

televisions or handheld cellular telephones.

Lower power consumption: Replacing a handful of standard parts with a single chip

reduces total power consumption. Reducing power consumption has a ripple effect on

the rest of the system: a smaller, cheaper power supply can be used; since less power

consumption means less heat, a fan may no longer be necessary; a simpler cabinet with

less shielding for electromagnetic shielding may be feasible, too.

Reduced cost: Reducing the number of components, the power supply requirements,

cabinet costs, and so on, will inevitably reduce system cost. The ripple effect of

integration is such that the cost of a system built from custom ICs can be less, even

though

the

individual

ICs

cost

more

than the

standard

Communication within a chip can occur hundreds of times faster than communication

between chips on a printed circuit board.

Understanding why integrated circuit technology has such profound influence

on the design of digital systems requires understanding both the technology of IC

manufacturing and the economics of ICs and digital systems.

Electronic systems now perform a wide variety of tasks in daily life. Electronic

systems in some cases have replaced

mechanically,

hydraulically, or by other means; electronics are usually smaller, more flexible, and

easier to service. In other cases electronic systems have created totally new applications.

8

Electronic systems perform a variety of tasks, some of them visible, some more hidden.

Electronic systems in cars operate stereo systems and displays; they also control fuel

injection systems, adjust suspensions to

rates, on-the-fly in consumer electronics.

despite their dedicated function.

and games. Computers include both central processing units (CPUs) and specialpurpose hardware for disk access, faster screen display, etc.

processing algorithms to warn about unusual conditions. The availability of these

complex systems, far from overwhelming consumers, only creates demand for

even more complex systems.

2.6 Conclusion

The growing sophistication of applications continually pushes the design and

manufacturing of integrated circuits and electronic systems to new levels of complexity.

And perhaps the most amazing characteristic of this collection of systems is its varietyas systems become more complex, we build not a few general-purpose computers but

an ever wider range of special-purpose systems. Our ability to do so is a testament to

our growing mastery of both integrated circuit manufacturing and design, but the

increasing

demands

of customers

continue

manufacturing.

to

test

the

limits

of design

and

CHAP TER 3

VERILOG HDL

3.1 Introduction

Verilog HDL is a hardware description language that can be used to model a

digital system at many levels of abstraction ranging from the algorithmic-level to the

gate-level to the switch-level. The complexity of the digital system being modeled

could vary from that of a simple gate to a complete electronic digital system, or

anything in between. The digital system can be described hierarchically and timing

can be explicitly modeled within the same description.

The Verilog HDL language includes capabilities to describe the behavior-al

nature of a design, the dataflow nature of a design, a design's structural composition,

delays and a waveform generation mechanism including aspects of response monitoring

and verification, all modeled using one single language. In addition, the language

provides a programming language interface through which the internals of a design can

be accessed during simulation including the control of a simulation run.

The language not only defines the syntax but also defines very clear simulation

semantics for each language construct. Therefore, models written in

this language

can be verified using a Verilog simulator. The language inherits many of its operator

symbols and constructs from the C programming language. Verilog HDL provides an

extensive

comprehend initially. However, a core subset of the language is quite easy to learn and

use. This is sufficient to model most applications.

The Verilog HDL language was first developed by Gateway Design Automation

in 1983 as hardware are modeling language for their simulator product, At that time ,it

was a proprietary language. The Verilog HDL language includes capabilities to describe

the behavior-al nature of a design, the dataflow nature of a design, a design's structural

Because of the popularity of the, simulator product, Verilog HDL gained acceptance as a

usable and practical language by a number of designers. In an effort to increase the

popularity of the language, the language was placed in the public domain in 1990.

Open Verilog International (OVI) was formed to promote Verilog. In 1992 OVI

decided to pursue standardization of Verilog HDL as an IEEE standard. This effort was

successful and the language became an IEEE standard in 1995. The complete standard is

described in the Verilog hardware description language reference manual. The standard

is called std. 1364-1995.

10

Listed below are the major capabilities of the Verilog hardware description:

Primitive logic gates, such as and, or and nand, are built-in into the language.

either be a combinational logic primitive or a sequential logic primitive.

Switch-level modeling primitive gates, such as pmos and nmos, are also built- in

into the language.

styles are: behavioral style modeled using procedural constructs; dataflow style

- modeled using continuous assignments; and structural style modeled using

gate and module instantiations.

There are two data types in Verilog HDL; the net data type and the register

data type. The net type represents a physical connection between structural

elements while a register type represents an abstract data storage element.

Figure.3-1 shows the mixed-level modeling capability of Verilog HDL, that is, in

one design; each module may be modeled at a different level.

Verilog HDL also has built-in logic functions such as & (bitwise-and) and I

(bitwise-or).

High-level

programming

language

constructs

such

as

conditionals,

case

The language is non-deterministic under certain situations, that is, a model may

produce different results on different simulators; for example, the ordering of

events on an event queue is not defined by the standard.

11

3.3 SYNTHESIS

Synthesis is the process of constructing a gate level netlist from a registertransfer level model of a circuit described in Verilog HDL. Figure.3-2 shows such a

process. A synthesis system may as an intermediate step, generate a netlist that is

comprised of register-transfer level blocks such as flip-flops, arithmetic-logic-units,

and multiplexers, interconnected by wires. In such a case, a second program called the

RTL module builder is necessary. The purpose of this builder is to build, or acquire

from a library of predefined components, each of the required RTL blocks in the userspecified target technology.

The above figure shows the basic elements of Verilog HDL and the elements

used in hardware. A mapping mechanism or a construction mechanism has to be

provided that translates the Verilog HDL elements into their corresponding hardware

elements as shown in figure.3-3

3.4 Conclusion

The Verilog HDL language includes capabilities to describe the behavior-al

nature of a design, the dataflow nature of a design, a design's structural composition,

delays and a waveform generation mechanism including aspects of response monitoring

and verification, all modeled using one single language. The language not only defines

the syntax but also defines very clear simulation semantics for each language construct.

Therefore, models written in

The Verilog HDL language includes capabilities to describe the behavior-al nature of

a design, the dataflow nature of a design, a design's structural composition, delays.

12

13

CHAP TER 4

BIT-SERIAL MULTIPLIER

4.1 Multiplier

Multipliers are key components of many high performance systems such as FIR

filters, microprocessors, digital signal processors, etc. A systems performance is

generally determined by the performance of the multiplier because the multiplier is

generally the slowest clement in the system. Furthermore, it is generally the most area

consuming. Hence, optimizing the speed and area of the multiplier is a major design

issue. However, area and speed are usually conflicting constraints so that improving

speed results mostly in larger areas. As a result, whole spectrums of multipliers with

different area-speed constraints are designed with fully parallel processing. In between

are digit serial multipliers where single digits consisting of several bits are operated on.

These multipliers have moderate performance in both speed and area. However, existing

digit serial multipliers have been plagued by complicated switching systems and/or

irregularities in design. Radix 2^n multipliers which operate on digits in a parallel fashion

instead of bits bring the pipelining to the digit level and avoid most of the above

problems. They were introduced by M. K. Ibrahim in 1993. These structures are iterative

and modular. The pipelining done at the digit level brings the benefit of constant

operation speed irrespective of the size of the multiplier. The clock speed is only

determined by the digit size which is already fixed before the design is implemented.

The growing market for fast floating-point co-processors, digital signal processing

chips, and graphics processors has created a demand for high speed, area-efficient

multipliers. Current architectures range from small, low-performance shift and add

multipliers, to large, high performance array and tree multipliers. Conventional linear

array multipliers achieve high performance in a regular structure, but require large

amounts of silicon. Tree structures achieve even higher performance than linear arrays

but the tree interconnection is more complex and less regular, making them even larger

than linear arrays. Ideally, one would want the speed benefits of a tree structure, the

regularity of an array multiplier, and the small size of a shift and add multiplier.

4.2 Background

Websters dictionary defines multiplication as a mathematical operation that at

its simplest is an abbreviated process of adding an integer to itself a specified number of

times. A number (multiplicand) is added to itself a number of times as specified by

14

learn to multiply by placing the multiplicand on top of the multiplier. The multiplicand is

then multiplied by each digit of the multiplier beginning with the rightmost, Least

Significant Digit (LSD). Intermediate results (partial-products) are placed one atop the

other, offset by one digit to align digits of the same weight. The final product is

determined by summation of all the partial-products. Although most people think of

multiplication only in base 10, this technique applies equally to any base, including

binary. Figure 1.2.1 shows the data flow for the basic multiplication technique just

described. Each black dot represents a single digit.

4.2.1 Binary Multiplication

In the binary number system the digits, called bits, are limited to the set. The

result of multiplying any binary number by a single binary bit is either 0, or the original

number. This makes forming the intermediate partial-products simple and efficient.

Summing these partial- products is the time consuming task for binary multipliers. One

logical approach is to form the partial-products one at a time and sum them as they are

generated. Often implemented by software on processors that do not have a hardware

multiplier, this technique works fine, but is slow because at least one machine cycle is

required to sum each additional partial-product.

For applications where this approach does not provide enough performance,

multipliers can be implemented directly in hardware.

4.2.2 Hardware Multipliers

Direct hardware implementations of shift and add multipliers can increase

performance over software synthesis, but are still quite slow. The reason is that as each

additional partial- product is summed a carry must be propagated from the least

significant bit (LSB) to the most significant bit (MSB). This carry propagation is time

15

One method to increase multiplier performance is by using encoding techniques to

reduce the number of partial products to be summed. Just such a technique was first

proposed by Booth. The original Booths algorithm ships over contiguous strings of ls by

using the property that: 2 + 2(n-1) + 2(n-2) + . . . + 2hm) = 2(n+l) - 2(n-m). Although

Booths algorithm produces at most N/2 encoded partial products from an N bit operand,

the number of partial products produced varies. This has caused designers to use modified

versions of Booths algorithm for hardware multipliers. Modified 2-bit Booth encoding

halves the number of partial products to be summed.

Since the resulting encoded partial-products can then be summed using any

suitable method, modified 2 bit Booth encoding is used on most modern floating-point

chips LU 881, MCA 861. A few designers have even turned to modified 3 bit Booth

encoding, which reduces the number of partial products to be summed by a factor of three

IBEN 891. The problem with 3 bit encoding is that the

Carry-propagate addition required to form the 3X multiples often overshadows the

potential gains of 3 bit Booth encoding.

To achieve even higher performance advanced hardware multiplier architectures

search for faster and more efficient methods for summing the partial-products. Most

increase performance by eliminating the time consuming carry propagate additions. To

accomplish this, they sum the partial-products in a redundant number representation. The

advantage of a redundant representation is that two numbers, or partial-products, can be

added together without propagating a carry across the entire width of the number. Many

redundant number representations are possible. One commonly used representation is

known as carry-save form. In this redundant representation two bits, known as the carry

and sum, are used to represent each bit position. When two numbers in carry-save form

are added together any carries that result are never propagated more than one bit position.

This makes adding two numbers in carry-save form much faster than adding two normal

binary numbers where a carry may propagate. One common method that has been

developed for summing rows of partial products using a carry-save representation is the

array multiplier.

4.2.3 Array Multipliers

Conventional linear array multipliers consist of rows of carry-save adders (CSA).

A portion of an array multiplier with the associated routing can be seen in Figure 4.2.

16

In a linear array multiplier, as the data propagates down through the array, each

row of CSAs adds one additional partial-product to the partial sum. Since the

intermediate partial sum is kept in a redundant, carry-save form there is no carry

propagation. This means that the delay of an array multiplier is only dependent upon the

depth of the array, and is independent of the partial-product width. Linear array

multipliers are also regular, consisting of replicated rows of CSAs. Their high

performance and regular structure have perpetuated the use of array multipliers for VLSI

math co-processors and special purpose DSP chips.

The biggest problem with full linear array multipliers is that they are very large.

As operand sizes increase, linear arrays grow in size at a rate equal to the square of the

operand size. This is because the number of rows in the array is equal to the length of the

multiplier, with the width of each row equal to the width of multiplicand. The large size

of full arrays typically prohibits their use, except for small operand sizes, or on special

purpose math chips where a major portion of the silicon area can be assigned to the

multiplier array.

Another problem with array multipliers is that the hardware is underutilized. As

the sum is propagated down through the array, each row of CSAs computes a result only

once, when the active computation front passes that row. Thus, the hardware is doing

useful work only a very small percentage of the time. This low hardware utilization in

conventional linear array multipliers makes performance gains possible through increased

efficiency. For example, by overlapping calculations pipelining can achieve a large gain

in throughput Figure 4.3 shows a full array pipelined after each row of CSAs. Once the

partial sum has passed the first row of CSAs, represented by the shaded row of GSAs in

17

cycle 1, a subsequent multiply can be started on the next cycle. In cycle 2, the first partial

sum has passed to the second row of CMs, and the second multiply, represented by the

cross hatched row of CSAs, has begun. Although pipelining a full array can greatly

increase throughput, both the size and latency are increased due to the additional latches

While high throughput is desirable, for general purpose computers size and latency tend

to be more important; thus, fully pipelined linear array multipliers are seldom found.

We do not always synthesize our multipliers from scratch but may desire, or be

required, to use building blocks such as adders, small multipliers, or lookup tables.

Furthermore, limited chip area and/or pin availability may dictate the use of bit-serial

designs. In this chapter, we discuss such variations and also deal with modular

multipliers, the special case of squaring, and multiply-accumulators.

Divide-and-Conquer Designs

Bit-Serial Multipliers

Modular Multipliers

18

Bit-serial arithmetic is attractive in view of its smaller pin count, reduced wire

length, and lower floor space requirements in VLSI. In fact, the compactness of the

design may allow us to run a bit-serial multiplier at a clock rate high enough to make the

unit almost competitive with much more complex designs with regard to speed. In

addition, in certain application contexts inputs are supplied bit-serially anyway. In such a

case, using a parallel multiplier would be quite wasteful, since the parallelism may not

lead to any speed benefit. Furthermore, in applications that call for a large number of

independent multiplications, multiple bit-serial multipliers may be more cost-effective

than a complex highly pipelined unit.

Bit-serial multipliers can be designed as systolic arrays: synchronous arrays of

processing elements that are interconnected by only short, local wires thus allowing very

high clock rates. Let us begin by introducing a semisystolic multiplier, so named because

its design involves broadcasting a single bit of the multiplier x to a number of circuit

elements, thus violating the short, local wires requirement of pure systolic design.

Figure 4.4 shows a semisystolic 4 x 4 multiplier. The multiplicand a is supplied in

parallel from above and the multiplier x is supplied bit-serially from the right, with its

least significant bit arriving first. Each bit x i of the multiplier is multiplied by a and the

19

result added to the cumulative partial product, kept in carry-save form in the carry and

sum latches. The carry bit stays in its current position, while the sum bit is passed on to

the neighboring cell on the right. This corresponds to shifting the partial product to the

right before the next addition step (normally the sum bit would stay put and the carry bit

would be shifted to the left). Bits of the result emerge serially from the right as they

become available.

A k-bit unsigned multiplier x must be padded with k zeros to allow the carries to

propagate to the output, yielding the correct 2k-bit product. Thus, the semisystolic

multiplier of Figure 4.4 can perform one k x k unsigned integer multiplication every 2k

clock cycles. If k-bit fractions need to be multiplied, the first k output bits are discarded

or used to properly round the most significant k bits.

To make the multiplier of Figure 4.4 fully systolic, we must remove the

broadcasting of the multiplier bits. This can be accomplished by a process known as

systolic retiming, which is briefly explained below

Consider a synchronous (clocked) circuit, with each line between two functional

parts having an integral number of unit delays (possibly 0). Then, if we cut the circuit into

two parts CL and CR, we can delay (advance) all the signals going in one direction and

advance (delay) the ones going in the opposite direction by the same amount without

affecting the correct functioning or external timing relations of the circuit. Of course, the

primary inputs and outputs to the two parts CL and cg must be correspondingly advanced

or delayed, too.

For the retiming to be possible, all the signals that are advanced by d must have

had original delays of d or more (negative delays are not allowed). Note that all the

signals going into CL have been delayed by d time units. Thus, CL will work as before,

except that everything, including output production, occurs d time units later than before

retiming. Advancing the outputs by d time units will keep the external view of the circuit

unchanged.

We apply the preceding process to the multiplier circuit of Figure 4.4 in three

successive steps corresponding to cuts 1, 2, and 3, each time delaying the left-moving

signal by one unit and advancing the right-moving signal by one unit. Verifying that the

multiplier in Fig. 12.9 works correctly is left as an exercise. This new version of our

multiplier does not have the fan-out problem of the design in Figure 4.4 but it suffers

from long signal propagation delay through the four FAs in each clock cycle, leading to

inferior operating speed. Note that the culprits are zero-delay lines that lead to signal

propagation through multiple circuit elements.

20

One way of avoiding zero-delay lines in our design is to begin by doubling all the

delays in Figure 4.4. This is done by simply replacing each of the sum and carry flip-flops

with two cascaded flip-flops before retiming is applied. Since the circuit is now operating

at half its original speed, the multiplier x must also be applied on alternate clock cycles.

The resulting design is fully systolic, inasmuch as signals move only between adjacent

cells in each clock cycle. However, twice as many cycles are needed.

The easiest way to derive a multiplier with both inputs entering bit-serially is to

allow k clock ticks for the multiplicand bits to be put into place in a shift register and then

use the design of Figure 4.4 to compute the product. This increases the total delay by k

cycles.

Figure 4.5 uses dot notation to show the justification for the bit-serial multiplier

design above. Figure 4.5 depicts the meanings of the various partial operands and results.

21

CHAP TER 5

IMPLEMENTATION

5.1 Tools Used

1) Pc installed with linux operating system

2) Installed cadence tools:

1) Create directory structure for the project as below

2) Write RTL code in a text file and save it as .v extension in RTL directory

3) Write code for testbench and store in TB directory

The Commands that are used in cadence for the execution are

1) Initially we should mount the server using mount -a.

2) Go to the C environment with the command csh //c shell.

3) The source file should be opened by the command source /root/cshrc.

4) The next command is to go to the directory of cadence_dgital_labs

#cd .../../cadence_digital_labs/

5) Then check the file for errors by the command ncvlog ../rtl/filename.v -mess.

6) Then execute the file using ncverilog +access +rwc ../rtl/filename.v ../tb/file_tb.v

+nctimescale +1ns/1ps

Rwc read write command Gui- graphical unit interface

7) After running the program we open simulation window by command simvision

&".

22

8) After the simulation the waveforms are shown in the other window.

module fulladder(output reg cout,sum,input a,b,cin,rst);

always@(a,b,cin)

{cout,sum}=a+b+cin;

always@(posedge rst)

begin

sum<=0;

cout<=0;

end

endmodule

23

module full_adder_tb;

wire cout,sum;

reg a,b,cin,rst;

//dut

fulladder fa(cout,sum,a,b,cin,rst);

initial

begin

#2 rst=1'b1;

#(period/2) rst=1'b0;

a=1'b1;

b=1'b0;

cin=1'b1;

#5 a=1'b0;

b=1'b1;

cin=1'b1;

$finish;

end

endmodule

24

module serial_mult(output product,input [3:0] a,input b,clk,rst);

wire s1,s2,s3;

reg s1o,s2o,s3o; //latches for sum at various stages

wire c0,c1,c2,c3;

reg c0o,c1o,c2o,c3o;//latches for carry at various stages

wire a3o,a2o,a1o,a0o;

reg s;

fulladder fa0(c0,product,a0o,s1o,c0o,rst);

fulladder fa1(c1,s1,a1o,s2o,c1o,rst);

fulladder fa2(c2,s2,a2o,s3o,c2o,rst);

fulladder fa3(c3,s3,a3o,s,c3o,rst);

and n0(a0o,a[0],b);

and n1(a1o,a[1],b);

and n2(a2o,a[2],b);

and n3(a3o,a[3],b);

always@(posedge clk, posedge rst)

begin

25

if(rst)

begin

s=0;

c0o<=1'b0;

c1o<=1'b0;

c2o<=1'b0;

c3o<=1'b0;

s1o<=1'b0;

s2o<=1'b0;

s3o<=1'b0;

end

else //moving all sums to reg

begin

c0o<=c0;

c1o<=c1;

c2o<=c2;

c3o<=c3;

s1o<=s1;

s2o<=s2;

s3o<=s3;

end

end

endmodule

module serial_mult_tb;

reg [3:0] a;

reg b;

wire product;

reg clk,rst;

parameter period=10;

serial_mult dut(product,a,b,clk,rst); //dut

//clock

26

initial clk=0;

always #period clk=~clk;

initial

begin

#2 rst=1'b1;

#(period/2) rst=1'b0;

a=4'b1101;

b=1;

@(posedge clk) b=0;

@(posedge clk) b=0;

@(posedge clk) b=1;

@(posedge clk) b=0;

@(posedge clk) b=0;

@(posedge clk) b=0;

@(posedge clk) b=0;

#period $finish;

end

endmodule

27

CHAP TER 6

CONCLUSIONS

Multipliers play an important role in todays digital signal processing and various

other applications. With advances in technology, many researchers have tried and are

trying to design multipliers which offer either of the following design targets high

speed, low power consumption, regularity of layout and hence less area or even

combination of them in one multiplier thus making them suitable for various high speed,

low power and compact VLSI implementation. The common multiplication method is

add and shift algorithm. In parallel multipliers number of partial products to be added is

the main parameter that determines the performance of the multiplier. To reduce the

number of partial products to be added, Modified Booth algorithm is one of the most

popular algorithms. To achieve speed improvements Wallace Tree algorithm can be used

to reduce the number of sequential adding stages. Further by combining both Modified

Booth algorithm and Wallace Tree technique we can see advantage of both algorithms in

one multiplier. However with increasing parallelism, the amount of shifts between the

partial products and intermediate sums to be added will increase which may result in

reduced speed, increase in silicon area due to irregularity of structure and also increased

power consumption due to increase in interconnect resulting from complex routing. On

the

other

hand

to

achieve better

performance for area and power consumption. The selection of a parallel or serial

multiplier actually depends on the nature of application.

A key challenge facing current and future computer designers is to reverse the

trend by removing layer after layer of complexity, opting instead for clean, robust, and

easily certifiable designs, while continuing to try to devise novel methods for gaining

performance and ease-of-use benefits from simpler circuits that can be readily adapted to

application requirements.

This is achieved by using Bit Serial multipliers.

28

REFERENCES

[1] Behrooz Parhami, Computer arithmetic: algorithms and hardware designs, Oxford

University Press, 2009

[2] F. Sadiq M. Sait, Gerhard Beckoff, A Novel Technique for Fast Multiplication.

IEEE Fourteenth Annual International Phoenix Conference on Computers and

Communications, vol. 7803-2492-7, pp. 109-114, 1995.

[3] Ghest, C., Multiplying Made Easy for Digital Assemblies, Electronics, Vol. 44,

pp.56-61. November 22. 1971.

[4] Ienne, P., and M. A. Viredaz, Bit-Seria1 Multipliers and Squarers, IEEE Trans.

Computers, Vol. 43, No. 12, pp. 1445-1450, 1994

[5] Samir Palnitkar, Verilog HDL: A Guide to Digital Design and Synthesis, Prentice

Hall Professional, 2003

29

- SSB Interview Questions & Answers _ SSBCrack _ SSB Interview Tips and PreparationDiunggah olehscribd_shrey
- Verilog Code for Mac UnitDiunggah olehmeaow88
- 8bit Array Multiplier verilog codeDiunggah olehAditya
- Here is the List of FPGA Based VLSI Projects Ideas for Experimenting With VHDL and Verilog HDLDiunggah olehShahid Rabbani
- Bad to the BoneDiunggah olehQuy Bao Hoang
- project report about multipliersDiunggah olehsenthilvl
- SystemVerilog Assertions HandbookDiunggah olehtallurips91
- Verilog ExamplesDiunggah olehSanam Nisar
- Digital Design with Verilog: Course Notes for First EditionDiunggah olehJohn Michael Williams
- HDL Verilog ExamplesDiunggah olehVinod Lk
- Verilog Code for Fir FilterDiunggah olehsivasankarmeae
- Innovations for Potato Chip ProcessingDiunggah olehAxelVanBrs
- Vlsi EconomicsDiunggah olehss_18
- EL5473-Syllabus vlsiDiunggah olehdil17
- Barrel ShifterDiunggah olehVinay Reddy
- Adder Kogge Stone 32bit With Test BenchDiunggah olehYermakov Vadim Ivanovich
- computer generationsDiunggah olehVicky Cool
- New MultivibratorsDiunggah olehgeet_batta
- Report PDFDiunggah olehparticledynamics
- SystemVerilogForVerification_woQuizDiunggah olehKrishna Kishore
- Lec 1_VLSIDiunggah olehJaweria Jaffar Ali
- -HDLDiunggah olehjosesmn
- mukeshDiunggah olehMahesh Sharma
- 6502-apchdl99Diunggah olehmanaj_mohapatra2041
- SemiconductorsDiunggah olehjxcosta8_733753180
- syllabus for vlsi beginnersDiunggah olehvenkatesh kumar
- EC306Diunggah olehmdzakir_hussain
- Electronic CircuitsDiunggah olehMasudRana
- Three DIMENSIONAL-CHIPDiunggah olehKumar Goud.K
- cmosDiunggah olehakshita

- Wireless Home Automation System using ZigBee with Voice RecognitionDiunggah olehInternational Journal for Scientific Research and Development - IJSRD
- Timeline Integrated CircuitDiunggah olehJayson Alva
- Panasonic Quasar Sp 2725f Chasis Sc363 [ET]Diunggah olehfercike
- smart dustDiunggah olehapi-19937584
- Vox VT50 Service ManualDiunggah olehnilzzon
- 1)_PAN101BDiunggah olehshinjikenny
- System on ChipDiunggah olehimcoolsha999
- Digital DiceDiunggah olehSameer Ul Bashir
- Studio 7 Service Manual Rev1Diunggah olehAlfonso Sanchez Verduzco
- Accounts Notes for 2010 - 11Diunggah olehAMIN BUHARI ABDUL KHADER
- Power IC in the SaddleDiunggah olehErminsul Vicuña Salas
- Operational AmplifierDiunggah olehMarcus Mills
- Technothlon sample Paper-1 + SolutionDiunggah olehNilesh Gupta
- ELECTRIC LINEMAN PROTECTION USING KEYPAD AND GSM BASED CIRCUIT BREAKERDiunggah olehIJIERT-International Journal of Innovations in Engineering Research and Technology
- 2910484 VLSI CMOS Interview QuDiunggah olehvlsiQ
- Autonomous Car , Seminar Reports _ PPT _ PDF _ DOC _ Presentation _ - Seminar Report,PDF,PPT,Doc,Topics,Free DownloadDiunggah olehകൂട്ടുകാരിയെ സ്നേഹിച്ച കൂട്ടുകാരൻ
- Device architectures for the 5nmDiunggah olehGabriel Donovan
- Computer Fundamentals AssignmentDiunggah olehSabrish
- Xabre (7th Semester) Project ReportDiunggah olehAbhijit Karnik
- 32LB551A.pdfDiunggah olehRene G Orama
- An Overview of VLSIDiunggah olehksgcbpur
- Lecture 11 - CpE 690 Introduction to VLSI DesignDiunggah olehjvandome
- Second Generation ComputersDiunggah olehJohn James Antipolo
- Fod 3180Diunggah olehgsergwesr
- Clockless ChipsDiunggah olehGopalkrushna Behera
- 3d siliconDiunggah olehFawaz Bin Abdulla
- Carry Select AdderDiunggah olehKiran Kumar
- Manual Panasonic de QsDiunggah olehtinguiblue
- patent specification-radiant power chip finalDiunggah olehapi-243297839
- Design Techniques For EMC CompleteDiunggah olehc_rodolfo_rodrigues5109

## Lebih dari sekadar dokumen.

Temukan segala yang ditawarkan Scribd, termasuk buku dan buku audio dari penerbit-penerbit terkemuka.

Batalkan kapan saja.