Anda di halaman 1dari 26

Dr.Y.Narasimmha Murthy Ph.

D
yayavaram@yahoo.com

UNIT III : CASE STUDIES

[CPLD & FPGA ARCHITECTURE & APPLICATIONS]

INTRODUCTION:

The Field Programmable Gate Arrays consist of an array of programmable logic blocks
including general logic, memory and multiplier blocks, surrounded by a programmable routing
fabric that allows blocks to be . The array is surrounded by programmable input/output blocks,
labeled I/O in the figure, that connect the chip to the outside world. Here the term
programmable indicates an ability to program a function into the chip after completion of
silicon fabrication . This is possible by the programming technology, which is a method that
can cause a change in the behavior of the pre-fabricated chip after fabrication, in the field,
where system users create designs. The first programmable logic devices used very small fuses
as the programming technology. Every FPGA depends on a programming technology that is
used to control the programmable switches that give FPGAs their programmability.
Programming Technologies

There are a number of programming technologies that have been used for reconfigurable
architectures. Each of these technologies have different characteristics and have significant effect
on the programmable architecture. Some of the well-known technologies are

(i).SRAM Based Programming Technology (ii).Flash Programming Technology(EEPROM) ,


and (iii) Anti-fuse based Programming Technology

SRAM-Based Programming Technology

Static memory cells are the basic cells used for SRAM-based FPGAs. Most commercial vendors like
XILINX, Lattice and Altera etc.. use static memory (SRAM) based programming technology in their
devices. These devices use static memory cells which are divided throughout the FPGA to provide
configurability. An example of such memory cell is shown below .In an SRAM-based FPGA, SRAM
cells are mainly used for following purposes

(i). To program the routing interconnect of FPGAs which are generally steered by small multiplexors.

1
Dr.Y.Narasimmha Murthy Ph.D
yayavaram@yahoo.com

(ii). To program Configurable Logic Blocks (CLBs) that are used to implement logic functions.

There are two primary uses for the SRAM cells. Most are used to set the select lines to
multiplexers that steer interconnect signals. The majority of the remaining SRAM cells are used
to store the data in the lookup-tables (LUTs) that are typically used in SRAM-based FPGAs to
implement logic functions. Historically, SRAM cells were used to control the tri-state buffers
and simple pass transistors that were also used for programmable interconnect.
SRAM-based programming technology has become the dominant approach for FPGAs because
of its re-programmability and the use of standard CMOS process technology and therefore
leading to increased integration, higher speed and lower dynamic power consumption of new
process with smaller geometry.

There are however a number of drawbacks associated with SRAM-based programming


technology. For example an SRAM cell requires 6 transistors which makes this technology
costly in terms of area compared to other programming technologies.

Further SRAM cells are volatile in nature and external devices are required to permanently store
the configuration data. These external devices add to the cost and area overhead of SRAM-based
FPGAs.

There is a problem in terms of security of data also. Since the configuration information must be
loaded into the device at power up, there is the possibility that the configuration information

2
Dr.Y.Narasimmha Murthy Ph.D
yayavaram@yahoo.com

could be intercepted and stolen for use in a competing system. To overcome this problem certain
encryption techniques are followed.
Electrical properties of pass transistors are not ideal. i.e SRAM-based FPGAs typically rely on
the use of pass transistors to implement multiplexers. However, they are far from ideal switches
as they have significant on-resistances and present an appreciable capacitive load. As FPGAs
migrate to smaller device geometries these issues may be exacerbated.
Flash Programming Technology

An important alternative to the SRAM-based programming technology is the use of flash or


EEPROM based programming technology. This technology inject charge onto a gate that
floats above the transistor. This approach is used in flash or EEPROM memory cells. These
cells are non-volatile; they do not lose information when the device is powered down. With
modern IC fabrication processes, it has become possible to use the floating gate cells directly as
switches. Flash memory cells, in particular, are now used because of their improved area
efficiency. The widespread use of flash memory cells for non-volatile memory chips ensures that
flash manufacturing processes will benefit from steady decreases in process geometries.

Flash-based programming technology offers several advantages. For example, this programming
technology is nonvolatile in nature. Flash-based programming technology is also more area
efficient than SRAM-based programming technology. Flash-based programming technology has
its own disadvantages also. Unlike SRAM-based programming technology, flash based devices
cannot be reconfigured/reprogrammed an infinite number of times. Also, flash-based technology
uses non-standard CMOS process.

This flash-based programming technology offers several unique advantages, most importantly
non-volatility. This feature eliminates the need for the external resources required to store and
load configuration data when SRAM-based programming technology is used. Additionally,
a flash-based device can function immediately upon power-up instead of having to wait for the
loading of configuration data. The flash approach is also more area efficient than SRAM-based
technology which requires up to six transistors to implement the programmable storage. The
programming circuitry, such as the high and low voltage buffers needed to program the cell,
contributes an area overhead not present in SRAM-based devices. However, this cost is
relatively modest as it is amortized across numerous programmable elements. In comparison to

3
Dr.Y.Narasimmha Murthy Ph.D
yayavaram@yahoo.com

anti-fuses, an alternative non-volatile programming technology, flash-based FPGAs are


reconfigurable and can be programmed without being removed from a printed circuit board. The
use of a floating-gate to control the switching transistor adds design complexity because care
must be taken to ensure the sourcedrain voltage remains sufficiently low to prevent charge
injection into the floating gate . Since newer processes require lower voltage levels, this issue
may become less of a concern in the future .One disadvantage of flash-based devices is that they
cannot be reprogrammed an infinite number of times. Charge buildup in the oxide eventually
prevents a flash-based device from being properly erased and programmed . Devices such as the
Actel ProASIC3 are useful for only 500 programming cycles . For most of the uses of FPGAs
,this programming count is more than sufficient. In many cases FPGAs are programmed for only
one use. Another significant disadvantage of flash devices is the need for a non-standard CMOS
process. Also, like the static memory-based technology, this programming technology suffers
from relatively high resistance and capacitance due to the use of transistor-based switches. One
trend that has recently emerged is the use of flash storage in combination with SRAM
programming technology.
In devices from Altera, Xilinx and Lattice, on-chip flash memory is used to provide non-
volatile storage while SRAM cells are still used to control the programmable elements in the
design. This addresses the problems associated with the volatility of pure-SRAM approaches,
such as the cost of additional storage devices or the possibility of configuration data interception,
while maintaining the infinite re-configurability of SRAM-based devices.
It is important to recognize that, since the programming technology is still based on SRAM cells,
the devices are no different than pure-SRAM based devices from an FPGA architecture
standpoint. However, the incorporation of flash memory generally means that the processing
technology will not be as advanced as pure-SRAM devices. Additionally, the devices incur more
area overhead than pure-SRAM devices since both flash and SRAM bits are required for every
programmable element.
Anti-fuse Programming Technology
An alternative to SRAM and floating gate-based technologies is anti fuse programming
technology. This technology is based on structures which exhibit very high-resistance under
normal circumstances but can be programmably blown (in reality, connected) to create a low
resistance link.

4
Dr.Y.Narasimmha Murthy Ph.D
yayavaram@yahoo.com

An anti-fuse is a two terminal device with an unprogrammed state presenting a very high

resistance between its terminals. When a high voltage (from 11 to 20 volts, depending on the

type of anti-fuse) is applied across its terminals the anti-fuse will blow and create a low

resistance link. This link is permanent. Anti-fuses in use today are built either using an Oxygen-

Nitrogen-Oxygen (ONO) dielectric between N+ diffusion and poly-silicon or amorphous silicon

between metal layers or between polysilicon and the first layer of metal.

Programming an anti-fuse requires extra circuitry to deliver the high programming voltage and a
relatively high current of 5 mA or more. This is done in through fairly sizable pass transistors to
provide addressing to each anti-fuse. Anti-fuse technology is used in the FPGAs from Actel ,
Quick logic , and Cross point.
A major advantage of the anti-fuse is its small size, little more than the cross-section of two
metal wires. But this advantage is limited by the large size of the necessary programming
transistors, which handle large currents, and the inclusion of isolation transistors that are
sometimes needed to protect low voltage transistors from high programming voltages.
A second major advantage of an anti-fuse is its relatively low series resistance. The on-resistance
of the ONO anti-fuse is 300 to500 ohms, while the amorphous silicon anti-fuse is 50 to100 ohms.
Additionally, the parasitic capacitance of an un programmed amorphous anti-fuse is significantly
lower than for other programming technologies.
The limitations of this technology are , this technology does not make use of standard CMOS
process. Also, anti-fuse programming technology based devices cannot be reprogrammed. The
ideal technology should be re-programmable, non-volatile, and that uses a standard CMOS
process. But it is clear that none of the above technologies satisfy these conditions.
However, SRAM-based programming technology is the most widely used programming
technology. The main reason is its use of standard CMOS process .Due to this reason it is
expected that this technology will continue to dominate the other two programming technologies.

5
Dr.Y.Narasimmha Murthy Ph.D
yayavaram@yahoo.com

Comparison of Programming Technologies

Programming Re-Programmable Volatile Series Capacitance Cell Area


Technology Storage Resistance in pf
Static RAM In-circuit Yes 1K 15 5X
Anti-Fuse No No 50-500 1.2 5.0 1X
EPROM Outside circuit No 2 K 10 1X
EEPROM In-Circuit No 2 K 10 2X

XILINX XC3000 FPGA Device

Xilinx introduced the first FPGA family, called the XC2000 series, in 1985 and next offered
three more series of FPGAs namely XC3000, XC4000, and XC5000 etc. The first modern-era
FPGA was introduced with 64 logic blocks and 58 inputs and outputs. XC3000 series of FPGA
devices were introduced in 1985 by XILINX Inc.This was the most successful family of FPGAs.
The XC3000 archtecture includes enhancements to the XC2000 architecture to improve
performance ,density and usability. The XC3000 architecture was developed with manual tools
for design implementation and the architecture also shows a bias towards manual design. The
XC3000 Family covers a range of nominal device densities from 2,000 to 9,000 gates, practically
achievable densities from 1,000 to 6,000 gates with up to 144 user-definable I/Os. Device
speeds, described in terms of maximum guaranteed toggle frequencies, range from 70 to 125
MHz. The XC3000 Configurable Logic block is substantially larger than XC2000 and Each of
the lookup tables has four inputs and requires 16 bits of configuration memory.
The two lookup tables can be combined with a multiplexer to produce any function of five inputs
and some functions of up to seven inputs.The XC3000 archtecture allows faster logic
implementation with minimum CLBs in series.

There are now four distinct familes within the XC3000 Series of FPGA devices

XC3000A Family
XC3000L Family
XC3100A Family
XC3100L Family

6
Dr.Y.Narasimmha Murthy Ph.D
yayavaram@yahoo.com

All four families share a common architecture, development software, design and programming
methodology, and also common package pin-outs.
XC3000A Family : The XC3000A is an enhanced version of the basic XC3000 family,
featuring additional interconnect resources and other user-friendly enhancements.
XC3000L Family : The XC3000L is identical in architecture and features to the XC3000A
family, but operates at a nominal supply voltage of 3.3 V. The XC3000L is the right solution for
battery-operated and low-power applications.

XC3100A Family The XC3100A is a performance-optimized relative of the XC3000A


family. While both families are bit stream and footprint compatible, the XC3100A family
extends toggle rates to 370 MHz and in-system performance to over 80 MHz. The XC3100A
family also offers one additional array size, the XC3195A.
XC3100L Family The XC3100L is identical in architectures and features to the XC3100A
family, but operates at a nominal supply voltage of 3.3V

The basic LCA (Logic Cell Array) of XC3000 consists of three components .They are
Programmable I/O Blocks , Configurable Logic Block and Programmable Interconnect. In
addition to this a small amount of configurable memory is also present .
Programmable I/O Block

Each user-configurable IOB as shown below, provides an interface between the external
package pin of the device and the internal user logic. Each IOB includes both registered and
direct input paths. Each IOB provides a programmable3-state output buffer, which may be driven
by a registered or direct output signal. Configuration options allow each IOB an inversion, a
controlled slew rate and a high impedance pull-up. Each input circuit also provides input
clamping diodes to provide electrostatic protection, and circuits to inhibit latch-up produced by
input currents.

7
Dr.Y.Narasimmha Murthy Ph.D
yayavaram@yahoo.com

Each IOB includes input and output storage elements and I/O options selected by configuration
memory cells. A choice of two clocks is available on each die edge. The polarity of each clock
line (not each flip-flop or latch) is programmable. A clock line that triggers the flip-flop on the
rising edge is an active Low Latch Enable (Latch transparent) signal and vice versa. Passive pull-
up can only be enabled on inputs, not on outputs. All user inputs are programmed for TTL or
CMOS thresholds.
Configurable Logic Block.

Each CLB includes a combinatorial logic section, two flip-flops and a program memory
controlled multiplexer selection of function. It has the following components
Five logic variable inputs A, B, C, D, and E
a direct data in DI
an enable clock EC
a clock (invertible) K
an asynchronous direct RESET RD
Two outputs X and Y.

8
Dr.Y.Narasimmha Murthy Ph.D
yayavaram@yahoo.com

XC3000 CLB
Each CLB has a combinatorial logic section, two flip-flops, and an internal control section. The
CLB has five logic inputs (A, B, C, D and E) ; a common clock input(K); an asynchronous
direct RESET input (RD) and an enable clock (EC) as shown in the block diagram. Each CLB
also has two outputs (X and Y) which may drive interconnect networks. Data input for the flip-
flops within a CLB is supplied from the function F or G outputs of the combinatorial logic, or
the block input, DI. Both flip-flops in each CLB share the asynchronous RD which, when
enabled , is dominant over clocked inputs. All flip-flops are reset by the active-Low chip input,
RESET, or during the configuration process. The flip-flops share the enable clock (EC) which,
when Low, re circulates the flip-flops present states and inhibits response to the data-in or
combinatorial function inputs on a CLB. The user may enable these control inputs and select
their sources. The user may also select the clock net input (K), as well as its active sense within
each CLB. This programmable inversion eliminates the need to route both phases of a clock
signal throughout the device.
Programmable Interconnect :

Programmable-interconnection resources in the Field Programmable Gate Array provide routing


paths to connect inputs and outputs of the IOBs and CLBs into logic networks. Interconnections

9
Dr.Y.Narasimmha Murthy Ph.D
yayavaram@yahoo.com

between blocks are composed of a two-layer grid of metal segments. Specially designed pass
transistors, each controlled by a configuration bit, form programmable interconnect points (PIPs)
and switching matrices used to implement the necessary connections between selected metal
segments and block pins.
Three types of metal resources are provided to accommodate various network interconnect
requirements.
General Purpose Interconnect
Direct Connection
Long lines (multiplexed busses and wide AND gates)

XC3000 Interconnect

XILINX XC4000 FPGA Device : The XC4000 features a Configurable Logic Block (CLB)
that is based on look-up tables (LUTs). A LUT is a small one bit wide memory array, where the
address lines for the memory are inputs of the logic block and the one bit output from the
memory is the LUT output. A LUT with K inputs would then correspond to a 2K x 1 bit memory
and can realize any logic function of its K inputs by programming the logic functions truth table
directly into the memory. The XC4000 CLB contains three separate LUTs, in the configuration

10
Dr.Y.Narasimmha Murthy Ph.D
yayavaram@yahoo.com

as shown below. There are two 4-input LUTS that are fed by CLB inputs, and the third LUT can
be used in combination with the other two. This arrangement allows the CLB to implement a
wide range of logic functions of up to nine inputs, two separate functions of four inputs or other
possibilities. Each CLB also contains two flip-flops.

Xilinx XC4000 Configurable Logic Block (CLB).

To provide high density devices that support the integration of entire systems, the XC4000
chips have system oriented features. For example, each CLB contains circuitry that allows it to
efficiently perform arithmetic (i.e., a circuit that can implement a fast carry operation for adder-
like circuits) and also the LUTs in a CLB can be configured as read/write RAM cells. A new
version of this family, the 4000E, has the additional feature that the RAM can be configured as a
dual port RAM with a single write and two read ports. In the 4000E, RAM blocks can be
synchronous RAM. Also, each XC4000 chip includes very wide AND-planes around the
periphery of the logic block array to facilitate implementing circuit blocks such as wide
decoders.

11
Dr.Y.Narasimmha Murthy Ph.D
yayavaram@yahoo.com

The other important feature of this FPGA is its interconnect structure. The XC4000
interconnect is arranged in horizontal and vertical channels. Each channel contains some number
of short wire segments that span a single CLB (the number of segments in each channel depends
on the specific part number), longer segments that span two CLBs, and very long segments that
span the entire length or width of the chip. Programmable switches are available to connect the
inputs and outputs of the CLBs to the wire segments, or to connect one wire segment to another..
The figure below shows only the wire segments in a horizontal channel, and does not show the
vertical routing channels, the CLB inputs and outputs, or the routing switches. The salient feature
about the Xilinx interconnect is that signals must pass through switches to reach one CLB from
another, and the total number of switches traversed depends on the particular set of wire
segments used. Thus, speed-performance of an implemented circuit depends in part on how the
wire segments are allocated to individual signals by CAD tools.

Actel FPGAs

In contrast to XILINX FPGAs the devices manufactured by Actel are based on anti fuse
technology. Actel offers three main families .They are : Act 1, Act 2, and Act 3. Actel devices
are based on a structure similar to traditional gate arrays; the logic blocks are arranged in rows
and there are horizontal routing channels between adjacent rows. This architecture is shown in
figure below. The logic blocks in the Actel devices are relatively small in comparison to the LUT
based ones. , and are based on multiplexers. The figure illustrates the logic block in the Act 3
and shows that it comprises an AND and OR gate that are connected to a multiplexer based

12
Dr.Y.Narasimmha Murthy Ph.D
yayavaram@yahoo.com

circuit block. The multiplexer circuit is arranged such that, in combination with the two logic
gates, a very wide range of functions can be realized in a single logic block. About half of the
logic blocks in an Act 3 device also contain a flip-flop.

Actel FPGA structure.

Actels interconnect is organized in horizontal routing channels. The channels consist of wire
segments of various lengths with anti-fuses to connect logic blocks to wire segments or one wire
to another. Also, Actel chips have vertical wires that overlay the logic blocks, for signal paths
that span multiple rows. In terms of speed-performance, it is evident that Actel chips are not fully
predictable, because the number of anti-fuses traversed by a signal depends on how the wire

13
Dr.Y.Narasimmha Murthy Ph.D
yayavaram@yahoo.com

segments are allocated during circuit implementation by CAD tools. However, Actel provides a
rich selection of wire segments of different length in each channel and has developed algorithms
that guarantee strict limits on the number of anti-fuses traversed by any two-point connection in
a circuit which improves speed-performance significantly.

Quicklogic pASIC FPGAs :


The Quicklogic is the main competitor for Actel in anti-fuse -based FPGAs . It produces two
families of devices, called pASIC and pASIC-2. The pASIC-2 is an enhanced version of
pASIC. The pASIC, consists of a regular two-dimensional array of blocks called pASIC Logic
Blocks (pLBs).The logic capacities of first generation of Quick Logic FPGAs is between 48 and
380pLBs,or 500 to 4000 equivalent MPGAs gates.
As shown in figure below pASIC has similarities to other FPGAs i.e the overall structure is
array-based like Xilinx FPGAs, and logic blocks use multiplexers similar to Actel FPGAs, and
the interconnect consists of only long- lines like in Altera FLEX 8000. It is to be noted that the
pASIC architecture is now independently developed by Cypress also.

Structure of Quicklogic pASIC FPGA.


It consists of a top layer of metal, an insulating layer of amorphous silicon, and a bottom layer of
metal. When compared to Actels PLICE anti-fuse, Via Link offers a very low on-resistance of
about 50 ohms (PLICE is about 300 ohms) and a low parasitic capacitance. The Via Link anti-

14
Dr.Y.Narasimmha Murthy Ph.D
yayavaram@yahoo.com

fuses are present at every crossing of logic block pins and interconnect wires, providing generous
connectivity.

Quicklogic (Cypress) Logic Cell


pASICs multiplexer-based logic block is shown in the above figure. It is more complex than
Actels Logic Module, with more inputs and wide (6-input) AND-gates on the multiplexer select
lines. Every logic block also contains a flip- flops.
Altera FLEX 8000 and FLEX 10000 FPGAs :

The first FPGA chips from Aletra were simple arrays of logic cells ,which are relatively simple
logic elements (LEs),each element comprising of a three input look-up table (LUT ) to generate
logic functions ,a single configurable flip-flop and multiplexers for routing the signals and
selecting clocks. The logic cells were connected by switch boxes instead of fixed interconnect.
The general architecture of Alteras FPGAs is shown in the diagram below .

15
Dr.Y.Narasimmha Murthy Ph.D
yayavaram@yahoo.com

There are two high performance FPGA series called FLEX series. Alteras FLEX 8000 series
consists of a three-level hierarchy similar to CPLDs. However, the lowest level of the hierarchy
consists of a set of lookup tables, rather than an SPLD like block, and so the FLEX 8000 is
categorized here as an FPGA. It should be noted, however ,that FLEX 8000 is a combination of
FPGA and CPLD technologies. FLEX 8000 is SRAM-based and features a four-input LUT as its
basic logic block. Logic capacity ranges from about 4000gates to more than 15,000 for the 8000
series.
The architecture of FLEX 8000 is shown in figure below. The basic logic block, called a Logic
Element (LE) contains a four-input LUT, a flip-flop, and special-purpose carry circuitry for
arithmetic circuits (similar to Xilinx XC 4000). The LE also includes cascade circuitry that
allows for efficient implementation of wide AND functions.
In the FLEX 8000, LEs are grouped into sets of 8, called Logic Array Blocks (LABs, a term
borrowed from Alteras CPLDs). As shown in Figure below each LAB contains local
interconnect and each local wire can connect any LE to any other LE within the same LAB.

Architecture of Altera FLEX 8000 FPGAs.

16
Dr.Y.Narasimmha Murthy Ph.D
yayavaram@yahoo.com

Altera FLEX 8000 Logic Element (LE).

Local interconnect also connects to the FLEX 8000s global interconnect, called Fast Track. Fast
Track is similar to Xilinx long lines in that each Fast Track wire extends the full width or height
of the device. However, a major difference between FLEX 8000 and Xilinx chips is that Fast
Track consists of only long lines. This makes the FLEX 8000 easy for CAD tools to
automatically configure. All Fast-Track wires horizontal wires are identical, and so interconnect
delays in the FLEX 8000 are more predictable than FPGAs that employ many smaller length
segments because there are fewer programmable switches in the longer paths. Predictability is
furthered aided by the fact that connections between horizontal and vertical lines pass through
active buffers.
The FLEX 8000 architecture has been extended in the state-of-the-art FLEX 10000 family.
FLEX 10000 offers all of the features of FLEX 8000, with the addition of variable-sized blocks
of SRAM, called Embedded Array Blocks (EABs) which shows that each row in a FLEX 10000
chip has an EAB on one end. Each EAB is configurable to serve as an SRAM block with a
variable aspect ratio: 256 x 8, 512 x 4, 1K x 2, or 2K x 1. In addition, an EAB can alternatively
be configured to implement a complex logic circuit, such as a multiplier, by employing it as a
large multi-output lookup table. Altera provides, as part of their CAD tools, several macro-
functions that implement useful logic circuits in EABs. Counting the EABs as logic gates, FLEX
10000 offers the highest logic capacity of any FPGA, although it is hard to provide an accurate
number.

17
Dr.Y.Narasimmha Murthy Ph.D
yayavaram@yahoo.com

Concurrent Logic FPGA Device : The manufacturer Concurrent Logic offers the CFA6006
FPGA device ,which is based on two dimensional array of identical blocks ,where each block is
symmetrical on its four sides. The array holds 3136 of such blocks ,providing a total logic
capacity of about 5000 equivalent gates. Connections are formed using multiplexers that are
configured by a static RAM programming technology.

The structure of the Concurrent Logic Block is shown below diagram. It comprises of user
configurable multiplexers, basic gates and a D type flip-flop .The concurrent FPGA is especially
suitable for register-intensive and arithmetic applications since the logic block can easily
implement a half-adder and a register bit.

There are two direct connections A and B formed by routing signals through the multiplexers
within the blocks.Long connection is implemented using a bussing network, in which wires of
various lengths are superimposed on the array of logic blocks.

Crosspoint Solutions FPGAs:

The crosspoint FPGAs are different from other FPGAs because it is configurable at the
transistor level as aoposed to logic block level in other FPGAs.Basically the architecture
consists of rows of transistor pairs ,where the rows are separated by horizontal wiring segments
.Veritical wiring segments are also available ,for connection among the rows.

18
Dr.Y.Narasimmha Murthy Ph.D
yayavaram@yahoo.com

Each transistor row comprises two lines of series connected transistors ,with one line being
NMOS and the other PMOS .The wiring resources allow individual transistor pairs tobe
interconnected to implement CMOS logic gates. The programming technology used for the
programmable switches is similar to the Via-Link anti-fuse ,which is based on amorphous
silicon.

The structure of the transistor pair rows is shown in below diagram.The diagram shows the
implementation of a NOR agte and a NAND gate using the transistor lines. The transistor gates
,drains , sources can be programmable interconnected to other transistors and also to power and
ground.The series connections across the lines is broken where necessary by permanently
holding a transistor in its OFF state. A wide range of logic gates can be implemented by the
transistor lines and the interconnection patterns.

The FPGAs currently offered by Crosspoint Solutions has a total logic capacity of 4200
gates.The chip has 256 rows of transistor pairs and an additional 64-rows of multiplexer like
structures are provided.With its rows based architecture ,anti-fuse programming technology and
multiplexers ,the Crosspoint FPGAs are most similar to those of Actel FPGAs.

ALGOTRONIX CAL-1024
This design has a two-dimensional mesh array structure which resembles the gatearray sea of
gates architecture previously identified in Figure . Like the Xilinx architecture, Algotronics
used Static RAM programming technology to specify the function performed by each logic cell

19
Dr.Y.Narasimmha Murthy Ph.D
yayavaram@yahoo.com

and to control the switching of connections between cells. The CAL1024 design contains 1024
identical logic cells arranged in a 32 X 32 matrix. The design is considered to be a mesh-
connected architecture since each cell is directly connected to its nearest north, south, east, and
west neighbors. In addition to these direct connects, two global interconnect signals are routed to
each cell to distribute clock and other low skew requirement control signals. Figure 19 shows
the basic array architecture, indicating both nearest neighbor and global connections to the logic
cells. In addition to these logical connections, row select lines and bit select lines which are not
shown on the figure are connected to program each cells SRAM bits.

ALGOTRONIX Array Architecture


The basic building block of the Algotronix design is a configurable cell containing multiplexers
and a function unit. As indicated in the figure , the function unit is preceded by multiplexers

20
Dr.Y.Narasimmha Murthy Ph.D
yayavaram@yahoo.com

which select the source for the X1 and X2 inputs. The function unit is capable of generating any
logic function of the two inputs, or of operating as a D-type latch. Not shown in the figure are
four additional multiplexers which select the function output or one of the external inputs for
routing to each of the four outputs (north, south, east, and west).

A unique feature in the Algotronix I/O pad design is its capability to provide simultaneous input
and output on the same pin when communicating with another Algotronix chip. This is done
through a 3-level (ternary) logic signaling scheme in which I/O pads sense whenever two outputs
are driving each other via a contention scheme. Even during contention, the pad can deduce the
correct input value and pass it along to the internal circuitry. This makes it easier to partition a
single design across multiple FPGAs because the increased connectivity reduces pin limitations
on communications bandwidth.
AMD Mach : AMD offers a CPLD family comprising five subfamilies calledMach. Each Mach
device consists of multiple PAL-like blocks (or optimizedPALs). Mach 1 and 2 consist of
optimized22V16 PALs, Mach 3 and 4 consist of several optimized 34V16 PALs,and Mach 5 is
similar to Mach 3 and 4but offers enhanced speed performance .All Mach chips use EEPROM
technology, and together the five subfamilies provide a wide range of selection ,from small,
inexpensive chips to larger, state-of-the-art ones. We will focus on Mach 4 because it represents
the most advanced currently available parts in the family.

21
Dr.Y.Narasimmha Murthy Ph.D
yayavaram@yahoo.com

Figure (a) below depicts a Mach 4 chip, showing the multiple 34V16 PAL-like blocks and the
interconnect, called the central switch matrix. The in-circuit programmable chips range in size
from6 to 16 PAL-like blocks, corresponding roughly to 2,000 to 5,000 equivalent gates. All
connections between PAL-like blocks (even from a PAL-like block to itself) pass through the
central switch matrix. Thus, the device is not merely a collection of PAL-like blocks but a single
,large device. Since all connections travel through the same path, circuit timing delays are
predictable. Figure (b) illustrates a Mach 4 PAL-like block. It has 16 outputs and a total of
34inputs (16 of which are the fed-back outputs), so it corresponds to a 34V16 PAL. However,
there are two key differences between this block and a normal PAL:1) a product term (PT)
allocator between the AND plane and the macro cells (the macro cells comprise an OR gate, an
EXOR gate, and a flip-flop), and2) an output switch matrix between the OR gates and the I/O
pins. These features make a Mach 4 chip easier to use because they decouple sections of the
PAL-like block. More specifically, the product term allocator distributes and shares product
terms from the AND plane to OR gates that require them, allowing much more flexibility than
thefixed-size OR gates in regular PALs. The output switch matrix enables any macrocell output
(OR gate or flip-flop)to drive any I/O pin connected to the PAL-like block, again providing
greater flexibility than a PAL, in which each macro cell can drive only one specific I/O pin.
Mach 4s combination of in-system programmability and high flexibility allow easy hardware
design changes.

22
Dr.Y.Narasimmha Murthy Ph.D
yayavaram@yahoo.com

AMD Mach 4 structure

FPGA Design Flow:


23
Dr.Y.Narasimmha Murthy Ph.D
yayavaram@yahoo.com

The earlier PLD and FPGA designs were performed largely by hand But to-days
complex programmable logic devices requires the use of an integrated Computer-Aided Design
(CAD) system. Both commercial CAD tool vendors and FPGA companies offer appropriate
tools. For example, traditional Electronic Design Automation (EDA) vendors such as Cadence,
Mentor Graphics, Synopsys, and View Logic etc. offer tools to support FPGA design. These
tools are typically used for the front-end design entry and simulation operations and provide the
necessary interfaces to vendor-specific back-end tools for chip placement and routing.
Examples of vendor specific tools are the Xilinx XACT system and the Altera
MAX+PLUS II software.The Alteras MAX+PLUS II software supports the entire design flow
on either PC or workstation platforms.
The first step in the design process is the description of the logic circuit,which can be done
either by schematic capture tool or with Boolean expressions.This is followed by a translation
that converts the original circuit description into a standard format used by the suitable CAD
tools (Ex: XILINX CAD tools).The circuit is then passed through CAD programs that partition it
into appropriate logic blocks. Select a specific location in the FPGA for each logic block and
form the required interconnections.( (Cadence, View Logic, OrCAD, etc.)

The performance of the implemented circuit can then be checked and its functionality is
verified.Finally a bitmap is generated and downloaded in a serial fashion to configure the FPGA.

Initial Design Entry: The detailed description of the logic circuit are entered using a schematic
capture program. In the design entry phase, RTL or schematic entry is used to create the logic
to be implemented in the device. Pin assignments can also be made, including pin placement
information, and timing constraints that might be necessary for building a functioning design. In
the design entry step a schematic or Block Design File (.bdf) is created that is the top-level
design. The library of parameterized modules (LPM) functions are added and Verilog HDL
code is used to add a logic block.

The library may be either supplied by the vendor of the schematic capture program or any FPGA
vendor(Like Xilinx or Altera etc.) .An alternate way to specify the logic circuit is to use a
Boolean expression or state machine language.This is done without the graphical interface.Some
times it is possible to use a mixture of both schematic and Boolean expressions.

24
Dr.Y.Narasimmha Murthy Ph.D
yayavaram@yahoo.com

Translation to XNF Format: After the logic circuit is successfully designed and merged into
one circuit ,it is translated into a special format that is understood by the CAD tools.Foe Xilinx
this format is called Xilinx net list format or XNF.This translation utility is supported by the
Xilinx or by the vendor of the logic entry tool.The translation process may also involve
automatic optimizations of the circuit.

Partition: The XNF circuit is partitioned into logic cells (this partition is also known as
Technology Mapping). This technology mapping converts the XNF circuit which is a net list of
basic logic gates ,into a net list of Xilinx logic cells.The logic cell used depends on which Xilinx
product the circuit is to be implemented in. XACT tools also attempt to optimize the circuit
during this step. For example, circuitry associated with unused logic block inputs or outputs is
eliminated from the design. In addition, the partitioning program attempts to minimize either the
total number of CLBs used or the number of logic stages in the critical delay path. The mapping
procedure attempts to optimize the resulting circuit, either to minimize the total of logic cells
required or the number of stages of logic cells in time critical circuitry.

Place and Route: This step is performed by using the CAD tools, manually by the user or
mixture of the two. The first step is placement ,in which each logic cell generated during the
partition step is assigned to a specific location in the FPGA. Automatic placement can be done
using the simulated annealing algorithm.

After the placement ,the required interconnections among the logic cells must be realized by
selecting wire segments and routing switches within the FPGA interconnection resources.An
automatic routing algorithm is used for this task which is based on Maze routing algorithm.
Generally this routing and placement must be done automatically but sometimes it is done
manually by the user also. With the physical placement and routing completed, exact timing
values can now be used to determine chip performance. The XACT tools provide a critical path

25
Dr.Y.Narasimmha Murthy Ph.D
yayavaram@yahoo.com

timing analyzer which provides delay information on the longest through shortest paths through
the chip.In addition, the physical layout timing information can also be back-annotated to the
schematics to get more accurate functional simulation results. The final step in the Xilinx design
flow is the creation of the BIT file which contains the binary programming data needed to
configure the SRAM bits of the target chip. This file is then downloaded to configure the chip
for final functional and timing tests of the programmed chip.

After creating the design it must be compiled. Compilation converts the design into a bitstream
that can be downloaded into the FPGA. The most important output of compilation is an SRAM
Object File (.sof), which is used to program the device. The software also generates other report
files that provide information about the code as it compiles

In the design flow process the simulation is very important to learn, and there are entire
applications devoted to simulating hardware designs. There are two types of simulation, RTL
and timing. RTL (or functional) simulation allows you to verify that your code is place-and-
route) simulation verifies that the design meets timing and functions appropriately in the device.

After completion of the design ,its performance is checked either by downloading the
configuration bits into FPGA or by using an interface to a timing simulation program.If the
performance is not satisfactory ,suitable modifications are done at some point in the design
flow.Once the timing and functionality is verified the implementation is complete.

---------------------xxxxxx------------------

References:

1.Field Programmable Gate Arrays S.D Brown, R.J.Francis et al

2.FPGA and CPLD Architectures : A Tutorial -STEPHEN BROWN & JONATHAN ROSE.

3. FPGA Architecture: Survey and Challenges --Ian Kuon1, Russell Tessier and Jonathan Rose1

26

Anda mungkin juga menyukai