Anda di halaman 1dari 52

Introduction to FPGA Devices

ECE 645 – Computer Arithmetic George Mason University


World of Integrated Circuits
Integrated Circuits

Full-Custom Semi-Custom User


ASICs ASICs Programmable

PLD FPGA

PAL PLA PML LUT MUX Gates


(Look-Up Table)

ECE 645 – Computer Arithmetic 2


Two competing implementation approaches

ASIC FPGA
Application Specific Field Programmable
Integrated Circuit Gate Array
• designs must be sent • bought off the shelf
for expensive and time and reconfigured by
consuming fabrication designers themselves
in semiconductor foundry
• no physical layout design;
• designed all the way design ends with
from behavioral description a bitstream used
to physical layout to configure a device

ECE 645 – Computer Arithmetic 3


What is an FPGA?

Configurable
Logic
Blocks

I/O
Block RAMs

Block RAMs
Blocks

Block
RAMs

ECE 645 – Computer Arithmetic 4


Which Way to Go?
ASICs FPGAs

Off-the-shelf
High performance
Low development cost
Low power
Short time to market
Low cost in
high volumes Reconfigurability

ECE 645 – Computer Arithmetic 5


Other FPGA Advantages

• Manufacturing cycle for ASIC is very costly,


lengthy and engages lots of manpower
• Mistakes not detected at design time have
large impact on development time and cost
• FPGAs are perfect for rapid prototyping of
digital circuits
• Easy upgrades like in case of software
• Unique applications
• reconfigurable computing

ECE 645 – Computer Arithmetic 6


Major FPGA Vendors
SRAM-based FPGAs
• Xilinx, Inc.
Share over 60% of the market
• Altera Corp.
• Atmel
• Lattice Semiconductor

Flash & antifuse FPGAs


• Actel Corp.
• Quick Logic Corp.
ECE 645 – Computer Arithmetic 7
Xilinx
 Primary products: FPGAs and the associated CAD
software

Programmable
Logic Devices ISE Alliance and Foundation
Series Design Software


Main headquarters in San Jose, CA
 Fabless* Semiconductor and Software Company
 UMC (Taiwan) {*Xilinx acquired an equity stake in UMC in 1996}
 Seiko Epson (Japan)

TSMC (Taiwan)

ECE 645 – Computer Arithmetic 8


Xilinx FPGA Families
• Old families
• XC3000, XC4000, XC5200
• Old 0.5µm, 0.35µm and 0.25µm technology. Not recommended
for modern designs.
• High-performance families
• Virtex (0.22µm)
• Virtex-E, Virtex-EM (0.18µm)
• Virtex-II, Virtex-II PRO (0.13µm)
• Virtex-4 (0.09µm)
• Low Cost Family
• Spartan/XL – derived from XC4000
• Spartan-II – derived from Virtex
• Spartan-IIE – derived from Virtex-E
• Spartan-3

ECE 645 – Computer Arithmetic 9


ECE 645 – Computer Arithmetic 10
Xilinx FPGA Block Diagram

ECE 645 – Computer Arithmetic 11


CLB Structure

ECE 645 – Computer Arithmetic 12


CLB Slice Structure
• Each slice contains two sets of the
following:
• Four-input LUT
• Any 4-input logic function,
• or 16-bit x 1 sync RAM
• or 16-bit shift register
• Carry & Control
• Fast arithmetic logic
• Multiplier logic
• Multiplexer logic
• Storage element
• Latch or flip-flop
• Set and reset
• True or inverted inputs
• Sync. or async. control

ECE 645 – Computer Arithmetic 13


LUT (Look-Up Table) Functionality
x1
x2
y
• Look-Up tables
x3 LUT
x1
0
x2
0
x3
0
x4
0
y
1
x4
x1
0
x2
0
x3
0
x4
0
y
0 are primary
0 0 0 1 1 0 0 0 1 1
0 0 1 0 1 0 0 1 0 0 elements for
0 0 1 1 1 0 0 1 1 0
0
0
1
1
0
0
0
1
1
1
0
0
1
1
0
0
0
1
0
1
logic
0
0
1
1
1
1
0
1
1
1
0
0
1
1
1
1
0
1
0
1 implementation
1 0 0 0 1 1 0 0 0 0
1
1
0
0
0
1
1
0
1
1
1
1
0
0
0
1
1
0
1
0 • Each LUT can
1 0 1 1 1 1 0 1 1 0
1 1 0 0 0 1 1 0 0 1 implement any
1 1 0 1 0 1 1 0 1 1
1
1
1
1
1
1
0
1
0
0
x1 x2 x3 x4 1
1
1
1
1
1
0
1
0
0
function of 4
inputs
x1 x2

ECE 645 – Computer Arithmetic 14


5-Input Functions implemented using
two LUTs
• One CLB Slice can implement any function of 5 inputs
• Logic function is partitioned between two LUTs
• F5 multiplexer selects LUT
LUT
A4
ROM
D
A3 RAM
A2
A1
WS DI F5
0
F5
1 X
WS DI GXOR
F4 A4 G
D
F3 A3
F2 A2 LUT
ROM
F1 A1
RAM

BX nBX
BX
1
0

ECE 645 – Computer Arithmetic 15


5-Input Functions implemented using two LUTs
X5 X4 X3 X2 X1 Y
0 0 0 0 0 0
0 0 0 0 1 1
0 0 0 1 0 0
0 0 0 1 1 0
0 0 1 0 0 1
0 0 1 0 1 1
0 0 1 1 0 0
0 0 1 1 1 0
0 1 0 0 0
0 1 0 0 1
1
0
LUT
0 1 0 1 0 0
0 1 0 1 1 1
0 1 1 0 0 1
0 1 1 0 1 1 OUT
0 1 1 1 0 1
0 1 1 1 1 1
1 0 0 0 0 0
1 0 0 0 1 0
1 0 0 1 0 0
1 0 0 1 1 0
1 0 1 0 0 0
1 0 1 0 1 0
1 0 1 1 0 0
1 0 1 1 1 1
1 1 0 0 0 0
1 1 0 0 1 1
1 1 0 1 0 0
1 1 0 1 1 1
1 1 1 0 0 0 LUT
1 1 1 0 1 1
1 1 1 1 0 0
1 1 1 1 1 0

ECE 645 – Computer Arithmetic 16


Distributed RAM RAM16X1S
D
W

=
EWCL
LUT AK O

• CLB LUT configurable as 0


A
1
A
2
A

Distributed RAM RAM32X


3

• A LUT equals 16x1 RAM D 1S


WE
WCL
A O
• Implements Single and Dual-
K
A
0
A
1

Ports A
2
A
3
4

• Cascade LUTs to increase


RAM size
LUT
or RAM16X
D0
D1
2S
WE

• Synchronous write = WCLK O


A
0
A
1
A
0
O
1
RAM16X
D 1D
W
2
A EWCL

• Synchronous/Asynchronous 3

or
AK SP
LUT 0 O
A

read 1
A
2
A
3
DPRA DP
• Accompanying flip-flops used 0
DPRA O
1
DPRA
for synchronous read 2
DPRA3

ECE 645 – Computer Arithmetic 17


Shift Register
LUT
• Each LUT can be D Q
IN
configured as shift register CE CE

• Serial in, serial out CLK

• Dynamically addressable D Q
delay up to 16 cycles CE

• For programmable
pipeline
• Cascade for greater cycle
LUT
= D Q
CE
OUT

delays
• Use CLB flip-flops to add
depth
D Q
CE

DEPTH[3:0]

ECE 645 – Computer Arithmetic 18


Shift Register
12 Cycles
Operatio Operatio
64 nA nB
4 Cycles 8 Cycles
64
Operatio
nC
3 Cycles

• Register-rich FPGA 3 Cycles


9-Cycle imbalance
• Allows for addition of pipeline stages to increase throughput
• Data paths must be balanced to keep desired functionality

ECE 645 – Computer Arithmetic 19


Carry & Control Logic
COUT

YB

G4 Y
G3 S
Look-Up Carry D Q
G2 TableO
G1 &
CK
Control
Logic EC
R
F5IN
BY
SR

XB
X S
F4
F3 Look-Up Carry D Q
F2 TableO
F1
& CK
Control
Logic EC
R

CIN
CLK
CE
SLICE

ECE 645 – Computer Arithmetic 20


Fast Carry Logic
 Each CLB contains separate
logic and routing for the fast
generation of sum & carry MSB
signals

Carry Logic
Routing
• Increases efficiency and
performance of adders,
subtractors, accumulators,
comparators, and counters
 Carry logic is independent of LSB
normal logic and routing
resources

ECE 645 – Computer Arithmetic 21


Accessing Carry Logic

All major synthesis tools can infer carry
logic for arithmetic functions
• Addition (SUM <= A + B)
• Subtraction (DIFF <= A - B)
• Comparators (if A < B then…)
• Counters (count <= count +1)

ECE 645 – Computer Arithmetic 22


Block RAM

Port B
Port A
Spartan-II
True Dual-Port
Block RAM

Block RAM
• Most efficient memory implementation
• Dedicated blocks of memory
• Ideal for most memory requirements
• 4 to 104 memory blocks
• 18 kbits = 18,432 bits per block
• Use multiple blocks for larger memories
• Builds both single and true dual-port RAMs
ECE 645 – Computer Arithmetic 23
Spartan-3 Block RAM Amounts

ECE 645 – Computer Arithmetic 24


Block RAM Port Aspect Ratios

ECE 645 – Computer Arithmetic 25


Block RAM Port Aspect Ratios
1 2
0 4
0
0

8k x 2 4k x 4

4,095

16k x 1 8,191
8+1
0
2k x (8+1)
2047

16+2
0
1023
1024 x (16+2)
16,383

ECE 645 – Computer Arithmetic 26


Dual Port Block RAM

ECE 645 – Computer Arithmetic 27


Dual-Port Bus Flexibility
RAMB4_S4_S16
WEA
ENA
Port A In RSTA DOA[17
Port A Out
1K-Bit Depth CLKA :0] 18-Bit Width
ADDRA[9:0]
DIA[17:0]

WEB
ENB

Port B In RSTB DOB[8:0]


Port B Out
2k-Bit Depth CLKB 9-Bit Width
ADDRB[8:0]
DIB[15:0]

• Each port can be configured with a different data bus


width
• Provides easy data width conversion without any
additional logic
ECE 645 – Computer Arithmetic 28
Two Independent Single-Port RAMs
RAMB4_S1_S1
Port A In WEA

8K-Bit Depth ENA Port A Out


RSTA DOA[0] 1-Bit Width
CLKA
VCC, ADDR[12:0]
ADDRA[12:0]
DIA[0]

Port B In WEB
Port B Out
8K-Bit Depth ENB
1-Bit Width
RSTB DOB[0]
CLKB
GND, ADDR[12:0]
ADDRB[12:0]
DIB[0]

• Added advantage of True Dual- • To access the lower RAM


Port • Tie the MSB address bit to
• No wasted RAM Bits Logic Low
• Can split a Dual-Port 16K RAM into • To access the upper RAM
two Single-Port 8K RAM • Tie the MSB address bit to
• Simultaneous independent access Logic High
to each RAM

ECE 645 – Computer Arithmetic 29


New 18 x 18 Embedded Multiplier
• Fast arithmetic functions
• Optimized to implement multiply /
accumulate modules
18 x 18 signed multiplier
Fully combinatorial
Optional registers with CE & RST (pipeline)
Independent from adjacent block RAM

ECE 645 – Computer Arithmetic 30


18 x 18 Multiplier
• Embedded 18-bit x 18-bit multiplier
• 2’s complement signed operation
• Multipliers are organized in columns

Data_A
(18 bits)

18 x 18 Output
Multiplier (36 bits)

Data_B
(18 bits)

Note: See Virtex-II Data Sheet


for updated performances

ECE 645 – Computer Arithmetic 31


Basic I/O Block Structure
Three-State D Q
FF Enable EC
Three-State
Clock SR Control
Set/Reset

Output D Q
FF Enable EC
Output Path
SR

Direct Input
FF Enable
Input Path
Registered Q D
Input EC
SR

ECE 645 – Computer Arithmetic 32


IOB Functionality

• IOB provides interface between the


package pins and CLBs
• Each IOB can work as uni- or bi-directional
I/O
• Outputs can be forced into High
Impedance
• Inputs and outputs can be registered
• advised for high-performance I/O
• Inputs can be delayed
ECE 645 – Computer Arithmetic 33
Routing Resources

CLB CLB CLB

PSM PSM
Programmable
Switch
CLB CLB CLB Matrix

PSM PSM

CLB CLB CLB

ECE 645 – Computer Arithmetic 34


Clock Distribution

ECE 645 – Computer Arithmetic 35


Spartan-3 FPGA Family Members

ECE 645 – Computer Arithmetic 36


FPGA Nomenclature

ECE 645 – Computer Arithmetic 37


Device Part Marking
We’re Using: XC3S100-4FG256

ECE 645 – Computer Arithmetic 38


ECE 645 – Computer Arithmetic 39
40
Configurable

Block
Block

Logic
I/O

Multipliers 18 x 18
Block RAMs
Virtex-II 1.5V Architecture

Multipliers 18 x 18
Block RAMs
Multipliers 18 x 18

ECE 645 – Computer Arithmetic


Block RAMs
Multipliers 18 x 18
Block RAMs
Virtex-II 1.5V
Device CLB Array Slices Maximum BlockRAM Multiplier Distributed
I/O (18kb) Blocks RAM bits
XC2V40 8x8 256 88 4 4 8,192
XC2V80 16x8 512 120 8 8 16,384
XC2V250 24x16 1,536 200 24 24 49,152
XC2V500 32x24 3,072 264 32 32 98,304
XC2V1000 40x32 5,120 432 40 40 163,840
XC2V1500 48x40 7,680 528 48 48 245,760
XC2V2000 56x48 10,752 624 56 56 344,064
XC2V3000 64x56 14,336 720 96 96 458,752
XC2V4000 80x72 23,040 912 120 120 737,280
XC2V6000 96x88 33,792 1,104 144 144 1,081,344
XC2V8000 112x104 46,592 1,108 168 168 1,490,944

ECE 645 – Computer Arithmetic 41


Virtex-II Block SelectRAM
• Virtex-II BRAM is 18 kbits WEA
ENA

• Additional “parity” bits SSRA DOA[# : 0]


CLKA
available in selected ADDRA[# : 0]
DOPA[# : 0]

configurations DIA[# : 0]
DIPA[# : 0]

Width Depth Address Data Parity


1 16,386 [13:0] [0] N/A
WEB
2 8,192 [12:0] [1:0] N/A ENB
RSTB DOB[# : 0]
4 4,096 [11:0] [3:0] N/A CLKB
DOPB[# : 0]
ADDRB[# : 0]

9 2,048 [10:0] [7:0] [0] DIB[# : 0]


DIPA[# : 0]

18 1,024 [9:0] [15:0] [1:0]


36 512 [8:0] [31:0] [3:0]

ECE 645 – Computer Arithmetic 42


Using Library Components in
VHDL Code

ECE 645 – Computer Arithmetic George Mason University


RAM 16x1 (1)
library IEEE;
use IEEE.STD_LOGIC_1164.all;

library UNISIM;
use UNISIM.all;

entity RAM_16X1_DISTRIBUTED is
port(
CLK : in STD_LOGIC;
WE : in STD_LOGIC;
ADDR : in STD_LOGIC_VECTOR(3 downto 0);
DATA_IN : in STD_LOGIC;
DATA_OUT : out STD_LOGIC
);
end RAM_16X1_DISTRIBUTED;

ECE 645 – Computer Arithmetic 44


RAM 16x1 (2)
architecture RAM_16X1_DISTRIBUTED_STRUCTURAL of RAM_16X1_DISTRIBUTED is

attribute INIT : string;


attribute INIT of RAM16X1_S_1: label is "F0C1";

-- Component declaration of the "ram16x1s(ram16x1s_v)" unit


-- File name contains "ram16x1s" entity: ./src/unisim_vital.vhd
component ram16x1s
generic(
INIT : BIT_VECTOR(15 downto 0) := X"0000");
port(
O : out std_ulogic;
A0 : in std_ulogic;
A1 : in std_ulogic;
A2 : in std_ulogic;
A3 : in std_ulogic;
D : in std_ulogic;
WCLK : in std_ulogic;
WE : in std_ulogic);
end component;

ECE 645 – Computer Arithmetic 45


RAM 16x1 (3)
begin

RAM_16X1_S_1: ram16x1s generic map (INIT => X"F0C1")


port map
(O=>DATA_OUT,
A0=>ADDR(0),
A1=>ADDR(1),
A2=>ADDR(2),
A3=>ADDR(3),
D=>DATA_IN,
WCLK=>CLK,
WE=>WE
);

end RAM_16X1_DISTRIBUTED_STRUCTURAL;

ECE 645 – Computer Arithmetic 46


RAM 16x8 (1)
library IEEE;
use IEEE.STD_LOGIC_1164.all;

library UNISIM;
use UNISIM.all;

entity RAM_16X8_DISTRIBUTED is
port(
CLK : in STD_LOGIC;
WE : in STD_LOGIC;
ADDR : in STD_LOGIC_VECTOR(3 downto 0);
DATA_IN : in STD_LOGIC_VECTOR(7 downto 0);
DATA_OUT : out STD_LOGIC_VECTOR(7 downto 0)
);
end RAM_16X8_DISTRIBUTED;

ECE 645 – Computer Arithmetic 47


RAM 16x8 (2)
architecture RAM_16X8_DISTRIBUTED_STRUCTURAL of RAM_16X8_DISTRIBUTED is

attribute INIT : string;


attribute INIT of RAM16X1_S_1: label is "0000";

-- Component declaration of the "ram16x1s(ram16x1s_v)" unit


-- File name contains "ram16x1s" entity: ./src/unisim_vital.vhd
component ram16x1s
generic(
INIT : BIT_VECTOR(15 downto 0) := X"0000");
port(
O : out std_ulogic;
A0 : in std_ulogic;
A1 : in std_ulogic;
A2 : in std_ulogic;
A3 : in std_ulogic;
D : in std_ulogic;
WCLK : in std_ulogic;
WE : in std_ulogic);
end component;

ECE 645 – Computer Arithmetic 48


RAM 16x8 (3)
begin

GENERATE_MEMORY:
for I in 0 to 7 generate
RAM_16X1_S_1: ram16x1s generic map (INIT => X"0000")
port map
(O=>DATA_OUT(I),
A0=>ADDR(0),
A1=>ADDR(1),
A2=>ADDR(2),
A3=>ADDR(3),
D=>DATA_IN(I),
WCLK=>CLK,
WE=>WE
);
end generate;

end RAM_16X8_DISTRIBUTED_STRUCTURAL;

ECE 645 – Computer Arithmetic 49


ROM 16x1 (1)
library IEEE;
use IEEE.STD_LOGIC_1164.all;

library UNISIM;
use UNISIM.all;

entity ROM_16X1_DISTRIBUTED is
port(
ADDR : in STD_LOGIC_VECTOR(3 downto 0);
DATA_OUT : out STD_LOGIC
);
end ROM_16X1_DISTRIBUTED;

ECE 645 – Computer Arithmetic 50


ROM 16x1 (2)
architecture ROM_16X1_DISTRIBUTED_STRUCTURAL of ROM_16X1_DISTRIBUTED is

attribute INIT : string;


attribute INIT of ROM16X1_S_1: label is "F0C1";

component ram16x1s
generic(
INIT : BIT_VECTOR(15 downto 0) := X"0000");
port(
O : out std_ulogic;
A0 : in std_ulogic;
A1 : in std_ulogic;
A2 : in std_ulogic;
A3 : in std_ulogic;
D : in std_ulogic;
WCLK : in std_ulogic;
WE : in std_ulogic);
end component;
signal Low : std_ulogic := ‘0’;

ECE 645 – Computer Arithmetic 51


ROM 16x1 (3)
begin

ROM_16X1_S_1: ram16x1s generic map (INIT => X"F0C1")


port map
(O=>DATA_OUT,
A0=>ADDR(0),
A1=>ADDR(1),
A2=>ADDR(2),
A3=>ADDR(3),
D=>Low,
WCLK=>Low,
WE=>Low
);

end ROM_16X1_DISTRIBUTED_STRUCTURAL;

ECE 645 – Computer Arithmetic 52

Anda mungkin juga menyukai