Kuruvilla Varghese
DESE
Indian Institute of Science
Kuruvilla Varghese
Topics 2
Kuruvilla Varghese
1
Field Programmable Gate Arrays 3
Commercial FPGAs 4
Xilinx
Spartan-3, Spartan-6
Virtex-4, Virtex-5, Virtex-6
Artix-7, Kintex-7, Virtex-7, Zynq
Altera
Cyclone, Cyclone II, Cyclone III, Cyclone IV, Cyclone V
Arria II, Arria V
Stratix II, Stratix III, Startix IV, Startix V
Kuruvilla Varghese
2
Commercial FPGAs 5
Actel
Axcelerator (Antifuse)
IGLOO, IGLOOE (Flash)
ProASIC Plus (Flash)
ProASIC3, ProASIC3E (Flash)
RTAX (Radiation Tolerant, Anti-fuse)
RTSX -SU (Radiation Tolerant, Anti-fuse)
Smart Fusion, Smart Fusion 2 (ARM Cortex M3)
Kuruvilla Varghese
Structure of an FPGA 6
Kuruvilla Varghese
3
Structure of an FPGA 7
Detailed View 8
SB SB SB
Kuruvilla Varghese
4
Switch Block 9
Kuruvilla Varghese
Kuruvilla Varghese
5
FPGA 11
FPGA 12
6
Programmable Connections 13
Kuruvilla Varghese
7
Pass Transistor with configuration cell 15
Flip-Flop
Write Transistor
Pass Transistor
Kuruvilla Varghese
Flash Transistor 16
Kuruvilla Varghese
8
Flash Transistor 17
Kuruvilla Varghese
Kuruvilla Varghese
9
Flash Cell Erase 19
Kuruvilla Varghese
Anti-fuse 20
Kuruvilla Varghese
10
Programmable Connections 21
Kuruvilla Varghese
Coarse grain
Owing to SRAM interconnection area (6 transistors) the
Logic Blocks are made large in SRAM based FPGA
Utilization is made high with configurability within the logic
block
Fine Grain
Since the antifuse occupies less area and has less time delay,
antifuse based FPGAs employs smaller size logic blocks
Kuruvilla Varghese
11
Logic Cell Structure Coarse Grain 23
Kuruvilla Varghese
12
Design Methodology 25
Functional
HDL Source
Simulation
Equations/Netlists
Static Timing
Constraints PAR/Fitting Analysis
Configuration Timing
File Model
Timing Simulation
Programming
Kuruvilla Varghese
Structure of an FPGA 26
13
Commercial Tools 27
Simulators
ModelSim (Mentor Graphics)
Active HDL (Aldec)
Synthesis Tools
Synplify Pro (Synopsys)
Precision Synthesis (Mentor Graphics)
Vendor Tools
Xilinx ISE (Synthesis, Simulation, PAR, Programming, )
Xilinx Vivado (Synthesis, Simulation, PAR, Programming, )
Altera Quartus II (Synthesis, Simulation, PAR, Programming, )
Actel Libero (Synthesis, Simulation, PAR, Programming, )
Kuruvilla Varghese
Commercial Tools 28
Cadence Suite
Synopsis Suite
Mentor Graphics Suite
Kuruvilla Varghese
14
Xilinx Virtex FPGA 29
Kuruvilla Varghese
15
Virtex CLB 31
LUT 32
00 0
X A1
01 1
A0
Y 10 1
11 0
D0
X XOR Y
16
FPGA Configuration / Programming 33
Virtex Family 34
17
Important Specifications 35
Kuruvilla Varghese
Structure of an FPGA 36
Kuruvilla Varghese
18
Virtex CLB 37
19
4 input LUT and Flip-Flops 39
I3 S
I2 O D Q
I1 CK
I0 AR
I3
S
I2 O
I1 D Q
I0 CK
AR
Kuruvilla Varghese
Kuruvilla Varghese
20
5 input LUT 41
I3
I2 O
I1
I0
F5
I3
I2 O
I1
I0
I4
Two 4 input LUTs are Muxed for 5 input LUT using F5 Mux.
Select line is connected to BX and hence cannot use bottom FF
independently. F5 Mux output is connected to this FF.
Kuruvilla Varghese
6 Input LUT 42
I3 I3
I2 O I2 O
I1 I1
I0 I0
F5 F5
I3 I3
F6
I2 O I2 O
I1 I1
I0 I0
I4 I4
I5
Two 5 inputs are Muxed using F6 for a 6 input LUT. Select line is
connected to BY and hence cannot use top FF independently. F6 Mux
output is connected to this FF.
Kuruvilla Varghese
21
Cascading LUTs 43
Kuruvilla Varghese
Y = ABCDE or ABCDF
Y = (ABCD) and (E or F)
ABCD = X
Y = X and E or F
X
A
B E
C F
D
22
5 inputs using 2 cascaded LUTs 45
Y = ABCDE
Y = (ABCD) and E
ABCD = X
Y = X and E
X
A
B E
C
D
Y = ABCDE or AB/CDE/
Y = (ABCD) and E
ABCD = X
Y = X and E X
A
B
C E
D
Kuruvilla Varghese
23
5 inputs using 5 input LUT 47
Y = ABCD xor E
ABCD = Z
Y = ZE/ and Z/E
A I3
B I2 O
C I1
D I0
F5 Y
A I3
B I2 O
C I1
D I0
Kuruvilla Varghese
Kuruvilla Varghese
24
LUT as RAM 49
I3
I2 O
I1
I0
LUT
RAM
Write
General routing lines can be used to write LUT through the LUT
RAM write control circuit to use LUT as Distributed RAM
Kuruvilla Varghese
Kuruvilla Varghese
25
Carry Chain 51
Adder
S i = Ai B i C i
C i +1 = Ai B i + ( Ai B i )C i
Kuruvilla Varghese
Carry Chain 52
Ci+1
0 1
LUT
Ai
Bi
Si
Ci
C i +1 = A i B i + ( Ai B i ) C i
Kuruvilla Varghese
26
Source: Xilinx Data Sheets
Kuruvilla Varghese
Carry Chain 54
Kuruvilla Varghese
27
Control of Sequential Circuits 55
FSM /
en (RA_L) Reg /
Contr-
oller Counter /
clk
Kuruvilla Varghese
Clock Gating 56
D7:0
D Q
RA_E
RA_L
CLK
CLK CK
CLK
RA-L
CLK
Kuruvilla Varghese
28
Re-circulating Multiplexer 57
0 D7:0
D Q
1
RA_L RA_E
CLK CK
CLK
RA-L
Kuruvilla Varghese
Re-circulating Multiplexer 58
0 D Q
D Q
1
CE
CK
CK
29
Clock Gating for low power 59
D7:0
D Q
RA_L CLK1
D Q CLK2
CK RA_E
CK
CLK
CLK
RA-L
CLK1
CLK2
Kuruvilla Varghese
Comb
Kuruvilla Varghese
30
Sequential Circuit Mapping 61
FF Comb FF
Kuruvilla Varghese
NSL FF OL
Kuruvilla Varghese
31
Virtex IOB 63
Virtex IOB 64
Kuruvilla Varghese
32
Virtex IOB 65
Various IO standards
LVTTL
LVCMOS33, LVCMOS25
LVCMOS18, LVCOMS15, LVCMOS12
PCI33, PCI66
Some IO standards require a Reference voltage for Inputs
Banks of I/O pins support some of the IO standards
Kuruvilla Varghese
Bus
Hold circuit hold the previous state of the bus, but provides a
weak drive so that it could be driven to 0 or 1.
This avoids unnecessary switching of inputs by noise, if the bus
would have been left in high impedance.
Kuruvilla Varghese
33
Detailed View 67
SB SB SB
Kuruvilla Varghese
Virtex Routing 68
34
Virtex Routing 69
Kuruvilla Varghese
Bus Lines 70
35
Fitting Example: FSM 71
Kuruvilla Varghese
CLBs, FSM 72
NSL FF OL
36
Fitting Example: Counter 73
Kuruvilla Varghese
CLB, Counter 74
+1 FF
Kuruvilla Varghese
37
Signal Paths in CLB 75
library ieee;
use ieee.std_logic_1164.all;
entity test is
port (a, b, c, d, e, f, g, h: in std_logic; z: out std_logic);
end entity test;
Kuruvilla Varghese
process (a, b)
begin
if (a = '1') then z <= '0';
elsif (b'event and b = '1') then
if (c = '1') then
z <= (d and e and f and g) xor h;
end if;
end if;
end process;
end arch_test;
Kuruvilla Varghese
38
d
e
f
g
z
h
d
e
f
g
a
b
c
Kuruvilla Varghese
Virtex DPRAM 78
39
Virtex DPRAM 79
Metastability 80
D Q
ts: Setup time: Minimum time input
must be valid before the active clock
CLK
edge
Kuruvilla Varghese
40
Minimum Clock period 81
Data path
D Q D Q
Comb
CLK CLK
clk
Here we are considering the data path from first flip-flop to the next. We
Are estimating the minimum clock period for proper latching of data on to
second flip-flop
Kuruvilla Varghese
Inputs
D PS
Comb NS CK Q
AR
Clock
Reset
Kuruvilla Varghese
41
Clock skew 83
Kuruvilla Varghese
clock
Max Path
Kuruvilla Varghese
42
Clock Skew: Max path 85
D Q D Q
Comb
CLK1 CLK2
tclk tskew > tcomax +
clk tcombmax + tsetup
tclk
tclk > tcomax + tcombmax +
CLK1 tsetup + tskew
tco tcomb tskew
ts
slack =
CLK2 tclk (tcomax + tcombmax + tsetup
slack + tskew)
Kuruvilla Varghese
Kuruvilla Varghese
43
Clock Skew: Max path 87
Kuruvilla Varghese
D Q D Q
Comb
CLK1 CLK2
clk
Same edge
tclk
tcomin + tcombmin >
tskewmax + thold
CLK1
tco tcomb Next edge
tclk > tco + tcomb +
CLK2
tsetup - tskew
th
tskew tskew
Kuruvilla Varghese
44
Clock Skew: Min path 89
Here, an analysis like the case in max path (i.e. from one clock edge at first
flip-flop to next clock edge on second flip-flop) would result is a smaller
clock period, as the clock edge arrives late on second flip-flop
But, now the real danger is the data from first flip-flop due to current edge,
appearing in the hold time window of the current edge at the second flip-
flop
If that happens, solution is only to add extra delay to the data path between
these flip-flops, or route the clock in opposite direction
Practically, this can happen in shift registers as there may not be
combinational delay between flip-flops
Kuruvilla Varghese
Clock routing 90
Requirement
Minimum relative delay between any 2 flip-flops, at least between flip
flops where there is a datapath
Solution
Balance the number of buffers and approximate the length of wire from
clock input to the flip-flops
H Clock Tree
Kuruvilla Varghese
45
Virtex Clock Tree 91
DLL 92
CLKIN CLKOUT
CLKI CLKO
CLKIN
CLKOUT
tskew tadd
Kuruvilla Varghese
46
DLL / PLL 93
Kuruvilla Varghese
Current FPGAs 94
PLL
Digital Clock Manager (DCM)
DLL for de-skewing
Phase shifter
Frequency multiplication / division
Clock Buffers, Muxes (Glitchless)
All these can be connected in clock path
Clock pins, Clock tree
Kuruvilla Varghese
47
Special Resources Usage 95
Resources
Buffers
DLL / PLL
Block RAMs
DSP Blocks
Usage
Vendor library components
Inferred by synthesis tool, when possible
VHDL attributes with code
Kuruvilla Varghese
Virtex Configuration 96
48
Virtex Configuration: Serial PROM 97
Serial Configuration 98
Kuruvilla Varghese
49
SelectMAP Scheme 99
50
FPGA Controls while configuring 101
Kuruvilla Varghese
Kuruvilla Varghese
51
Spartan 6: Bit Stream encryption 103
Bit steam is AES encrypted with 256 bit key using BitGen
tool
Encryption key is programmed in to FPGA device through
JTAG for decryption.
Once programmed FPGA can be configured for no read back
Configuration also cant be read back.
AES key can be permanently fused in FPGA, Or in an
SRAM with external battery backup
Kuruvilla Varghese
Kuruvilla Varghese
52
Spartan 6: Multi Boot 105
Kuruvilla Varghese
Kuruvilla Varghese
53
Spartan 6: DSP48A1Slice 107
54
Xilinx ChipScope Pro 109
55
Virtex Pins 111
Inputs Outputs
NS
Next
D PS Output
State
CK Q Logic
Logic
AR
Clock
Reset
Outputs
tclk > tco + tlogic + tsetup
Inputs
D PS
Logic NS CK Q
AR
Clock
Reset
Kuruvilla Varghese
56
One hot encoding 113
Si
condi
condj
Sj
Dj = condi . Qi + condj . Qj
NSL: 5 + 2 inputs (Worst Case)
Kuruvilla Varghese
57
One-hot encoding Output logic 115
Kuruvilla Varghese
State encoding
Sequential, gray, one-hot-one, one-hot-zero
58
One-hot one, One-hot zero 117
Kuruvilla Varghese
59
Altera Stratix 119
Kuruvilla Varghese
Kuruvilla Varghese
60
Altera Stratix 121
Kuruvilla Varghese
61
Actel 54SX-A, C Cell 123
62
Actel 54SX-A 125
63
Actel 54SX-A Probe 127
64
ProASIC Plus, Logic Tile 129
Latch / FF 130
clk
Q
0
FF with Latches
D D Q Q
D Q
C C
CLK
Kuruvilla Varghese
65
ProASIC Plus Routing 131
Fast Connect
Short Lines (1, 2, 4), Long Lines
Clock Tree
Pad Ring (Pin Locking)
SRAM Blocks
Programming Tech: Flash
Non-volatile
Kuruvilla Varghese
66
Static Timing Analysis (STA) 133
Kuruvilla Varghese
Input D Q D Q Output
Comb
CK CK
CLK
Register to register path decides the clock frequency. But, if other 2 exceeds one need to
choose the maximum value as the minimum clock period.
In real life, this is not a great concern many a time we are designing some IPs which goes
inside the chip interfaced to other blocks close by. Even in case inputs are outputs are
brought to external pins, proper placement should take care of these delays.
Kuruvilla Varghese
67
Static Timing Analysis: Sequential Circuit 135
Kuruvilla Varghese
Kuruvilla Varghese
68
False Paths 137
Improbable Paths
Static Paths (e.g. Input Registers)
Paths between clock domains
Kuruvilla Varghese
D Q Comb D Q
CE1 CE CE2
CE
CK CK
clk
Kuruvilla Varghese
69
Critical Path 139
FF1 FF2
D Q C1 C2 D Q
CE1 CE CE2 CE
CK CK
clk
Kuruvilla Varghese
Constraint editor
I/O constraints
I/O locations
I/O standards (LVTTL, PCI66-3, LVDS ..)
Drive strength (current)
Slew rate
I/O termination (pull up, pull down, hold)
Input delay
Kuruvilla Varghese
70
Timing constraints 141
Global
Clock period, pad to setup, clock to pad
Per port
pad to setup, clock to pad
Per group (by net and clock)
Pad to setup, Clock to pad
FROM TO, FROM THRU TO
False Paths
Multi-cycle paths
Kuruvilla Varghese
71