Reiner
Hartenstein
University of
Kaiserslautern
Infinite
Infiniteamount
amountof
ofgates
gatesnot
notyet
yetavailable
availableon
onaachip
chip
33millions
millionsgates
gates(10
(10millions
millionsinin2003
2003?)
?)far
faraway
awayfrom
from"infinite"
"infinite"
3.
3.
4.
4.
Bleeding
Bleedingedge
edgedesigns
designsonly
onlywith
withsophisticated
sophisticatedEDA
EDAtools
tools
Excessive
Excessiveoptimization
optimizationneeded
needed
5.
5.
6.
6.
Hardware
Hardwareepertise
epertiseisisinevitable
inevitablefor
forthe
thedesigner.
designer.
improve
improveand
andsimplify
simplifythe
thedesign
designflow
flowthe
theuser
user
7.
7.
8.
8.
provide
providerich
richconfigware
configwarelibraries
librariesof
ofsoft
softIP
IPcores,
cores,
APPLICATIONS:
APPLICATIONS:
1.
1.
2.
2.
control
controlapplications,
applications,
networking,
networking,
3.
3.
4.
4.
wireless
wirelesstelecommunication,
telecommunication,
data
datacommunication,
communication,
5.
5. embedded
embeddedand
andconsumer
consumermarkets.
markets.
1.
1. For
Forlibraries,
libraries,creation
creationand
andreuse
reuseof
ofconfigware
configware
2.
2. To
Tosearch
searchfor
forIPs
IPssee:
see:List
Listof
ofall
allavailable
availableIP
IP
3.
3. The
TheAllianceCORE
AllianceCOREprogram
programisisaacooperation
cooperationbetween
betweenXilinx
Xilinx
and
andthird-party
third-partycore
coredevelopers
developers
4.
4. The
TheXilinx
XilinxReference
ReferenceDesign
DesignAlliance
AllianceProgram
Program
5.
5. The
TheXilinx
XilinxUniversity
UniversityProgram
Program
6.
6. LogiCORE
LogiCOREsoft
softIP
IPwith
withLogiCORE
LogiCOREPCI
PCIInterface.
Interface.
7.
7. Consultants
Consultants
4.
4.
5.
5.
Select
SelectEDA
EDAquality
quality//productivity,
productivity,not
notFPGA
FPGAarchitectures
architectures
EDA
EDAoften
oftenhas
hasmassive
massivesoftware
softwarequality
qualityproblems
problems
Customer:
Customer:highest
highestpriority
priorityEDA
EDAcenter
centerof
ofexcellence
excellence
1.
1. collecting
collectingEDA
EDAexpertise
expertiseand
andEDA
EDAuser
userexperience
experience
2.
2. to
toassemble
assemblebest
bestpossible
possibletool
toolenvironments
environments
3.
3. for
foroptimum
optimumsupport
supportdesign
designteams
teams
4.
4. to
tocope
copewith
withinteroperability
interoperabilityproblems
problems
5.
5. to
tokeep
keeptrack
trackwith
withthe
theEDA
EDAscene
sceneas
asaarapidly
rapidlymoving
movingtarget
target
being
beingfabless,
fabless,FPGA
FPGAvendors
vendorsspend
spendmost
mostqualified
qualifiedmanpower
manpowerinin
development
developmentof
ofEDA,
EDA,IP
IPcores,
cores,applications
applications,,support
support
Xilinx
Xilinxand
andAltera
Alteraare
aremorphing
morphinginto
intoEDA
EDAcompanies.
companies.
OS for FPGAs
separate
separateEDA
EDAsoftware
softwaremarket,
market,comparable
comparabletotothe
thecompiler
compiler/ /OS
OSmarket
marketinin
computers,
computers,
Cadence,
Cadence,Mentor,
Mentor,Synopsys
Synopsysjust
justjumped
jumpedin.
in.
<<5%
5%Xilinx
Xilinx/ /Altera
Altera income
incomefrom
fromEDA
EDASW
SW
Changing
ChangingEDA
EDATools
ToolsMarket
Market
Full
Fulldesign
designflow
flowfrom
fromCadence,
Cadence,Mentor,
Mentor,&&Synopsys
Synopsys
Xilinx
XilinxSoftware
SoftwareAllianceEDA
AllianceEDAProgram:
Program:
1.
1.
2.
2.
Alliance
AllianceSeries
SeriesDevelopment
DevelopmentSystem.
System.
Foundation
FoundationSeries
SeriesDevelopment
DevelopmentSystems.
Systems.
3.
3. Xilinx
XilinxFoundation
FoundationSeries
SeriesISE
ISE (Integrated
(IntegratedSynthesis
Synthesis
Environment)
Environment)
4.
4. free
freeWebPOWERED
WebPOWEREDSW
SWw.
w.WebFitter
WebFitter&& WebPACKWebPACKISE
ISE
5.
5. StateCAD
StateCADXE
XEand
andHDL
HDLBencher
Bencher
6.
6.
7.
7.
Foundation
FoundationBase
BaseExpress
Express
Foundation
FoundationISE
ISEBase
BaseExpress
Express
ModelSim
ModelSimXilinx
XilinxEdition
Edition(ModelSim
(ModelSim
XE)
XE)
Forge
ForgeCompiler
Compiler
Modular
ModularDesign
Design
Chipscope
ChipscopeILA
ILA
The
TheXilinx
XilinxSystem
SystemGenerator
Generator
XPower
XPower
JBits
JBitsSDK
SDK
The
TheXilinx
XilinxXtremeDSP
XtremeDSPInitiative
Initiative
MathWorks
MathWorks/ /Xilinx
XilinxAlliance
Alliance
System
SystemGenerator
Generator
Wind
WindRiver
River/ /Xilinx
Xilinxalliance
alliance
Altera EDA
Altera
Alterawas
wasfounded
foundedininJune
June1983
1983
EDA:
EDA:synthesis,
synthesis,place
place&&route,
route,and,
and,verification
verification
Quartus
QuartusII:
II:APEX,
APEX,Excalibur,
Excalibur,Mercury,
Mercury,FLEX
FLEX6000
6000families
families
MAX+PLUS
MAX+PLUSII:
II:FLEX,
FLEX,ACEX
ACEX&&MAX
MAXfamilies
families
Flow
Flowwith
withQuartus
QuartusII:
II:Mentor
MentorGraphics,
Graphics,Synopsys,
Synopsys,Synplicity
Synplicitydeliver
deliveraa
design
designdesign
designsoftware
softwareto
tosupport
supportAltera
AlteraSOPC
SOPCsolutions.
solutions.
Mentor:
Mentor:only
onlyEDA
EDAvendor
vendorw.
w.complete
completedesign
designenvironment
environmentf.f.APEX
APEXIIII
incl.
incl.IP,
IP,design
designcapture,
capture,simulation,
simulation,synthesis,
synthesis,and
andh/s
h/sco-verification
co-verification
Configware:
Configware:Altera
Alteraoffers
offersover
overaahundred
hundredIP
IPcores
cores
Third
Thirdparty
partyIP
IPcore
coredesign
designservices
servicesand
andconsultants
consultants
Cadence
FPGA
FPGADesigner:
Designer:top-down
top-downFPGA
FPGAdesign
designsystem,
system,
high-level
high-levelmapping,
mapping,architecture-specific
architecture-specificoptimization,
optimization,
Verilog,VHDL,
Verilog,VHDL,schematic-level
schematic-leveldesign
designentry.
entry.
Verilog,
Verilog,VHDL
VHDLto
toSynergy
Synergy(logic
(logicsynthesis)
synthesis)and
andFPGA
FPGADesigner
Designer
FPGAs
FPGAssimulated
simulatedby
bythemselves
themselvesusing
usingCadence's
Cadence'sVerilog-XL
Verilog-XLor
or
Leapfrog
LeapfrogVHDL
VHDLsimulators
simulatorsand
and
simulated
simulatedwith
withrest
restof
ofthe
thesystem
systemdesign
designwith
withLogic
LogicWorkbench
Workbench
board/system
board/systemverification
verificationenvironment.
environment.
Libraries
Librariesfor
forthe
theleading
leadingFPGA
FPGAmanufacturers.
manufacturers.
Mentor Graphics
System
System Design
Designand
andVerification.
Verification.
PCB
PCBdesign
designand
andanalysis:
analysis:
IC
ICDesign
Design and
andVerification
Verification
shifts
shiftsASIC
ASICdesign
designflow
flowto
toFPGAs
FPGAs(Altera,
(Altera,Xilinx)
Xilinx)
by
byFPGA
FPGAAdvantage
Advantagewith
withIP
IPsupport
support
by
byModuleWare,
ModuleWare,
Xilinx
XilinxCORE
COREGenerator
Generator
Altera
AlteraMegaWizard
MegaWizardintegration,
integration,
Synopsys
FPGA
FPGACompiler
Compiler IIII
Version
Version of
of ASIC
ASIC Design
Design Compiler
Compiler Ultra
Ultra
Block
Block Level
Level Incremental
Incremental Synthesis
Synthesis (BLIS)
(BLIS)
ASIC
ASIC <->
<-> FPGA
FPGAmigration
migration
Actel,
Actel, Altera,
Altera, Atmel,
Atmel, Cypress,
Cypress, Lattice,
Lattice, Lucent,
Lucent,
Quicklogic,
Quicklogic, Triscend,
Triscend, Xilinx
Xilinx
Lattice
15%
Altera
37%
Actel
6%
Xilinx
42%
$3.7 Bio
1998
629
1999
Xilinx
Meanwhile,
Altera
654
837
Xilinx acquired
Philips' MOS
PLD
Lattice
206
410
Actel
154
172
Lucent
100
120
Cypress
41
43
Quicklogic
30
40
Atmel
32
38
business,
Lattice
purchased
Vantis.
.
899
[Dataquest]
[Dataquest] PLD
PLDmarket
market>>$7
$7billion
billionby
by2003.
2003.
fastest
fastestgrowing
growingsegment
segmentof
of semiconductor
semiconductormarket.
market.
IP
IPreuse
reuseand
and"pre-fabricated"
"pre-fabricated" components
componentsfor
for the
the
efficiency
efficiencyof
of design
designand
anduse
usefor
for PLDs
PLDs
FPGAs
FPGAsare
aregoing
going into
intoevery
everytype
typeof
of application.
application.
Xilinx
fabless
fablessFPGA
FPGAsemi
semivendor,
vendor,San
SanJose,
Jose,Ca,
Ca,founded
founded1984
1984
key
keypatents
patentson
onFPGAs
FPGAs(expiring
(expiringininaafew
fewyears)
years)
Fortune
Fortune2001:
2001:No.
No.14
14Best
BestCompany
Companyto
towork
workfor
forinin(intel:
(intel:no.
no.42,
42,hp
hpno.
no.
64,
64,TI
TIno.
no.65).
65).
DARPA
DARPAgrant
grant(Nov99)
(Nov99)to
todevelop
developJbits
JbitsAPI
APItools
toolsfor
forinternet
internet
reconfigurable
reconfigurable//upgradable
upgradablelogic
logic(w.
(w.VT)
VT)
Less
Lessbrilliant
brilliantearly/mid
early/mid90ies
90ies(president
(presidentCurt
CurtWozniak):
Wozniak):1995
1995market
market
share
from
84%
down
to
62%
[Dataquest]
share from 84% down to 62% [Dataquest]
As
Asdesigns
designsget
getlarger,
larger,Xilinx
Xilinxlosed
losedits
itsadvantage
advantage(bugfixes
(bugfixesdid
didnot
notrequire
require
to
burn
new
chips)
to burn new chips)
meanwhile,
meanwhile,weeks
weeksof
ofexpensive
expensivedebug
debugtime
timeneeded
needed
Xilinx Flexware
Virtex,
Virtex,Virtex-II,
Virtex-II,first
firstw.
w.11mio
miosystem
systemgates.
gates.
Virtex-E
Virtex-Eseries
series>>33mio
miosystem
systemgates.
gates.
Virtex-EM
Virtex-EMon
onaacopper
copperprocess
process&&addit.
addit.on
onchip
chipmemory
memoryf.f.network
networkswitch
switch
appl.
appl.
The
TheVirtex
VirtexXCV3200E
XCV3200E>>33million
milliongates,
gates,0.15-micron
0.15-microntechnology,
technology,
Spartan,
Spartan,Spartan-XL,
Spartan-XL,Spartan-II
Spartan-II
for
forlow-cost,
low-cost,high
highvolume
volumeapplications
applicationsas
asASIC
ASICreplacements
replacements
Multiple
MultipleI/O
I/Ostandards,
standards,on-chip
on-chipblock
blockRAM,
RAM,digital
digitaldelay
delaylock
lockloops
loops
eliminate
eliminatephase
phaselock
lockloops,
loops,FIFOs,
FIFOs,I/O
I/Oxlators
xlators, ,system
systembus
busdrivers
drivers
XC4000XV,
XC4000XV, XC4000XL/XLA,
XC4000XL/XLA,CPLD:
CPLD:low-cost
low-costfamilies
families
rapid
rapiddevelopment,
development,longer
longersystem
systemlife,
life,robust
robustfield
fieldupgradability
upgradability
support
supportIn-System
In-SystemProgramming
Programming(ISP),
(ISP),in-board
in-boarddebugging,
debugging,
test
testduring
duringmanufacturing,
manufacturing,field
fieldupgrades,
upgrades,full
fullJTAG
JTAGcompliant
compliantinterface
interface
CoolRunner:
CoolRunner:low
lowpower,
power,high
highspeed/density,
speed/density,standby
standbymode.
mode.
Military
Military&&Aerospace:
Aerospace:QPRO
QPROhigh-reliability
high-reliabilityQML
QMLcertified
certified
Configuration
ConfigurationStorage
StorageDevices
Devices
Altera Flexware
Newer
Newerfamilies:
families:APEX
APEX20KE,
20KE,APEX
APEX20KC,
20KC,APEX
APEXII,II,MAX
MAX7000B,
7000B,ACEX
ACEX1K,
1K,Excalibur,
Excalibur,
Mercury
Mercuryfamilies.
families.
Apex
ApexEP20K1500E
EP20K1500E(0.18-),
(0.18-),up
uptoto2.4
2.4million
millionsystem
systemgates,
gates,
APEX
APEXIIII(all-copper
(all-copper0.13-)
0.13-)f.f.data
datapath
pathapplications,
applications,supports
supportsmany
manyI/O
I/O
standards.
standards.1-Gbps
1-GbpsTrue-LVDS
True-LVDSperformance
performance
wQ2001,
wQ2001,an
anARM-based
ARM-basedExcalibur
Excaliburdevice
device
Altera
Alteramainstream:
mainstream:MAX
MAX7000A,
7000A,3000A;
3000A;FLEX
FLEX6000,
6000,10KA,
10KA,10KE;
10KE;APEX
APEX20K
20K
families.
families.
Mature
Matureand
andother
other: :Classic,
Classic,MAX
MAX7000,
7000,7000S,
7000S,9000;
9000;FLEX
FLEX8000,
8000,10K
10Kfamilies.
families.
Triscend CSoC
[Kean]
Configurable system logic
ARM
Digital Filter
Display Interface
Viterbi
A/D Interface
CSI Socket
Configurable System Interconnect (CSI) Bus
Memory
HDL
[ la S. Guccione]
Netlister
Netlist
Place
and
Route
Bitstream
HLL
Compiler
HLL
Schematics/
HDL
Netlister
Netlist
Place
and
Route
.
.
Bitstream
User
Code
Compiler
Executable
Compiler
Schematics/
HDL
Netlister
JBits
API
Netlist
Place
and
Route
.
.
Bitstream
User
Code
Compiler
Executable
User
Java
Code
Java
Compiler
Executable
HLL
Compiler
FPGA core
HLL
[ la S. Guccione]
Compiler
CPU Memory
core
core
on-board
on-boardmicroprocessor
microprocessorCPU
CPUisisavailable
availableanyhow
anyhow--even
evenalong
along
with
withaalittle
littleRTOS
RTOS
use
usethis
thisCPU
CPUfor
forconfiguration
configurationmanagement
management
Run-Time Reconfiguration
Compiler
HLL
Compiler
FPGA core
HLL
Compiler
CPU Memory
core
core
JBits
API
User
Java
Code
Java
Compiler
Executable
RTR
Divides
Dividesapplication
applicationinto
intoaaseries
seriesofofsequentially
sequentiallyexecuted
executedstages,
stages,each
eachimplemented
implementedas
asaa
separate
separateexecution
executionmodule.
module.
Partial
PartialRTR
RTRpartitions
partitionsthese
thesestages
stagesinto
intofiner-grain
finer-grainsub-modules
sub-modulestotobe
beswapped
swappedininas
as
needed.
needed.
Without
WithoutRTR,
RTR,all
allconf.
conf.platforms
platformsjust
justASIC
ASICemulators.
emulators.
needs
needsaanew
newkind
kindofofapplication
applicationdevelopment
developmentenvironments.
environments.
directly
directlysupport
supportdevelopment
developmentand
anddebugging
debuggingofofRTR
RTRappl.
appl.
essential
essentialfor
forthe
theadvancement
advancementofofconfigurable
configurablecomputing
computing
will
willalso
alsoheavily
heavilyinfluence
influencethe
thefuture
futuresystem
systemorganization
organization
Xilinx,
Xilinx,VT,
VT,BYU
BYUwork
workon
onrun-time
run-timekernels,
kernels,run-time
run-timesupport,
support,RTR
RTRdebugging
debuggingtools
toolsand
and
other
associated
tools.
other associated tools.
smaller,
smaller,faster
fastercircuits,
circuits,simplified
simplifiedhardware
hardwareinterfacing,
interfacing,fewer
fewerIOBs;
IOBs;smaller,
smaller,cheaper
cheaper
packages,
simplified
software
interfaces.
packages, simplified software interfaces.
Run-time Mapping
1.1. Run-time
Run-timereconfigurable
reconfigurable are:
are:Xilinx
XilinxVIRTEX
VIRTEXFPGA
FPGAfamily
family
2.2. RAs
RAsbeing
beingpart
partofofChameleon
ChameleonCS2000
CS2000series
seriessystems
systems
3.3. Using
Usingsuch
suchdevices
deviceschanges
changesmany
manyof
ofthe
thebasic
basicassumptions
assumptionsininthe
theHW/SW
HW/SWcocodesign
designprocess:
process:
4.4. Host/RL
Host/RLinteraction
interactionisisdynamic,
dynamic,needs
needsaatiny
tinyOS
OSlike
likeeBIOS,
eBIOS,also
alsototoorganize
organize
RL
reconfiguration
under
host
control
RL reconfiguration under host control
5.5. Typical
Typicalgoal
goalisisminimization
minimizationof
ofreconfiguration
reconfigurationlatency
latency(especially
(especiallyimportant
importantinin
communication
communicationprocessors),
processors),totohide
hideconfiguration
configurationloading
loadinglatency,
latency,and,
and,
6.6. Scheduling
Schedulingtotofind
findbest
bestschedule
schedulefor
foreBIOS
eBIOScalls
calls(C~side).
(C~side).
2.2. Quickturn
Quickturn(Cadence),
(Cadence),IKOS
IKOS(Synopsys),
(Synopsys),Celaro
Celaro(Mentor)
(Mentor)
3.3. From
Fromrack
racktotoboard
boardtotochip
chip(from
(fromother
othervendors,
vendors,e.e.g.
g.Virtex
Virtexand
andVirtexE
VirtexEfamily
family
(emulate
up
to
3
million
gates)
(emulate up to 3 million gates)
4.4. Easy
Easyconfiguration
configurationusing
usingSmartMedia
SmartMediaFLASH
FLASHcards
cards
5.5. ASIC
ASICemulators
emulatorswill
willbecome
becomeobsolete
obsoletewithin
withinyears
years
6.6. By
ByRTR:
RTR:in-circuit
in-circuitexecution
executiondebugging
debugginginstead
insteadof
ofemulation
emulation
ItItisisrecommendable
recommendabletotoset-up
set-upan
aninterwoven
interwovencompetence
competenceininboth
bothscenes,
scenes,EM
EM
scene
and
the
highly
commercialized
EDA
scene
scene and the highly commercialized EDA scene
5.5. EH
EHshould
shouldbe
bedone
doneby
byEDA
EDApeople,
people,rather
ratherthan
thanEM
EMfreaks.
freaks.
BRASS (1)
1.1. UC
UCBerkeley,
Berkeley,the
theBRASS
BRASSgroup
group: :Prof.
Prof.Dr.
Dr.John
JohnWawrzynek
Wawrzynek
2.2. The
ThePleiades
PleiadesProject,
Project,Prof.
Prof.Jan
JanRabaey,
Rabaey,ultra-low
ultra-lowpower
powerhighhighperformance
performancemultimedia
multimediacomputing
computingthrough
throughreconfiguration
reconfigurationofof
heterogeneous
heterogeneoussystem
systemmodules,
modules,reducing
reducingenergy
energyby
byoverhead
overhead
elimination,
programmability
at
just
right
granularity,
parallellism,
elimination, programmability at just right granularity, parallellism,
pipelining,
pipelining,dynamic
dynamicvoltage
voltagescaling.
scaling.
3.3. Garp
Garpintegrates
integratesprocessor
processorand
andFPGA;
FPGA;developed
developedininparallel
parallelwith
with
compiler
software
compile
techniques
(VLIW
SW
pipelining):
compiler - software compile techniques (VLIW SW pipelining):simple
simple
pipelining
functionalites,
broad
class
of
loops.
pipelining functionalites, broad class of loops.
4.4. SCORE,
SCORE,aastream-based
stream-basedcomputation
computationmodel
model--aaunifying
unifyingcomputational
computational
model.
model.Fast
FastMapping
Mappingfor
forDatapaths:
Datapaths:by
byaatree-parsing
tree-parsingcompiler
compilertool
toolfor
for
datapath
module
mapping
datapath module mapping
BRASS (2)
HSRA.
HSRA.new
newFPGA
FPGA(&
(&related
relatedtools)
tools)supports
supportspipelining,
pipelining,
w.
w.retiming
retimingcapable
capableCLB
CLBarchitecture,
architecture,implemented
implementedin
inaa
0.4um
0.4umDRAM
DRAMprocess
processsupporting
supporting250MHz
250MHzoperation
operation
OOCG.
OOCG.Object
ObjectOriented
Oriented Circuit-Generators
Circuit-Generatorsin
inJava
Java
MESCAL
MESCAL (GSRC),
(GSRC),the
thegoal
goalis:
is:to
toprovide
provide aaprogrammer's
programmer's
model
modeland
andsoftware
softwaredevelopment
developmentenvironment
environmentfor
for
efficient
efficientimplementation
implementationof
ofan
aninteresting
interestingset
setof
of
applications
applicationsonto
ontoaafamily
familyof
offully-programmable
fully-programmable
architectures
architectures//microarchitectures.
microarchitectures.
1.1. SCORE,
SCORE,aastream-based
stream-basedcomputation
computationmodel:
model:the
theBRASS
BRASSgroup
groupclaims
claims
having
havingsolved
solvedthe
theproblem
problemof
ofprimary
primaryimpediment
impedimenttotowide-spread
wide-spread
reconfigurable
computing,
by
a
unifying
computational
reconfigurable computing, by a unifying computationalmodel.
model.
2.2. Remark:
Remark:clean
cleanstream-based
stream-basedmodel
modelintroduced
introduced~1980:
~1980:Systolic
SystolicArray
Array
3.3. 1995:
1995:Rainer
RainerKress.
Kress.Introduces
Introducesreconfigurable
reconfigurablestream-based
stream-basedmodel
model
4.4.
Fast
FastMapping
Mappingfor
forDatapaths
Datapaths(SCORE):
(SCORE):BRASS
BRASSclaims
claimshaving
havingintroduced
introduced
1998
1998the
thefirst
firsttree-parsing
tree-parsingcompiler
compilertool
toolfor
fordatapath
datapathmodule
modulemapping
mapping."."
1.1. Further,
Further,ititisisthe
thefirst
firstwork
worktotointegrate
integratesimultaneous
simultaneousplacement
placementwith
with
module
modulemapping
mappingininaaway
waythat
thatpreserves
preserveslinear
lineartime
timecomplexity."
complexity."
1.
1. Remark:
Remark:The
TheDPSS
DPSS(Data
(DataPath
PathSynthesis
SynthesisSystem)
System)using
usingtree
treecovering
covering
simultanous
datapath
placement
and
routing
has
been
published
simultanous datapath placement and routing has been publishedinin
1995
1995by
byRainer
RainerKress
Kress
2.
2. Chip-in-a-Da2
Chip-in-a-Da2Bee
BeeProject.
Project.Prof.
Prof.Dr.
Dr.Bob
BobBrodersons
Brodersonsradical
radicalrethink
rethinkof
of
the
ASIC
design
flow
aimed
at
shortening
design
time,
relying
on
the ASIC design flow aimed at shortening design time, relying on
stream-based
stream-basedDPU
DPUarrays.
arrays.[published
[publishedinin2000]
2000]
3.
3. Remark:
Remark:the
theKressArray,
KressArray,aascalable
scalablerDPU
rDPUarray
array[1995]
[1995]isisstream-based
stream-based
Hardware/Software
Hardware/SoftwareCo-Design
Co-Designofofmedia
mediaand
andstream
streamprocessors
processors
and
andothers
others....
....
Chip-in-a-Day
Chip-in-a-DayProject.
Project.
Prof.
Prof.Dr.
Dr.Bob
BobBroderson,
Broderson,BWRD:
BWRD:targeting
targetingaaradical
radicalrethink
rethinkof
ofthe
theASIC
ASIC
design
flow
aimed
at
shortening
design
time.
design flow aimed at shortening design time.
Relying
Relyingon
onstream-based
stream-basedDPU
DPUarrays
arrays(not
(notrDPU
rDPUand
andrelated
relatedEDA
EDA
tools.
tools.
Davis:
Davis:...
...50x
50xdecrease
decreaseininpower
powerrequirements
requirementsover
over typical
typicalTI
TIC64X
C64X
design.
design.
1.
1. New
Newdesign
designflow
flowto
tobreak
breakup
upthe
thehighly
highlyiterative
iterativeEDA
EDAprocess,
process,allowing
allowing
designers
to
spend
more
time
defining
the
device
and
far
less
time
designers to spend more time defining the device and far less time
implementing
implementingititininsilicon.
silicon....
...developers
developersto
tostart
startby
bycreating
creatingdata
dataflow
flow
graphs
rather
than
C
code,
graphs rather than C code,
2.
2. ItItisisstream-based
stream-basedcomputing
computingby
byDPU
DPUarray
array(hardwired
(hardwiredDPA)
DPA)
3.
3. For
Forhardwired
hardwiredand
andreconfigurable
reconfigurableDPU
DPUarray
arrayand
andrDPU
rDPUarray
array
Stanford:
Stanford:Prof.
Prof.Flynn
Flynnwent
wentemeritus,
emeritus,Oskar
OskarMenzer
Menzermoved
movedtotoBell
BellLabs.
Labs.
no
activities
seen
other
than
YAFA
(yet
another
FPGA
application)
no activities seen other than YAFA (yet another FPGA application)
2.2.
UCLA:
UCLA:Prof.
Prof.Jason
JasonCong,
Cong,expert
experton
onFPGA
FPGAarchitectures
architecturesand
andR&
R&PPalgorithms.
algorithms.99projects,
projects,mult.
mult.
sponsors
sponsorsunder
underCalifornia
CaliforniaMICRO
MICROProgram
Program
3.3.
Prof.
Prof.Majid
MajidSarrafzadeh
Sarrafzadehdirects
directsthe
theSPS
SPSproject:
project:"versatile
"versatileIPs,
IPs,aanew
newrouting
routingarchitecture,
architecture,
architecture-aware
CAD,
IP-aware
SPS
compiler
architecture-aware CAD, IP-aware SPS compiler
4.4.
5.5.
USC:
USC:Prof.
Prof.Viktor
ViktorPrasanna
Prasanna(EE
(EEdept.)
dept.)works
works20%
20%on
onreconfigurable
reconfigurablecomputing:
computing:MAARC
MAARC project,
project,
DRIVE
DRIVEproject
projectand
andEfficient
EfficientSelf-Reconfiguration.
Self-Reconfiguration.- - Prof.
Prof. Dubois:
Dubois:RPM
RPMProject,
Project,FPGA-based
FPGA-based
emulation
emulationofofscalable
scalablemultiprocessors.
multiprocessors.
DEFACTO
proj.:
compilation
DEFACTO proj.: compilation- -architecture-independent
architecture-independentatatall
alllevels
levels
6.6.
MIT.
MIT.MATRIX
MATRIXweb
webpages
pagesremoved
removed`99.
`99.RAW
RAWproject:
project:aaconglomerate
conglomerate
7.7.
VT.
VT.Prof.
Prof.Athanas:
Athanas:Jbits
JbitsAPI
APIf.f.internet
internetRTR
RTRlogic
logic($2.7
($2.7mio
mioDARPA).
DARPA).w.
w.Prof.
Prof.Brad
BradHutchings,
Hutchings,BYU
BYU
on
programming
approaches
for
RTR
Systems
on programming approaches for RTR Systems
8.8.
BYU.
BYU.Prof.
Prof.Brad
BradHutchings
Hutchingsworks
workson
onthe
theJHDL
JHDL(JAVA
(JAVAHardware
HardwareDescription
DescriptionLanguage)
Language)and
and
compilation
compilationofofJHDL
JHDLsources
sourcesinto
intoFPGAs.
FPGAs.
ASICs
ASICsAre
AreAlready
Already Dead
Dead
My Position
[Jonathan Rose]
You
You have
have to
to fabricate
fabricate an
anASIC
ASIC
Very
Very hard,
hard, getting
getting harder
harder
An
An FPGA
FPGAisis pre-fabricated
pre-fabricated
AAstandard
standard part
part
immense
immense economic
economic advantages
advantages
[Jonathan Rose]
Instant
InstantFabrication
Fabrication
Get
Getto
toMarket
MarketFast
Fast
Fix
Fixem
emquick
quick
Custom
CustomIC
ICDesigner
DesignerCan
CanMake
MakeLogic
Logic
20x
20xFaster,
Faster,
20x
20xSmaller
Smallerthan
thanProgrammable
Programmable
Embedded
FPGA Fabric
Still
StillHave
Haveto
toMake
Makethe
theChip
Chip
Need
NeedTwo
TwoSets
Setsof
ofSoftware
Softwareto
toBuild
BuildItIt
The
TheASIC
ASICFlow
Flow
The
ThePLD
PLDFlow
Flow
Have
HaveNo
NoIdea
IdeaWhat
Whatto
toConnect
Connectthe
thePLD
PLDPins
Pinsto
to
Chances
ChancesAre,
Are,You
YouAre
AreGoing
Goingto
toGet
GetItItWrong!
Wrong!
[Jonathan Rose]
Pre-Fabricated
Pre-Fabricated
One
OneCAD
CADTool
ToolFlow!
Flow!
Can
CanConnect
ConnectAnything
Anythingto
toAnything
Anything
PLDs
PLDsare
arebuilt
builtfor
forgeneral
generalconnectivity
connectivity
[Jonathan Rose]
Dual-Port
RAM
Single-Port
RAM
ARM 922T
Core
High-Speed
High-Speed
Processors
Processors
Integrated
Integratedwith
with
PLDs
PLDs
General Purpose
PLD
[Jonathan Rose]
Available Today!
HLL
Compiler
FPGA
Memory
core
soft
CPU
FPGA
Nios
architecture
platform
Xilinx up to 100 on
one FPGA
core
architecture
platform
Leon
25 Mhz
SPARC
ARM7 clone
ARM
uP1232 8-bit
CISC, 32 reg.
16-bit
instr. set
Altera
Mercury
REGIS
Nios
50 MHz
32-bit
instr. set
Altera
22 D-MIPS
Reliance-1
12 bit DSP
Nios
8 bit
Altera Mercury
Lattice
4 isp30256,
4 isp1016
1Popcorn-1
8 bit CISC
gr1040
16-bit
gr1050
32-bit
My80
i8080A
FLEX10K30 or
EPF6016
YARD-1A
16-bit RISC,
2 opd. Instr.
DSPuva16
16 bit DSP
Spartan-II
xr16
RISC integer C
SpartanXL
Acorn-1
1 Flex 10K20
Description
Language
Implementation
Reliance 1
Schematic
PopCorn 1
Verilog
Acorn 1
VHDL
16-bit DSP
VHDL
Xilinx XC4000
Free-6502
VHDL
DLX
VHDL
DLX2
VHDL
GL85
i8085 clone
VHDL
AMD 2901
VHDL
AMD 2910
VHDL
i8051
8-bit micro-controller
VHDL
Synopsys
i8051
VHDL
Mentor Graphics
Synopsys
Mraldalen
MraldalenUniversity,
University,Eskilstuna,
Eskilstuna,
Sweden
Sweden
Chalmers
ChalmersUniversity,
University,Gteborg,
Gteborg,
Sweden
Sweden
Cornell
CornellUniversity
University
Gray
GrayResearch
Research
Georgia
GeorgiaTech
Tech
Hiroshima
HiroshimaCity
CityUniversity,
University,Japan
Japan
Michigan
MichiganState
State
Universidad
Universidadde
deValladolid,
Valladolid,
Spain
Spain
Virginia
VirginiaTech
Tech
Washington
WashingtonUniversity,
University,St.
St.
Louis
Louis
New
NewMexico
MexicoTech
Tech
UC
UCRiverside
Riverside
Tokai
TokaiUniversity,
University,Japan
Japan
rD PU
Ar ra y
[ la S. Guccione]
Performance
1000
100
Proc
60%/yr..
1
1980
Processor-Memory
Performance Gap:
(grows 50% / year)
CPU
10
DRAM
1990
2000
DRAM
7%/yr..
data
streams,
or,
from / to
embedded
memory
banks
[ la S. Guccione]
rD PU
Ar ra y
miscellanous
HLL
Compiler
soft CPU
ft
o
U
s P ay
D rr
a
[ la S. Guccione]
Memory
miscellanous
HLL
Compiler
CPU
Uy
Pa
rD r r
a
[ la S. Guccione]
Memory
>> HLLs
Configware
ConfigwareMarket
Market
FPGA
FPGAMarket
Market
Embedded
EmbeddedSystems
Systems(Co-Design)
(Co-Design)
Hardwired
HardwiredIPIPCores
Coreson
onBoard
Board
Run-Time
Run-TimeReconfiguration
Reconfiguration(RTR)
(RTR)
Rapid
RapidPrototyping
Prototyping&&ASIC
ASICEmulation
Emulation
Evolvable
EvolvableHardware
Hardware(EH)
(EH)
Academic
AcademicExpertise
Expertise
ASICs
ASICsdead
dead
Soft
SoftCPU
CPU
HLLs
HLLs
Problems
Problemstotobe
besolved
solved
HLL
Compiler
System Design
HLL
[ la S. Guccione]
Compiler
Compiler
HLL
Compiler
System Design
HLL
[ la S. Guccione]
Compiler
HLL
Compiler
FPGA core
HLL
[ la S. Guccione]
Compiler
CPU Memory
core
core
Jbit Environment
RTP Core
Library
[ la S. Guccione]
JRoute
API
JBits
API
User
Code
BoardScope
Debugger
XHWIF
TCP/IP
Device
Simulator
HLL
Compiler
HLL
System Design
[ la S. Guccione]
Compiler
FPGA core
HLL
Compiler
CPU Memory
core
core
Memory
FPGA
core
HLL
[ la S. Guccione]
Compiler
soft
CPU
FPGA
Target technologies
Processing
Processingunits
units
Power
Powerefficiency
efficiencyof
oftarget
targettechnologies
technologies
ASICs
ASICs
Processors
Processors
Energy
Energyefficiency
efficiency
Code
Codesize
sizeefficiency
efficiencyand
andcode
codecompaction
compaction
Run-time
Run-timeefficiency
efficiency
DSP
DSPprocessors
processors
Multimedia
Multimedia processors
processors
Very
Verylong
longinstruction
instructionword
word(VLIW)
(VLIW)&&EPIC
EPICmachines
machines
Micro-controllers
Micro-controllers
Reconfigurable
ReconfigurableHardware
Hardware
Memory
Memory
Reconfigurable Logic
Full
Fullcustom
custom chips
chipsmay
may be
betoo
tooexpensive,
expensive, software
softwaretoo
tooslow.
slow.
Combine
Combinethe
thespeed
speedof
of HW
HWwith
withthe
theflexibility
flexibility of
ofSW
SW
HW
HWwith
withprogrammable
programmablefunctions
functionsand
andinterconnect.
interconnect.
Use
Useof
of configurable
configurablehardware;
hardware;
common
commonform:
form:field
fieldprogrammable
programmablegate
gatearrays
arrays(FPGAs)
(FPGAs)
Applications:
Applications:bit-oriented
bit-orientedalgorithms
algorithmslike
like
encryption,
encryption,
fast
fastobject
objectrecognition
recognition(medical
(medicaland
andmilitary)
military)
Adapting
Adaptingmobile
mobilephones
phonesto
todifferent
differentstandards.
standards.
Very
Verypopular
populardevices
devicesfrom
from
XILINX
XILINX(XILINX
(XILINXVertex
VertexIIII are
arevery
veryrecent
recent devices)
devices)
Actel
Acteland
andothers
others
Example:
a
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
b
0
0
0
0
1
1
1
1
0
0
0
0
1
1
1
1
c
0
0
1
1
0
0
1
1
0
0
1
1
0
0
1
1
d
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
G
0
1
1
0
1
0
0
1
1
0
0
1
0
1
1
0
Virtex II
(Pro) Slice
Dedicated
Dedicatedor
orchain
chainfor
forcomputing
computingsum
sum of
of products
products
Number of resources
available in Virtex II Pro devices
Embedded Multipliers
AAVirtex-II
Virtex-IIPro
Promultiplier
multiplierblock
blockisis
an
an18-bit
18-bitby
by1818-signed
signed
multiplier.
multiplier.
Device
Columns
Multipliers
XC2VP2
12
XC2VP4
28
Multipliers
Multipliersare
areconnected
connectedto
toaa
switch
switchmatrix,
matrix,share
sharesome
somebits
bits
with
withRAM
RAM(MAC
(MACinstruction).
instruction).
XC2VP7
44
XC2VP20
88
XC2VP30
136
XC2VPX20
88
XC2VP40
10
192
XC2VP50
12
232
XC2VP70
14
328
XC2VPX70
14
308
XC2VP100
16
444
HierarchicalRouting
RoutingResources
Resources
Hierarchical
Interconnect
Summary
Processing
Processingunits
units
Power
Powerefficiency
efficiencyof
oftarget
targettechnologies
technologies
ASICs
ASICs
Processors
Processors
Energy
Energyefficiency
efficiency
Code
Codesize
sizeefficiency
efficiencyand
andcode
codecompaction
compaction
Run-time
Run-timeefficiency
efficiency
DSP
DSPprocessors
processors
Multimedia
Multimediaprocessors
processors
Very
Verylong
longinstruction
instructionword
word(VLIW)
(VLIW)machines
machines
Micro-controllers
Micro-controllers
Reconfigurable
ReconfigurableHardware
Hardware
Memory
Memory
Covered
today
Reiner Hartenstein
University of
Kaiserslautern