Outline
Synthesis from standard HDL (Verilog) [L. Lavagno et al Async00]
Subset for asynchronous specification
Data-path/control partitioning
Circuit architecture. Control generation
Synthesis from asynchronous HDL (CSP, Tangram)
CSP for control generation [A. Martin et al, Caltech]
Tangram for silicon compilation [K. van Berkel et al, Philips]
Control synthesis using FSMs [K. Yun, S. Nowick]
Burst-mode machines
Comparison with STGs
Motivation
Language-based design key enabler
to synchronous logic success
Use HDL as single language for
specification
logic simulation and debugging
synthesis
post-layout simulation
HDL must support multiple levels of
abstraction
Control-data partitioning
Splitting of asynchronous control and
synchronous data path
Automated insertion of bundling delays
request
CONTROL
UNIT
DATA
PATH
delay
acknowledge
Design flow
HDL
specification
Control/data
splitting
STG
(control)
Synthesis
(petrify)
Logic
implementation
Synthesizable
HDL (data)
Synthesis
(Synopsys)
Logic delays
Timing analysis
(Synopsys)
Delay
insertion
HDL
implementation
SMP
R
E
S
start
C.U.
RES
done
end
begin-end for sequencing, fork-join for concurrency, if-else
for input choice
Petri Net
Reductions
Trace Expressions
Synthesis
Circuit
||
( a || ( b ; c) ) || (d e)
||
Reduction Example
d;a; ( b || f )
f
c
d
c
h
g; h;e
Concurrency in TE:
b and f have a
common
parallel father
;
a
;
||
c
b
f c
;
;||
c
b
f c
Synthesis
Place-based encoding ( based on a David-cell
approach)
Transformations to improve area and
performance
Structural methods to derive a circuit
[Pastor et al.] Transactions on CAD, Nov98
Place-based encoding
p2
p1
p2+
p1+
1100
p3+
t1
ER(t1) = 111-
t1
p3
p1-
p2-
0010
p4+
t2
t2
p4
p30001
p4-
ER(t2) = --11
ldtack+
p2+
p1-
LDS+
p8-
p11-
p3+
lds+
D+
LDTACK+
DSr+
p1+
LDTACK-
p2p7p4+
D+
DTACK-
p10-
dsr+
ldtack-
p8+
dtack+
LDS-
p3-
Place encoding
DTACK+
p11+
p5+
D-
p9-
p6-
lds-
dtack-
p10+
p7+
dsrp4-
DSrp6+
p9+
D-
p5-
ldtack+
p1-
p8-
d+
lds+
p11-
p3+
lds+
D+
dtack+
p1+
dsr+
p2p7-
dsr-
p9+
ldtack-
p9-
p4+
p10-
dsr+
ldtack-
p8+
dtack+
lds-
dtack-
Reductions
Transforms
d-
p3p11+
p5+
p9-
p6-
lds-
dtack-
p10+
p7+
dsrp4p6+
p9+
D-
p5-
p1
000
y-
p5
-0-
p2
1-0
z+
y+
p3
1-1
p7
010
p6
-1-
xp4
0-1
z-
Next-state function
of signal y ?
p1
000
y-
p5
10-01
p2
1-0
z+
y+
p3
1-1
p7
010
p6
11-11
xp4
0-1
z-
Next-state function
of signal y ?
y=x+z
Conclusion
Initial prototype of automated flow without state explosion for
ASIC design
From HDLs (control / data splitting)
Existing tools for data-path synthesis
Direct synthesis guarantees implementation
(HDL Petri net, Petri-net-based encoding)
Synthesis of large controllers by efficient spec models (Free-choice
Petri nets + trace expressions)
Exploration of the design space (optimization) by propertypreserving transformations
Logic synthesis by structural methods
li
Q element
lo
ri
STG:
li+
CSP:
ro+
ri+
; = sequencing operator
ro+ = ro goes high; ro- = ro goes low
[li] = wait until li is high; [not li] = wait until li is low
Production rules:
li -> ro+; ri -> ronot ri -> lo+; not li -> lo-
weak
ri
ro
li
Conflict elimination
CSP: *[[li];ro+;[ri];x+;[x];ro-;[not ri];lo+;[not li];x-;[not x];lo-]
Production rules:
not x and li -> ro+; x or not li -> rox and not ri -> lo+; not x or ri -> lori -> x+; not li -> xro
li
x
lo
FF
not x
ri
Conclusions
Generating circuits from CSP control
program is similar to STG synthesis
One can be reduced to the other
Particular technique may vary. Direct CSP
program transformations can be (and were)
used instead of methods based on state
space generation
See reference list for more details
Buffer
passive port
active port
;
Q element
x
Data path
Summary
Tangram program is partitioned into data path and control
Data path is implemented as dual or single rail
Control is mapped to composition of standard elements (; ||
etc)
Each standard element is mapped to a circuit
Post-optimization is done
Composing islands of control elements and re-synthesis with
STG can give more aggressive optimization
Philips made a few chips using Tangram, including a product:
8051 micro-controller in low-power pager Muna (25 wks battery
life from one AAA battery)
Similar approach used in Balsa
(Manchester Univ., public domain)
s1
a+b+/y+
s2
b-/xa-/x+y-
c+/ys3
s4
s1
a+b*/y+
<b+>a-/x+y-
s2
c-/y+
b-/x-
<b+>c+/y-
s3
s4
Synthesis of XBM
Next state and output functions free of functional and logic
hazards
Sequential feedbacks should not introduce new hazards
State assignment
one state of the BM spec to one layer of Karnaugh map
compatible layers are merged
layers are compatible if merging does not introduce CSC violations or
hazards
Layers are encoded using race free encoding
b+
s1
a+b*/y+
y+
<b+>a-/x+y-
s2
c-/y+
aeps
x+ yb-
c+
<b+>c+/y-
s3
ycy+
b-/x-
s4
Summary
Specification: XBM is subclass of STGs
Synthesis: techniques are extensions of synchronous state
assignment and logic minimization
Timing:
environment is limited to fundamental mode (difficult for
pipelined and highly concurrent systems)
internals are delay insensitive
See reference list for details