Anda di halaman 1dari 6

History-based VLSI Legalization using Network Flow

Minsik Cho, Haoxing Ren, Hua Xiang, Ruchir Puri


IBM T. J. Watson Research Center, Yorktown Heights, NY 10598
{minsikcho, haoxing,huaxiang,ruchir}@us.ibm.com
ABSTRACT
In VLSI placement, legalization is an essential step where
the overlaps between gates/macros must be removed. In
this paper, we introduce a history-based legalization algo-
rithm with min-cost network ow optimization. We nd
a legal solution with the minimum deviation from a given
placement to fully honor/preserve the initial placement, by
solving a gate-centric network ow formulation in an itera-
tive manner. In order to realize a ow into gate movements,
we develop ecient techniques which solve an approximated
Subset-sum problem. Over the iterations, we factor into our
formulation the history which captures a set of likely-to-fail
gate movements. Such a history-based scheme enables our
algorithm to intelligently legalize highly complex designs.
Experimental results on over 740 real cases show that our
approach is signicantly superior to the existing algorithms
in terms of failure rate (no failure) as well as quality of re-
sults (55% less max-deviation).
Categories and Subject Descriptors
B.7.2 [Hardware, Integrated Circuit]: Design Aids
General Terms
Algorithms, Design, Performance
Keywords
VLSI, Placement, Legalization, Network Flow
1. INTRODUCTION
With aggressive technology scaling and complex function-
alities in modern designs, the chip density is increasing rapidly.
Such high density aects the entire physical synthesis, but
placement is the rst step where such high density has to be
well taken care of, in order to reduce the complexity of the
following stages like buering, gate sizing, and routing.
The main goal of placement is to physically locate all the
objects (e.g., gates, macros, and so on) without any over-
lap, while satisfying critical design objectives including wire-
length, timing, congestion, power, and so on. In modern de-
sign ows, placement consists of two steps: global placement
where the objects are spread over the chip for wirelength or
timing and detailed placement where local renement is per-
formed for further improvement. Legalization is an impor-
tant step between global and detail placement where all the
overlaps among objects have to be removed with minimum
perturbation/impact to the global placement.
Legalization is also required as an essential step right af-
ter timing optimization including buer insertion or gate
sizing in order to repair the potential overlaps after such
optimization. High-quality legalization is critical to make
such timing optimization more eective. For example, poor
legalization may displace a buer from the optimal location
with long wirelength, which in turn requires another buer
to break the unexpected long interconnect.
Due to such signicance, there have been considerable
amount of eorts on VLSI legalization including density-
balancing techniques using network ow, diusion, or delau-
nay triangular [2, 4, 12, 14] as well as heuristic/greedy/local
cell movement [1, 5, 9, 10, 15, 16]. However, there are a few
limitations in these prior works: (a) complex concurrent
gate movements cannot be performed [1, 5, 9, 15, 16]; (b)
ow solutions can be too complex to be realized into gate
movements [2, 4]; (c) average/maximum gate movement is
not directly optimized [1, 5, 9, 10, 15, 16]; (d) due to lack of
the global picture, solution quality can be suboptimal [1, 5,
9, 10, 15, 16]; (e) nal results still have overlaps [12, 14]
1
.
In this paper, we propose a history-based legalization al-
gorithm using a novel min-cost network ow formulation,
which nds a legal placement with the minimal deviation
from an input placement in an iterative way. Once a ow
solution is found, each ow is eciently realized into corre-
sponding gate movements by solving an approximated Subset-
sum problem. During iterations, if legalization fails due to
unrealizable ows, our history engine learns such attempts
and helps the future iterations smartly avoid the potentially
unrealizable ows. The major contributions of this paper
include the following.
We propose a novel gate-centric min-cost ow formula-
tion which provides three advantages over the previous
works: (a) it optimizes the deviation for each gate bet-
ter, (b) it reduces the complexity of ow realization,
(c) our history scheme can be integrated smoothly.
We incorporate a history-based technique into our new
network ow formulation to legalize highly complex
and dense cases over multiple iterations.
We propose ecient techniques to realize a ow into
gate movements based on a Subset-sum problem. When
1
[12, 14] are not legalizers strictly speaking, as both rst
roughly migrate gates to meet the maximum bin density
constraint, and then rely on a legalizer for nal overlap re-
moval. Hence, ours can be complementary to them.

Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. To copy
otherwise, to republish, to post on servers or to redistribute to lists,
requires prior specific permission and/or a fee.
DAC'10, June 13-18, 2010, Anaheim, California, USA
Copyright 2010 ACM 978-1-4503-0002-5 /10/06...$10.00
286
18.2
r
Wr
(Xr,Yr)
x
y
q
p
(a) 5 unblocked regions.
(x
i
, y
i
)
. .
(x
i
, y
i
)
w
i
(x
j
, y
j
)
. .
(x
j
, y
j
)
w
j
(b) Gate deviations.
Figure 1: Illustration of the notations in Table 1.
the total number of gates involved for the ow realiza-
tion is limited, our technique reduces the problem size
eectively.
The rest of the paper is organized as follows. Section 2
provides preliminaries, and Section 3 describes our problem
formulation. Section 4 presents our proposed algorithm. Ex-
perimental results are in Section 5, followed by conclusion
in Section 6.
Table 1: The notations in this paper.
I the set of all the gates (indexed by i)
( x
i
, y
i
) initial low-left corner of a gate i
(x
i
, y
i
) current low-left corner of a gate i
w
i
width of gates i
R the set of Regions (indexed by r)
(Xr, Yr) low-left corner of a region r
Wr width of a region r
Gr a set of gates assigned to a region r
Or total width needed for Gr (

iGr
w
i
)
D
i
(r) deviation of gate i in case i Gr
2. PRELIMINARIES
The notations in this paper are listed in Table 1 and il-
lustrated in Fig. 1. We assume that a rectangular chip is
partitioned into equally-sized circuit rows and each row is
further divided into block-free regions. Fig. 1 (a) shows that
the 3rd row is divided into regions p and q due to the grey
blockage, leading to total 5 regions in the chip. Every mov-
able gate has the same height as a circuit row, but may have
dierent width as in Fig. 1 (b). During legalization, we may
move a gate from its initial location to a new overlap-free lo-
cation as in Fig. 1 (b), and the Manhattan distance between
these two positions is dened as a deviation. We regard the
region r overowed by Or Wr when Or > Wr.
3. PROBLEM FORMULATION
Assuming the input placement is already fully optimized
for design closure (e.g., timing, wirelength, congestion, and
so on) except being illegal, a general legalization problem
can be formulated as follows:
min :

iGr
D
i
(r)
|I|
+ (1 )M (0 < 1)(1)
s.t : xi = Xr i Gr, r (2)
yi such that no overlap i Gr, r
M Di(r) = |xi xi| +|yi yi| i Gr, r (3)
Or =

iGr
wi Wr r (4)
The objective is to minimize weighted summation of average
and maximum deviation (M). Regarding the constraints, in
a standard-cell design, a gate should be completely aligned
Min-cost network
flow optimization
Min-cost network
flow optimization
Post-optimization
History learning for
unrealized flows
History learning for
unrealized flows
Y
N
All flows realized? All flows realized?
- Flow realization
- Region placement
- Flow realization
- Region placement
Figure 2: Overall ow of the proposed legalizer.
with a circuit row as in Eq. (2) under the non-overlapping
constraints. This problem can be viewed as an assignment
problem with capacity constraints as in Eq. (4). If wi is
identical for every gate and the assignment problem is for-
mulated as network ow optimization, it can be solved in
polynomial time. However, such a design with a single-sized
gate is hardly found in real design, thus the problem com-
plexity is NP-hard in practice. Nevertheless, we can rely on
the network ow optimization as a legalization framework
based on the following observations [2]:
If each gate is sliced into a unit, the network ow op-
timization can be still leveraged to perform global le-
galization planning.
After solving the network ow formulation, if a gate
needs to move partially (e.g., 50% of a gate), it requires
a special step called ow realization.
4. ALGORITHM
In this section, we propose our iterative legalization al-
gorithm illustrated in Fig. 2. First, we solve a network
ow problem with every gate sliced into a unit, then re-
alize a ow if a gate needs to move partially. If such ow
realization cannot be accomplished, it means legalization
fails. Hence, we proceed to a history learning step which
will make the ow formulation for the next iteration more
ow-realization friendly. Otherwise, we complete legaliza-
tion with post-optimization.
Our algorithm is superior to the previous approaches as
illustrated in Fig. 3 where a region p is overowed by 2 with
the gate A. Fig. 3 (a) shows the case which is legalizable
only if the gates A and E are moved simultaneously. Hence,
a greedy/local algorithm may fail but a network ow-based
approach (with a decent ow realization algorithm!) can
solve easily. Meanwhile, a greedy legalizer can solve (b)
trivially as A can perfectly t to the empty space in the
region r, but a network ow-based one may have diculty.
E
C
D
A
p q r
A
C
D
p q r
E
(a) A network ow-based le-
galization may succeed, but a
greedy legalization will fail.
C
D
A
p q r
A
C
D
p q r
(b) A greedy legalization may
succeed, but a network ow-
based legalization will fail.
Figure 3: Motivative examples for our legalization
where wA = 3, wE = 1 and Op Wp = 2.
287
18.2
Region r
O
r
W
r
Region p
Op W
p
Gate i
(w
i
,0)
Gate j
Region q
Oq W
q
(w
j
,0)
(w
i
, cost(i,r,p))
(w
i
, cost(i,r,q))
(w
j
, cost(j,r,p))
(w
j
, cost(j,r,q))
Figure 4: Network ow graph to minimize the devi-
ation of each gate individually.
The optimal movement in terms of network ow (not le-
galization) is to move the overowed part of A from p to
q. However, as such movement is not achievable (unrealiz-
able), if the same movement is insisted over the iterations,
the legalization process will be stuck and eventually fail.
For both cases, our algorithm can nd legal solutions. Be-
cause our framework is based on network ow, the case (a)
is easily legalized. Also, thanks to our novel history learn-
ing, our framework will learn that the optimal movement (in
terms of network ow) for (b) is not realizable, thus start
discouraging the same (unrealizable) ow in the future iter-
ations, eventually moving A to r.
In the rest of this section, we propose a novel gate-centric
formulation in Section 4.1 where the deviation of each gate
is individually minimized. In order to realize a ow solu-
tion into gate movements, we leverage an eciently approx-
imated Subset-sum problem in Section 4.2, followed by re-
gion placement in Section 4.3. Also, we have a novel step,
history learning to handle complex cases based on previous
legalization failures in Section 4.4. Finally, we have a post-
optimization step to improve the solution as in Section 4.5.
4.1 Network Flow Formulation
In our formulation, we model the deviation of each gate
individually as shown in Fig. 4. First, each region becomes
an either source or sink node. For an overowed region, it
becomes a source node, otherwise, a sink node. For each
gate in Gr, a corresponding node is created and connected
to regions as in Fig. 4. In our formulation, each gate can
jump to any other regions at the cost computed by Eq. (6)
(ignore h[wi][r][p] till Section 4.4) where Di(r) is the current
deviation at the current region r and Di(p) is the expected
deviation when the gate i moves to another region p.
Di(p) |Xp xi| +|Yp +
Wp
2
yi| (5)
cost(i, r, p) = h[wi][r][p]
(Di(p)
1

Di(r)
1

)
wi
(6)
The main idea in Eq. (6) is to compute the change in terms
of deviation by moving a gate i Gr to another region p.
For the current deviation, it can be computed by Eq. (3)
accurately. However, the deviation after moving to p can be
only approximated as yi cannot be computed yet as we do
not know what other gates will move to p later (network ow
optimization moves multiple gates simultaneously). There-
fore, we estimate a new deviation based on the center of p
as in Eq. (5). Note that yi will be nalized in Region Place-
ment in Section 4.3. As it is hard to minimize the maximum
deviation (M in Eq. (3)) in a ow formulation, we have
(from Eq. (1)) in Eq. (6) to penalize bigger deviation more
(e.g., when = 0.5, quadratically penalized).
The example in Fig. 5 (a) illustrates how our cost is com-
1 1
4 4
9 9
1 2 3 4 5 6 7
(a) In the 1st iteration.
-4
-3 5
-3
0
5
1 2 3 4 5 6 7
(b) In the 2nd iteration.
H
H
H
HH
r
p
1 2 3 4 5 6 7
1 - -5 -8 -9 -8 -5 0
2 5 - -3 -4 -3 0 5
3 8 3 - -1 0 3 8
4 9 4 1 - 1 4 9
5 8 3 0 -1 - 3 8
6 5 0 -3 -4 -3 - 5
7 0 -5 -8 -9 -8 -5 -
(c) Edge costs from Eq. (6).
Figure 5: Example of cost computation ( = 0.5).
puted where an inverter was initially at the 4th circuit row
(we ignore y deviation for easier explanation). The arrows
represent potential movements of the inverter during legal-
ization, and the corresponding costs in our network ow
graph are shown in the circles. In the 1st iteration, since
the current deviation is zero, the quadratic distance becomes
the cost. Assuming the inverter migrates to the 2nd circuit
row after the 1st iteration, the costs in the following itera-
tion can be computed by Eq. (6) as in Fig. 5 (b). In detail,
as moving this inverter to the 3rd, 4th, and 5th circuit rows
will reduce the deviation, some costs become negative. Also,
it shows that moving the inverter to the 4th circuit row will
reduce the cost mostly (which was the initial location of the
inverter), but moving to the 6th circuit makes no change.
Fig. 5 (c) summarizes all possible costs based on Eq. (6).
When our formulation is solved by a min-cost network
ow algorithm, a ow will occur between gates and regions
such that the total cost based on Eq. (6) is minimized. Our
formulation has the following advantages over the previous
ones [2, 4] which have unbounded ows between only adja-
cent zones at non-negative costs:
Over the iterations, as a gate i deviates away from
( xi, yi), the cost on the edge toward ( xi, yi) becomes
smaller (more negative), eectively controlling the de-
viation of i over the multiple iterations.
The maximum ow is bounded by the maximum gate
size in the library, which will make ow realization
more ecient as to be discussed in Section 4.2.
It is easier to incorporate the history scheme in Sec-
tion 4.4 into our formulation due to gate-level control.
4.2 Flow Realization
Since a network ow-based legalizer cannot model the dis-
crete sizes of gates in the library, a ow may need a realiza-
tion step [4]. For example, although there is a ow f = 2
from p to q in Fig. 3 (a), the gate A cannot move to q as
wA = 3. Thus, we need to nd an alterative way of realizing
f = 2, which is moving E to p at the same time so that
f = wA wE = 2. Finding such an alterative ow realiza-
tion can be formulated into a Binary Knapsack problem or
equivalently a Subset-sum problem [6].
288
18.2
Algorithm 1 Flow
Require: a set of regions R
1: for each region r R do
2: Add a node Nr with Or Wr
3: end for
4: for each region r R do
5: for each gate i Gr do
6: Add a node Ni
7: Insert an edge Nr Ni with (wi, 0)
8: for each region p R and p = r do
9: Insert an edge Ni Np with (wi, cost(i, r, p))
10: end for
11: end for
12: end for
13: Solution F=Solve min-cost network ow
The runtime of ow realization depends on the ow size.
Therefore, a preprocessing step is suggested in [4] (where a
ow is unbounded) to reduce the ow size by moving gates
greedily. However, in our case, every ow is limited by the
corresponding gate width, thus limited by the biggest width
in a library. With such a bounded ow, the ow realization
problem can be solved highly eciently [13].
Further speedup is possible if we limit the number of gates
involved in ow realization (which is dened as in Algo-
rithm 3). Fig. 6 illustrates our ow realization where f = 5
is required between regions r and p with = 3. |Gr| = 7
where widths are {6, 3, 3, 3, 3, 1, 1}. We rst bucket the gates
in Gr and Gp respectively by the width. Since we will ac-
complish f = 5 (which is positive) with at most 3 gates, we
never move more than 3 gates from any bucket. Therefore,
we can exclude one gate whose width is 3 from Gr (which is
crossed out in Fig. 6). For the same reason and the ow is
always positive, we can drop two gates whose width is 2 and
one gate whose width is 1 (e.g., if all 3 gates are picked from
p, the ow will be negative). After that, we build the nal
set T in Fig. 6 where widths from p become negative. Then,
any subset whose cardinality is fewer than 3 () and sum-
mation is 5 (f) can realize a ow as in Fig. 6. Among these
subsets, we pick one which will incur the least cost based on
Eq. (6). Generally, if the number of dierent sizes in library
is L, the size of the nal set |T| (2 1)L, which will
allow very ecient implementation. Our ow realization is
detailed in Algorithm 3.
4.3 Region Placement
When a gate i is assigned to a region r, xi is xed as in
Eq. (2), but yi needs to be computed. As nding yi, i
Gr for minimum deviation is NP-hard, we consider a more
restrictive problem by prexing the order i Gr. We
6
3
3
3
3
1
1
4
2
2
2
1
1
1
{3,3,3,3}
{6}
{1,1}
{4}
{2,2,2,2}
{1,1,1}
T = {6,3,3,3,1,1,-4,-2,-2,-1,-1}
{6,-1}{3,1,1} {6,1,-2}{6,3,-4}
r p
{3,3,-1}
+
_
Subset:
Flow: 5
Figure 6: Example of ow realization using Subset-
sum with = 3.
Algorithm 2 Realize, Region Placement, History Learning
Require: a ow solution F, a set of regions R
1: for each ow f F for a gate i from r to p do
2: if f = wi then
3: Move i to p
4: else if Realize(f, r, p) fails then
5: h[wi][r][p] + + //history learning
6: end if
7: end for
8: for each region r R do
9: Region Placement for r
10: Update (xi, yi), i Gr
11: end for
order the gates according to the center location (xi, yi +
w
i
2
) rather than (xi, yi), as it provides the less deviation
according to our experience. In case a region placement
is not feasible for an overowed region r (Or > Wr), we
temporarily scale down wi, i Gr by
Wr
Or
to just make
region placement feasible. Since our region placement is a
well-studied problem as single-row or 1D linear placement,
we will omit the details and please refer to [3, 4, 7, 11].
4.4 History Learning
If there is any overowed region after Section 4.3, it is due
to ow realization failure. Therefore, we may perform his-
tory learning to avoid the similar ow realization attempts
in the next iteration. Since the formulation in Fig. 4 does
not consider the potential ow realization issue, it may in-
sist some unrealizable ows over the iterations, as it seeks
the optimal solution in terms of network ow (not legaliza-
tion). We observe that this issue makes legalization converge
poorly by shuttling between a few optimal but unrealizable
ow solutions. In order to enhance convergence, we need
a new more ow-realization friendly formulation in the fol-
lowing iterations. We accomplish this by the history factor,
h[wi][r][p] in Eq. (6) which is indexed by three terms: the
wi of the failing gate i Gr attempting to move to a region
r. We initialize the history factor as 1 in the beginning.
If the same ow realization keeps failing, the corresponding
history factor increases accordingly as in the line 5 of Al-
gorithm 2, and eventually will make such ows expensive
enough to be excluded from the solution.
Fig. 7 shows how our history learning helps Fig. 3 (b)
which is hard for a network ow-based approach. In the
rst iteration where h[3][p][q] = h[3][p][r] = 1, the optimal
network ow solution is a f = 2 from p A q, which
is again unrealizable. Then, we do h[3][p][q] + + and solve
repeatedly for the next three iterations where the graph tem-
porarily stays static (no movement). In the fth iteration
where h[3][p][q] = 5, the optimal solution is f = 2 from
3
1
Region p
2
Region q
-2
A
(3,0)
Region r
-3
(3, h[3][p][q] )
3
4
(3, h[3][p][r] )
Figure 7: History learning for Fig. 3 (b) (wA = 3).
289
18.2
Algorithm 3 Realize
Require: a ow f from regions r to p
1: T =
2: for each gate i Gr do
3: if the number of wi in T < then
4: Add wi to T
5: end if
6: end for
7: for each gate i Gp do
8: if the number of wi in T < 1 then
9: Add wi to T
10: end if
11: end for
12: S = Find the cheapest Subset-sum solution from T
13: if S = then
14: for each number w S do
15: if w > 0 then
16: Move the cheapest i Gr whose wi = w to p
17: else
18: Move the cheapest i Gp whose wi = w to r
19: end if
20: end for
21: return Success
22: end if
23: return Failure
Algorithm 4 PostOptimization
1: for a gate with the maximum deviation i I do
2: E = a set of regions within a box (xi, yi) ( xi, yi)
3: r = a region s.t. i Gr
4: for each region p E, p = r do
5: if Realize(0, r, p) succeeds then
6: break
7: end if
8: end for
9: if No improvement then
10: break
11: end if
12: end for
p A r which can be realized as discussed in Fig. 3 (b).
We nd our history learning is a very eective way of
guiding a network ow-based legalizer to the more realiz-
able solution space. Regarding history update, there can be
various policies, but we observe even a simple policy (e.g.,
increasing by 1 for each failure) is sucient.
4.5 Post-Optimization
Since we minimize the total deviation as well (the rst
term in Eq. (1)) while nding a legal solution, it is possible
that some gates have large maximum deviation. In order
to minimize the maximum deviation, we greedily move the
gates toward their initial locations. In such post-optimization,
we utilize the ow realization again with a zero ow as shown
in Algorithm 4. Since the gate with the maximum deviation
has the most negative cost, once such zero ow is realized,
the gate will migrates toward its initial location and reduce
the maximum deviation, while keeping a solution legal (no
change in Or, r R).
4.6 Speedup Techniques
The complexity of min-cost network ow optimization is
O(|V |
2

|E|) where |V | = |I| + |R| and |E| = |I||R|. For


speedup, we apply the following techniques.
For a small set of logically tightly coupled gates whose
total width is less than that of the largest gate in the
library, we can bundle them (e.g., as a single node
in the ow graph). This will eectively reduce the
number of gates (|I|).
In most cases, a gate migrates to the nearby regions.
Thus, we can insert edges from a gate to the regions
within a user-dened proximity parameter. This will
reduce the number of edges in the ow graph.
By taking a hierarchical approach, we can eectively
reduce |R|. After partitioning, we can apply pre-legalizer
such as [12,14] among partitions, and then ours can le-
galize each partition.
5. EXPERIMENTAL RESULTS
We implemented our legalizer in C++ based on the e-
cient min-cost network ow algorithm/enhancements [6, 8].
We collected 741 industrial legalization problem instances
(35K gates) in 45nm with mixed-sized blockages and xed
gates for a reliable statistics across the entire utilization
spectrum. All these benchmarks are from one industrial
global placer. All the experiments were performed on a
2.4GHz Linux machine with 4G RAM. For comprehensive
results, we compared our legalizer with other legalizers in
NTUPlace3-LE [5] (NTUP) which extends the patent in [9],
FastPlace3 [16] (FastP), and Dragon2006 [15] (DragP) whose
binaries are publicly available online. Also, we implemented
the network ow-based legalizer in [2, 4] (BonnL) using our
framework for complete study. In our experiments, we en-
able only the legalization parts (to disable wirelength opti-
mization) in NTUP, FastP, and DragP by providing proper
inputs. For our legalizer, we set = 0.5, = 3.
Table 2 summarizes the overall result from ours, NTUP,
FastP, DragP, and BonnL by reporting the number of failed
benchmarks. The rst two columns show the distribution
of benchmarks over the complete utilization spectrum. It
clearly shows that only our legalizer completes all the bench-
marks thanks to the history scheme. NTUP also completes
all the cases except 4 complex and dense cases in 0.9x utiliza-
tion. BonnL performs well under 0.7x utilization, but fails
many dense cases. We observe that BonnL nicely spreads
out gates with minimal movement and nds a near legal
solution, but has diculty in realizing a couple of ows at
the last moment for highly utilized benchmarks. FastP per-
forms relatively well under 0.8x utilization, but beyond that,
Table 2: Failure rate comparison.
circuit Ours NTUP [5] FastP [16] DragP [15] BonnL [2]
util num fail % fail % fail % fail % fail %
0.1x 19 0 0 0 0 3 16 4 21 0 0
0.2x 23 0 0 0 0 0 0 3 13 0 0
0.3x 88 0 0 0 0 2 2 7 8 0 0
0.4x 147 0 0 0 0 10 7 30 20 2 1
0.5x 107 0 0 0 0 1 1 22 21 1 1
0.6x 89 0 0 0 0 1 1 14 16 4 4
0.7x 95 0 0 0 0 4 4 32 34 15 16
0.8x 121 0 0 0 0 23 19 17 14 41 34
0.9x 34 0 0 4 12 18 53 9 26 14 41
290
18.2
Table 3: Quality of legalization comparison.
ckt Ours Ours w/o PostOpt NTUP [5] FastP [16]
util avg
a
max
b
mov(%)
c
cpu(s) avg max mov(%) cpu(s) avg max mov(%) cpu(s) avg max mov(%) cpu(s)
0.1x 9.6 47.6 62.8 1.29 10.4 55.2 62.9 0.11 11.1 55.1 98.8 0.66 21.4 89.3 74.1 0.13
0.2x 11.6 49.0 52.8 2.89 11.7 51.8 52.7 1.14 14.6 48.5 97.0 0.82 21.8 87.1 68.6 0.11
0.3x 3.3 35.1 21.5 6.97 3.4 40.1 21.5 1.49 5.7 35.4 88.4 2.88 18.5 162.0 60.2 0.51
0.4x 2.9 33.2 22.1 8.31 3.1 39.9 22.3 1.60 5.8 51.0 89.4 3.05 20.0 181.1 65.8 0.54
0.5x 3.2 28.9 19.6 3.63 3.5 32.2 19.8 1.38 6.2 37.0 81.9 2.11 26.9 185.1 67.4 0.45
0.6x 3.8 35.2 16.4 3.63 3.8 38.2 16.6 1.63 6.6 62.8 74.4 1.73 19.2 144.2 56.3 0.36
0.7x 3.5 38.7 16.9 17.71 3.7 43.3 17.5 2.43 4.6 90.6 81.1 2.69 16.4 173.2 61.6 0.61
0.8x 1.4 33.2 11.5 6.96 1.6 41.1 12.3 3.74 3.7 150.5 70.9 3.18 19.4 239.0 61.5 0.91
0.9x 11.6 59.9 26.3 14.05 11.1 68.4 23.8 6.69 18.0 274.1 73.5 3.88 38.8 372.3 81.8 1.28
avg 5.67 40.10 27.8 7.27 5.82 45.58 27.7 2.24 8.48 89.43 84.0 2.33 22.49 181.48 66.5 0.54
ratio 1 1 1 3.12 1.03 1.14 1.0 0.96 1.50 2.23 3.0 1 3.97 4.53 2.4 0.23
a
average Manhattan distance in tracks from the initial to the nal legal location for all movable gates
b
maximum Manhattan distance in tracks from the initial to the nal legal location for all movable gates
c
the percentage (%) of moved gates during legalization out of the total movable gates
yields even worse than DragP which shows a rather uniform
failure rate regardless of utilization.
Table 3 analyzes the quality of legalization from ours and
NTUP, and FastP. We further show the results from ours
without the post-optimization (See Section 4.5). We drop
DragP and BonnL due to their large number of failures in
dense cases. Also, note that we cannot reect the failed
cases from NTUP and FastP in Table 3. Across the entire
utilization spectrum, ours (even without post-optimization)
outperforms NTUP by wide margin. Average deviations
from the initial location are 33 and 31% less than NTUP,
respectively. The max deviations are cut by half: 55 and
49%, less than NTUP. Also, ours touches substantially fewer
gates for legalization like 67% fewer than NTUP, even com-
pleting all the cases. Overall results demonstrate that ours
nishes legalization with signicantly less disturbance to the
initial placement. FastP has the largest average/maximum
deviation but moves fewer gates than NTUP.
Regarding runtime, FastP is the fastest but again Table 3
does not reect a dozens of failed cases by FastP. Ours
without post-optimization is even 4% faster than NTUP,
but when the post-optimization is enabled, the quality gets
improved at a cost of 3x slow-down. Ours without post-
optimization is on average 22% faster than NTUP till 0.8x
utilization, but gets slower from 0.9x utilization due to history-
learning, an essential part to complete these dense cases
(NTUP shows 12% failure rate for 0.9x utilization as in Ta-
ble 2). Overall, our approach (with post-optimization) is
slower than the competing algorithms, but the less failure
rate and the better quality of legalization will make even-
tually a design converged faster (e.g., design closure) by re-
ducing the number of iterations.
6. CONCLUSION
In order to cope with high demand for ecient VLSI legal-
ization, we propose a history-based legalization using min-
cost network ow optimization. Our novel ow formulation
minimizes the deviation of each gate eectively, and the pro-
posed history-learning scheme guides network ow optimiza-
tion to the more realizable solution space. We rst show that
the history scheme is helpful in nding a legal solution from
a highly dense input placement, and then demonstrate that
our legalization technique delivers higher quality legalization
than other existing approaches by fully honoring design in-
tentions, which all will expedite critical design closure.
7. REFERENCES
[1] A. Agnihotri, M. C. Yildiz, A. Khatkhate, A. Mathur, S. Ono,
and P. H. Madden. Fractional cut: Improved recursive bisection
placement. In Proc. Int. Conf. on Computer Aided Design,
2003.
[2] U. Brenner, A. Pauli, and J. Vygen. Almost optimum
placement legalization by minimum cost ow and dynamic
programming. In Proc. Int. Symp. on Physical Design, 2004.
[3] U. Brenner and J. Vygen. Faster optimal single-row placement
with xed ordering. In Proc. Design, Automation and Test in
Eurpoe, 2000.
[4] U. Brenner and J. Vygen. Legalizing a placement with
minimum total movement. IEEE Trans. on Computer-Aided
Design of Integrated Circuits and Systems, 23(12):15971613,
Dec 2004.
[5] T.-C. Chen, Z.-W. Jiang, T.-C. Hsu, H.-C. Chen, and Y.-W.
Chang. NTUplace3: An Analytical Placer for Large-Scale
Mixed-Size Designs With Preplaced Blocks and Density
Constraints. IEEE Trans. on Computer-Aided Design of
Integrated Circuits and Systems, 27(7):12281240, Jul 2008.
[6] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein.
Introduction to Algorithms, Second Edition. MIT Press and
McGraw-Hill, 2001.
[7] M. R. Garey, R. E. Tarjan, and G. T. Wilfong. One-processor
scheduling with symmetric earliness and tardiness penalties.
Math. Oper. Res., 13(2), 1988.
[8] A. V. Goldberg. An Ecient Implementation of a Scaling
Minimum-Cost Flow Algorithmn. Journal of Algorithms, 22:1
29, 1997.
[9] D. Hill. Method and system for high speed detailed placement
of cells within an integrated circuit design. In U.S. Patent
6370673, Apr 2002.
[10] S. W. Hur and J. Lillis. Mongrel: hybrid techniques for
standard cell placement. In Proc. Int. Conf. on Computer
Aided Design, 2000.
[11] A. Kahng, P. Tucker, and A. Zelikovsky. Optimization of linear
placements for wirelength minimization with free sites. In Proc.
Asia and South Pacic Design Automation Conf., 1999.
[12] T. Luo, H. Ren, C. J. Alpert, and D. Z. Pan. Computational
geometry based placement migration. In Proc. Int. Conf. on
Computer Aided Design, 2005.
[13] D. Pisinger. Linear time algorithms for knapsack problems with
bounded weights. Journal of Algorithms, 33:114, 1999.
[14] H. Ren, D. Z. Pan, C. J. Alpert, and P. Villarrubia.
Diusion-based placement migration. In Proc. Design
Automation Conf., 2005.
[15] T. Taghavi, X. Yang, B.-K. Choi, M. Wang, and
M. Sarrafzadeh. Dragon2006: blockage-aware
congestion-controlling mixed-size placer. In Proc. Int. Symp.
on Physical Design.
[16] N. Viswanathan, M. Pan, and C. Chu. Fastplace 3.0: A fast
multilevel quadratic placement algorithm with placement
congestion control. In Proc. Asia and South Pacic Design
Automation Conf., 2007.
291
18.2

Anda mungkin juga menyukai