Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. To copy
otherwise, to republish, to post on servers or to redistribute to lists,
requires prior specific permission and/or a fee.
DAC'10, June 13-18, 2010, Anaheim, California, USA
Copyright 2010 ACM 978-1-4503-0002-5 /10/06...$10.00
286
18.2
r
Wr
(Xr,Yr)
x
y
q
p
(a) 5 unblocked regions.
(x
i
, y
i
)
. .
(x
i
, y
i
)
w
i
(x
j
, y
j
)
. .
(x
j
, y
j
)
w
j
(b) Gate deviations.
Figure 1: Illustration of the notations in Table 1.
the total number of gates involved for the ow realiza-
tion is limited, our technique reduces the problem size
eectively.
The rest of the paper is organized as follows. Section 2
provides preliminaries, and Section 3 describes our problem
formulation. Section 4 presents our proposed algorithm. Ex-
perimental results are in Section 5, followed by conclusion
in Section 6.
Table 1: The notations in this paper.
I the set of all the gates (indexed by i)
( x
i
, y
i
) initial low-left corner of a gate i
(x
i
, y
i
) current low-left corner of a gate i
w
i
width of gates i
R the set of Regions (indexed by r)
(Xr, Yr) low-left corner of a region r
Wr width of a region r
Gr a set of gates assigned to a region r
Or total width needed for Gr (
iGr
w
i
)
D
i
(r) deviation of gate i in case i Gr
2. PRELIMINARIES
The notations in this paper are listed in Table 1 and il-
lustrated in Fig. 1. We assume that a rectangular chip is
partitioned into equally-sized circuit rows and each row is
further divided into block-free regions. Fig. 1 (a) shows that
the 3rd row is divided into regions p and q due to the grey
blockage, leading to total 5 regions in the chip. Every mov-
able gate has the same height as a circuit row, but may have
dierent width as in Fig. 1 (b). During legalization, we may
move a gate from its initial location to a new overlap-free lo-
cation as in Fig. 1 (b), and the Manhattan distance between
these two positions is dened as a deviation. We regard the
region r overowed by Or Wr when Or > Wr.
3. PROBLEM FORMULATION
Assuming the input placement is already fully optimized
for design closure (e.g., timing, wirelength, congestion, and
so on) except being illegal, a general legalization problem
can be formulated as follows:
min :
iGr
D
i
(r)
|I|
+ (1 )M (0 < 1)(1)
s.t : xi = Xr i Gr, r (2)
yi such that no overlap i Gr, r
M Di(r) = |xi xi| +|yi yi| i Gr, r (3)
Or =
iGr
wi Wr r (4)
The objective is to minimize weighted summation of average
and maximum deviation (M). Regarding the constraints, in
a standard-cell design, a gate should be completely aligned
Min-cost network
flow optimization
Min-cost network
flow optimization
Post-optimization
History learning for
unrealized flows
History learning for
unrealized flows
Y
N
All flows realized? All flows realized?
- Flow realization
- Region placement
- Flow realization
- Region placement
Figure 2: Overall ow of the proposed legalizer.
with a circuit row as in Eq. (2) under the non-overlapping
constraints. This problem can be viewed as an assignment
problem with capacity constraints as in Eq. (4). If wi is
identical for every gate and the assignment problem is for-
mulated as network ow optimization, it can be solved in
polynomial time. However, such a design with a single-sized
gate is hardly found in real design, thus the problem com-
plexity is NP-hard in practice. Nevertheless, we can rely on
the network ow optimization as a legalization framework
based on the following observations [2]:
If each gate is sliced into a unit, the network ow op-
timization can be still leveraged to perform global le-
galization planning.
After solving the network ow formulation, if a gate
needs to move partially (e.g., 50% of a gate), it requires
a special step called ow realization.
4. ALGORITHM
In this section, we propose our iterative legalization al-
gorithm illustrated in Fig. 2. First, we solve a network
ow problem with every gate sliced into a unit, then re-
alize a ow if a gate needs to move partially. If such ow
realization cannot be accomplished, it means legalization
fails. Hence, we proceed to a history learning step which
will make the ow formulation for the next iteration more
ow-realization friendly. Otherwise, we complete legaliza-
tion with post-optimization.
Our algorithm is superior to the previous approaches as
illustrated in Fig. 3 where a region p is overowed by 2 with
the gate A. Fig. 3 (a) shows the case which is legalizable
only if the gates A and E are moved simultaneously. Hence,
a greedy/local algorithm may fail but a network ow-based
approach (with a decent ow realization algorithm!) can
solve easily. Meanwhile, a greedy legalizer can solve (b)
trivially as A can perfectly t to the empty space in the
region r, but a network ow-based one may have diculty.
E
C
D
A
p q r
A
C
D
p q r
E
(a) A network ow-based le-
galization may succeed, but a
greedy legalization will fail.
C
D
A
p q r
A
C
D
p q r
(b) A greedy legalization may
succeed, but a network ow-
based legalization will fail.
Figure 3: Motivative examples for our legalization
where wA = 3, wE = 1 and Op Wp = 2.
287
18.2
Region r
O
r
W
r
Region p
Op W
p
Gate i
(w
i
,0)
Gate j
Region q
Oq W
q
(w
j
,0)
(w
i
, cost(i,r,p))
(w
i
, cost(i,r,q))
(w
j
, cost(j,r,p))
(w
j
, cost(j,r,q))
Figure 4: Network ow graph to minimize the devi-
ation of each gate individually.
The optimal movement in terms of network ow (not le-
galization) is to move the overowed part of A from p to
q. However, as such movement is not achievable (unrealiz-
able), if the same movement is insisted over the iterations,
the legalization process will be stuck and eventually fail.
For both cases, our algorithm can nd legal solutions. Be-
cause our framework is based on network ow, the case (a)
is easily legalized. Also, thanks to our novel history learn-
ing, our framework will learn that the optimal movement (in
terms of network ow) for (b) is not realizable, thus start
discouraging the same (unrealizable) ow in the future iter-
ations, eventually moving A to r.
In the rest of this section, we propose a novel gate-centric
formulation in Section 4.1 where the deviation of each gate
is individually minimized. In order to realize a ow solu-
tion into gate movements, we leverage an eciently approx-
imated Subset-sum problem in Section 4.2, followed by re-
gion placement in Section 4.3. Also, we have a novel step,
history learning to handle complex cases based on previous
legalization failures in Section 4.4. Finally, we have a post-
optimization step to improve the solution as in Section 4.5.
4.1 Network Flow Formulation
In our formulation, we model the deviation of each gate
individually as shown in Fig. 4. First, each region becomes
an either source or sink node. For an overowed region, it
becomes a source node, otherwise, a sink node. For each
gate in Gr, a corresponding node is created and connected
to regions as in Fig. 4. In our formulation, each gate can
jump to any other regions at the cost computed by Eq. (6)
(ignore h[wi][r][p] till Section 4.4) where Di(r) is the current
deviation at the current region r and Di(p) is the expected
deviation when the gate i moves to another region p.
Di(p) |Xp xi| +|Yp +
Wp
2
yi| (5)
cost(i, r, p) = h[wi][r][p]
(Di(p)
1
Di(r)
1
)
wi
(6)
The main idea in Eq. (6) is to compute the change in terms
of deviation by moving a gate i Gr to another region p.
For the current deviation, it can be computed by Eq. (3)
accurately. However, the deviation after moving to p can be
only approximated as yi cannot be computed yet as we do
not know what other gates will move to p later (network ow
optimization moves multiple gates simultaneously). There-
fore, we estimate a new deviation based on the center of p
as in Eq. (5). Note that yi will be nalized in Region Place-
ment in Section 4.3. As it is hard to minimize the maximum
deviation (M in Eq. (3)) in a ow formulation, we have
(from Eq. (1)) in Eq. (6) to penalize bigger deviation more
(e.g., when = 0.5, quadratically penalized).
The example in Fig. 5 (a) illustrates how our cost is com-
1 1
4 4
9 9
1 2 3 4 5 6 7
(a) In the 1st iteration.
-4
-3 5
-3
0
5
1 2 3 4 5 6 7
(b) In the 2nd iteration.
H
H
H
HH
r
p
1 2 3 4 5 6 7
1 - -5 -8 -9 -8 -5 0
2 5 - -3 -4 -3 0 5
3 8 3 - -1 0 3 8
4 9 4 1 - 1 4 9
5 8 3 0 -1 - 3 8
6 5 0 -3 -4 -3 - 5
7 0 -5 -8 -9 -8 -5 -
(c) Edge costs from Eq. (6).
Figure 5: Example of cost computation ( = 0.5).
puted where an inverter was initially at the 4th circuit row
(we ignore y deviation for easier explanation). The arrows
represent potential movements of the inverter during legal-
ization, and the corresponding costs in our network ow
graph are shown in the circles. In the 1st iteration, since
the current deviation is zero, the quadratic distance becomes
the cost. Assuming the inverter migrates to the 2nd circuit
row after the 1st iteration, the costs in the following itera-
tion can be computed by Eq. (6) as in Fig. 5 (b). In detail,
as moving this inverter to the 3rd, 4th, and 5th circuit rows
will reduce the deviation, some costs become negative. Also,
it shows that moving the inverter to the 4th circuit row will
reduce the cost mostly (which was the initial location of the
inverter), but moving to the 6th circuit makes no change.
Fig. 5 (c) summarizes all possible costs based on Eq. (6).
When our formulation is solved by a min-cost network
ow algorithm, a ow will occur between gates and regions
such that the total cost based on Eq. (6) is minimized. Our
formulation has the following advantages over the previous
ones [2, 4] which have unbounded ows between only adja-
cent zones at non-negative costs:
Over the iterations, as a gate i deviates away from
( xi, yi), the cost on the edge toward ( xi, yi) becomes
smaller (more negative), eectively controlling the de-
viation of i over the multiple iterations.
The maximum ow is bounded by the maximum gate
size in the library, which will make ow realization
more ecient as to be discussed in Section 4.2.
It is easier to incorporate the history scheme in Sec-
tion 4.4 into our formulation due to gate-level control.
4.2 Flow Realization
Since a network ow-based legalizer cannot model the dis-
crete sizes of gates in the library, a ow may need a realiza-
tion step [4]. For example, although there is a ow f = 2
from p to q in Fig. 3 (a), the gate A cannot move to q as
wA = 3. Thus, we need to nd an alterative way of realizing
f = 2, which is moving E to p at the same time so that
f = wA wE = 2. Finding such an alterative ow realiza-
tion can be formulated into a Binary Knapsack problem or
equivalently a Subset-sum problem [6].
288
18.2
Algorithm 1 Flow
Require: a set of regions R
1: for each region r R do
2: Add a node Nr with Or Wr
3: end for
4: for each region r R do
5: for each gate i Gr do
6: Add a node Ni
7: Insert an edge Nr Ni with (wi, 0)
8: for each region p R and p = r do
9: Insert an edge Ni Np with (wi, cost(i, r, p))
10: end for
11: end for
12: end for
13: Solution F=Solve min-cost network ow
The runtime of ow realization depends on the ow size.
Therefore, a preprocessing step is suggested in [4] (where a
ow is unbounded) to reduce the ow size by moving gates
greedily. However, in our case, every ow is limited by the
corresponding gate width, thus limited by the biggest width
in a library. With such a bounded ow, the ow realization
problem can be solved highly eciently [13].
Further speedup is possible if we limit the number of gates
involved in ow realization (which is dened as in Algo-
rithm 3). Fig. 6 illustrates our ow realization where f = 5
is required between regions r and p with = 3. |Gr| = 7
where widths are {6, 3, 3, 3, 3, 1, 1}. We rst bucket the gates
in Gr and Gp respectively by the width. Since we will ac-
complish f = 5 (which is positive) with at most 3 gates, we
never move more than 3 gates from any bucket. Therefore,
we can exclude one gate whose width is 3 from Gr (which is
crossed out in Fig. 6). For the same reason and the ow is
always positive, we can drop two gates whose width is 2 and
one gate whose width is 1 (e.g., if all 3 gates are picked from
p, the ow will be negative). After that, we build the nal
set T in Fig. 6 where widths from p become negative. Then,
any subset whose cardinality is fewer than 3 () and sum-
mation is 5 (f) can realize a ow as in Fig. 6. Among these
subsets, we pick one which will incur the least cost based on
Eq. (6). Generally, if the number of dierent sizes in library
is L, the size of the nal set |T| (2 1)L, which will
allow very ecient implementation. Our ow realization is
detailed in Algorithm 3.
4.3 Region Placement
When a gate i is assigned to a region r, xi is xed as in
Eq. (2), but yi needs to be computed. As nding yi, i
Gr for minimum deviation is NP-hard, we consider a more
restrictive problem by prexing the order i Gr. We
6
3
3
3
3
1
1
4
2
2
2
1
1
1
{3,3,3,3}
{6}
{1,1}
{4}
{2,2,2,2}
{1,1,1}
T = {6,3,3,3,1,1,-4,-2,-2,-1,-1}
{6,-1}{3,1,1} {6,1,-2}{6,3,-4}
r p
{3,3,-1}
+
_
Subset:
Flow: 5
Figure 6: Example of ow realization using Subset-
sum with = 3.
Algorithm 2 Realize, Region Placement, History Learning
Require: a ow solution F, a set of regions R
1: for each ow f F for a gate i from r to p do
2: if f = wi then
3: Move i to p
4: else if Realize(f, r, p) fails then
5: h[wi][r][p] + + //history learning
6: end if
7: end for
8: for each region r R do
9: Region Placement for r
10: Update (xi, yi), i Gr
11: end for
order the gates according to the center location (xi, yi +
w
i
2
) rather than (xi, yi), as it provides the less deviation
according to our experience. In case a region placement
is not feasible for an overowed region r (Or > Wr), we
temporarily scale down wi, i Gr by
Wr
Or
to just make
region placement feasible. Since our region placement is a
well-studied problem as single-row or 1D linear placement,
we will omit the details and please refer to [3, 4, 7, 11].
4.4 History Learning
If there is any overowed region after Section 4.3, it is due
to ow realization failure. Therefore, we may perform his-
tory learning to avoid the similar ow realization attempts
in the next iteration. Since the formulation in Fig. 4 does
not consider the potential ow realization issue, it may in-
sist some unrealizable ows over the iterations, as it seeks
the optimal solution in terms of network ow (not legaliza-
tion). We observe that this issue makes legalization converge
poorly by shuttling between a few optimal but unrealizable
ow solutions. In order to enhance convergence, we need
a new more ow-realization friendly formulation in the fol-
lowing iterations. We accomplish this by the history factor,
h[wi][r][p] in Eq. (6) which is indexed by three terms: the
wi of the failing gate i Gr attempting to move to a region
r. We initialize the history factor as 1 in the beginning.
If the same ow realization keeps failing, the corresponding
history factor increases accordingly as in the line 5 of Al-
gorithm 2, and eventually will make such ows expensive
enough to be excluded from the solution.
Fig. 7 shows how our history learning helps Fig. 3 (b)
which is hard for a network ow-based approach. In the
rst iteration where h[3][p][q] = h[3][p][r] = 1, the optimal
network ow solution is a f = 2 from p A q, which
is again unrealizable. Then, we do h[3][p][q] + + and solve
repeatedly for the next three iterations where the graph tem-
porarily stays static (no movement). In the fth iteration
where h[3][p][q] = 5, the optimal solution is f = 2 from
3
1
Region p
2
Region q
-2
A
(3,0)
Region r
-3
(3, h[3][p][q] )
3
4
(3, h[3][p][r] )
Figure 7: History learning for Fig. 3 (b) (wA = 3).
289
18.2
Algorithm 3 Realize
Require: a ow f from regions r to p
1: T =
2: for each gate i Gr do
3: if the number of wi in T < then
4: Add wi to T
5: end if
6: end for
7: for each gate i Gp do
8: if the number of wi in T < 1 then
9: Add wi to T
10: end if
11: end for
12: S = Find the cheapest Subset-sum solution from T
13: if S = then
14: for each number w S do
15: if w > 0 then
16: Move the cheapest i Gr whose wi = w to p
17: else
18: Move the cheapest i Gp whose wi = w to r
19: end if
20: end for
21: return Success
22: end if
23: return Failure
Algorithm 4 PostOptimization
1: for a gate with the maximum deviation i I do
2: E = a set of regions within a box (xi, yi) ( xi, yi)
3: r = a region s.t. i Gr
4: for each region p E, p = r do
5: if Realize(0, r, p) succeeds then
6: break
7: end if
8: end for
9: if No improvement then
10: break
11: end if
12: end for
p A r which can be realized as discussed in Fig. 3 (b).
We nd our history learning is a very eective way of
guiding a network ow-based legalizer to the more realiz-
able solution space. Regarding history update, there can be
various policies, but we observe even a simple policy (e.g.,
increasing by 1 for each failure) is sucient.
4.5 Post-Optimization
Since we minimize the total deviation as well (the rst
term in Eq. (1)) while nding a legal solution, it is possible
that some gates have large maximum deviation. In order
to minimize the maximum deviation, we greedily move the
gates toward their initial locations. In such post-optimization,
we utilize the ow realization again with a zero ow as shown
in Algorithm 4. Since the gate with the maximum deviation
has the most negative cost, once such zero ow is realized,
the gate will migrates toward its initial location and reduce
the maximum deviation, while keeping a solution legal (no
change in Or, r R).
4.6 Speedup Techniques
The complexity of min-cost network ow optimization is
O(|V |
2