Guide: Prof. Sachin Patkar Department of Electrical Engineering Indian Institute of Technology Bombay
Acknowledgment
I would like to thank my guide, Prof. Sachin Patkar, for supporting me on the academic and personal fronts during my seminar work. His constant encouragement, valuable suggestions and patience have played instrumental role in this area of research. -Uttam Sikaria
Contents
Acknowledgment ................................................................................................ 2 Contents ............................................................................................................. 3 List of Figures ...................................................................................................... 5 Introduction........................................................................................................ 6 1 Technology Mapping..................................................................................... 7
1.1 1.2 Technology Mapping in Lookup-Table based FPGA Architecture.................................... 7 Algorithms for LUT-based FPGA Mapping ....................................................................... 8
FlowMap....................................................................................................... 9
2.1 Problem Formulation ....................................................................................................... 9 2.2 Assumptions and Preliminaries ........................................................................................ 9 2.1 2.3 Difficulties................................................................................................................. 10 2.3.1 Monotone Clustering Constraint ............................................................................ 10 2.4 The Algorithm ................................................................................................................. 11 2.4.1 Labeling Phase ........................................................................................................ 11 2.4.2 Mapping Phase........................................................................................................ 13 2.5 Area optimization ........................................................................................................... 13 2.5.1 Maximising the Cut Volume During Mapping......................................................... 13 2.5.2 Post processing Operations for K-LUT Reduction ................................................... 14
Autumn 2011 Supervised Research Exposition 3.2 3.3 3.4 3.5 Maximum Flow ............................................................................................................... 17 Flow and Cut................................................................................................................... 18 Augmenting Path and Flow Augmentation .................................................................... 18 Ford-Fulkersons Algorithm for computing maximum flow........................................... 19
Conclusion .................................................................................................. 20
4.1 4.2 Complexity Analysis........................................................................................................ 20 Further Scope ................................................................................................................. 20
List of Figures
Figure 1: Mapping a Boolean network to a 3-LUT network ........................................................... 9 Figure 2: Constraint on the number of inputs to LUT is not monotone (K=3) ............................. 10 Figure 3: Computing the label l(t) of node t (K=3). (a) partial network (b) construction of N t and the highest 3-feasible cut. (c) Determining l(t)............................................................................. 11 Figure 4: Network transformations in computing a minimum height K-feasible cut in (K=3) 13
Figure 5: Predecessor Packing ...................................................................................................... 14 Figure 6: Gate decomposition....................................................................................................... 15 Figure 7: the flow-pack operation ................................................................................................ 15 Figure 8: A Flow Network. Numbers along the edges are their respective capacities ................. 16 Figure 9: A Flow in a network. =6............................................................................................. 17 ...................................................................... 17
Figure 10: Maximum Flow in a network. Figure 11: Flow across and
Introduction
The seminar work consist of two aspects implementation and reading. As part of implementation, a compilation of various VLSI CAD algorithms have been created to serve as demonstration object and to help understand the algorithms better through actual execution on varied inputs. This consists of Calculation of Minimum Sum of Products for a Boolean function, interchanging between SoP and PoS form of a Boolean function, creating BDD for a given Boolean Function, complimenting an SoP, stuck-at-fault detector, static hazards. As part of reading work, I read about flow networks and their application in technology mapping. Specifically, Jason Congs work in the field of delay optimum mapping for FPGA based Boolean networks was explored. FlowMap, an algorithm for Delay optimization in Lookup-Table based FPGA designs, was studied thoroughly. The rest of the report is a highlight of reading work carried out. The implementation work is software and needs no explanation herein.
1 Technology Mapping
Logic synthesis is often taken as a two-step process technology independent optimization of a set of logic equations followed by technology dependent mapping into a feasible circuit. The later one, Technology Mapping caters to two essential aspects of logic implementation area minimization of the circuit and satisfying the maximum critical path-delay. It finishes the synthesis of the circuit by performing the final gate selection from a particular library. Technology mapping doesnt change the structure of the circuit radically which in fact is achieved by the precedent technology independent take on the problem. This simplifies the process of logic synthesis radically. In general, a good technology mapping algorithm must: 1. 2. 3. 4. Adapt easily to different libraries Support irregular collections of logic functions Handle detailed technology-dependent cost-functions Be time efficient
With the advent of FPGAs and their popularity in VLSI ASIC designs, often the library of logic functions available is well known. For instance, one popular FPGA design is that based on Lookup-Tables (LUTs) and is several FPGA manufacturers including Xilinx and AT&T. This makes it advantageous to explore for efficient and fast technology mapping algorithm meant specifically for LUT-based FPGAs. This renders the first two requirements of a good technology mapping algorithm a little unimportant atleast for certain specific purposes leaving us to concentrate more on the other two.
1.1
Autumn 2011 Supervised Research Exposition technology mapping algorithm, we need an algorithm which optimizes the critical path delay as well as the area of the chip. The first constraint translates to obtaining a minimum depth K-LUT network while the second one translates minimizing the number of K-LUTs used.
1.2
2 FlowMap
FlowMap is arguably a major accomplishment in the field of technology mapping as it presents an alternative solution to the NP hard problem of technology mapping in general Boolean networks in just polynomial time. This chapter discusses the algorithm in details
2.1
Problem Formulation
The problem of technology mapping in K-LUT based FPGA Design can be best be described as Covering a given K-bounded Boolean network with K-feasible cones or KLUTs. The solution is a Directed Acyclic Graph (DAG) where: - Each node is a K-feasible cones (or KLUT) - An edge (Cu, Cv) exists only if u is in input(Cv) where Cv is the KLUT rooted at v The primary objective is to minimize the critical path delay by minimizing the depth of the resultant DAG. A secondary goal is Figure 1: Mapping a Boolean network to a 3-LUT network to minimize the chip area of the solution by minimizing the number of K-LUTs used in the solution.
2.2
Autumn 2011 Supervised Research Exposition Next, The sources of delay are assumed to be two viz. the propagation time of K-LUTs and the delay in interconnection paths. To simplify the model a reasonable unit delay model is assumed: Each K-LUT is assumed to contribute a constant delay, equal to its propagation time, independent of the function implemented by it Each edge or interconnection path contributes a constant delay irrespective of how it is routed.
This model enforces the fact that the delay of the circuit, determined by the critical path, is now solely dependent on the depth of the mapped solution. Before discussing the algorithm, please note that all terminologies and definitions thereof (other than the most obvious ones in networks) have been included in Appendix I. The FlowMap algorithm is applicable only to K-bounded Boolean network. This however is not a constraint as any Boolean network can be transformed into a K-bounded Boolean network using Roth Karp Decomposition [2]. A network can always be be transformed into a simple gate network by representing each complex gate in the sumof-products form. Thereon, DMIG[3] can be used to decompose each multiple-input simple gate into a tree of two input simple gates. Such a transformation arguably enables the mapping algorithm to pack more gates along critical paths to one K-LUT, resulting in smaller depths in the solution. Henceforth in the discussion, we shall assume the network to be K-bounded. Although a transformation into a network of 2-input simple gates is performed, optimality of the solution doesnt rely on it. The solution is optimal as long as the network is K-bounded.
2.1
2.3
Difficulties
Supervised Research Exposition Autumn 2011 to K-LUT based FPGA mapping achieiving significant results but was not optimal.
2.4
The Algorithm
The FlowMap algorithm runs in two phases. In the first phase, it computes a label for each node reflecting the level of K-LUT that implements it in optimal solution. In the second phase, mapping solution is generated based on node labels computed in phase I.
Figure 3: Computing the label l(t) of node t (K=3). (a) partial network (b) construction of N t and the highest 3-feasible cut. (c) Determining l(t)
s In Figure 3(a), node t is to be labeled. We modify Nt by including an auxillary node s and connecting it to all PIs thus serving as the only source. Nt now has one source s and once sink t. Fig. 3(b) shows the construction of network Nt rooted at t. Let LUT(t) in Figure 3(c) be the 3-LUT implementing t in an optimal solution of Nt. If is the set of nodes in forms a 3-feasible cut between LUT(t) and be the set of remaining nodes. Then s and t. Let u be the node with maximum label in . Level of LUT(t) is then in ) is in the optimal mapping solution of Nt. Now, Height of the cut . Therefore,
Autumn 2011 Supervised Research Exposition minimizing the level of LUT(t) requires finding a minimum height cut so,
in Nt. And
Label computed thus is the minimum depth of any mapping solution of N t.In Figure 3(b), we have a minimum height 3-feasible cut in Nt of height 1 and so we have . In the preceeding discussion, we have essentially brought down the problem of labeling the nodes to finding the minimum height K-feasible cut. An important contribution of Cong. et al through FlowMap is an time algorithm for finding the minimum height K-feasible cut in Nt, where is the number of edges and is the number of nodes in Nt. An important property of the label thus obtained is that , where p is the maximum label of the nodes in . A rigorous proof can be found in [1]. Intuitively, for every input, node t can either belong to the same LUT as the input or in the successive LUT. Hence label of a node is greater than or equal to the maximum label of its inputs. Besides the worst case can be when t is implemented in the next LUT as the maximum label input and hence the upper bound on is So, to find the minimum height K-feasible cut, it suffices to check for existence of a cut with height in Nt. if there is such a cut, we assign We use one KLUT for entire . Otherwise we assign and the minimum height cut (height { } { } . A new K-LUT is used for node t. ) is Whether or not there is a K-feasible cut of height or not can be tested as follows. is the maximum label of nodes { }. We apply a network transformation on Nt that callapses all the nodes in with label , together with t, into a sink t. This gives a modified network . Now, it can be seen that has a K-feasible cut of height p-1 iff has a K-feasible cut. Above mentioned modification of network enforces that only exist in those cuts which have all nodes with label grouped into . Thus any cut in has corresponding cut of height in . Figure 3(a) and 3(b) respectively show and . Further, to determine a K-feasible cut in , we apply another standard network transformation, called the node-splitting transformation to obtain thus reducing the node cut-size constraint to an edge cut-size constraint. For each node in other than or , we introduce two nodes and connected by an edge of capacity 1. All other asedges are given a capacity of . Figure 3(c) shows corresponding to and in figures 3(a) and 3(b). Now, a K-feasible cut in corresponds to a cut in with cut-size no more than K. Infact, with edge capacities as mentioned above, this further reduces to whether the maximum flow from to in is of value or smaller.
Thus the labeling phase in FlowMap algorithm reduced to the well eastablished problem of finding the maximum flow in a flow network. FlowMap uses augmenting path algorithm to compute the maximum flow.
2.5
Area optimization
The secondary objective of the algorithm is area optimization i.e. to minimize the number of K-LUTs in the mapping solution. This is done by maximizing the volume of each cut during the mapping process and by post-processing operations for K-LUT reduction.
Autumn 2011 Supervised Research Exposition is the corresponding cut in maximized where . Thus we need to find a cut in is maximum. In such that and other words, we want a min-cut in of the maximum volume. It turns out that there is a unique maximum volume min-cut in any network and this shall be our choice of . Proof to its unique existence can be found in [1]
3 Flow Networks
a flow network is a directed graph where each edge has a capacity and each edge receives a flow. The amount of flow on an edge cannot exceed the capacity of the edge. A flow network consists of: - A weighted directed graph G with non-negative integer weights called capacity of an edge e, c(e) - Two distinct vertices S and T, source and sink of the network respectively U 6 5
7
W
S
2
7
9
1 V
Figure 8: A Flow Network. Numbers along the edges are their respective capacities
3.1
Flow
A flow for a network satisfies: is an integer assignment to each edge e such that it
1/5 1/2
W
2/6
S
3/5 1/2
V
1/7 4/7
Z
T 5/9
1/1
3.2
Maximum Flow
A flow for a network N is said to be maximum if its value is the largest of all flows for N. Figure 9 shows an example of a flow while Figure 10 gives maximum flow of the same flow network.
6/6
S
4/5 2/2
W
5/5 1/2
4/7 7/7
Z
8/9
1/1
3.3
2/6
S
1/5 1/2
W
3/5
1/2
V
1/7 4/7
Z
1/1
5/9
Figure 11: Flow across and is same and equal to the flow in the network
3.4
2/6
U
3/5
1/5 1/2
1/7
S
1/2
4/7 Z 5/9
1/1
Supervised Research Exposition Autumn 2011 Let be an augmenting path for flow f in network N. There exists a flow for N of value
Figure 13 shows the flow augmentation of the example in figure 12. 2/5 0/2
2/6
U 4/5
1/7
S
1/2
4/7
Z 5/9
1/1
3.5
4 Conclusion
Flow map achieves technology mapping for depth minimization is LUT-based FPGA desigs quite efficiently in polynomial time. To get a fair idea of how fast the algorithm works, we will carry out brief complexity analysis
4.1
Complexity Analysis
Let the network, have edges and nodes
Since we need to find a flow of no more than K, labeling of each node takes
Since there are n nodes, the FlowMap Algorithm arrives at the optimal technology mapping in time Thus FlowMap algorithm gives optimal mapping for delay optimization in LUT-based FPGA in polynomial time.
4.2
Further Scope
In 2.2 we assumed a unit delay model for the Boolean Network. One immediate extension of the FlowMap algorithm could be to use a more general delay model other than the unit delay model. For example [5] exhibits use of nominal delay model in FPGA designs where the interconnection delay of a single net is estimated by the number of fanouts of the net. The publication shows that this model estimates the interconnection delay quite well. Another possible extension is to combine area and depth optimization in the mapping procedure. In a FlowMap solution, depth of every node is minimum while only depths of nodes on critical path need to be minimized. This slack of non-critical node depths can be exploited for area minimization without affecting the depth optimality through certain delay relaxation operations on non-critical nodes [6].
Appendix I - Terminologies
K-feasible cone at a node v, Cv is a subgraph containing v and its predecessors such that: Also any path connecting a node in Cv to v should lie entirely in Cv Level of a node length of largest path to any node from any Primary Input Depth of a network is the largest node level K-bounded boolean network is one where:
Height of a Cut (X,X) maximum node label in X { } that are adjacent to some node in }
is # of nodes in {
where E(N) is the edge set of the network N K-feasible cut is a cut with node cut-size less than or equal to K Edge cut-size sum of the capacities of the forward edges Volume of a cut is number of nodes in
Bibliography
[1] [2] [3] [4] [5] J. Cong and Y. Ding (1994). FlowMap: An Optimal Technology Mapping Algorithm for Delay Optimization in Lookup-Table Based FPGA Designs. IEEE Trans. J. P. Roth and R.M. Karp (1962). Minimization over Boolean Graphs. IBM J. Res. Devel. K.C. Chen, J. Cong, Y. Ding, A.B. Kahng, and P. Trajmar (1992). DAG-map: Graph-based FPGA technology mapping for delay optimization. IEEE Design and Test of Computers E. L. Lawler, K. N. Levitt, and J. Turner (1969). Module clustering to minimize digital networks. IEEE Trans. Computers M. Schlag, P. Chan and J. Kong (1991). Empirical evaluation of multilevel logic minimization tools for a field programmable gate array technology. Proc. 1st Int. Workshop on Field Programmable Logic and Applications J.Cong and Y. Ding (1993). On area/depth trade-off in LUT-based FPGA technology mapping. Proc. 30th ACM/IEEE Design Automation Conf.
[6]