Evaluating Cost and Time-to-Market Requirements of MCM Designs

Evaluation and Improvement of MCM Design Flow
Vineet Sahula C. P. Ravikumar∗
Deptt. of ECE Texas Instruments, Bangalore
Regional Engineering College, Jaipur ravikumar@india.ti.com
sahula@ieee.org
Abstract
MCM design is a preferred methodology for implementation of large systems in telecom-
munication applications e.g. wireless mobile transceiver. It permits inclusion of manufac-
turability consideration, for example yield, into the design process. Design for Manufactura-
bility (DFM) and design for packageability (DFP) are the two factors considered, which
facilitate achievement of objectives of low cost and short time-to-market. For a design
methodology, cost and time-to-market (TTM) are two major issues for end user appliances
like mobile wireless handset. The major portion of TTM is constituted by design time. In
our work, we have addressed both the cost and the TTM, and considered minimization of
cost and design time as the design objectives.
In this paper, we formally analyze the cost and time-to-market requirements of MCM designs.
We first present the issues involved in design for manufacturability and design for packageability
in Section 1. In Section 2, earlier efforts which have considered DFM and DFP during design
stages of VLSI systems, are discussed. In Section 3, we present two MCM design flows namely,
∗ This work was carried out at IIT, Delhi. The authors wish to thank Motorola, USA for funding this research.
1
manufacturability oriented flow and performance oriented flow. We take yield consideration into
account during partitioning in the design flow. Section 4, illustrates a procedure for parame-
terization of task completion time for partitioning. Using the HCFG approach, we evaluate two
flows for design completion time in Section 5. We present results of completion time comparison
in Section 6 and indicate situations when one of the two flows provides better completion time.
1 Design for manufacturability
Design planning for large system design involves several decisions related to manufacturability,
testability and packageability with primary objectives of cost and performance optimization.
The choices considered for implementation are either SOC (System-on-chip) or MCM (Multi
Chip Modules) packaging. SOC implementation results in better system performance but at a
higher manufacturing cost. MCM implementation provides a lower system cost at relatively
lower performance as compared to an SOC. For both methodologies, choice of package type
(QFP, PGA, BGA etc.) and material (ceramic, plastic etc.) is influenced by cost-performance
trade-offs. The choice of substrate material and chip attachment technique for an MCM based
system is also guided by cost-performance considerations.
MCM technology is used to realize electronic systems in order to achieve one or more of the
following objectives: high performance, small size, light weight, high reliability, short time-to-
market and low cost. The type of application decides the priority of objectives to be realized. For
example, for design of a general purpose microprocessor, major objectives are high performance
and low cost. Table 1 illustrates the dependency between objectives and factors. To achieve high
performance, interconnect delay need to be minimized as for deep submicron feature sizes below
0.18 μm, interconnect delay may be more than 60% of the total delay in a digital system. High
volume production and reuse of predesigned components from library should be considered to
achieve low cost of system realization. Optimization of yield should be considered early in the
2
design cycle so as to minimize the manufacturing cost. Improvement in yield also result in fewer
design iterations which lead to shorter TTM.
Table 1: Design objectives dependency for MCM based design

Objective Factors to be considered
Small size Interconnect density
I/O counts
Component placement
Second level interconnect technology
High performance Interconnect delay
Noise
Low cost Standard parts ( high volume production)
Shorter time to market Manufacturability (fabrication technology)
High reliability Power, Noise
Testability, Thermal management
From the manufacturability perspective, MCM methodology has an advantage over SOC. Ta-
ble 2 compares the MCM and SOC packaging styles on several factors such as yield, manufac-
turing cost and so on. These general guidelines cannot help a designer in making manufacturing
and packaging related decisions at an early stage. In an SOC, the whole system is implemented
on a single chip and has to be fabricated in a single manufacturing technology indicated by λ, the
minimum feature width in μm. Thus the yield and related cost of chip get decided and remain
fixed. However, an MCM based system can be implemented in more cost effective ways by opt-
ing for different fabrication technologies for system partitions, which could result in low overall
system cost. This is true as different fabrication processes have been optimized to produce cost
effective solution for different types of logic systems such as microprocessor, memory and ASIC.
Figure 1 illustrates the SOC and MCM design flows and various considerations to be incor-
porated. The SOC design methodology is opted for when sole criterion is performance. For
achieving a mix of both, cost and performance, MCM methodology is followed. When cost is
the more dominant objective, flow 2 is adopted, where yield is a constraint and performance is
3
Table 2: Comparison of MCM and SOC Implementations of a VLSI System
Factor MCM Packaging SOC Packaging
considered Style Style
1 Yield Individual dies mounted on Due to large chip area,
the MCM are smaller in size yield may be
and their yields are expected expected to be lower.
to be larger, resulting in an
overall yield that is larger.
2 Manufacturing If there are n chips on There is only one
steps the MCM, there are n manufacturing step.
manufacturing steps.
3 Testing The individual dies have to be There is a single testing step.
steps separately tested before mounting
them on the surface. This is
followed by system level test.
4 Interconnect Interconnect models for system- On a relative scale,
models level interconnect are interconnect models are
more elaborate, e.g. they must simpler and inductance effects
include inductance effects. may often be ignored.
5 Length Average length of system- Average length of interconnect
of level interconnect is longer is smaller than that of
interconnect than that of SOC, MCM, providing an
resulting in a lower opportunity to obtain
performance. higher performance.
6 Fabrication The designer can select alternate The SOC must be
technology implementation technologies implemented in a
(different values of λ) for single technology.
different dies to tradeoff cost.
an objective function. Similarly, when performance is dominant objective, flow 3 is adopted,
where performance is a constraint and yield is an objective function. In performance oriented

flow, flow 3, there is a larger probability of substantial yield loss at die fabrication stage and thus
a larger probability of design iterations. This implies a larger design time and hence longer time-
to-market. Yield-oriented design flow (flow 2) has a lower probability of redesign, as probability
of yield loss is minimized. This is made possible by relating yield prediction models at earlier
design stages to the information available at the physical design stage and fabrication.
4
System requirements
Conceptual Design
SOC Design Methodology MCM Design Methodology
Performance oriented Yield oriented Performance oriented

Design planning Design planning Design planning
Detailed design Detailed design Detailed design

and design tapeout and design tapeout
and design tapeout
Prototype fabrication/ Prototype fabrication/

Prototype fabrication/
yield simulation yield simulation yield simulation
Package design Package design Package design

and packaging and packaging
and packaging
Flow 1 Flow 2 Flow 3
Package material={Plastic, Ceramic}

Package type={QFP, PGA, BGA}
Die attach method={WB, TAB, FC}
Package tMaterial={Plastic, Ceramic} Performance={Cutset, die pads, die size}
Package type={QFP, PGA, BGA} Package Substrate={MCM−C, MCM−D, MCM−L}
Figure 1: Design planning for packageability and manufacturability
2 Related work
There is general consensus on the benefits of taking manufacturing and packaging considerations
into account during the design stages of a VLSI system. In other words, the designer must take
cognizance of the available manufacturing technologies and packaging styles during the early
stages of the design process in order to optimize the cost and performance of the product. For
instance, the VLSI system designer may be confronted by a question such as “Will packaging
the system as a multi-chip module (MCM) result in better system performance and cost than a
monolithic system-on-chip (SOC) packaging?” If yes how should the system be partitioned?
5
There is no easy way to answer these questions. Some general guidelines are available in
the literature to compare VLSI, MCM and WSI (Wafer Scale Integration) implementations of a
system [1, 2]. There has been earlier work on taking manufacturing and packaging issues into
account during the design stages of VLSI systems [3, 4, 5].

It has been shown that a MCM package for MicroSPARC processor is of lower cost [3] than
an SOC package, packaging related information was considered during functional partitioning.
Sandborn et al [5] provide an excellent discussion of packaging related trade-offs during system
design. Khan et al in [4] have presented an MILP (mixed integer linear programming) formula-
tion of constrained system partitioning for MCMs.

In all of above the approaches, however time-to-market is not explicitly addressed and the
yield model considered is not valid for sub-micron technologies. In the next Section, we describe
an approach for including yield consideration during partitioning task of an MCM design flow.
3 Design flow with yield oriented partitioning
In our work, we formally analyze the cost requirement and time-to-market requirement of MCM
designs [6]. In order to achieve the objectives of minimum cost and short TTM, issue of man-
ufacturability of individual die has to be addressed. The die yields affect the system objectives
in two ways. Firstly, at the manufacturing stage, the yield of a die may not be sufficient. This
increases the overall projected system cost, leading to higher system cost. On the other hand, if
cost is unacceptable, the system has to be redesigned and therefore the overall system cost may
still increase due to cost of redesign effort.
We consider two alternate MCM design flows - (a) an yield oriented design flow, and (b) a
performance oriented design flow, marked as “flow 2” and “flow 3”, respectively in Figure 1.
These two flows have been redrawn in Figure 2 using hardware-software (HW-SW) codesign
methodology for system design. We compare the subgraphs of these two flows which have been
6
Specification Specification
generation generation
Algorithm Algorithm
development development
TY TNY
Yield
constrained Partitioning
partitioning
Detailed Detailed Detailed Detailed

Software hardware Software hardware
design design design design
T2
Cosimulation Cosimulation
Prototype Prototype
Fabrication/ Fabrication/
Yield simulation
Yield simulation
Is yield NO Is yield NO
acceptable? acceptable?
transition-probability p transition-probability p
(Low) (high)
YES YES
Product Product
(a) Design flow with yield constrained partittioning (b) Design flow without yield constrained partittionig
Figure 2: MCM design flows using HW-SW codesign methodology
shown shaded in the figure. We presume that the subgraphs contained in the dashed-line boxes of
the two flows in Figure 2, are identical so that they require equal completion time. We designate
each of the subgraphs by G2 , and their completion time by T2 . The subgraphs under consideration
are shown dark shaded and have been reproduced in Figure 3 for clarity.
In the next part of this section, we illustrate the calibration procedure for the HCFG model.
We describe parameterization of completion time of partitioning task. Similar procedure may be

followed for rest of the tasks of the design flow for complete calibration of the model.
7
Yield Partitioning
Constrained TY without yield TNY
Partitioning constraint
Subflow G2 Subflow G2
_
Detailed
T2 Detailed T2
design design
(1 − p) p
Completion time TP1 Completion time TP2
(a) Design flow 1 (a) Design flow 2
Figure 3: MCM Flows under consideration redrawn from figure 2
3.1 Algorithm for functional partitioning
A given design description is partitioned either at a structural level or at a behavioral level of

abstraction and the individual partitions are implemented on separate dice. Known good dice
are assembled on an MCM surface using a technology such as flip-chip bonding. Different dies
may be implemented in different nano-meter technologies. For simplicity, we do not consider
HW/SW issues during partitioning. Our objective is yield consideration at partitioning and as
such we presume circuit is all digital and HW. Various techniques/algorithms are available in
literature for graph partitioning based on different criteria such as min-cut and ratio-cut etc. The
partitioning algorithm divide the components of a large circuit into several smaller subcircuits.
This attempts to minimize the given cost function while satisfying certain constraints. For exam-
ple, it may minimize the signal nets (cutset) across the partitions under the constraint that each
of the partitions must be larger than a given size.
Cutset is defined as sum of number of signal nets cut between all possible pairs of partitions.
Partitioning being an NP-hard problem, exhaustive search for even a small circuit is too time
consuming. We consider partitioning being performed with two constraints - cutset and yield for
8
the flow 1 of Figure 3, whereas for flow 2 partitioning is performed under cutset constraint only.
The performance of a system is qualitatively related to the cutset of its partitions. Performance
is better if cutset is small and vice-versa.
We have chosen Simulated Annealing, a heuristic algorithm, for partitioning. Algorithm 1,

provides a description of this heuristic.
Algorithm 1 Simulated Annealing for partitioning

Anneal()
So : Initial Partition
T : Initial Temperature
WHILE (constraint not satisfied) LOOP
WHILE ( T > stopping temperature ) LOOP
Initialize COUNT=0
WHILE ( COUNT < COUNT LIMIT ) LOOP
randomly perturb So to generate S
ΔE = Cost(So ) − Cost(S)
− kΔET
P(ΔE) = min 1, e B
IF (rand(0, 1) ≤ P(ΔE))
So = S
Reset COUNT=0
ENDIF
ENDLOOP
Update T
ENDLOOP
ENDLOOP
Output partition
So is an initial partition. It is chosen randomly from among the set of partitions. T is temper-
ature. We presume cooling schedule according to relation Tnew = βT , where 0 < β < 1, with β
chosen to be 0.8-0.98. A small value of β implies a faster cooling rate; here, the algorithm may
settle to a non-optimal solution which is far from being a near-optimal solution. On the other
hand, a larger value of β means slower cooling, which implies larger execution time of algo-
rithm. COU NT LIMIT is an upper limit on number of iterations which are allowed at a certain
temperature when inferior solutions are being successively rejected.
9
3.2 Yield and cost estimates
The cost-function used in partitioning algorithm is total manufacturing cost of the system. Each
of the subcircuits in the partition would be realized on a separate die for an MCM implementa-
tion. Thus manufacturing cost for an MCM is the sum of all the individual die cost. Knowing the
processed wafer cost and die size, cost of a die can be computed. This would be the projected
cost of a die. Yield must be taken into account to arrive at real cost of the die. In this section, we
discuss models and estimates of wafer cost, yield of die and number of dice per wafer.
3.2.1 Wafer cost
The wafer cost model as given by Maly in [7] states

CW = C0 · X (1−λ) (2 < X < 3) (1)
λ is the chosen manufacturing technology for the wafer in μm. C0 is the cost of a standard wafer
of diameter 6”, processed in 1μm technology. X is a factor which corresponds to rate of cost
increase per technology generation. Different fabrication units would have different values for
X . X actually indicates investment increase in fabrication facility for a new technology. We
assume X to be 2.9.
3.2.2 Yield estimate
Faults caused on a layout surface may be of two types- (1) a short caused by an extra material
deposit and (2) an open caused by missing material defect. Faults occur due to several sources
of contamination e.g. mask scratching, particle contamination, mishandling of wafer etc. Again,
defects caused by contaminating particles may be two dimensional or three dimensional in na-
ture. We presume defects due to two dimensional discs of extra material or missing material.
They are characterized as spot defects. We consider Poisson yield model as it is appropriate for
10
moderate size dice.
Poisson yield model is given as
Ydie = e−Ao D .
Here Ao is die area and D is defect density. This expression needs to be extended for various
types and sizes of the defects. We consider yield model given in [8, 9], reproduced in equation 2.
N Ê∞
Ydie = ∏ e− 0 Acri (r)·Di (r)dr
(2)
i=1
Ydie is the yield of a single die. N is number of types of defect. Acri is the critical area for the i type
defects of radii r. The defect density for i type defects of radii r is given by Di (r) as expressed in
equation 3. Doi is specific occurrence of the defect type under consideration, where fi (r) is the
defect distribution function. There are various distribution function provided for defect densities.
We adopt the relation given by equation 4 as defect distribution function. Here k is a constant
and value of p varies between 3 and 5.
Di (r) = Doi · fi (r) (3)
k
fi (r) = p (3 < p < 5) (4)
ri
kDoi
Di (r) = p (3 < p < 5) (5)
ri
The nature of distribution of Di (r) is illustrated in Figure 4.
11
Defect
density
Di (r)
−
Ro Defect radii r
Figure 4: Defect distribution curve
3.2.3 Number of dice per Wafer
We have used following function for computation of number of dice per wafer.
(2Rw /h)−1
2
Ndie = ∑ w
· min(ri, ri+1 ) (6)
i=0
Where ri is given by following expression.

ri = Rw − (Rw − i · h)2
Here
Ndie is total number of dice on the wafer

Rw is effective radii of the wafer
h, w are height and width of the die, respectively

ri is the half length of ith row of dice

2Rw − 1 provides us the number of rows of dice on the wafer. 2r is the length of row at ith
h i
12
level of height from center of wafer.
3.2.4 Die cost
Knowing the values of CW , Ydie and Ndie from equations 1, 2 and 6, cost of a die can be computed
as follows.
CW
Cdie = (7)
Ndie ·Ydie
Table 3: Die parameters

Parameters/property Value
Minimum feature size, λ 0.09-0.99 μm
Wafer diameter 6 inches
Unusable wafer border 0.5 inches
Wafer defect density, Do 5 defects/sq. inch
C0 700 money units
Die test cost 0.01 units/pin
Die pad/bump cost (Flip-chip, wire bond) 0.0005 units/bump
Table 3 illustrates typical parameters related to die and wafer, which have been presumed in
these calculations. The actual values of the parameters listed in first column are normally to be
provided by the user. C0 , as indicated earlier, is the base cost of a 6” wafer processed in 1 μm
technology. We consider scaling of λ from 0.99 to 0.09 by a scaling factor α = 0.8. For this value
of α, we get 10 possible technologies for implementation in the range of λ under consideration.
For experimental calibration purposes, we have used values of λ which are not exactly the
values available through fabrication houses. Wafer density Do for year 2001 as projected in
de f ects
technology roadmap [10], is approximately 2 Sq. in. , whereas we have chosen a value of 5
de f ects
Sq. in. . We assume flip-chip bonding of the dice to the substrate. Other bonding technique
such as wire bonding or tape automated bonding, could as well have been used. We also do
not take the packaging cost into our computation assuming that both the MCM implementations
13
employ identical packaging and have approximately same number of die pads. The equivalence
of number of die pads for the two implementations is ensured when we consider same cutset
constraint for both the flows.
3.3 Critical area computation
Defect causing fault
Metal w
r > 2s
s
r < 2s
Defect not causing fault
Figure 5: Faults on array of metal lines in a chip layout
For a particular defect type and size, critical area is defined as area on a layer of a layout where
occurrence of a defect will definitely cause a fault. We have presumed spot defects created by
two dimensional discs. A defect causes fault which is either short or open. Both of these types
of faults are catastrophic i.e. they cause functional failure of the IC. Figure 5 illustrates the spot
defects occurrence on the metal layer of a layout. The occurrence of a fault depend on size and
location of the defect. The critical area Acr (r) for a circular defect of radii r will be the area of
the die where if the center of the defect is deposited, a fault will occur. In Figure 5, the critical
area has been illustrated by shaded regions between array of metal lines of the layout. Let Ao
s
be the area of the array of metal lines. As illustrated in this figure, defect of radii r < 2 will not
cause any fault and hence Acr (r) is given by equation 8. Similarly, each of the defects having
14
radii r > (s + w2 ), would cause fault if it lie anywhere on the die. Hence
⎧
⎪
⎨ 0 for r < s
2
Acr (r) = (8)
⎪
⎩ Ao for r > s + w
2
For the given array of metal lines, Acr variation is illustrated in Figure 6. While equation 8 is
valid for all types of layouts, simple or complex, Figure 6 is valid only for the present example

layout of array of metal lines. The nature of variation of Acr between r > s/2 and r < w + 2s ,
depends on the complexity of the layout and the layer under consideration.
Ar
Ao
1 −
0
s s+w
2 2
Defect radii r
Figure 6: Critical area dependence on defect size
4 Model calibration
4.1 Characterization of task time for partitioning
During partitioning, we utilize a library containing cell layouts of basic gates and modules. We
compute and store critical areas of these gates and modules for combination of different tech-
15
nologies λ and defect sizes r. We consider three layers of an IC layout- poly silicon, metal 1 and
metal 2 for computation of critical areas and hence yield of an IC. We presume a scaling factor
α = 0.8 for uniform scaling down of layouts for smaller λ.
We have chosen a fast floating point adder (FP adder) and Texas Instruments’ DSP chip
TMS320C30 as demonstration vehicles. They represent a very small size and a moderate size
design, respectively. We assume a standard cell based implementation [11]. Results and data
after partitioning have been organized in Table 4 and 5. Table 6 shows resulting parameters
when each of these two designs would be implemented as single chip. We consider constrained
yield to vary between 0.78 and 0.90 in steps of 0.02, and constrained cutset varying in steps of
200.
Table 4: Results of partitioning FP adder for MCM implementation

Minimum Maximum 32-bit Fast FP adder
Yield constr. cutset constr. Tp Cost Dice # Cutset
500 26 0.31 2 123

0.78 700 93 0.30 2 119
1000 93 0.30 2 119
500 27 0.63 3 291
0.990 700 89 0.61 3 265
900 90 0.57 3 222
1200 92 0.58 4 308
600 55 0.60 4 300
0.991 800 116 0.61 3 219
1200 116 0.7 6 496
600 132 0.72 4 354
0.993 700 151 0.75 6 543
800 195 0.78 6 583
1000 210 0.75 6 510
0.994 800 210 0.78 5 441
0.994 1200 251 0.8 6 521
500 12 0.30 2 123
None 800 45 0.30 2 119
1600 48 0.29 2 119
We observe that for a particular yield constraint Y , relaxing cutset constraint Cs results in
16
Table 5: Results of partitioning TMS320C30 for MCM implementation
Minimum Maximum TMS320C30
Yield constr. cutset constr. Tp Cost Dice # Cutset
600 33 3.27 2 484

0.78 800 62 2.92 2 698
1000 106 1.80 5 992
800 50 6.81 2 576
0.82 1000 110 3.20 3 766
1200 144 2.22 4 952
1600 310 1.93 6 1182
1000 114 3.04 4 974
0.86 1400 201 2.39 6 1204
1800 384 2.40 6 11260
1200 182 5.53 4 1190
0.88 1400 265 2.65 6 1330
1600 360 3.55 5 1438
1800 362 2.89 6 1398
0.9 1200 183 4.10 5 1194
1400 358 4.00 5 1378
1200 48 1.21 3 438
None 2000 105 1.09 3 284
2500 113 0.96 3 290
reduced cost. The interpretation is as follows. By relaxing cutset constraint system performance
degrades, however it results in increased number of dice for implementation. For smaller sizes of
dice, yield improves and hence cost of MCM implementation decreases. For example, for Y =
0.82, cost for 2 dice implementation is more than 3 times the cost of 6 dice implementation. But
this happens at the reduced performance as cutset doubles from 800 to 1600. This implies a cost-
performance trade off. The reason of performance degradation is off-chip interconnect delay. As
we relax cutset constraint, number of dice increases and thus number of off-chip interconnections
increases. This cost performance trade off is similar for different sizes of the systems. We
observe that the actually obtained cutset is sometimes much smaller than the cutset constraint
considered for partitioning for both the design considered e.g. in Table 5, for yield constraint
of 0.82 and cutset constraints of 800 and 1000, actually obtained cutsets of partitions are 576
17
Table 6: SOC implementation results
Design Cost of Yield λ Chip area
application Manf. μm mm2
TMS320C30 3.3 0.20 0.19 302

FP Adder 0.024 0.43 0.09 21
MCM implementation of TMS320C30

4.0
Production Cost
3.5 Chip-bumping Cost
Implementation cost
Y=0.82
3.0
2.5
2.0
1.5
1.0
2 3 4 5 6
# of Dice
Figure 7: Cost variation for an MCM implementation
and 766. The explanation lies in the nature of partitioning algorithm used. The algorithm is an
heuristic and not an exact algorithm and hence not the whole partition space is explored. We also
observe that system cost decreases as number of partitions are increased. The production cost
variation of an MCM implementation with number of partitions is shown in Figure 7, whereas

the total system cost and performance variations are illustrated in Figures 7 and 8. We see that
increasing number of partitions results in increased cutset and hence increased bumping cost.
The cost variation with fabrication technology λ, for an SOC implementation, is shown in Figure
9. The cost for larger λ is high as number of dice on wafer is small as well as yield is poor as
the critical area of an individual die is larger. For very small λ the cost is high due to very high
18
MCM implementation of TMS320C30
6.0 1200
Total Cost
5.5 Cutset of dies
1100
Y=0.82
5.0
Total Cost
Cutset
4.5 1000
4.0
900
3.5
800
3.0
2 3 4 5 6
# of Dice
Figure 8: Cost and performance trade-off for an MCM implementation
density of small size particles. This results in very poor yield and hence larger cost.
We parameterize task completion time of partitioning Tpartg in terms of following design

characteristics- (1) yield constraint, (2) cutset constraint Cs and (3) design size. Using regression
fitting on data of Table 4 and 5, we obtained expression for Tpartg given by equation 9.
TBC( 1−Y )
1
Tpartg = TA + (9)
D + E Cs
As indicated earlier, Y is minimum yield constraint and Cs is maximum cutset constraint. Regres-
sion coefficients TB , C and E can be parameterized, where one of the parameters is design size.
The regression coefficients TA , TB , C, D and E obtained for TMS320C30 and FP adder chips are
listed in Table 7. In Table 7, we see that TB , C and E are parameters which show a expected trend
in their variation with the design size. The goodness of fit can be inferred from the knowledge
of variance. The illustrations of fitted curve and partitioning data have been shown in Figures 10
and 11.
19
SOC implementation of TMS320C30
26
Production cost
0.8
Predicted Yield
22
0.6
18
Yield
Cost
0.4
14
0.2
10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
λ μm
Figure 9: Cost variation for an SOC implementation
We propose following linear regression expression for characterizing TA , TB , C, D and E in
terms of design size.
xi ∈ {TA , TB ,C, D, E}
xi = XiYiG
Here Xi and Yi are linear regression coefficients and G represents size of design in equivalent
number of gates. A treatment, based on statistical method, has been provided in [12, 13, 14] for
estimation of design schedule and productivity. Design schedule estimation performed in [14], is
based on procedure of statistical data fitting on example design data from various design houses.
20
Design example TI’s DSP TMS320C30
220
Regression fitting Curve
200
Time of partitioning Tpartg
Partitioning results
180 Yield=0.82
160
140
120
100
80
60
40
500 1000 1500 2000 2500
Maximum Cutset constraint Cs
Figure 10: Regression fitting of partitioning time during cutset constraint
Table 7: Regression fit coefficients

Coefficient Value
TMS320C30 FP-Adder
TA 27 units 74 units
TB 3.6 × 10−3 units 0.48 × 10−3 units
C 1.25 1.01
D 10−4 10−4
E 0.987 0.981
5 Model evaluation
5.1 HCFG representation of candidate flows
Time-to-market is the duration from conceptualization of a product up to its actual production.

For MCM-based systems design, time-to-market includes the time taken for completion of sys-
tem design, the time spent in testing and time taken in manufacturing (i.e. fabrication). For
both of the design flows under consideration, we assume that manufacturing and testing times
21
Design example TI’s DSP TMS320C30
700
Regression fitting Curve
600 Partitioning results
Time of partitioning Tpartg
Cutset=1400
500
400
300
200
100
0.7 0.75 0.8 0.85 0.9
Minimum Yield constraint Y
Figure 11: Regression fitting of partitioning during yield constraint
are same. Also T2 , average value of design completion time for subflow G2 , is same for both of
the design flows. The HCFG equivalent for the flows under consideration have been shown in
Figure 12.
5.2 Completion time computation
We denote partitioning time with yield constraint by TY and partitioning time without yield con-
straint by TNY . Let T 2 denote the expected value of completion time and T2 the graph trans-
mittance of subgraph G2 . Obviously, TY = zTY and TNY = zTNY . Let TI,F1 and TI,F2 be the graph
transmittance and TP1 and TP2 be the process completion times of flow graphs 1 and 2 respectively.
For flow of Figure 12(a), graph transmittance TI,F1 is given by,
TI,F1 = TY · T2 . (10)
22
I I
zTY zTNY
T2 T2
p.zTNY
(1 − p)
F F
Process completion time= TP1 Process completion time= TP2
Project flow 1 Project flow 2
Figure 12: HCFG equivalent of design flows in Figure 3
Whereas for flow of figure 12(b), graph transmittance is given by,
TNY · T2
TI,F2 = (1 − p) . (11)
1 − p · TNY · T2
The completion time of task of flows 1 and 2, have been chosen as deterministic values so that
process completion time can be obtained by manipulating equations only.
Given the model parameters TY , TNY , p and the parameters related to graph G2 , the HCFG
approach can actually compute graph transmittance TI,F1 and TI,F2 . In equations 12 and 16, we
present estimates of expected values of TP! and TP2 analytically within the framework of HCFG
approach.
E[TP1 ] = TY + T2 (12)
23
−1
TI,F2 = (1 − p) · z(TNY +T2 ) · 1 − pz(TNY +T2) (13)

= (1 − p) z(TNY +T2 ) + pz2(TNY +T2 ) + p2 z3(TNY +T2 ) + ... (14)
Combining equations 14 and ??, we get following.

E[TP2 ] = (1 − p) · (TNY + T2 ) 1 + 2p + 3p2 + 4p3 + ... (15)
TNY + T2
= (16)
1− p
6 Results
Given the expressions for the expected completion time of two flows, we can now explore situ-
ations under which one of the flows, provides smaller completion time. First, we compute the
values of p at which completion times of the two flows would be equal. We denote this value of
p by pcritical . Hence, at p = pcritical equation 17 follows.
E[TP1 ] = E[TP2 ] (17)

TNY + T2
TY + T2 = (18)
(1 − pcritical )
TY − TNY
pcritical = (19)
TY + T2
In general, at any other values of p, ratio of completion times can be written as in equation 21.
The ratio is illustrated graphically in Figure 13.
E[TP1 ] TY + T2
= (1 − p) (20)
E[TP2 ] TNY + T2
(1 − p)
= (21)
(1 − pcritical )
We base our criterion for choosing better flow on the value of equation 21. We propose
following,
24
Ratio of E[TP ] and E[TP ]
1 2
3.5
pcritical=0.7
3 pcritical=0.2
2.5
E[TP ]/E[TP ]
2
2
1
1.5
1
0.5
0
0 0.2 0.4 0.6 0.8 1
Transition probability, p
Figure 13: Relative variation in completion time with p
E[TP1 ] 1
1. If E[TP2 ] ≤ 2 then flow 1 is better, ( p > pcritical )
E[TP1 ]
2. If E[TP2 ] ≥ 2, flow 2 is better ( p < pcritical ).
Hence to decide which flow has a larger probability of providing shorter design completion time,
we should know p a priori and T2 . T2 can be computed using HCFG approach, if individual task
times and transition probabilities for subflow G2 are known. Given a design size and I/O pins
etc., we may estimate TY and TNY from equation 9.
When there is a large difference in partitioning times of the two flows, i.e. TY ∼ TNY is large,
value of T2 decides which option is better- (1) for large T2 (T2
TY ) flow 1 is recommended, as
pcritical is small; (2) for small T2 (T2 ≤ TY ) comparable with TY , however flow 2 may be a chosen,
as pcritical is larger.
In general,
• For p < pcritical , flow 1 is not advised
25
• For other values of p (p ≥ pcritical ), flow 1 is always opted
For example, in Figure 13, for value of pcritical = 0.2 the flow 2 can not provide better value of
completion time than provided by flow 1, i.e. achieving p < pcritical is too difficult and even after
achieving it, completion time of flow 2 is only 20% better than that of flow1 at p ∼ 0. However
for large values of pcritical , i.e. pcritical = 0.7 in Figure 13, flow 2 can be considered better when
E[TP1 ]
p < 0.4. We observe in Figure that at p = 0.4, E[TP2 ] > 2.
We illustrate the above procedure through an example. We consider aforesaid MCM flows
for an example design when KGDs are available, TY is larger in comparison to T2 and pcritical
is nearly 1. This implies that flow 1 cannot be recommended. For illustration, we assume TY
=10 days, TNY =2 days, expected time for G(T2 ) for FPGAs (or standard cell) based design and
ASIC custom design be 10 days and 100 days respectively. Table 8 shows that flow 1 is strongly
recommended for custom ASIC designs whereas for FPGAs/standard-cell based designs, flow 2
may result in design time comparable to flow1.
Table 8: Design time comparison for design styles

Design Level of Flow 2
style E[T2 ] pcritical E[TP1 ] p E[TP2 ] Difficulty Advised?
( to achieve p)
12 √
FPGAs, 10 0.4 20 ≤ 0.4 1−p Not difficult
12
Standard cell > 0.4 1−p Feasible ×
102
Custom 100 0.08 110 ≤ 0.08 1−p Very difficult ?
102
> 0.08 1−p Feasible ××
7 Conclusions
The objectives of this chapter were to apply HCFG and to stress the need of design planning for
MCM based design of large systems. We analyzed two design flows consisting of partitioning
with and without manufacturability considerations. The proposed model in Chapter ?? has been
26
used for computation of design completion time. Our results indicate that for large systems,
taking manufacturing considerations during functional partitioning can reduce the number of
design iterations, resulting in a lower cost and time-to-market. However, in this chapter we
considered deterministic values for model parameters, whereas we presume stochastic variables
for model parameters of the design flow considered in next chapter i.e. Chapter ??.
References
[1] P. Dehkordi, K. Ramamurthi, D. Bouldin, H. Davidson, and P. A. Sandborn, “Impact
of packaging technology on system partitioning : A case study,” in IEEE International

Conference on Multichip Module, 1995.
[2] E. E. Swartzlander, “VLSI, MCM and WSI : A design comparison,” IEEE Design and Test
Magazine, pp. 28–34, Sept. 1998.
[3] P. Dehkordi and D. Bouldin, “Design for packageability : Early consideration of packaging
from a VLSI designer’s viewpoint,” IEEE Computer Magazine, pp. 76–81, Apr. 1993.
[4] S. Khan and V. K. Madisetti, “System partitioning of MCMs for low power,” IEEE Design
and Test Magazine, pp. 41–52, Spring 1995.
[5] P. A. Sandborn and M. Vertal, “Analyzing packaging trade-offs during system design,”
IEEE Design and Test Magazine, pp. 10–19, July-Sept. 1998.
[6] V. Sahula and C. P. Ravikumar, “Yield oriented design planning for MCM based systems,”
in IMAPS Conference on Emerging Microelectronics and Interconnection Technology, In-
dia, 2000.
[7] W. Maly, “Cost of silicon viewed from VLSI design perspective,” in Proceedings of
IEEE/ACM Design Automation Conference, 1994.
27
[8] J. P. de Gyvez and D. K. Pradhan, Design for Manufacturability: The art of process and
design integration, IEEE Press, 1998.
[9] H. T. Heineken and W. Maly, “Performance-Manufacturability tradeoffs in IC design,” in
Proceedings of IEEE/ACM Design Automation and Test in Europe Conference, 1998.
[10] “International Technology Roadmap for Semiconductors,” http://pub.itrs.net, 1999.
[11] S. Gupta, A. Srivastava, Chandrashekher, and C. P. Ravikumar, “System partitioning and

technology selection,” in 2nd IEEE VLSI Design and Test Workshops, INDIA, Aug. 1998.
[12] C. F. Fey, “Custom LSI/VLSI chip design productivity,” IEEE Journal of Solid State
Circuits, vol. SC-20, no. 2, pp. 555–561, Apr. 1985.
[13] C. F. Fey and D. E. Paraskevopoulos, “Studies in VLSI technology economics IV: Models
for gate array design productivity,” IEEE Journal of Solid State Circuits, vol. 24, no. 4, pp.
1085–1091, Aug. 1989.
[14] D. E. Paraskevopolous and C. F. Fey, “Studies in LSI technologies III: Design schedule for
application specific integrated circuits,” IEEE Journal of Solid State Circuits, vol. sc-22,
no. 2, pp. 223–229, Apr. 1987.
28

Evaluating Cost and Time-to-Market Requirements of MCM Designs

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Evaluating Cost and Time-to-Market Requirements of MCM Designs

Diunggah oleh

Hak Cipta:

Format Tersedia

Evaluation and Improvement of MCM Design Flow

Vineet Sahula C. P. Ravikumar∗

Deptt. of ECE Texas Instruments, Bangalore

Regional Engineering College, Jaipur ravikumar@india.ti.com

MCM design is a preferred methodology for implementation of large systems in telecom-

munication applications e.g. wireless mobile transceiver. It permits inclusion of manufac-

cost and design time as the design objectives.

1 Design for manufacturability

Table 1: Design objectives dependency for MCM based design

an objective function. Similarly, when performance is dominant objective, flow 3 is adopted,

where performance is a constraint and yield is an objective function. In performance oriented

SOC Design Methodology MCM Design Methodology

Performance oriented Yield oriented Performance oriented

Detailed design Detailed design Detailed design

Prototype fabrication/ Prototype fabrication/

Package design Package design Package design

Flow 1 Flow 2 Flow 3

Package material={Plastic, Ceramic}

Figure 1: Design planning for packageability and manufacturability

account during the design stages of VLSI systems [3, 4, 5].

tion of constrained system partitioning for MCMs.

3 Design flow with yield oriented partitioning

Detailed Detailed Detailed Detailed

Figure 2: MCM design flows using HW-SW codesign methodology

We describe parameterization of completion time of partitioning task. Similar procedure may be

Completion time TP1 Completion time TP2

(a) Design flow 1 (a) Design flow 2

Figure 3: MCM Flows under consideration redrawn from figure 2

3.1 Algorithm for functional partitioning

A given design description is partitioned either at a structural level or at a behavioral level of

We have chosen Simulated Annealing, a heuristic algorithm, for partitioning. Algorithm 1,

Algorithm 1 Simulated Annealing for partitioning

temperature when inferior solutions are being successively rejected.

3.2.1 Wafer cost

The wafer cost model as given by Maly in [7] states

3.2.2 Yield estimate

Di (r) = Doi · fi (r) (3)

The nature of distribution of Di (r) is illustrated in Figure 4.

Figure 4: Defect distribution curve

3.2.3 Number of dice per Wafer

Where ri is given by following expression.

Ndie is total number of dice on the wafer

h, w are height and width of the die, respectively

3.2.4 Die cost

Table 3: Die parameters

3.3 Critical area computation

Defect causing fault

Defect not causing fault

Figure 5: Faults on array of metal lines in a chip layout

Figure 6: Critical area dependence on defect size

4.1 Characterization of task time for partitioning

Table 4: Results of partitioning FP adder for MCM implementation

500 26 0.31 2 123

600 33 3.27 2 484

TMS320C30 3.3 0.20 0.19 302

MCM implementation of TMS320C30

Figure 7: Cost variation for an MCM implementation

variation of an MCM implementation with number of partitions is shown in Figure 7, whereas

Figure 8: Cost and performance trade-off for an MCM implementation

We parameterize task completion time of partitioning Tpartg in terms of following design

We propose following linear regression expression for characterizing TA , TB , C, D and E in

terms of design size.

Figure 10: Regression fitting of partitioning time during cutset constraint