Anda di halaman 1dari 192

System on Chip (SoC)

Dr. Bharat Garg


Assistant Professor, ECED
Thapar Institute of Engineering & Technology
Syllabus: SOC (PVL333)

5/18/2019 Lec/PVL333/Jan-July/2019 2
Course Learning Outcomes

The student will be able to


1. Acquire knowledge about Top-down SoC design flow.
2. Understand the ASIC Design flow and EDA tools.
3. Acquire knowledge about Front-end and back-end chip
design.
4. Understand the designing communication Networks.
5. Understand the design space exploration.
6. Understand the design methodologies for SoC

5/18/2019 Lec/PVL333/Jan-July/2019 3
Recommended Books

1. Wolf, W., Modern VLSI Design: System-on-chip Design,


Prentice Hall (2002) 3rd ed.
2. Nekoogar, F. and Nekoogar, F., From ASICs to SOCs: A
Practical Approach, Prentice Hall (2003).
3. Uyemura, J.P., Modern VLSI Design – SOC Design, Prentice
Hall (2001).
4. Rajsuman, R., System-on-a-chip: Design and Test, Artech
House (2000).
5. Asheden, P.J. and Mermet J., System-on-Chip Methodologies
and Design Languages, Kluwer Academic (2002)

5/18/2019 Lec/PVL333/Jan-July/2019 4
Evaluation Scheme

5/18/2019 Lec/PVL333/Jan-July/2019 5
Introduction to SOC

5/18/2019 Lec/PVL333/Jan-July/2019 6
Introduction to SoC- History
• First generation chips contains a few transistors
• Today silicon technology allows us to build chips consisting of
hundreds of million of transistors (Intel Pentium IV 0.09 micron).
This technology has enabled new levels of system integration onto a
single chip.
• Mobile phones, portable computers and internet applications will be
built using a single chip.
• The demand for more powerful product and the huge capacity of
today’s silicon technology have moved System-on-Chip (SoC)
designs from leading edge to mainstream design process.
• SoC technology will put the maximum amount of technology in to
smallest possible space.

5/18/2019 Lec/PVL333/Jan-July/2019 7
Evolution of Microelectronics: the SOC
Paradigm

Yesterday’s chips is today’s functional blocks

5/18/2019 Lec/PVL333/Jan-July/2019 8
System On Chip

• An IC that integrates multiple components of a system onto a


single chip.
• MPSoC addresses the performance requirements.

5/18/2019 Lec/PVL333/Jan-July/2019 9
S3C6410 based Mobile Processor

5/18/2019 Lec/PVL333/Jan-July/2019 10
A Representative 2G/2.5G Cell Phone

5/18/2019 Lec/PVL333/Jan-July/2019 11
Example of Complex SoC

• An IC that

5/18/2019 Lec/PVL333/Jan-July/2019 12
SOC Concept

5/18/2019 Lec/PVL333/Jan-July/2019 13
Paradigm Shift in SoC Design

5/18/2019 Lec/PVL333/Jan-July/2019 14
What is SOC

• SoC: More of a System not a Chip


– In addition to IC, SoC consists of software and interconnection
structure for integration.
• SoC may consists of all or some of the following:
– Processor/CPU cores
– On-chip interconnection (buses, network etc)
– Analog circuits
– Accelerators or application specific hardware modules
– ASICs Logics
– Software-OS, Applications etc.
– Firmware

5/18/2019 Lec/PVL333/Jan-July/2019 15
SOC (Cont.)

• Technological Advantages
– Today’s chip can contains 100M transistors
– Transistor gate length are now in term of nano meters
– Approximately every 18 months the number of transistors on a chip
doubles– Moore’s Law
• The consequences
– Components connected on a Printed Circuit Board can now be
integrated onto single chip
– Hence the development of System-on-Chip design

5/18/2019 Lec/PVL333/Jan-July/2019 16
Major SOC Applications
• Speech Signal Processing
• Image and Video Signal Processing
• Information Technologies
• PC interface (USB, PCI, PCI-Express, IDE, etc.), Computer
peripheries (Printer control, LCD monitor controller, DVD
controller, etc.)
• Data Communication
• Wireless communication: 10/100 Based-T, XDL, Gigabit
Ethernet etc.
• Wireless communication: BlueTooth, WLAN, 2G/3G/4G,
WiMax, UWB, etc.

5/18/2019 Lec/PVL333/Jan-July/2019 17
Current Mobile SOCs

5/18/2019 Lec/PVL333/Jan-July/2019 18
TI OMAP5430 SOC

5/18/2019 Lec/PVL333/Jan-July/2019 19
Moore’s Law

The performance of the IC, including the number of components


on it, doubles every 18-24 months with the same chip price - GM
5/18/2019 Lec/PVL333/Jan-July/2019 20
The size advantage

5/18/2019 Lec/PVL333/Jan-July/2019 21
Evaluation of Semiconductor Device
Technology

5/18/2019 Lec/PVL333/Jan-July/2019 22
Benefits

• There are several benefits in integrating a large digital system


into a single integrated circuit.
• These includes:
– Lower cost per gate
– Lower power consumption
– Faster circuit operation
– More reliable implementation
– Smaller physical size
– Greater design security

5/18/2019 Lec/PVL333/Jan-July/2019 23
Drawbacks

• The principle drawback of SoC design are associated with the


design pressure imposed on today’s engineers such as
– Time-to-market demands
– Exponential fabrication cost
– Increased system complexity
– Increased verification complexity

5/18/2019 Lec/PVL333/Jan-July/2019 24
Gap in Current Technology Demand and
Supply

5/18/2019 Lec/PVL333/Jan-July/2019 25
System on Chip Cores

• One solution to the design productivity gap is to make ASIC


designs more standardized by reusing segments of previously
manufactured chips.
• These segments are known as Blocks, Macros, Cores or Cells.
• The blocks can either be developed in-house or licensed from
an IP company
• Cores are the basic building blocks for SOCs.

5/18/2019 Lec/PVL333/Jan-July/2019 26
Intellectual Property

• In today's rapidly growing technology, the time to market is a


very important term in Electronic design business.
• From business point of view, it is always required for a
manufacturer to come up with new product in the market at the
earliest.
• The reuse of existing verified designs in the new product is a
key to save design time.

5/18/2019 27
Cont…

• IC designers typically use predesigned modules to avoid


reinventing the wheel for every new product.
• By practicing design-reuse techniques—that is, using blocks
that have been designed, verified, and used previously—
various blocks of a large ASIC/SOC can be assembled quite
rapidly.
• Another advantage of reusing existing blocks is to reduce the
possibility of failure based on design and verification of a
block for the first time. These predesigned modules are
commonly called Intellectual Property (IP) cores or Virtual
Components (VC).

5/18/2019 28
Cont…
• Designing an IP block generally requires greater effort and
higher cost. However, due to its reusable architecture, once an
IP is designed and verified, its reuse in future designs saves
significant time and effort in the long run.
• Designers can either outsource these reusable blocks from
third-party IP vendors or design them in-house.

1/30/2019 29
Resources Vs Number of Uses Plot
IP Core Licensing

• Licensing the IP cores from IP provider companies


has become more popular in the electronic industry
than designing inhouse reusable blocks for the
following reasons: -
 Lack of expertise in designing application-specific reusable building
blocks.
 Savings in time and cost to produce more complex designs when using
third-party IP cores.
 Ease of integration for available IP cores into more complicated
systems.
 Commercially available IP cores are preverified and reduce the design
risk.
 Significant improvement to the product design cycle.

5/18/2019 30
Intellectual Property Categories

• To provide various levels of flexibility for reuse and


optimization, IP cores are classified into three distinct
categories :-
 Soft IP
 Firm IP
 Hard IP

5/18/2019 31
Soft IP Core
• Soft IP cores are delivered as RTL (VHDL/Verilog code) to
provide functional descriptions of IPs.
• These cores offer maximum flexibility and re-configurability
to match the requirements of a specific design application.
• Although soft cores provide the maximum flexibility for
changing their features, they must be synthesized, optimized,
and verified by their user before integration into designs.
• Some of these tasks could be performed by IP providers;
however, it's not possible for the provider to support all the
potential libraries.
• Therefore, the quality of a soft IP is highly dependent on the
effort needed in the IP integration stage of SOC design.

5/18/2019 32
Firm IP Core

• These cores are delivered in the form of targeted netlists to


specific physical libraries after going through synthesis
without performing the physical layout.

• They can be optimized at the placement and routing level.

5/18/2019 33
Hard IP Core

• Hard IP cores consist of hard layouts using particular physical


design libraries and are delivered in masked-level designed
blocks (GDSII format).
• These cores offer optimized implementation and the highest
performance for their chosen physical library.
• The integration of hard IP cores is quite simple and the core
can be dropped into an SOC physical design with minor
integration effort.
• However, hard cores are technology dependent and provide
minimum flexibility and portability in reconfiguration and
integration across multiple designs and technologies.

1/30/2019 34
Comparison of Different IP Formats
IP Format Representation Optimization Technology Reusability
Reusability
Soft RTL Low Technology Very High
Independent
Firm Targeted Netlist High Technology High
Generic
Hard GDSII Very High Technology Low
Dependent

5/18/2019 35
Examples of IPs

Category Intellectual Property


Processors ARM7, ARM9, and ARM10, ARC

Application ADPCM, CELP, MPEG-2, MPEG-4, Turbo Code, Viterbi, Reed


specific DSP Solomon, AES

Mixed Signal ADCs, DACs, Audio Codecs, PLLs, OpAmps, Analog MUX

I/Os PCI, USB, 1394, 1284, E-IDE, IRDA

Miscellaneous UARTs, DRAM Controller, Timers, Interrupt Controller, DMA


Controller, SDRAM Controller, Flash Controller, Ethernet 10/100
MAC

5/18/2019 36
SOC Design Flow

5/18/2019 37
SOC Design Flow

• The different and conflicting requirements in SOC design are


of increasing design size, deep sub micron effects (DSM) and
the necessity for shorter and predictable implementation times.

• To meet these requirements, SOC designs are implemented by


following a design flow: -
 Bottom – UP design flow
 Top Down design flow
Bottom Up Design Approach
• The design team starts by partitioning the system design into
various subsystem and system components (blocks). The
subsystems are targeted to ASICs, FPGAs, or microprocessors.

• Since these subsystem designs are usually on the critical path


to completing the design, the team starts on these immediately,
developing the other system components in parallel. Each
block is designed and verified based on its own requirements.
When all blocks are complete, system verification begins.

• The bottom-up design approach has the advantages of


focusing on the initial product delivery and of allowing
work to begin immediately on critical portions of the
system.
Bottom Up Design Approach (Cont.…)
• With Bottom-Up approach, system-level design errors do not
surface until late in the design cycle and may require costly
design iterations– Disadvantage

• Furthermore, while related products can reuse lower-level


components, they cannot have any system-level similarities in
design architecture, intellectual property, or verification
environment -Disadvantage

• Finally, bottom-up design requires commitment to a


semiconductor technology process early on and hinders the
ability to reuse designs in other technology processes-
Disadvantage
Top Down Design Approach
• The alternative approach is the top-down design approach. In
this approach, the design team invests time in developing
system-level models and verification environment.
• Using the system models, the team is able to analyze trade-offs
in system performance, features set, partitioning, and
packaging.
• Furthermore, a system-level verification environment ensures
that system requirements are met and provides the
infrastructure for verifying the subsystems and system
components.
• The top-down design approach results in higher confidence
that the completed design will meet the original schedule
and system specifications.
Top Down Design Approach (Cont.)
• Basing the starting point of the system design on a single
verified model ensures that critical design issues surface early
in the process and reduces false starts in the concurrent design
of ASICs, PCBs, and systems.
• The design team can discover and manage system-level issues
up front, rather than having to redesign the system at the end
of the design cycle--Advantage
• Because each subsystem is designed and verified within the
context of the system verification environment, the overall
system functionality is preserved--Advantage.
• The top-down design approach also effectively leverages the
initial product development in the design of related products.
The related projects begin with the system environment in
place. The design team can reuse and re-verify alternative
designs, packages, or implementations without having to
rebuild a new context or infrastructure.
Basic Principles of Top Down Approach

• Understanding the basic principles of top-down design is the


first step toward implementing the best design practices. The
top-down design approach is based on the following
principles:
 Use a HDL or other high-level programming language to create system
and subsystem models as well as reusable cores.
 Validate designs early by developing a system-level verification
environment up front. A system verification environment includes a set
of testbenches and models and a detailed, formal test plan for validation
of the system. The models and testbenches are a “golden” representation
of the design that the team can use to qualify the design of the
components.
 Develop a design for test (DFT) strategy.
Top-Down Design

• System specification
• Refine architecture/algorithm
• Decompose into blocks
• Design or select macros
• Integrate macros
• Deliver to next level integration
• Verify

5/18/2019 Lec/PVL333/Jan-July/2019 44
SOC Design Process

• Addresses these problem concurrently


– Functionality
– Timing
– Physical design and
– Verification

• Incrementally improving as design converges

• Top-down to combination of top-down and bottom-up


– Bottom-up with critical low-level blocks, reuse soft or hard macros

5/18/2019 Lec/PVL333/Jan-July/2019 45
SOC Methodology Evolving ...
How to Design an SOC
How to Design an SOC
How to Design an SOC
How to Design an SOC
Key to SOC Design Process
• Iteration is an inevitable part of the design process
• The problem is how large the loop is
• Goal
– Minimize the overall design time
• But How
– Plan for iterations
– Minimize iteration numbers (Specially major loops)
– Local loop is preferred (e.g. coding, verifying, synthesizing small
blocks)
– IP clearly help due to pre-verified
– Parameterized blocks offer more trade-off between area, performance
and functionality.
• Carefully designed spec is the best way to minimize the loop

5/18/2019 Lec/PVL333/Jan-July/2019 51
System Design Flow

• SoC challenges force chip designers to alter design flow.


– Waterfall model to spiral model
– Top-down methodology to Top-down/Bottom-up combination

5/18/2019 Lec/PVL333/Jan-July/2019 52
Waterfall Model

• Step-to-step handoff
• Fewer feedback paths in the flow
– Possibilities for re-iteration exist

5/18/2019 Lec/PVL333/Jan-July/2019 53
Waterfall Flow Limitations

• Handoff between teams rarely clean


– May have to tell the system designers algorithm will not work

• Doesn’t work for large, deep submicrometer designs


– Software and hardware must be developed concurrently
– Physically design issues must be considered early to meet performance
goals.

5/18/2019 Lec/PVL333/Jan-July/2019 54
Spiral Development Model

• Response to increased complexity and shorter time-to-market


pressure

• Multiple aspects of design are worked on simultaneously.

• Each area is incrementally improved together as the entire


project is completed.

5/18/2019 Lec/PVL333/Jan-July/2019 55
Spiral Methodology

5/18/2019 Lec/PVL333/Jan-July/2019 56
Spiral Methodology

5/18/2019 Lec/PVL333/Jan-July/2019 57
Spiral Design Characteristics

• Concurrent HW & SW development


• Parallel verification and synthesis.
• Floor-planning and place-and-rout included in the synthesis
process.
• Modules developed only if a predesigned hard or soft macro is
not available.
• Planned iteration throughout.

5/18/2019 Lec/PVL333/Jan-July/2019 58
Design Metrics
Design Challenges
• Digital integrated circuits experience exponential growth in
complexity (Moore’s law) and performance

• Design in the deep submicron (DSM) era creates new


challenges
 Devices become somewhat different and hence new models are
required for accurate simulation.
 Global clocking becomes more challenging
 Interconnect effects play a more significant role
 Power dissipation may be the limiting factor
Design Issue of SOC (Cont…)

• Due to the use of various hard, firm, and soft cores from
multiple vendors, the SoC design may contain a very high
level of integration complexity, interfacing and
synchronization issues, data management issues, design
verification, and test, architectural, and system-level issues.

• The use of a wide variety of logic, memory, and analog/mixed-


signal cores from different vendors can cause a wide range of
problems in the design of SoC.
Design Issue of SOC (Cont…)
• Portability Methodology related Issues
 Non-netlisted cores
 Layout-dependent step sizes
 Aspect ratio misfits
 Hand-crafted layout

• Timing Issues
 Clock redistribution
 Hard core width and spacing disparities
 Antenna rules mismatch
 Timing reverification
Major Design Issues
• Microscopic issues • Macroscopic issues
– ultra-high speeds – time-to-market
– power dissipation and – design complexity
supply rail drop (millions of gates)
– growing importance of – high levels of abstractions
interconnect – design for test
– noise, crosstalk – reuse and IP, portability
– reliability, manufacturability – tool interoperability
– clock distribution

Year Tech. Complexity Frequency 3 Yr. Design Staff Costs


Staff Size
1997 0.35 13 M Tr. 400 MHz 210 $90 M
1998 0.25 20 M Tr. 500 MHz 270 $120 M
1999 0.18 32 M Tr. 600 MHz 360 $160 M
2002 0.13 130 M Tr. 800 MHz 800 $360 M
Fundamental Design Metrics
• Functionality
• Cost
 NRE (fixed) costs - design effort
 RE (variable) costs - cost of parts, assembly, test
• Reliability, robustness
 Noise margins
 Noise immunity
• Performance
 Speed (delay)
 Power consumption; energy
• Time-to-market
• Reusability
Cost of Integrated Circuits
• NRE (non-recurring engineering) costs
– Fixed cost to produce the design
• design effort
• design verification effort
• mask generation
– Influenced by the design complexity and designer productivity
– More pronounced for small volume products
• Recurring costs – proportional to product volume
– silicon processing
• also proportional to chip area
– assembly (packaging)
– test
fixed cost
cost per IC = variable cost per IC + -----------------
volume
NRE Cost is Increasing
Silicon Wafer

Single die

Wafer

From http://www.amd.com
Recurring Costs

cost of wafer
cost of die = -----------------------------------
dies per wafer × die yield

 × (wafer diameter/2)2  × wafer diameter


dies per wafer = ----------------------------------  ---------------------------
die area  2 × die area
Reliability

 Noise – unwanted variations of voltages and


currents at the logic nodes.
 from two wires placed side by side v(t)

 capacitive coupling
- voltage change on one wire can influence signal on the
neighboring wire
i(t)
- cross talk
 inductive coupling
- current change on one wire can influence signal on the
neighboring wire
VDD
 from noise on the power and ground supply rails
 can influence signal levels in the gate
Example of Capacitive Coupling
• Signal wire glitches as large as 80% of the supply voltage will
be common due to crosstalk between neighboring wires as
feature sizes continue to scale.
Crosstalk vs. Technology
Pulsed Signal
0.12m CMOS
0.16m CMOS

Black line quiet


Red lines pulsed 0.25m CMOS
Glitches strength vs technology 0.35m CMOS

From Dunlop, Lucent, 2000


Static Gate Behavior
• Steady-state parameters of a gate – static behavior – tell how
robust a circuit is with respect to both variations in the
manufacturing process and to noise disturbances.
• Digital circuits perform operations on Boolean variables
x {0,1}
• A logical variable is associated with a nominal voltage level
for each logic state
1  VOH and 0  VOL
VOH = ! (VOL)
V(x) V(y)
VOL = ! (VOH)

 Difference between VOH and VOL is the logic or signal swing


Vsw
Noise Margins
 For robust circuits, want the “0” and “1” intervals to be a s
large as possible
VDD VDD

VOH "1"
NMH = VOH - VIH
VIH
Noise Margin High Undefined
Region
Noise Margin Low VIL
NML = VIL - VOL
VOL
"0"
Gnd Gnd
Gate Output Gate Input

 Large noise margins are desirable.


Noise Immunity
 Noise margin expresses the ability of a circuit to overpower a
noise source
 Noise sources: supply noise, cross talk, interference, offset

 Absolute noise margin values are deceptive


 A floating node is more easily disturbed than a node driven by a low
impedance (in terms of voltage)

 Noise immunity expresses the ability of the system to process


and transmit information correctly in the presence of noise

 For good noise immunity, the signal swing (i.e., the difference
between VOH and VOL) and the noise margin have to be large
enough to overpower the impact of fixed sources of noise
Fan-In and Fan-Out

 Fan-out – number of load gates


connected to the output of the
driving gate
N
 gates with large fan-out are slower

 Fan-in – the number of inputs to the


gate
 gates with large fan-in are bigger and M
slower
Time To Market

• Time-to-market for SOC Design should be appropriate.

• Design reusability contributes in reducing Time To Market for


SOC Design.
Design Deliverables: Soft IP Core
• Synthesizable Verilog/VHDL
• Example synthesis script
• RTL compiled module
• Structural compiled module
• Design, timing, and synthesis cells
• Functional simulation testbench
• Installation script
• Bus functional models and monitors used in testbenches
• Testbenches with sample verification tests
• Cycle-based simulation or emulation models
• Bus functional models
• Application note that describes signal slew rate at the inputs, clock
skew tolerance, output-loading range, and test methodology
Design Deliverables: Hard IP Core

• Installation scripts
• ISA (instruction set architecture) or behavioral model of the
core
• Bus functional and fully functional models for the core
• Cycle-based emulation model (on request)
• Floor planning, timing, and synthesis models
• Functional simulation test bench
• Bus functional models and monitors used in test benches
• Test benches with verification tests
• Manufacturing tests
• GDSII with technology file
Improvement Techniques For Specific
Design Metrics
Fundamental Design Metrics

• Functionality (No. Of features)


• Performance
• Power consumption
• On Chip Testability
• Time-to-market
• Reusability
Improving/Adding Features

• Addition of features contributes to the complexity of SOC


design. Also requirement of data exchange between different
Macros makes integration process complex.

• To address these issues following design techniques are used:


 Using a suitable Design Flow to handle complexity of design
 Using a standard bus architecture for communication between
Macros
Design Flow for complex SOC Designs

• To handle complexity of SOC designs, designer must follow


appropriate design flow instead of going with traditional ASIC
design flow.

• The traditional model for ASIC development often called a


waterfall model. In a waterfall model, the project transits from
one phase to another phase in a step function, never returning
to the activities of the previous phase.
Water Fall Model
Weakness of Water Fall Model
• Software development team can make little progress before hardware
model is available for debugging purpose. Thus, hardware and software
development are essentially serialized.
• Design handoffs from one team to the next are rarely clean. For
example, the RTL design team may have to go back to the system
designer and tell him that the algorithm is not implementable, or the
synthesis team may have to go back to the RTL team and inform them
that the RTL must be modified to meet timing.
• For large, deep submicron designs, this waterfall methodology simply
does not work. Large systems have sufficient software content that
hardware and software must be developed concurrently to ensure
correct system functionality.
• Physical design issues must be considered early in the design process to
ensure that the design can meet its performance goals.
• For the above reason Top Down design flow or a combination of Top
Down or Bottom Up flow is used for Complex SOC Designs.
Performance

• Meeting performance/timing goals is a major requirement for


an SOC design.

• Designer need to deal with performance/timing related issues


at two different levels, namely:
 Microscopic level (at gate level)
 Macroscopic level (at Module/IP level)
Gate level timing issues

• In DSM technologies, the RC delay for the wires between the


gates can be much larger than the intrinsic delay of the gate.
• Wire load models provide estimates of these wire delays for
synthesis, but these are only estimates.
• As blocks become larger, the variance between the average
delay and the actual delay on worst case wires can become
quite large.
• The problem is that the architect and designer do not know
which wires will require additional buffering until physical
design.
• If timing problems are severe enough to require architectural
changes, such as increasing the pipeline depth, then other
blocks, and even software, may be affected.
Addressing the problem of Gate level timing
• To meet timing constraints, it may be necessary to increase the
drive strengths of cells driving long wires.
 For very long wires, additional buffers must be inserted at intermediate
points between the gates to assure acceptable rise and fall times as well
as delays.
• Timing-driven place and route tools can help deal with some
of these timing problems by attempting to place critical timing
paths so as to minimize total wire length.
• Physical synthesis, which combines synthesis with timing-
driven placement, has done a good job in managing the
problems of achieving timing closure in deep submicron
designs.
• But these tools cannot correct for fundamental architectural
errors, such as an insufficient number of pipeline stages.
Addressing the problem of Module level timing
• The on-chip communication architecture must provide latency
or bandwidth guarantees to ensure that the application
performance constraints are satisfied.
• A latency guarantee implies that a data unit must traverse the
communication architecture and reach its destination within a
finite amount of time, determined by a latency bound (e.g., 40
ns from source to destination).
• A bandwidth guarantee implies that a group of data units must
traverse a portion of the communication architecture at a
certain data rate, as determined by the bandwidth requirements
(e.g., 100 megabits/second from source to destination).
• Depending on the performance requirements of an application,
various types of on-chip communication architectures are
required.
Need of Standard Bus Architecture
• The modules in an MPSoC design invariably need to communicate
with each other during application execution. For instance, a 𝜇p
fetches instructions from memory components, or writes to external
memories by sending data to an on-chip memory controller.
• Design separate communication channels (routes) for each module
to another module, with different transmit data sizes, increases the
complexity of design exponentially.
• Standard Bus architectures are preferred for intra chip
communications.
• Few examples of Standard Bus Architectures are:-
 AMBA
 AHB
AMBA (Advanced Microcontroller Bus
Architecture)
 The ARM’s AMBA protocol is an open standard, on-chip
interconnect specification for the connection and management
of functional blocks in a SoC.
 It facilitates the development of multi-processor designs with
large numbers of controllers and peripherals.
 AMBA promotes design re-use by defining common interface
standards for SoC modules.
Design Techniques For Low Power

• Traditionally, design teams have used full-custom design


techniques to achieve low power (such as multi Vt MOS), but
this approach does not give the technology portability required
for reuse-based design.

• Techniques used for both low-power and reusable designs are


as follows: -
 Lowering the Supply Voltage
 Reducing Capacitance and Switching Activity
 Memory Architecture
 Clock Distribution
 Sizing
Lowering the Supply Voltage
• Running the core of the chip at the lowest possible voltage (consistent with
correct functionality) is first step in achieving a very low power design.
• Unfortunately, lowering the supply voltage has several adverse effects
which must be overcome in other areas of design.
• The primary problem with lowering the supply voltage is that it slows the
timing performance of the chip.
– To compensate for this factor, designers typically use pipelining and
parallelism to increase the inherent performance of the design.
Although this increases area of the design, and thus the overall
capacitance, the end result can lower power significantly .
• I/O voltages must meet the requirements of the board design, and are
usually higher (3.3v to 5v) than the minimum voltage that the process will
support. Most designers run the I/O at the required voltage, and use a
separate, lower voltage power supply for the core logic of the chip.
Reducing capacitance and Switching Activity

• The standard cell library provider can use a variety of


techniques to produce a low power library.

• Once we have selected a good low-power library, we can use


architectural and design techniques to reduce system power.

• In real chips, memory design, I/O cells, and the clocking


network often dominate the overall power.
Memory Architecture

• Reducing power in the on-chip memories again involves both


circuit and architectural techniques.

• Most silicon providers have memory compilers that can


produce a variety of memory designs that trade off area,
power, and speed.

• The memory architecture itself can reduce power significantly.


Instead of using a single, deep memory, it may be possible to
partition the memory into several blocks.
Memory Architecture (Cont…)

• Only the block being accessed is powered up. This approach


again produces redundant logic (in extra decode logic), so it
reduces power at the expense of (slightly) increasing area.

• The technique is shown in Figure below.

Figure: Multi-block RAM architecture


Clock Distribution

• In pipelined designs, a significant portion of the overall power


is in the clock, so reducing power in the clock distribution
network is important.

• As few different clocks as possible should be used. Single


clock, flop-based designs can reduce power by 50%.

• Shutting down clock distribution to part of the circuit by clock


gating can significantly reduce chip power. Clock gating,
however, can be very technology dependent; careful design is
required to assure a portable, reusable design.
Sizing

• Gate sizing can produce a significant power savings in many


designs.

• This technique consists of reducing the drive strength of gates


to the lowest level that meets the timing requirements for the
design.

• Synthesis tools can do this automatically, without any


requirement for changing the RTL code.
On Chip Testability
• The design team must develop a strategy for the bring-up and debug
of the SoC design at the beginning of the design process. The most
effective debug strategies usually require specific features to be
designed into the chip.
• Adding debug features early in the design cycle greatly reduces the
incremental cost of these features, in terms of design effort and
schedule.
• Adding debug features after the basic functionality is designed can
be difficult or impossible, and very time consuming.
• Without effective debug structures, even the simplest of bugs can be
very difficult to troubleshoot on a large SoC design.
• Controllability and observability are the keys to an easy debug
process.
Controllability & Observability
• Controllability: By controllability from DFT point of view, we intend if both
‘0’ and ‘1’ are able to propagate to each and every node within the target
patterns. A point is said to be controllable if both ‘0’ and ‘1’ can be
propagated through scan patterns. To achieve DFT coverage for a node, it is
needed to be controllable.

• What if a node is not controllable: If a node is not controllable, it cannot be


tested. For production mode devices, it is necessary to have certain minimum
percentage of nodes controllable to ensure reliable devices to the customers.
So, less number of controllable nodes mean less DFT coverage, and hence,
less reliable device.
Controllability & Observability

 Inserting control points (enhancing controllability): A node


can be made controllable by inserting control points.

 If the test coverage target is not getting met through target number
of patterns, control points are inserted to increase the test
coverage.
Controllability & Observability
• Observability implementation is a major
problem for SoC designs. We’d like to be able
to put logic analyzer probes on internal nodes
in the chip, and debug the chip the way we
debug boards.
 For SoC designs, we can come close to this by
adding additional circuitry on the chip to aid
observability.
 We can add circuitry to monitor buses to check
data transactions and detect illegal transactions.
 Another useful approach is to provide a
mechanism for observing the internal bus(es)
on the chip’s I/O pins. It is often done by
muxing the bus onto existing I/O pins.
Reusability

• To provide the highest reuse benefits, IP should have these


features: -
 Configurable to meet the requirements of many different
designs
 Standard interfaces
 Complete set of deliverables to facilitate integration into a
chip design
Configurability
• Most IP has to be configurable to meet the needs of many
different designs (and if it doesn’t meet the needs of many
different designs, it is not worth making the investment
required to make it reusable). For example:
 Processors may offer different implementations of multipliers,
caches, and cache controllers.
 Interface blocks like USB may support multiple configurations
(low-speed, full speed, high-speed) and multiple interfaces for
different physical layer interfaces.
 Buses and peripherals may support configurable address and
data bus widths, arbitration schemes, and interrupt capability.

• Configurability is key to the usability of IP, but also poses


great challenges, since it makes the core harder to verify.
Standard Interfaces
• In an SOC design, several components has an interface to the
outside world consisting of a set of pins that are responsible
for sending/receiving addresses, data, and control information
to/from other components.

• The choice of pins at the interface is governed by the particular


bus protocol of the communication architecture.

• In order to seamlessly integrate all these components into an


SOC design, it is necessary to have some kind of a standard
interface definition for the components.
Standard Interfaces (Cont…)

• Without a standard interface, the component interfaces will not


be compatible with the bus architecture implementation, and
consequently will not function correctly.

• In such a scenario, the components will require the design of


logic wrappers at their interfaces to correctly interface with the
bus architecture being used.

• These logic wrappers, require additional area on the chip and


can be time consuming to design and verify.
ASIC Design Flow
Design Flow
• Adapt circuit design style to market requirement
• Parameters:
– Cost
– Performance
– Volume
• Full Custom
– Maximal freedom
– High performance blocks
– Slow
• Semi-custom
– Standard Cells
– Gate Arrays
– Mask Programmable (MPGAs)
– Field Programmable (FPGA)

5/18/2019 Lec/PVL333/Jan-July/2019 109


Full Custom IC Design

5/18/2019 Lec/PVL333/Jan-July/2019 110


ASIC Design Flow

5/18/2019 Lec/PVL333/Jan-July/2019 111


Semi-custom IC Design

• Almost all digital ICs are designed and manufactured by


semicustom ICs methodologies.

• Typically it refers to HDL based IC design with automatic


synthesis-place and route EDA tool.

• Semi-custom IC design methodology has three major


categories:
– Standard cell based ASIC design
– Gate array
– FPGA

5/18/2019 Lec/PVL333/Jan-July/2019 112


Overview of Semicustom ASIC Design Flow

• Front End VLSI Design Flow


– HDL Coding
– Functional Verification
• ASIC Netlist Generation Flow
– Synthesis
– DFT
– Formal Verification
– Pre-layout STA
• ASIC Physical Design Flow
– Place and Route
– Extraction
– Post-layout STA
– Physical Verification

5/18/2019 Lec/PVL333/Jan-July/2019 113


ASIC Design Flow
Behavioral
Verify
Model
Function
VHDL/Verilog
Synthesis
DFT/BIST Gate-Level Verify
& ATPG Netlist Function

Test vectors Full-custom IC

Transistor-Level Verify Function


Standard Cell IC Netlist & Timing
& FPGA/CPLD

Physical
DRC & LVS Verify
Layout
Verification Timing
Map/Place/Route

Feb 15, 2006 VLSI D&T Seminar 114


IC Mask Data/FPGA Configuration File
RTL design flow
HDL simulate

RTL manual
Synthesis design

netlist a

b
0
1
d
q

Library/ s clk

module Logic
generators optimization

a 0 d
netlist b 1
q

s clk

physical
design

layout

115
Input and output from ASIC synthesis flow

5/18/2019 Lec/PVL333/Jan-July/2019 116


Synthetic Flow
Physical Design: Overall Conceptual Flow

5/18/2019 Lec/PVL333/July-Dec/2017 118


Floorplanning

• Output from partitioning used for floorplanning

• Inputs:
 Blocks with well-defined shapes and area
 Blocks with approximated area and no particular shape
 Netlist specifying block connections

• Outputs:
 Locations for all blocks

119
Floorplanning problem
• Objectives
 Minimize area
 Reduce wire-length
 Maximize routability
 Determine shapes of
flexible blocks
• Constraints
 Shape of each block
 Area of each block
 Pin locations for each
block
 Aspect ratio

120
Placement

• The process of arranging circuit components on a layout


surface

• Inputs : Set of fixed modules, netlist

• Output : Best position for each module based on various cost


functions

• Cost functions include wirelength, wire routability, hotspots,


performance, I/O pads

121
Good placement vs Bad placement

• Good placement  Bad placement


 No congestion  Congestion
 Shorter wires  Longer wire lengths
 Less metal levels  More metal levels
 Smaller delay
 Longer delay
 Lower power dissipation
 Higher power dissipation
122
Routing
• Connect the various standard cells using wires

• Input:
– Cell locations, netlist

• Output:
– Geometric layout of each net connecting various standard cells

• Two-step process
– Global routing
– Detailed routing

123
FPGA vs. ASIC

5/18/2019 Lec/PVL333/Jan-July/2019 124


5/18/2019 Lec/PVL333/Jan-July/2019 126
Major EDA Companies and their tools

5/18/2019 Lec/PVL333/Jan-July/2019 127


ASIC Back End Design Flow
Introduction

ASIC Back End Design Flow consists of following steps:


 Acquiring Gate Level Netlist and Design Constraints
 Partitioning
 Floorplaning
 Placement
 Routing
 Verification
Gate Level Netlist

• Physical design team gets Gate Level netlist from Front End
Design team. The netlist is a logical description of the ASIC

• Verilog gate-level netlists are widely used owing to their ease


of understanding and clear syntax.

• Although the behavioral Verilog language has a vast number of


keywords, there are only a few that may be used to represent
the entire circuit function and connectivity.
Netlist Example

Structural Verilog Format


Design Constraints

• Design constraints are ASIC design specifications that are


applied during logic and physical synthesis.

• Each tool attempts to meet two general design constraints:


 Timing constraints: STA, DTA
 Design rule constraints
Timing Analysis

There are two ways to perform Timing Analysis


• Dynamic Timing Analysis requires a set of input vectors to
check the timing characteristics of the paths in the design. If
we have N inputs then we need to make 2N simulation
combinations to get full timing analysis.

• Static Timing Analysis checks timing violations without


simulations. This is faster but doesn't check functionality
issues.

5/18/2019 Lec/PVL333/Jan-July/2019 133


Static Timing Analysis

• Static Timing Analysis is a method for determining if a circuit


meets timing constraints without having to simulate so it is
much faster than timing-driven, gate-level simulation.

• EDA tools check setup, hold and removal constraints, clock


gating constraints, maximum frequency and any other design
rules.

• They take design netlist, timing libraries, delay information


and timing constraints as Inputs to perform static timing
analysis.

5/18/2019 Lec/PVL333/Jan-July/2019 134


Timing Path

• There are four timing paths as shown in the figure:


– Input to Register path
– Register to Output path
– Register to Register path
– Input to Output path
• We also can divide timing constraints into 3 categories:
– Clocking Requirements
– Boundaries Settings
– Timing Exceptions

5/18/2019 Lec/PVL333/Jan-July/2019 135


Clocks
Clock can to be defined as follow:
– Clock Source, maybe "Port", "Net" or "Pin" or "Virtual"
– Clock Period,
– Duty Cycle
– Clock Skew, Uncertainty
– Clock Latency, due to clock tree propagation
– Rise & Fall time
• // clock A 10ns with 70% duty cycle
• create_clock -period 10 -name ClkA -waveform {0 7} [get_ports A]
• // clock B 20ns with 50% duty cycle and phase 5 ns w.r.t clock A
• create_clock -period 20 -name ClkA -waveform {5 15} [get_ports
A]

5/18/2019 Lec/PVL333/Jan-July/2019 136


Port Delay

• The timing constraints is applied on input and output ports. The


main target is to leave a budget in time for the signal outside the
block. The designer should specify the time at which the inputs
would be available on the block and should specify the time for
which a signal travels outside the block for outputs.
• Input/output Delay: Input arrival and output required time
should be considered in timing constraints as follows
• # assume that T_CLKtoQ+TM = 10ns
• set_input_delay -clock CLOCK -max 10 [get_ports D}
• # assume that TN+T_setup = 2ns
• set_output_delay -clock CLOCK -max 10 [get_ports D}

5/18/2019 Lec/PVL333/Jan-July/2019 137


Timing Constraints
• TC is an important part of designing ASIC/FPGA to make sure
that it will behave correctly after manufacturing.

• Timing constraints are user-specified and are related to period,


frequency, net skew, maximum delay between end points, or
maximum net delay…

• The most basic timing constraints are as follows:


 System clock definition and clock delays
 Multiple cycle paths
 Input and output delays
 Minimum and maximum path delays
 Input transition and output load capacitance
 False paths
System Clock Definition and Clock Delays
• System clocks, and their delays, are extremely important
constraints in ASIC, System Clocks are supplied externally.

• All delays, especially in a synchronous ASIC design are


dependent upon the system clocks.

• Most logic synthesis tools consider the clock network delays


to be an ideal (i.e. a clock with fixed latency and zero skew)
and are used during design synthesis.

• The physical design tools use the system clock definition to


perform Clock Tree Synthesis (CTS) and try to meet the clock
networks’ delay constraints.
Multi Cycle Paths
• Multiple cycle paths are for ASIC designs that have a non-single
cycle clock timing requirement.

• Sometimes a designer might need to provide some additional cycles


before the data is to be captured. If there is a Multicycle path then It
doesn’t limit the system frequency and we make another timing
exception.

• This directs the physical design tools to avoid optimization of data


paths that have non-single clock behavior.
• set_multicycle_path -setup 2 -from [get_ports FF4] -to [get_cells FF5]
• set_multicycle_path -hold 1 -from [get_ports FF4] -to [get_cells FF5]
Input & Output Delays

• Input and output delays are used to constrain the boundary of


external paths in an ASIC design.

• These constraints specify point-to-point delays from external


inputs to the first registers and from registers to the outputs of
an ASIC design.
Minimum And Maximum Path Delays

• Minimum and maximum path delays provide greater flexibility


for physical synthesis tools that have a point-to-point
optimization capability.

• This means that one can specify timing constraints from one
specific point (e.g. pin or port) in the ASIC design to another,
provided such a path exists between the two specified points.
I/P Transitions and O/P Load Capacitance

• Input transition and output capacitance loads are used to


constrain the input slew rate and output capacitance of an
ASIC device. The constraints have direct effect on the final
design timing.

• The values of these constraints are set to zero during physical


design and PnR activity to ensure that the actual ASIC design
timing is calculated independent of external conditions and to
make sure register to register timing is met.

• Once that is achieved, these external conditions can be applied


to the design for input and output timing optimization.
False Path

• If any path does not affect the output and does not contribute
to the delay of the circuit then that path is called false path.

• Every false path needs to be informed to the STA tool.

• As the STA tool considers every path that originates at a valid


start point and ends on a valid end point as a valid timing path
that needs to be met.
Timing Constraints Commands
• These commands are related to timing specifications of the
design which contains
 Clock definition : create_clock
 Generated clock : create_generated_clock
 Clock transition : set_clock_transition
 Clock Uncertainty : set_clock_uncertainty
 Clock Latency : set_clock_latency
 Propagated clock : set_propagated_clock
 Disable timing : set_disable_timing
 False path : set_false_path
 Input/Output delay : set_input_delay & set_output_delay
 Min/Max delay : set_min_delay / set_max_delay
 Multicycle path : set_multicycle_path
Design Rule Constraints
• Design rule constraints are imposed upon ASIC designs by
requirements specified in a given standard cell library or
within physical design tools.

• Design rule constraints have precedence over timing


constraints because they have to be met in order to realize a
functional ASIC design.

• There are four types of major design rule constraints:


 Maximum Number of Fan-outs
 Maximum Transition Time
 Maximum Capacitance
 Maximum Wire Length
Maximum No. Of Fan Outs

• Maximum number of fan-outs specify the number of


destinations that one cell can connect to for each standard cell
in the library.

• A large fan-out can increase the amount of current passing


through a node beyond safety limits.

• This constraint can also be applied at the ASIC design level


during the physical synthesis to control the number of
connections one cell can make.
Maximum Transition Time

• The transition time of a net is the longest time required for its
deriving pin to change logic values.

• Transition time is decided on the basis of rise time and false


time. This constraint is based on the library data.

• There are two models to calculate output transition time.


 CMOS Delay Model : Delay = Driver R x Load C
 Non Linear Delay Model : Transition Time from table look up and
interpolation/extrapolation.
Maximum Capacitance

• This is the maximum output capacitance load that an O/P pin


can drive.

• Total capacitance comprises of load pin capacitance and


interconnect capacitance.

• This constraint is only available for O/P pins.

• The max capacitance vale can vary with the frequency (it may
happen that library is characterized for multiple frequencies)
Maximum Wire Length

• Maximum wire length constraint is useful for controlling the


length of wire to reduce the possibility of two parallel long
wires of the same type.

• Parallel long wires of the same type may have a negative


impact on the noise injection and may cause crosstalk.
Design Planning

• There are two style alternatives for design implementation of


an ASIC :
 Flat
 Hierarchical

• For small to medium ASIC’s, flattening the design is most


suited.

• For very large and/or concurrent ASIC designs, partitioning


the design into sub-designs, or hierarchical style, is preferred.
Flat Design Implementation
• The flat implementation style provides better area usage and requires effort
during physical design and timing closure compared to the hierarchical
style.

• The area advantage is mainly due to there being no need to reserve extra
space around each sub-design partition for power, ground, and resources
for the routing.

• Timing analysis efficiencies arise from the fact that the entire design can be
analyzed at once rather than analyzing each sub-circuit separately and then
analyzing the assembled design later.

• The disadvantage of this method is that it requires a large memory space


for data and run time increases rapidly with design size.
Hierarchical Design Implementation
• The hierarchical implementation style is mostly used for very large and/or
concurrent ASIC designs where there is a need for a substantial amount of
computing capability for data processing.

• In addition, it is used when sub-circuits are designed individually.

• This may degrade the performance of the final ASIC because the
components forming the critical path may reside in different partitions
within the design thereby extending the length of the critical path.

• Therefore, when using a hierarchical design implementation style one


needs to assign the critical components to the same partition or generate
proper timing constraints in order to keep the critical timing components
close to each other and thus minimize the length of the critical path within
the ASIC.
Partitioning

• In hierarchical design flow, ASIC design can be partitioned in


following ways :
 Logical Partitioning
 Physical Partitioning
Logical Partitioning

• Logical partitioning takes place in the early stages of ASIC


design (i.e. RTL coding).

• The design is partitioned according to its logical functions, as


well as physical constraints, such as interconnectivity to other
partitions or sub-circuits within the design.

• In logical partitioning, each partition is placed and routed


separately and is placed as a macro, or block, at the ASIC top
level.
Physical Partitioning

• Physical partitioning is performed during the physical design


activity.
• Once the entire ASIC design is imported into physical design
tools, partitions can be created where a large circuit can be
partitioned into several sub-circuits.
• Most often, these partitions are formed by recursively
partitioning a rectangular area containing the design using
vertical or horizontal cut lines.
• Physical partitioning is used for minimizing delay and
satisfying timing and other design requirements in a small
number of sub-circuits.
Floorplanning

• Regardless of the physical design implementation style, after


physical database creation using the imported netlist and
corresponding library and technology files, the first step is to
determine ASIC device and core width and height.

• The goals of floorplanning are to


 Arrange the blocks on a chip.
 Decide the location of IO Pads.
 Decide the location and number of Power Pads
 Decide the type of Power Distribution
 Decide the location and type of Clock Distribution
Floorplanning (Cont…)
• Standard cell rows and I/O pad sites are created. The rows and
I/O pad sites are for standard cell and I/O pad placement.

• The height of a row is equal to the height of the standard cells


in the library.

• If there are any multiple-height standard cells in the library,


they will occupy multiple rows.

• The standard rows are oriented in alternating 180-degree


rotation or are flipped along the X-axis so that the standard
cells can share power and ground busses.
Pad Placement

• Correct I/O pad placement and selection is important for the


correct function of any ASIC design.

• For a given ASIC design, there are three types of I/O pads:
power, ground, and signal.

• It is critical to functional operation of an ASIC design to insure


that the pads have adequate power and ground connections and
are placed properly in order to eliminate electromigration and
current-switching noise related problems.
Electromigration Damage
• EM is the movement or molecular transfer of metal from one area to
another area that is caused by an excessive electrical current in the direction
of current flow.
• EM currents exceeding recommended guidelines can result in premature
ASIC device failure.
• Exceeding EM current density limits can create voids or hillocks, resulting
in increased metal resistance, or shorts between wires, and can impair ASIC
performance.
• Figure shows EM damage due to excess current captured by Electron
Scanning Microscopy (ESM) with 10K magnification.

5/18/2019 Lec/PVL333/Jan-July/2019 162


Power Planning

• The next step is to plan and create power and ground


structures for both I/O pads and core logic.
• The I/O pads’ power and ground busses are built into the pad
itself.
• For core logic, there is a core ring enclosing the core with one
or more sets of power and ground rings.
• A horizontal metal layer is used to define the top and bottom
sides, or any other horizontal segment, while the vertical metal
layer is utilized for left, right, and any other vertical segment.
• These vertical and horizontal segments are connected through
an appropriate via cut.
Cont…
• If these strips run both vertically and horizontally at regular intervals, then
the style is known as power mesh.

• The total number of strips and interval distance is solely dependent on the
ASIC core power consumption.

• As the ASIC core power consumption (dynamic and static) increases, the
distance of power and ground strip intervals increases.

• This increase in the power and ground strip intervals is used mainly to
reduce overall ASIC voltage drop, thereby improving ASIC design
performance.

• When both analog and digital blocks are present in an ASIC design, there is
a need for special care to insure that there is no noise injection from digital
blocks or core into the sensitive circuits of analog blocks through power
and ground supply connections.
Macro Placement
• Macros may be memories, analog blocks, or in the case of hierarchical
style, an individually placed and routed subcircuit.

• Macro placement can be manual or automatic. Manual macro placement is


more efficient when there are few macros to be placed and their
relationship with the rest of the ASIC design is known.

• Automatic macro placement is more appropriate if the number of macros is


large.

• During the macro placement step, one needs to make sure that there is
enough area between blocks for interconnections. This process (commonly
known as channel allocation or channel definition) can be manual or can be
accomplished by floorplan tools.

• The slicing tree is used by the floorplan algorithm for slicing floorplan
during macro placement and to define routing channels between the blocks.
Clock Planning

• Clock planning is considered another part of floorplanning.

• The clock distribution networks is implemented to provide


clock to all clocked elements in the design in a symmetrically-
structured manner.

• For very high-performance and synchronized designs, one


needs to implement the distributed clock networks manually in
order to minimize the skew between communicating elements
due to their line resistance and capacitance.
Clock Planning (Cont…)

• The idea of the implementation of clock distribution networks


is to provide clock to all clocked elements in the design in a
symmetrically-structured manner.
• However, it can present a systematic clock skew.

5/18/2019 Lec/PVL333/Jan-July/2019 172


Clock Planning (Cont…)

• Another aspect of clock planning is that it is well suited to


hierarchical physical design.
• This type of clock distribution is manually crafted at the chip
level, providing clock to each sub-block that is place-and-
routed individually.
• To minimize the clock skew among all leaf nodes, the clock
delay for each sub-block must be determined and the design of
the clock planned accordingly.

5/18/2019 Lec/PVL333/Jan-July/2019 173


Clock Tree Synthesis
• The concept of CTS is the automatic insertion of buffers/inverters
along the clock paths of the ASIC design to balance the clock delay
to all clock inputs.
• Generally the clock signals are considered global (or ideal) nets.
• These nets exhibit high resistance and capacitance due to their being
very long wires.
• The principle of CTS is to reduce the delay associated with these
long wires.
• These long wires can be modeled as distributed networks, where V
is voltage at a point i in the wire, and R and C are resistance and
capacitance of each wire segment.

5/18/2019 Lec/PVL333/Jan-July/2019 174


CTS (Cont…)
• Using discrete analysis of this distributed RC network, the signal
delay for wire segment of n can be approximated by

• As the number of segments increases, above equation reduces to

where L is the length of the wire, and r and C are resistance and
capacitance per unit length.

• The propagation delay of long wire is determined by the RC effect.

5/18/2019 Lec/PVL333/Jan-July/2019 175


CTS (Cont…)
• One method to reduce the RC effect is to insert intermediate buffers
or repeaters along the wire.
• Since interconnect is segmented into N equal sections, the wire
propagation will be reduced quadratically, which is sufficient to
offset the extra buffer delay that is introduced by repeaters.

where tb is the propagation delay of the repeaters.


• To obtain the optimal number of buffers or repeaters, one can let

and solve for N,

5/18/2019 Lec/PVL333/Jan-July/2019 176


CST (Cont…)

• The actual size of these tapering buffers may not be identical.


• To obtain optimal propagation delay, these buffers may be
selected to monotonically increase in their drive strength d by
a factor 𝜶 for each level of clock tree path.

• It is required for buffers that are used for tapering the clock
paths to have equal rise and fall delay time.
– To maintain the original duty cycle and
– To insure that there is no clock signal overlap due to any difference in
propagation delays.
5/18/2019 Lec/PVL333/Jan-July/2019 177
CST (Cont…)
• The overlapping clock signal becomes important when dealing with
very high-speed ASIC designs.
• These types of buffers are known as clock or balance buffers and
have different attributes to the normal buffers in the standard cell
library.
• The proper usage of balance buffers or inverters during clock tree
synthesis is extremely important, especially when dealing with very
high-speed clocking requirements.
• If the clock buffers or inverters are not selected correctly they may
cause the clock pulse width to degrade as the clock propagates
through them before reaching the final destination.

Figure: Clock pulse width degradation effect

5/18/2019 Lec/PVL333/Jan-July/2019 178


CST (Cont…)

• It is difficult to achieve perfect signal rise and fall time balance


in the context of an ASIC design during clock tree synthesis.

• One remedy to overcome clock pulse narrowing is to use


inverters rather than buffers.

Figure: Inverter based clock tree giving equal rise and fall time

5/18/2019 Lec/PVL333/Jan-July/2019 179


Buffer Based Clock Tree

• Buffer can be designed with two equal sized inverter


connected in cascade.
• However, to reduce area, first buffer is designed with lower
driving strength over the second and placed near to second
inverter.

5/18/2019 Lec/PVL333/Jan-July/2019 180


Buffer Based Clock Tree

• It can be observed that the delay of the first inverter is


dominated by input capacitance of second inverter as wire
length is small.
• Whereas, the load of second inverter depends on input
capacitance of first inverter with wire capacitance.
• It introduces asymmetrical rise and fall time and hence high
and low pulse width.

5/18/2019 Lec/PVL333/Jan-July/2019 181


Buffer Based Clock Tree
• It we can balance the load seen by the first and second inverters, we
might be able to get same rise and fall time.

• However, most standard cell library provides symmetrical buffers,


there could be difference of few pico seconds in buffers rise and fall
timings which create variations in high and low pulse widths.

• The variation in duty cycle increases for deeper clock tree.

• Most clock tree synthesis algorithms use either the Sum (𝝨) or Pi (𝝥)
configuration during the clock buffer insertion along each clock
path.

5/18/2019 Lec/PVL333/Jan-July/2019 182


Sum Configuration

• In the Sum configuration, the total number of inserted buffers


is the sum of all buffers per level.

• This type of structure exhibits unbalanced trees due to the


different number of buffers and wire lengths.

• In this method, the systematic skew is minimized through


delay matching along each clock path and, due to its non-
symmetrical nature, is highly process-corner-dependent.

5/18/2019 Lec/PVL333/Jan-July/2019 183


Sum Configuration (Cont…)

• In this configuration, the total number of clock buffers is given


by
Ntotal = nlevel0 + nlevel1 + … nleveN-1 + nlevelN

Figure: sum configuration


5/18/2019 Lec/PVL333/Jan-July/2019 184
Pi Configuration

• In the Pi configuration, the total number of buffers inserted


along clock paths is a multiple of the previous level.
• This type of structure uses the same number of buffers and
geometrical wires and relies on matching the delay
components at each level of the clock tree.

Figure: Pi Configuration
5/18/2019 Lec/PVL333/Jan-July/2019 185
Pi Configuration (Cont..)

• The Pi structure clock tree is considered to be balanced, or


symmetrical.

• In contrast to the Sum configuration, in a Pi configuration, the


systematic skew is minimized due to symmetry, and the
variation in clock skew is determined by process uniformity
rather than by process corner.

5/18/2019 Lec/PVL333/Jan-July/2019 186


CTS (Cont…)

• In today’s ASIC designs where dynamic power consumption


plays an important role, Sum configurations are most often
used regardless of the number of clock domains.
– This is because the total number of buffer insertions during clock tree
synthesis is less than the total number of buffers used in the Pi
configuration.

• Regardless of which configuration is used to insert buffers


during clock tree synthesis, the objective that governs the
clock tree quality is achieving minimal skew while
maintaining an acceptable clock signal propagation delay.

5/18/2019 Lec/PVL333/Jan-July/2019 187


CTS (Cont…)

• The difference of the leaf registers’ clock or skew with respect


to the source of clock can be expressed as
𝛿 = 𝑡1 − 𝑡2
• Where 𝛿 is the clock skew between two leaf registers (leaf
registers are connected at the last level of clock tree ) with the
propagation delay clock of t1 and t2 along two different clock
paths, as shown in Figure

5/18/2019 Lec/PVL333/Jan-July/2019 188


CTS (Cont…)

• For an ASIC design to perform properly, the following


condition must be satisfied:
𝛿 ≤ 𝑡1
• To prevent any erroneous result related to computational time
l1 at Register2, the lower bound value for source clock is
given by
𝑇 ≥ 𝑡1 − 𝛿
where T is the clock source period.

5/18/2019 Lec/PVL333/Jan-July/2019 189


CTS (Cont…)

• Clock skew (𝛿) can be local in the case of multiple clock


domains or global in the case of a single clock domain.
• The root cause of clock skew can be systematic due to non-
uniformity of clock routing and/or randomly caused by
differences in delay components.
• The variation in delay components is mainly due to the
different device sizes, process gradients, etching effects,
supply voltage variation, temperature gradients, or mismatched
minimum transistor channel lengths.
• Any one of these effects can influence the clock propagation
delay along each clock path.

5/18/2019 Lec/PVL333/Jan-July/2019 190


CST (Cont…)

• In the past, variation in component delay was observed from


lot-to-lot (a set of wafer undergoes the process at the same
time).

• As ASIC manufacturing processes advanced, variations


became apparent in both wafer-to-wafer and die-to-die.

• With the current deep submicron process, variation in


component delays can also be seen on a single die as on-chip
variation, or OCV.

5/18/2019 Lec/PVL333/Jan-July/2019 191


5/18/2019 Lec/PVL333/Jan-July/2019 192

Anda mungkin juga menyukai