Fpga23000 10 WKBF Rev1

Designing for Performance
fpga23000-10-wkbf-rev1
Xilinx is disclosing this Document and Intellectual Property (hereinafter the Design) to
you for use in the development of designs to operate on, or interface with Xilinx FPGAs.
Except as stated herein, none of the Design may be copied, reproduced, distributed,
republished, downloaded, displayed, posted, or transmitted in any form or by any means
including, but not limited to, electronic, mechanical, photocopying, recording, or otherwise,
without the prior written consent of Xilinx. Any unauthorized use of the Design may violate
copyright laws, trademark laws, the laws of privacy and publicity, and communications
regulations and statutes.
Xilinx does not assume any liability arising out of the application or use of the Design; nor
does Xilinx convey any license under its patents, copyrights, or any rights of others. You
are responsible for obtaining any rights you may require for your use or implementation of
the Design. Xilinx reserves the right to make changes, at any time, to the Design as
deemed desirable in the sole discretion of Xilinx. Xilinx assumes no obligation to correct
any errors contained herein or to advise you of any correction if such be made. Xilinx will
not assume any liability for the accuracy or correctness of any engineering or technical
support or assistance provided to you in connection with the Design.
THE DESIGN IS PROVIDED AS IS" WITH ALL FAULTS, AND THE ENTIRE RISK AS
TO ITS FUNCTION AND IMPLEMENTATION IS WITH YOU. YOU ACKNOWLEDGE
AND AGREE THAT YOU HAVE NOT RELIED ON ANY ORAL OR WRITTEN
INFORMATION OR ADVICE, WHETHER GIVEN BY XILINX, OR ITS AGENTS OR
EMPLOYEES. XILINX MAKES NO OTHER WARRANTIES, WHETHER EXPRESS,
IMPLIED, OR STATUTORY, REGARDING THE DESIGN, INCLUDING ANY
WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE,
TITLE, AND NONINFRINGEMENT OF THIRD-PARTY RIGHTS.
IN NO EVENT WILL XILINX BE LIABLE FOR ANY CONSEQUENTIAL, INDIRECT,
EXEMPLARY, SPECIAL, OR INCIDENTAL DAMAGES, INCLUDING ANY LOST DATA
AND LOST PROFITS, ARISING FROM OR RELATING TO YOUR USE OF THE
DESIGN, EVEN IF YOU HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH
DAMAGES. THE TOTAL CUMULATIVE LIABILITY OF XILINX IN CONNECTION WITH
YOUR USE OF THE DESIGN, WHETHER IN CONTRACT OR TORT OR OTHERWISE,
WILL IN NO EVENT EXCEED THE AMOUNT OF FEES PAID BY YOU TO XILINX
HEREUNDER FOR USE OF THE DESIGN. YOU ACKNOWLEDGE THAT THE FEES, IF
ANY, REFLECT THE ALLOCATION OF RISK SET FORTH IN THIS AGREEMENT AND
THAT XILINX WOULD NOT MAKE AVAILABLE THE DESIGN TO YOU WITHOUT
THESE LIMITATIONS OF LIABILITY.
The Design is not designed or intended for use in the development of on-line control
equipment in hazardous environments requiring fail-safe controls, such as in the
operation of nuclear facilities, aircraft navigation or communications systems, air traffic
control, life support, or weapons systems (High-Risk Applications). Xilinx specifically
disclaims any express or implied warranties of fitness for such High-Risk Applications.
You represent that use of the Design in such High-Risk Applications is fully at your risk.
2008 Xilinx, Inc. All rights reserved. XILINX, the Xilinx logo, and other designated
brands included herein are trademarks of Xilinx, Inc. PCI, PCIe and PCI Express are
trademarks of PCI-SIG and used under license. The PowerPC name and logo are
registered trademarks of IBM Corp. and used under license. All other trademarks are the
property of their respective owners.
Facilitator Guide
Table of Contents
Table of Contents
INTRODUCTORY MATERIAL
Getting Started
vi
About This Guide
vi
The Program in Perspective
ix
Program Preparation
Training At A Glance
xii
Quick Reference Material
QR-1
MODULES
Course Agenda
Course Agenda
Review of Fundamentals of FPGA Design

Apply Your Knowledge Answers
Designing with Virtex-5 FPGA Resources
8
9
10
13
Introduction
14
Overview
15
I/O
21
Block RAMs and FIFO
32
XtremeDSP Solution Cores
42
Other Features
53
Summary
62
65
CORE Generator Software System
67
Introduction
68
Overview
69
www.xilinx.com
1-877-XLX-CLAS
Page i
Table of Contents
Facilitator Guide
Using the CORE Generator Software System
73
CORE Generator Software Design Flows
77
Summary
81
83
Lab 1: CORE Generator Software System

Lab
84
85
Designing Clock Resources
86
Introduction
87
Overview
88
Clock Management Tile
90
Clock Networks
109
Summary
118
121
Lab 2: Designing Clock Resources

Lab
125
126
FPGA Design Techniques
127
Introduction
128
Duplicating Flip-Flops
129
Pipelining
133
I/O Flip-Flops
141
Synchronization Circuits
143
Summary
151
153
Synthesis Techniques
154
Introduction
155
Achieving Breakthrough Performance
158
Synthesis Options
166
XST Synthesis Options
177
Summary
179
Page ii
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Table of Contents

Lab 3: Synthesis Techniques
181
182
Lab
183
Day One Summary
184
Day One Summary
185
Course Agenda Day Two
191
192
Achieving Timing Closure
195
Introduction
196
Timing Reports
198
Interpreting Timing Reports
205
Report Options
214
Summary
220
222
Lab 4: Review of Global Timing Constraints

Lab
223
224
Timing Groups and OFFSET Constraints
225
Introduction
226
Overview
227
Creating Groups
233
OFFSET Constraints
243
Summary
250
252
Path-Specific Timing Constraints
253
Introduction
254
Inter-Clock Domain Constraints
256
Multicycle Paths
262
False Paths
267
Miscellaneous Constraints
273
www.xilinx.com
1-877-XLX-CLAS
Page iii
Table of Contents
Facilitator Guide
Summary
277
279
Lab 5: Achieving Timing Closure
281
Lab
282
Advanced Implementation Options
283
Introduction
284
Overview
286
Advanced MAP and Place & Route Options
288
Xplorer
294
SmartGuide and Partitions
299
Power Optimization
304
Summary
306
308
Lab 6: Designing for Performance
309
Lab
310
Power Estimation
311
Introduction
312
Overview
313
XPower Estimator
318
Using the XPower Analyzer Software
321
Summary
326
328
Lab 7: FPGA Editor Demo
329
Lab
330
ChipScope Pro Software
331
Introduction
332
Importance of Debug
334
ChipScope Pro Software Cores
336
Design Flows
342
Page iv
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Table of Contents
Summary
344
Lab 8: ChipScope Pro Software
346
Lab
347
Course Summary
348
Course Summary
349
Appendixes
Appendix A: Basic HDL Coding Techniques*
A-1
Appendix B: Spartan-3 FPGA HDL Coding Techniques*
B-1
Appendix C: Virtex-5 FPGA HDL Coding Techniques*
C-1
Appendix D: Synthesis Techniques*
D-1
Inferring Logic and Flip-Flop Resources
D-2
Inferring Memory
D-14
Inferring I/Os and Global Resources
D-22
Inferring DSP48 Resources
D-32
Appendix E: Spartan-3E FPGA 1600E MicroBlaze Processor Development

Kit Demo Board Introduction*
E-1
* Not included in the printed workbook, but available via

ftp://ftp.xilinx.com/pub/documentation/education/fpga23000-10-rev1xlnx_lab_files.zip
www.xilinx.com
1-877-XLX-CLAS
Page v
Getting Started
Facilitator Guide
Getting Started
About This Guide
Whats the Purpose of This Guide?
This facilitator guide provides a master reference document to help
you prepare for and deliver the Designing for Performance course.
What Will I Find in the Guide?
This facilitator guide is a comprehensive package that contains
!
The course delivery sequence
Checklists of any necessary materials and equipment
Presentation scripts and key points to cover
Instructions for managing exercises, case studies, and other

instructional activities
How Is This Guide Organized?

This section, Getting Started, contains all of the preparation
information for the Designing for Performance course, such as
learning objectives, prework, required materials, and room setup.
Following this section is the Training At A Glance table. This
table can serve as your overview reference, showing the module
names, timings, and process descriptions for the entire program.
Finally, the course itself is divided into modules, each of which is
comprised of one or more lessons. A module is a self-contained
portion of the program, usually lasting anywhere from 20 to 90
minutes, while a lesson is a shorter (typically 5-20 minutes) topic
area. Each module begins with a one-page summary showing the
Purpose, Time, Process, and Lessons for the module. Use these
summary pages to get an overview of the module that follows.
Page vi
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Getting Started
About This Guide, continued

How Is the Text Laid Out in This Guide?
Every action in the program is described in this guide by a text
block like this one, with a margin icon, a title line, and the actual
text. The icons are designed to help catch your eye and draw quick
attention to what to do and how to do it. For example, the icon
to the left indicates that you, the instructor, say something next.
The title line gives a brief description of what to do, and is
followed by the actual script, instruction set, key points, etc. that
are needed to complete the action.
A complete list of the margin icons used in this guide is provided
on the following page.
TRAINER NOTE
You may also occasionally find trainer notes such as this one in the
text of this guide. These shaded boxes provide particularly
important information in an attention-getting format.
www.xilinx.com
1-877-XLX-CLAS
Page vii
Getting Started
Facilitator Guide
About This Guide, continued

Graphic Cues
Overhead
Participant
Workbook
Lab
Exercise
Projected
Image
Key Points
Time
Transition
Flipchart
Handouts
Summary
Module
Process
Break /
Lunch
Group
Activity
Role Play
Where Can
I Learn
More?
Materials
Required
Audio Tape Case Study
Instructional
Game
Answers
To say
Video Tape
Assessment Question &

/ Quiz/Test Answer
Custom 5
Key points
Computer/
CDROM
Tool
Custom 6
VH
Module
Purpose
Page viii
Welcome
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Getting Started
The Program in Perspective

Why a Designing for Performance Course?
Attending the Designing for Performance class will help you create
more efficient designs. This course can help you fit your design
into a smaller FPGA or a lower speed grade for reducing system
costs. In addition, by mastering the tools and the design
methodologies presented in this course, you will be able to create
your design faster, shorten your development time, and lower
development costs.
Learning Objectives
After completing this comprehensive training, you will have the

necessary skills to:
!
Describe a flow for obtaining timing closure
Describe the architectural features of the Virtex-5 FPGA
Describe the features of the Digital Clock Manager (DCM) and

Phase-Locked Loop (PLL) and how they can be used to
improve performance
Increase performance by duplicating registers and pipelining
Describe different synthesis options and how they can improve

performance
Create and integrate cores into your design flow by using the
CORE Generator software system
Run behavioral simulation on an FPGA design that contains

cores
Pinpoint design bottlenecks by using Timing Analyzer reports
Apply advanced timing constraints to meet your performance

goals
Use advanced implementation options to increase design

performance
Program Timing
2 days
www.xilinx.com
1-877-XLX-CLAS
Page ix
Getting Started
Facilitator Guide
Program Preparation
Prerequisites
!
The Fundamentals of FPGA Design course or equivalent

knowledge of
FPGA architecture features
The Xilinx implementation software flow and
implementation options
Reading timing reports
Basic FPGA design techniques
Global timing constraints and the Constraints Editor
Intermediate HDL knowledge (VHDL or Verilog)
Solid digital design background
The following recorded e-Learning modules are recommended

Basic HDL Coding Techniques
Spartan-3 FPGA HDL Coding Techniques
Virtex-5 FPGA HDL Coding Techniques
Required Materials
!
Designing for Performance facilitator guide
PowerPoint files
Instructor Preparation
!
Read through the trainer notes
Read the lab setup guide
Lab Setup
Software Requirements
!
Xilinx ISE Foundation design tools 10.1 SP1, including ISE

Simulator
www.xilinx.com/support/download
Page x
ChipScope Pro tool 10.1 SP1 if you are running the optional
ChipScope Pro Software lab
Synplicity Synplify software 9.2 if you are running the Synplify

version of the Synthesis Techniques lab
Exemplar is no longer supported as part of the course

www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Getting Started
Program Preparation
Lab Files/Data Installed
!
ftp://ftp.xilinx.com/pub/documentation/education/
fpga23000-10-rev1-xlnx_lab_files.zip
Hardware Requirements
Note: The demo board is only required for the optional
ChipScope Pro Software lab
!
PC machine running Windows XP Professional (32-bit) with 2

GB RAM
Spartan-3E FPGA 1600E MicroBlaze processor development

board
User Guide:
www.xilinx.com/support/documentation/boards_and_kits
/ug257.pdf (included in the lab zip file)
USB cable for configuration (Type A to Type B included the

kit)
Platform Cable USB NOT required
Power supply for the Spartan-3 FPGA board (included the kit)
Optional:
Serial Cable (DB9 male/female) for computers with serial
ports or a USB-to-RS-232 adapter cable for computers
lacking a serial port
HyperTerminal or equivalent
Special Instructions
None
www.xilinx.com
1-877-XLX-CLAS
Page xi
Facilitator Guide
Time
Module
Description
5 minutes
Course Agenda
This module covers the agenda for the course.
15 minutes
Review of
Fundamentals of
FPGA Design
This module reviews the Virtex-5 FPGA

architecture and some of the primary functions
of the ISE tools.
60 minutes
Designing with
Virtex-5 FPGA
Resources
This module describes the latest features of the

newest FPGA from Xilinx.
20 minutes
CORE Generator
Software System
This module describes the basics of designing

with the CORE Generator software.
30 minutes
Lab 1: CORE
Generator
Software System
This lab illustrates how to build a block RAM

memory with the CORE Generator software.
45 minutes
Designing Clock
Resources
This module describes how to design a

complete FPGA clocking scheme.
40 minutes
Lab 2: Designing
Clock Resources
This lab illustrates how to build a multiple clock

system with the ISE Architecture Wizard tool.
40 minutes
FPGA Design
Techniques
This module describes how to build a reliable

and fast FPGA design.
40 minutes
Synthesis
Techniques
This module describes how to synthesize a fast

and efficient FPGA design by using the
advanced capabilities of the synthesis tools.
Page xii
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
30 minutes
Lab 3: Synthesis
Techniques
This lab illustrates how to synthesize a design

by taking advantage of some of the advanced
synthesis options available in the newest
synthesis tools.
10 minutes
Day One
Summary
This module reviews day one of the course.
5 minutes
Course Agenda
Day Two
This module covers the day two agenda for the

course.
45 minutes
Achieving
Timing Closure
This module describes how to read the Timing

Analyzer reports and use the information to
gain timing closure.
45 minutes
Lab 4: Review of
Global Timing
Constraints
This lab illustrates how to use global timing

constraints and the Timing Analyzer to find the
timing-critical paths of a design and develop a
strategy for gaining timing closure.
45 minutes
Timing Groups
and OFFSET
Constraints
This module describes the best ways to group

path endpoints to make the most efficient pathspecific timing constraints.
45 minutes
Path-Specific
Timing
Constraints
This module describes some of the most

common applications for path-specific timing
constraints and how to make them with the
Xilinx Constraints Editor.
45 minutes
Lab 5: Achieving
Timing Closure
This lab illustrates how to make path-specific

timing constraints on a design and use some of
the advanced implementation options in the
ISE tools.
30 minutes
Advanced
Implementation
Options
This module describes the advanced

implementation options available in the ISE
tools.
30 minutes
Lab 6: Designing
for Performance
This lab illustrates how to improve design

performance and maximize results solely with
advanced implementation options.
30 minutes
Power Estimation This optional module describes the power

estimation capabilities included with the ISE
tools.
www.xilinx.com
1-877-XLX-CLAS
Page xiii
Facilitator Guide
30 minutes
Lab 7: FPGA
Editor Demo
This optional demonstration illustrates how to

locate logic, view the contents of an FPGA
design, and insert a probe with the FPGA
Editor.
30 minutes
ChipScope Pro
Software
This optional module describes how to use the

Core Inserter and Core Generator tool flows and
plan for debugging with the ChipScope Pro
software.
60 minutes
Lab 8: ChipScope This optional lab illustrates how to use the

Pro Software
ChipScope Pro software to add the Analyzer
ILA core and prepare for debugging.
10 minutes
Course Summary This module reviews day two of the course and
provides a summary of the course.
Page xiv
www.xilinx.com
1-877-XLX-CLAS
Education Services Quick Reference

From the Xilinx Education Services Designing for Performance course.
For more information on Xilinx courses, please visit www.xilinx.com/education.
Synthesis
Tip
Reduce fanout
Action
Timing-driven synthesis
Hierarchy management
Retiming
FSM extraction
Manually duplicate logic and flip-flops

Preferred method over letting the synthesis tool
perform the duplication
Set your synthesis tool to keep the redundant logic
Name duplicate logic _A_B, not _1, _2
Replicate the necessary logic in an effort to build
logic that is in parallel, not serial
Try not to over-constrain
Should increase the size of the design
Optimization across hierarchical boundaries can
make node names change and/or disappear
Makes simulating and debugging later in the
design flow difficult
Only allow this to be done across as few
boundaries as possible and as a last effort to
gain timing closure
Maintain critical nodes with the KEEP attribute (or
equivalent)
Move registers forward/backward along a datapath to
decrease the number of LUTs in series
Can make node names change or disappear
Maintain critical nodes with the KEEP attribute
Optimizes your FSM by re-encoding your design
based on the number of states and inputs
Results can be good, but testing each encoding
technique manually is not difficult and allows
determination of which has the best speed and size
Verify good HDL coding style was used
Poor HDL coding style can add logic levels to any

datapathmake certain that good style was used on
your timing-critical paths (see the HDL Coding Style
Recorded e-Learning modules)
Access Verilog and VHDL language templates
From the Project Navigator menu, select Edit

Language Templates
2008 Xilinx, Inc. All rights reserved. All Xilinx trademarks, registered trademarks, patents, and disclaimers are as listed at http://www.xilinx.com/legal.htm.
All other trademarks and registered trademarks are the property of their respective owners. All specifications are subject to change without notice.
www.xilinx.com
1-800-255-7778
FPGA23000-10-QR (v1.0) June 20, 2008

Quick Reference Card Page 1 of 3

Reading Timing Reports
Tip
Look for a single long delay
Action
Use the Timing Improvement Wizard in the Timing Analyzer
If a high-fanout net, duplicate the source of the net

If a low-fanout net, try to obtain a better placement
with timing-driven packing or MPPR
If there is no single long delay, the path probably has
too many logic levels (go back to synthesis or
pipeline the datapath)
Click the Wizard icon when a constraint fails
Timing Constraints
Tip
Paths that cross unrelated clock domains are not covered
by PERIOD constraints
Action
Use the CLKA and CLKB groups that were created

when you entered PERIOD constraints
Specify a Slow/Fast Path Exception between CLKA
and CLKB
Do not forget to avoid creating a metastability
problem; consider using a FIFO or synchronization
circuit
Bidirectional buses usually create false paths
Group logic by component and place a TIG on paths

that can be ignored
Multicycle paths are usually associated with clock enable

nets
Create a MULTI_CYCLE group containing the clock

enable net
Specify a multicycle path constraint from
MULTI_CYCLE to MULTI_CYCLE

Tip
MAP: Timing-driven packing can improve performance by
up to 5 percent
Action
Most effective if unrelated logic has been packed

together, which happens when there is high device
utilization (over 80 percent)
Map Report Design Summary Number of Slices
Containing Unrelated Logic
PAR: Increasing the Overall Effort Level can improve

performance by up to 5 percent
Runtime can increase by 100 percent or more
PAR: Extra Effort can improve performance by up to 3

percent
Runtime can increase by 200 percent or more
PAR: Multi-Pass Place & Route (MPPR) can improve

speed by up to 3 percent
Runtime is nearly the same, but multiple

implementations are running; this is not
recommended for Virtex-5 FPGA designs.
Remember to not run Cost Table 1
Use Xplorer to automatically try different implementation

options
Requires several implementations
FPGA23000-10-QR (v1.0) June 20, 2008

www.xilinx.com
1-800-255-7778

Timing Closure
www.xilinx.com
1-800-255-7778
FPGA23000-10-QR (v1.0) June 20, 2008

Facilitator Guide
Course Agenda
Course Agenda
Purpose

Time
5 minutes
Process

Lessons
!
Course Agenda
www.xilinx.com
1-877-XLX-CLAS
Page 1
Course Agenda
Facilitator Guide
Course Agenda
Show Slide 1:

Course Agenda
Show Slide 2:
Day One Objectives

After completing this module, you will be able to:

Describe the features of the Digital Clock Manager (DCM) and PhaseLocked Loop (PLL) and how they can be used to improve performance
performance
Create and integrate cores into your design flow by using the CORE
Generator software system
Run behavioral simulation on an FPGA design that contains cores
Course Agenda - 2
Page 2
2008 Xilinx, Inc. All Rights Reserved
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Course Agenda
Course Agenda
Key Points
!
This course builds upon the fundamental techniques of

designing into Xilinx FPGAs, which are taught in the
Fundamentals of FPGA Design course.
The modules and labs were developed with version 10.1i of the
Xilinx software, with no service packs. If you have installed a
different version or service pack level, lab results may differ.
Show Slide 3:
Day Two Objectives


Apply advanced timing constraints to meet your performance goals
Use advanced implementation options to increase design performance
Course Agenda - 3
www.xilinx.com
1-877-XLX-CLAS
Page 3
Course Agenda
Facilitator Guide
Course Agenda
Show Slide 4:
Prerequisites
The Fundamentals of FPGA Design course or equivalent knowledge of
FPGA architecture features

The Xilinx implementation software flow and implementation options
Reading timing reports
Basic FPGA design techniques
Global timing constraints and the Constraints Editor
Intermediate HDL knowledge (VHDL or Verilog)

Solid digital design background
The following recorded e-Learning modules are recommended

Course Agenda - 4
Show Slide 5:
Day One Agenda

Course Agenda - 5
Page 4
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Course Agenda
Course Agenda
Show Slide 6:
Day Two Agenda

Power Estimation (Optional)
Lab 7: FPGA Editor Demo (Optional)
ChipScope Pro Software (includes lab) (Optional)
Course Summary
Course Agenda - 6
Key Points
!
The day two agenda includes three optional sections. The

instructor may skip these sections if students are not interested
in the topics, or if time is running short.
www.xilinx.com
1-877-XLX-CLAS
Page 5
Course Agenda
Facilitator Guide
Course Agenda
Show Slide 7:
Where Are We Going?
What should you know about using Xilinx software right now?
Synchronous design techniques

How to specify global design constraints
The basics of using the Xilinx implementation tools
What will you know by the end of this class?
How to use HDL coding techniques

Software options
Constraints
Systematic design flow to obtain your performance objectives
Course Agenda - 7
Show Slide 8:
Appendix
Note that this course also includes the following appendixes
Appendix A: Designing with Virtex-5 FPGA Resources

Appendix B: Designing Clock Resources
Appendix C: Synthesis Techniques
To reduce size, the appendixes are not included in the printed workbook
The appendixes are included in a supplemental folder with the lab files
and are available via
ftp://ftp.xilinx.com/pub/documentation/education/fpga23000-10-rev1xlnx_lab_files.zip
Course Agenda - 8
Page 6
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Course Agenda
Course Agenda
Show Slide 9:
Latest Product Information

Please visit the following resources for the most current information
on the Xilinx devices described in this course.
For the latest user design information, see the user guides
For the latest characteristics, such as timing, performance, etc., see the
data sheets
For the latest design and software issues or bugs, see the Answer Record
database: Search by FPGA family or software tool
www.xilinx.com/support Answer Browser (under Support Quicklinks)
www.xilinx.com/xlnx/xil_ans_browser.jsp
Note for instructor: Take a moment to click the link above and browse the Records.
Course Agenda - 9
TRAINER NOTE
Take a moment to click the link above and browse the Records.
Transition to Review of Fundamentals of FPGA Design
www.xilinx.com
1-877-XLX-CLAS
Page 7
Facilitator Guide
Review of Fundamentals of FPGA

Design
Purpose
This module
Time
15 minutes
Process
This module reviews the Virtex-5 FPGA architecture and some of

the primary functions of the ISE tools.
Lessons
!
Page 8
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide

Show Slide 10:
Review of Fundamentals of
FPGA Design
Show Slide 11:
Apply Your Knowledge
1) What is the basic building block of an FPGA?
2) List some Virtex-5 FPGA features
3) List the implementation processes
4) Name the global timing constraints
Review of Fundamentals of FPGA Design - 11
www.xilinx.com
1-877-XLX-CLAS
Page 9
Facilitator Guide

Show Slide 12:
Answer
1) What is the basic

building block of an
FPGA?
Slices are the basic

building block of FPGAs
Each slice contains
6-input LUTs:
Combinatorial logic,
Shift Register LUT
(SRL), distributed
memory
Flip-flops
Carry logic
Multiplexers
Review of Fundamentals of FPGA Design - 12
Answers
1) What is the basic building block of an FPGA?

!
Slices are the basic building block of FPGAs.
Each slice contains:

6-input LUTs: Combinatorial logic, Shift Register LUT (SRL),
distributed memory
Flip-flops
Carry logic
Multiplexers
Page 10
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide

Answers
2) List some Virtex-5 FPGA features.

!
Digital Clock Manager (DCM)
Phase-Lock Loop (PLL)
Global clock buffers (BUFGCTRL)
Regional clock resources (BUFIO and BUFR)
Dedicated DSP blocks (DSP48)
Block RAM (RAMB16)
Dedicated FIFOs (FIFO16)
SERDES interface
RocketIO multi-gigabit transceivers
PowerPC embedded processors
Ethernet MAC
3) List the implementation processes.

!
Translate
MAP
Place & Route
4) Name the global timing constraints.

!
PERIOD
PAD-TO-PAD
OFFSET IN and OFFSET OUT
Key Points
!
Valid endpoints for timing paths are:

I/O pins
Internal synchronous points (flip-flops, latches, and RAM
components)
www.xilinx.com
1-877-XLX-CLAS
Page 11
Facilitator Guide

Key Points
!
Each global constraint covers a different type of path:

PERIOD: Begins and ends at internal synchronous points
PAD-TO-PAD: Begins and ends at I/O pins
OFFSET IN: Begins at I/O pins; ends at internal
synchronous points
OFFSET OUT: Begins at internal synchronous points; ends
at I/O pins
Transition to Designing with Virtex-5 FPGA Resources
Page 12
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Designing with Virtex-5 FPGA

Resources
Purpose

!
Describe the I/O features of the Virtex-5 FPGA
Describe block RAM and FIFO resources
Explain XtremeDSP solution DSP48 resources
List other resources available in Virtex-5 FPGAs
Time
60 minutes
Process
This module describes the latest features of the newest FPGA from
Xilinx.
Lessons
!
Introduction
Overview
I/O
Block RAMs and FIFO
Other Features
Summary
www.xilinx.com
1-877-XLX-CLAS
Page 13
Introduction
Show Slide 13:
Designing with Virtex-5

FPGA Resources
Show Slide 14:
Objectives
Describe the I/O features of the Virtex-5 FPGA

Describe block RAM and FIFO resources
Explain XtremeDSP solution DSP48 resources
List other resources available in Virtex-5 FPGAs
Designing with Virtex-5 FPGA Resources - 14
Page 14
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Facilitator Guide
Overview
Show Slide 15:
Lessons
Overview
I/O
Block RAMs and FIFO
Other Features
Summary
Show Slide 16:
Virtex Family Product and

Process Evolution
www.xilinx.com
1-877-XLX-CLAS
Page 15
Facilitator Guide
Overview
Show Slide 17:
Virtex-5 Family
The Ultimate System Integration Platform
Logic
Logic/Serial
Logic
On-Chip RAM
DSP Capabilities
Parallel I/Os
Serial I/Os
PowerPC Processor
DSP/Serial
Embedded/
Serial
Built on the success of ASMBL

Key Points
Page 16
The Virtex-5 family is architected as a multi-platform FPGA

family. It is based on the ASMBL architecture that was
introduced in the Virtex-4 FPGA. The ASMBL architecture is a
column-based architecture that provides the benefit of mixing
resources (such as logic, on-chip RAM, DSP, and I/O) in
different proportions to better match your design requirements.
This approach provides an optimal mix of resources for your

needs and helps you to lower your system costsyou only pay
for the resource mix that you need.
The Virtex-5 family has four platforms that are optimized for
logic resources, logic with serial I/O, DSP with serial I/O, and
embedded processing with serial I/O.
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Overview
Show Slide 18:
Virtex-5 FPGA Platform

Feature Overview
CLB
BRAM
I/O
CMT
BUFGMUX
DSP48E
BUFIO & BUFR
Key Points
!
Basic topology: Note the column-based architecture. The

Advanced Silicon Modular Block (ASMBL) architecture allows
Xilinx to assemble multiple programmable platforms with an
optimal blend of features for target application domains
meaning that Xilinx has built subfamilies of the Virtex-5 FPGA
for particular markets.
There are multiple columns of block RAM dispersed across the

device.
IOB banks (a left bank, a right bank, and a center bank) are
available via flip-chip technology.
There are two columns of regions (you will see later that each
region is 20 CLBs tall and half the die in width), but the width
can vary with the device, which is described in more detail
later.
Note that the LXT, SXT, and FXT platforms have the same basic
topology except that the dedicated resources (EMAC, PCI, and
MGT) are all placed on the right side of the die.
www.xilinx.com
1-877-XLX-CLAS
Page 17
Facilitator Guide
Overview
Show Slide 19:
Virtex-5 LXT Devices
Industrys widest and most flexible offering with embedded MGTs
Wide range of options in RAM, DSP slices, and MGTs

LXT330 device is 2x as large as any other FPGA with MGTs in the industry
EasyPath technology support

Embedded hard IP: PCI Express core, Ethernet MACs
LX20T
LX50T
LX85T
LX110T
LX330T
Logic Cells
19,968
46,080
82,944
110,582
331,776
RAM (kb)
936
2,160
3,888
5,328
11,664
DSP Slices
24
48
48
64
192
Transceiver Speeds
MGTs
500 Mbps to 3.75 Gbps

(down to 100 Mbps with integrated over-sampling circuitry)
4
12
12
16
24
Key Points
!
Page 18
Not all family members are shown in this table. Other device
sizes are: LX30T, LX155T, and LX220T.
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Overview
Show Slide 20:
Virtex-5 SXT Devices
Same features as the LXT platform

More block RAM and DSP resources per logic cell compared to the LXT
platform
SX35T
SX50T
SX95T
Logic Cells
34,816
52,224
94,208
RAM (kb)
3,024
4,752
8,784
DSP Slices
192
288
640
Transceiver Speeds
MGTs
500 Mbps to 3.75 Gbps

8
12
16
Key Points
!
Notice that the smallest SXT device has the same amount of
block RAM as a mid-sized LXT device, and the same number of
DSP slices as the largest LXT device.
www.xilinx.com
1-877-XLX-CLAS
Page 19
Facilitator Guide
Overview
Show Slide 21:
Virtex-5 FXT Devices
Faster GTX transceiver

PowerPC 440 embedded processor
FX30T
FX70T
FX100T
FX130T
FX200T
Logic Cells
32,768
71,680
102,400
131,072
196,608
RAM (kb)
2,448
5,328
8,208
10,728
16,416
DSP Slices
64
128
256
320
384
Transceiver Speeds
750 Mbps to *6.5 Gbps

MGTs
16
16
20
24
PPC Processors
Key Points
!
Page 20
Transceiver speeds up to 6.5 Gbps are only possible with the -3

speed grade. Consult the Virtex-5 FPGA data sheets and
switching characteristics documents for more information
about GTX transceiver speeds.
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
I/O
Show Slide 22:
Lessons
Overview
I/O
Block RAMs and FIFO
Other Features
Summary
Show Slide 23:
Region
Region
Region
Region
Region
Region
Region
Region
Region
Region
Region
Region
Region
Region
CMT
CMT
GClk
GClk
CMT
CMT
CMT
Region
Region
Region
Region
Region
Region
Region
Region
Region
Region
Region
Region
Region
Region
Region
Region
Region
Region
Region
Region
Region
Region
LX330 Layout
Region
Region
Bank Bank
Bank Bank
Bank Bank
Bank
Bank Bank
Bank Bank
Bank Bank
Bank
Bank
Bank
Region
Region
Region
Region
Bank Bank
Bank Bank
Bank Bank
Bank
Bank
Region
Region
Bank Bank
Bank
Bank Bank
Bank
Region
Region
Bank CFG Bank

Bank
Bank
Region
Region
Bank Bank
Bank Bank
Bank
Bank
Bank Bank
Bank
Bank Bank
Bank Bank
Bank
LX30
Layout
Bank Bank
Bank Bank
Bank Bank
Bank
Bank Bank
Bank Bank
Bank Bank
Bank
Bank
Bank
I/O Banking Architecture

Eight to 24 regions per device
Each region has one bank

Each bank has 40 I/Os and four I/O clocks
Additional I/O banks in the center column
Bank
Bank
Bank
Bank
Region
Region
CFG
CMT
GClk
Each bank has 20 I/Os
With 40 I/Os
With 20 I/Os
Spans halfway across the chip
Dedicated configuration bank
Clock Management Tile (CMT)
Global clock inputs
More and smaller banks compared to the Virtex-4 FPGA

www.xilinx.com
1-877-XLX-CLAS
Page 21
Facilitator Guide
I/O
Show Slide 24:
SelectIO Interface Versatility

Each pin can be input and (3-stateable) output
Each pin can be individually configured for
ChipSync technology, XCITE termination, drive strength, input threshold,

and weak pull-up or pull-down
Each input can be 3.3-V tolerant; limited by its Vcco
Each I/O can have the same performance
Each I/O supports 40 plus voltage and protocol standards, including
No 5-V tolerance, unless current-limiting R is used

Up to 700 Mbps single-ended and 1.25 Gbps differential LVDS
LVCMOS (3.3 V, 2.5 V, 1.8 V,

1.5 V, and 1.2 V)
LVDS, bus LVDS, extended LVDS
LCPECL
PCI, PCI-X
Hyper Transport (LDT)
HSTL (1.8 V, 1.5 V, Classes I, II, III, IV)

HSTL_I_12 (unidirectional only)
DIFF_HSTL_I_18,
DIFF_HSTL_I_18_DCI
DIFF_HSTL_I, DIFF_HSTL_I_DCI
RSDS_25 (point-to-point)
SSTL (2.5 V, 1.8 V, Classes I, II)

DIFF_SSTL_I
DIFF_SSTL2_I_DCI
DIFF_SSTL18_I,
DIFF_SSTL18_I_DCI
GTL, GTL+
Versatile, fast, and homogeneous user I/Os

Key Points
!
Page 22
Support for the following standards has been removed:
DIFF_*_DCI
LVDS_25_DCI, LVDSEXT_25_DCI, and ULVDS_DCI
CSE complementary single-ended outputs (replaced with

differential drivers)
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
I/O
Show Slide 25:
Enhancements
Input and Output Buffers
All I/O are lower-cap I/O
Design improvements in the output buffer
No differentiation between the center column and other columns

LVDS output buffer is available on all I/O
Higher performance for single-ended I/O (700 Mbps versus 600 Mbps)
Higher performance for differential I/O (1.25 Gbps versus 1.1 Gbps)
LVDS SDR unchanged at 710 Mbps
Reduction in input differential termination (DIFF_TERM) variation

Inputs from pad can be optionally inverted
Inverters from the IOB to the fabric are removed
Show Slide 26:
Enhancements
ChipSync Technology Enhancements
New IODELAY
Used for input or output delay (IDELAY and ODELAY)
Flexibility in REFCLK frequency
Fabric access to IDELAY with optional inverter
Includes separate input from the fabric
IDELAY improvements
Can be any frequency between 175 MHz and 225 MHz
General use of the delay line
Enables building oscillators
Simplified IDELAYCTRL RESET (now edge triggered)
Separate ISERDES/OSERDES reset control

IDELYCTRL is auto placed to match the IODELAY instance
www.xilinx.com
1-877-XLX-CLAS
Page 23
Facilitator Guide
I/O
Show Slide 27:
Easy Interface to SourceSynchronous Memory
ChipSync technology
Fast regional and I/O clocks

Embedded ECC logic
Reduces logic resources

Increases performance
Data
Virtex-5
FPGA
Proven memory interfaces
ChipSync
ChipSync technology
technology
Programmable IDELAY and ODELAY

Integrated I/O SERDES
DDR-II DRAM and QDR/QDR-II, for

example
Forwarded
CLK/DQS
SelectIO
SelectIO
interface
interface
XCITE: Internal impedance control

Key Points
Page 24
XCITE: Digitally Controlled Impedance (DCI).
Series, parallel, or differential termination is supported.
Temperature and voltage compensation is digitally controlled.
Fewer resistors on the board result in easier PCB design.
Termination at the source or load is available.
Compatibility with all I/O standards (HSTL and SSTL, for

example) is supported.
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
I/O
Show Slide 28:
ISERDES Manages
Incoming Data
Frequency division
Dynamic signal alignment
Data width to 10 bits

Bit alignment
Word
Data
alignment
Clock
alignment
CLK
Supports
Dynamic
Phase Alignment
(DPA)
ChipSync
ChipSync Technology
n
ISERDES
ISERDES
BUFIO
BUFIO
FPGA Fabric
CLKDIV
CLK
BUFR
BUFR
Key Points
!
ChipSync technology provides two major functions: frequency

reduction and alignment. Although the term DPA is specifically
used in SPI-4.2 (including both bit and word alignment), the
term is also used more broadly. Because every signal has this
circuitry, including clocks, clocks can be aligned as well
making this the most flexible solution available.
www.xilinx.com
1-877-XLX-CLAS
Page 25
Facilitator Guide
I/O
Show Slide 29:
OSERDES Simplifies
Frequency Multiplication
Two separate SERDES included
Data SERDES: 2, 3, 4, 5, 6, 7, 8, 10 bits

Three-state SERDES: 1, 2, 4 bits
Ideal for memories

ChipSync
ChipSync
Technology
Technology
OSERDES
OSERDES
CLK
n
m
FPGA Fabric
CLKDIV
BUFIO/BUFR
BUFIO/BUFR
DCM/PMCD
DCM/PMCD
Key Points
Page 26
The figure shows data leaving the chip. Just as data was
divided down upon entering the chip, it must be multiplied up
when leaving. The OSERDES performs this function.
The OSERDES block also allows three-state control to be sped

up, primarily for memory buses. The 1-bit, 2-bit, or 4-bit
settings cover all the various memory configurations.
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
I/O
Show Slide 30:
Data Output Alignment

CLK
DATA
64
64delay
delayelements
elementsofof~70
~70toto89
89ps
ps
ChipSync
Technology
ODELAY
INC/DEC
State
Machine
ODELAY
ODELAYcan
canonly
onlybe
beused
usedinin
FIXED
FIXEDmode
mode
The
Thecalibration
calibrationclock
clockcan
canbe
beinternal
internal
ororexternal
external
FPGA Fabric
OSERDES
175225 MHz
(Calibration clk)
ODELAY CNTRL
Key Points
!
TIODELAYRESOLUTION = 1/(64 x FREF x 1e6).
The calibration clock range for the IDELAYCNTRL and

ODELAYCNTRL has changed from the Virtex-4 FPGA.
The IODELAY element can now be used independently with

the direct input from the fabric. Also, the delay element can be
used for input or output delay. There is only one delay element
shared by the direct input from the fabric, input logic, and
output logic.
www.xilinx.com
1-877-XLX-CLAS
Page 27
Facilitator Guide
I/O
Show Slide 31:
Use Examples
SDR resources utilizing ILOGIC and OLOGIC resources can be inferred

IDDR can be inferred
ODDR, ISERDES, and OSERDES resources must be instantiated
See Xilinx Answer Record 15776

Instantiate primitives
IP (CORE Generator & Architecture Wizard) ChipSync Wizard
Memory Interface Generator (MIG)
Virtex-5 FPGA support is available in v1.6
Key Points
!
Page 28
The ChipSync Wizard configures a group of I/O blocks into an

interface for use in memory, networking, or any other type of
bus interface. The ChipSync Wizard creates HDL code with
these features configured according to your input.
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
I/O
Show Slide 32:
ChipSync Wizard
Memory Applications: General and Data Setup
Key Points
!
Single data rate:

If the fabric data width is greater than 1, this selection sets
the DATA_RATE data attribute to SDR for the ISERDES and
OSERDES blocks in the resulting configuration.
If the fabric data width is 1, IFD and OFD blocks (flip-flops)
are used instead of ISERDES and OSERDES blocks.
Double data rate:

If the fabric data width is greater than 2, this selection sets
the DATA_RATE data attribute to DDR for the ISERDES
and OSERDES blocks in the resulting configuration.
If the fabric data width is 2, IDDR and ODDR blocks are
used instead of ISERDES and OSERDES blocks.
Number of data bits per clock/strobe:

Specifies the number of data bits in the bus that will be
clocked by each clock or strobe.
www.xilinx.com
1-877-XLX-CLAS
Page 29
Facilitator Guide
I/O
Key Points
!
DDR_CLK_EDGE property setting:

This option appears only when the data rate is set to double
data rate and the fabric data width is set to 2 in the General
Setup dialog box.
For a description of the DDR modes specified by the
DDR_CLK_EDGE property, refer to Chapter 7, SelectIO
Logic Resources in the Virtex-5 User Guide.
Show Slide 33:
Memory Interface Generator
Generates a complete memory

controller and interface design
Output: RTL, UCF,

documentation, and timing
analysis
Choose from a predefined

catalog of available devices
and interfaces
Checks SSO and all pin selection
rules
VHDL or Verilog
Included with the CORE

Generator software
Key Points
!
Page 30
Also available for the CORE Generator software in standalone

mode.
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
I/O
Show Slide 34:
1) Describe the I/O features of the Virtex-5 FPGA
www.xilinx.com
1-877-XLX-CLAS
Page 31
Facilitator Guide
Block RAMs and FIFO

Show Slide 35:
Lessons
Overview
I/O
Block RAMs and FIFO
Other Features
Summary
Show Slide 36:
Virtex-5 FPGA Block RAM

and FIFO Enhancements
36-kb size
Performance up to 550 MHz

Multiple configurations
True dual port, simple dual port, single port
Enhances PowerPC processor memory interfacing
Integrated 64-bit error correction

No issues with synchronous clocks on FIFO18/FIFO36
Reduced power
Page 32
Dual-Port
BRAM
64kb x 1 integrated cascade logic

Maximum data width = 72
Byte-write enable
One 36-kb block RAM or FIFO

Two independent 18-kb RAMs
One 18-kb RAM and one 18-kb FIFO
www.xilinx.com
1-877-XLX-CLAS
or
FIFO
Facilitator Guide
Block RAMs and FIFO

Key Points
!
Maximum frequency increased 10 percent
Setup and clock-to-out delays reduced 20 percent
Dynamic power reduced
Capacity doubled
More features added
Show Slide 37:
Virtex-5 FPGA Block RAM

Architecture
18-kb RAM
CLB
CLB
Five
CLBs
High
CLB
ECC & Interconnect
CLB
9-kb
RAM
9-kb
RAM
IO + Control Logic
FIFO Logic
18-kb RAM
9-kb
RAM
9-kb
RAM
CLB
Key Points
!
Note that each 18-kb RAM is divided into two 9-kb RAMs. This
distinguishing feature helps to reduce power and heat in that
location.
www.xilinx.com
1-877-XLX-CLAS
Page 33
Facilitator Guide
Block RAMs and FIFO

Show Slide 38:
Independent 18-kb Block

RAM and FIFO
Virtex-5 FPGA block RAM and FIFO can operate as
One 36-kb block RAM and FIFO or

Two independent 18-kb block RAMs or one 18-kb block RAM and
independent 18-kb FIFO
Backwards compatible with the Virtex-4 FPGA
One ECC per tile

36
36
36-kb
Block RAM
or
FIFO
OR
36
18-kb
Block RAM
36
18-kb
Block RAM
or
FIFO
Show Slide 39:
Simple Dual-Port or SinglePort Block RAM
Three different styles
Single-port block RAM: one address

driving both ports
Configurations
32kb x 1, 16kb x 2, 8kb x 4, 4kb x 9,
2kb x 18, 1kb x 36
Simple dual-port block RAM: one read

port, one write port
Configurations
32kb x 1, 16kb x 2, 8kb x 4, 4kb x 9,
2kb x 18, 1kb x 36, 512x72
512x72 uses both 18-kb
block RAMs as 512x36
Page 34
www.xilinx.com
1-877-XLX-CLAS
Addr A
Port A
36
Wdata A
36
Rdata A
36-kb
Memory
Array
Addr B
36
Wdata B
Port B
Rdata B
36
Facilitator Guide
Block RAMs and FIFO

Show Slide 40:
True Dual-Port Block RAM
Three different styles
True dual-port block RAM: unrestricted flexibility
Can perform read and write operations

simultaneously and independently on
Port A and Port B
In one clock cycle, a total of four
operations can be performed
using both Port A and Port B
Read before write, write before
read, or no change
Wide range of configurations
32kb x 1, 16kb x 2, 8kb x 4, 4kb x 9,
2kb x 18, 1kb x 36
Largest width in the Virtex-4 FPGA
is 512x36
Addr A
Port A
36
Wdata A
36
Rdata A
36-kb
Memory
Array
Addr B
36
Port B
36
Rdata B
Wdata B
Show Slide 41:
Block RAM is Cascadable
DQ
Built-in cascade logic for 64kb x 1
DQ
Cascade two adjacent 32-kb block RAMs without

using external CLB logic or compromising
performance
Cascade option for larger arrays using external

CLB logic
DI
A[1
3:0
]
Ram_ Extension
DQ
DI
A[13:0]
1 DO
0
A14
11
1
0
DQ
WE _ Control
DQ
DI
DQ
A[13:0]
Ram_ Extension
DQ
11
1
0
DQ
(To
(To Initiate
Initiate Write
Write Operation)
Operation)
Not Used
1
0
A14
WE _ Control
(To
(To Initiate
Initiate Write
Write Operation)
Operation)
128 kb, 256 kb, 512 kb, 1 Mb,

For depth or width expansion
Example:
Example: Cascade
Cascade eight
eight block
block RAMs
RAMs to
to
build
build 256-kb
256-kb memory
memory
www.xilinx.com
1-877-XLX-CLAS
Page 35
Facilitator Guide
Block RAMs and FIFO

Show Slide 42:
Output Register Set/Reset
Latch mode (DO_REG = 0)
Operation is the same as in the Virtex-4 FPGA

SSR and EN will set/reset the output latch to SRVAL
REG mode (DO_REG = 1)
SSR and EN will set/reset

the output register to
SRVAL
Block RAM
DATA_IN
can be read
SSR
or written
EN[A/B]
by the
other port
REGCE[A/B]
during SSR
36-kb
Block RAM
Memory
Array
Latch
SSR
REG
SSR
(DO_REG=1)
Key Points
!
Page 36
Block RAM can also be read or written by the other port during
SSR in latch mode (DO_REG = 0).
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Block RAMs and FIFO

Show Slide 43:
FIFO18/36 Top-Level View
550-MHz maximum frequency
Full featured
2x performance increase over soft implementations

Synchronous or asynchronous read and write
clocks
Four flags
Full, empty, programmable almost full, and

programmable almost empty
DOUT Bus
WREN
> WRCLK
FULL
AFULL
EMPTY
AEMPTY
RDERR
WRERR
RDEN
> RDCLK
RESET
RDCONT<11:0>
WRDCONT<11:>
Optional First Word Fall Through (FWFT)
No phase relationship required
DIN Bus
Immediate availability of the first word after empty

FIFO configurations (same width for read and
write)
FIFO36: 8kb x 4, 4kb x 9, 2kb x 18, 1kb x 36,512x72

FIFO read port is block RAM Port A
FIFO18: 4kb x 4, 2kb x 9, 1kb x 18, 512x36
Utilizes RAMB18/36 for memory in simple dual-port FIFO write port is block RAM Port B
style
Show Slide 44:
FIFO18/36
18-kb or 36-kb configuration
If used in 18-kb mode, the

other 18 kb can only be
used as block RAM
Two modes
Multirate or Synchronous
Attribute: EN_SYN
Not supported
Independent read/write port width

Byte write enable
Dedicated cascade logic
www.xilinx.com
1-877-XLX-CLAS
Page 37
Facilitator Guide
Block RAMs and FIFO

Key Points
!
Reading data from the FIFO is synchronous to the rising edge

of RDCLK.
Writing data to the FIFO is synchronous to the rising edge of

WRCLK.
The Full and Almost Full flags are synchronous to the write
clock (WRCLK).
The Empty and Almost Empty flags are synchronous to the

read clock (RDCLK).
Show Slide 45:
Two Modes
Multirate (asynchronous clocks)
Can be used in Standard or

FWFT mode
EN_SYN = FALSE
(default)
DO_REG = 1
Synchronous
Can be used in Standard

mode only
FIRST_WORD_FALL_THROUGH =
FALSE (default)
EN_SYN = TRUE
DO_REG = 0, 1
If DO_REG = 1, adds a pipeline stage to flags and outputimproving Tcko
Page 38
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Block RAMs and FIFO

Show Slide 46:
Virtex-5 FPGA FIFOs are

Cascadable
Flexible FIFO configuration
No dedicated cascade logic

Expand width, depth, or
both using fabric logic
DIN<35:0>
DIN<35:0> DOUT<35:0>
WREN
RDEN
EMPTY
WRCLK
AFULL
RDCLK
DOUT<35:O>
DIN<35:0> DOUT<35:0>
WREN
RDEN
EMPTY
WRCLK
AFULL
RDCLK
DOUT<71:36>
FIFO
#1
RDEN
WREN
RDEN
DIN<71:36>
WREN
FIFO
#1
EMPTY
AFULL
1kx72 FIFO
DIN<3:0>
WREN
DIN<3:0> DOUT<3:0>
DIN<3:0> DOUT<3:0>
Data_Avail
WREN
Data_Taken
WRCLK
RDCLK
WREN
FIFO
#1
WRCLK
RDCLK
RDEN
RDEN
WRCLK
RDCLK
DOUT<3:0>
Width Cascade
AFULL
FIFO
#2
16kx4 FIFO
Depth Cascade
Show Slide 47:
Block RAM and FIFO Use
Inference of block RAM is possible
Specific coding techniques are required
Most block RAM capabilities are available

Dual port, individual clocks, separate read/write ports, output register,
set/reset
See the XST Users Guide RAMs and ROMs
Examples: ftp://ftp.xilinx.com/pub/documentation/misc/examples_v8.zip
Inference of FIFO18/36 is not possible
Xilinx suggests that you use IP (CORE Generator & Architecture Wizard)
www.xilinx.com
1-877-XLX-CLAS
Page 39
Facilitator Guide
Block RAMs and FIFO

Key Points
!
Xilinx suggests instantiation of memory cores for the following

reasons.
Portability: If you change to the latest device, you can swap out
new cores to utilize new features. In addition, each family
and/or vendor will have different memory capabilities.
The cores that were created by the IP (CORE Generator &

Architecture Wizard) tool will:
Create nearly any size memory and automatically include
any extra logic that is required for connecting or cascading
Specify the required attributes based on the GUI selections
Only bring out the necessary portsgreatly simplifying
HDL instantiation
Show Slide 48:
IP (CORE Generator &

Architecture Wizard)
Page 40
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Block RAMs and FIFO

Key Points
!
The memories that were created from the CORE Generator

and FIFO Generator software automatically include the
necessary constraints and attributesmaking it easy to
instantiate the resulting core into your code with minimal
effort.
Show Slide 49:
2) Compare the following I/O resources in the Virtex-5 FPGA to the Virtex4 FPGA
Banking
ChipSync technology resources
www.xilinx.com
1-877-XLX-CLAS
Page 41
Facilitator Guide

Show Slide 50:
Lessons
Overview
I/O
Block RAMs and FIFO
Other Features
Summary
Show Slide 51:
0
1
PCOUT
BCOUT
Virtex-4 FPGA DSP48 Slice

A:B
Subtract
18
M
A
18
48
P
17-bit shift
18x18
18x182s
2scomplement
complementmultiplier
multiplier
17-bit shift
CARRYIN
48-bit
48-bitadder/subtractor/accumulator
adder/subtractor/accumulator
Dynamic
Dynamicuser-controlled
user-controlledoperating
operatingmodes
modes
OPMODE
17-bit
17-bitright
rightshift
shiftfor
formulti-precision
multi-precisionmultiplies
multiplies
Optional
Optionalinput/pipeline/output
input/pipeline/outputregisters
registers
Symmetric
Symmetricrounding
roundingsupport
support
Page 42
www.xilinx.com
1-877-XLX-CLAS
PCIN
BCIN
Cascading
Cascading18-bit
18-bitBBbus
busand
and48-bit
48-bitPPbus
bus
Facilitator Guide

Key Points
!
As a reminder, here is the DSP48 slice in the Virtex-4 FPGA. It

will be used as a comparison with the Virtex-5 FPGA DSP48E
slice.
Show Slide 52:
0
1
PCOUT
BCOUT
Virtex-5 FPGA (25x18)

Multiplier
A:B
Subtract
18
25
25
18
M
P
48
P
C
More
Moreefficient
efficientfor
for25x25
25x25applications
applications
17-bit shift
17-bit shift
CARRYIN
four
fourDSP48s)
DSP48s)
Single
Singleprecision
precisionfloating
floatingpoint
point
multiplication;
multiplication;24x24
24x24unsigned
unsigned
High-end
High-endaudio
audioand
andimage
imageprocessing
processing
More
Moreefficient
efficientfor
forcomplex
complex25x18
25x18multipliers
multipliers
Low
Lowpower
powerFFTs
FFTs(4G
(4Gwireless)
wireless)
OpMode
PCIN
BCIN
35x25
35x25inintwo
twoDSP48E
DSP48Eslices
slices(vs.
(vs.35x35
35x35inin
Key Points
!
USE_MULT replaces LEGACY_MODE. USE_MULT=NONE

disables the multiplier to lower power.
ADD/ACC/MACC extension; 96-bit ADD/ACC via two cascaded

DSP48Es:
!
Primarily used in CIC filters
Also useful for single precision floating point addition
Internal CARRYOUT signal (CARRYCASCOUT/

CARRYCASCIN) facilitates 96-bit ACC
Fabric output register is needed to align the lower 48-bit P with

upper if pipelining
Two-deep A:B provides upper DSP48E slice input alignment
More headroom for 25x18 MACC

www.xilinx.com
1-877-XLX-CLAS
Page 43
Facilitator Guide

Key Points
!
Ternary adder used in MACC function (single CARRYOUT not

sufficient)
Additional CARRYOUT signal needed to extend the MACC
internally (MULTSIGNOUT/MULTSIGNIN)
MULTSIGNOUT not available as fabric output
Special OPMODE[6:0] = 1001000 for upper DSP48E
Lower DSP48E uses normal OPMODE setting for MACC
Show Slide 53:
48
0
1
PCOUT
BCOUT
Independent C Input
A:B
Subtract
18
25
25
18
48
C
Virtex-5
Virtex-5FPGA
FPGADSP:
DSP:Independent
IndependentCCinput
input
M
P
48
P
Eliminates
EliminatesVirtex-4
Virtex-4FPGA
FPGAissues
issuessuch
such
as
as
17-bit shift
17-bit shift
CARRYIN
within
withinaatile
tile
Simulation
Simulationissues
issuesinincases
caseswhere
wheretwo
two
DSP48s
are
DSP48s areininaatile
tileand
andonly
onlyone
oneuses
uses
CCinput
input
Requires
DRC
checks
Requires DRC checks
Understanding
Understandingthe
therules
rulesand
and
regulations
regulationsofofusing
usingthe
theCCinput
input
Page 44
www.xilinx.com
1-877-XLX-CLAS
OpMode
PCIN
BCIN
MAP
MAPproblems
problemswith
withDSP48
DSP48slices
slices
Facilitator Guide

Show Slide 54:
48
0
1
A:B
ALUMode
18
PCOUT
BCOUT
SIMD and Logic Unit
30
25
25
18
M
0
48
C
C
A:B
A:Bexpanded
expandedtoto48
48bits
bits(36
(36bits
bitsininthe
theVirtex-4
Virtex-4FPGA)
FPGA)
SIMD
SIMD(Single
(SingleInstruction
InstructionMultiple
MultipleData)
Data)
17-bit shift
17-bit shift
48-bit adder is splittable into segments

48-bit adder is splittable into segments
Quad 12-bit or dual 24-bit configurations
Quad 12-bit or dual 24-bit configurations
Common control/instruction: OPMODE and ALUMODE
Common control/instruction: OPMODE and ALUMODE
CARRYOUTs for each segment (2-input arithmetic)
CARRYOUTs for each segment (2-input arithmetic)
CARRYINs only available to the lowest segment
CARRYINs only available to the lowest segment
CARRYIN
OpMode
PCIN
BCIN
48
Bit-wise
Bit-wiselogic
logicoperations
operationsavailable
available
XOR,
XOR,XNOR,
XNOR,AND,
AND,NAND,
NAND,OR,
OR,NOR,
NOR,NOT
NOT
Controlled
Controlleddynamically
dynamicallybybyALUMODE
ALUMODE
Show Slide 55:
Two-Input Logic Functions

ALUMODEs
ALUMODE[3:0]
0
P
A:B
0
1
0
PCIN
P
C
OPMODE[3:0]
Logic Unit Mode
OPMODE[3:2]
ALUMODE[3:0]
X XOR Z
00
0100
X XNOR Z
00
0101
X XNOR Z
00
0110
X XOR Z
00
0111
X AND Z
00
1100
X AND (NOT Z)
00
1101
X NAND Z
00
1110
(NOT X) OR Z
00
1111
X XNOR Z
10
0100
X XOR Z
10
0101
X XOR Z
10
0110
X XNOR Z
10
0111
X OR Z
10
1100
X OR (NOT Z)
10
1101
X NOR Z
10
1110
(NOT X) AND Z
10
1111
www.xilinx.com
1-877-XLX-CLAS
Page 45
Facilitator Guide

ALUMODEs
Logic Unit Mode
X XOR Z
OPMODE[3:2]
00
ALUMODE[3:0]
0100
X XNOR Z
00
0101
X XNOR Z
00
0110
X XOR Z
00
0111
X AND Z
00
1100
X AND (NOT Z)
00
1101
X NAND Z
00
1110
(NOT X) OR Z
00
1111
X XNOR Z
10
0100
X XOR Z
10
0101
X XOR Z
10
0110
X XNOR Z
10
0111
X OR Z
10
1100
X OR (NOT Z)
10
1101
X NOR Z
10
1110
(NOT X) AND Z
10
1111
text
Key Points
!
This table shows how the ALU can be configured for two-input
operations where the multiplier output is not used. If
OPMODE[3:2] is set to 00, then the Y multiplexer is
contributing a value of 0 to the 3-input adder. If OPMODE[3:2]
is set to 10, then the Y multiplexer is contributing an all 1s
value to the adder.
48-bit dynamic ALU-like functionality
Limited shift capability:

1-bit left shift, 17-bit right shift, but no 1-bit right shift
1-bit barrel shift
Additional logic operations:
Page 46
48-bit bitwise XOR, XNOR, AND, NAND, OR, NOR, NOT
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide

Show Slide 56:
48
1
A
A:B
ALUMode
18
30
0
PCOUT
ACOUT
BCOUT
A Input Cascade
25
25
18
48
0
1
48
P
Lower
Lowerpower
powerconsumption
consumption
17-bit shift
17-bit shift
Dedicated
Dedicatedrouting
routingwithin
withinthe
theDSP
DSPcolumn
column
CARRYIN
Allows
Allowsefficient
efficientadaptive
adaptivefilter
filterimplementation
implementation
Loads
Loadscoefficients
coefficientsserially
seriallyininaashadow
shadowregister
register
while
whilethe
thefilter
filterisisstill
stilloperating
operating
New
coefficients
loaded
to
the
filter
register
in
New coefficients loaded to the filter register in
parallel
parallel
Separate 2-deep A/B CE facilitates wave CE
Separate 2-deep A/B CE facilitates wave CE
OpMode
PCIN
ACIN
BCIN
Show Slide 57:
1
A
A:B
ALUMode
18
30
0
PCOUT
48
25
25
18
48
Extend
Extendsymmetric
symmetricrounding
roundingtotomulti-precision
multi-precision
operations
operations
Support
Supportfor
forconvergent
convergentrounding
rounding
0
1
17-bit shift
17-bit shift
ACIN
BCIN
PATTERN_DETECT
CARRYIN
Requires fabric logic; dynamic rounding point

Requires fabric logic; dynamic rounding point
C or MC
Overflow/underflow
Overflow/underflowimplemented
implementedininDSP48E
DSP48E
Support
Supportfor
foraccumulator
accumulatorterminal
terminalcount
count
Support
Supportfor
forsaturation
saturationlogic
logic
48
Counter
Counterauto-reset
auto-reset
OpMode
PCIN
ACOUT
BCOUT
Pattern Detector
Pattern
Patterndetector
detectoroutputs
outputsslower
slowerthan
thanPP
www.xilinx.com
1-877-XLX-CLAS
Page 47
Facilitator Guide

Key Points
!
The pattern detector at the output of the DSP48E slice provides

support for convergent rounding, overflow/underflow, block
floating point, and support for accumulator terminal count
(counter auto reset). The pattern detector can detect if the
output of the DSP48E slice matches a pattern as qualified by a
mask. This enables functions such as A:B NAND C = = 0 or A:B
(bitwise logic) C = = Pattern to be implemented.
For more information on pattern detection, refer to the Virtex-5

XtremeDSP Design Considerations User Guide.
Show Slide 58:
Multiply (35 X 25)

25
DSP48_1
OPMODE 0010101
ALUMODE 0000
B[34:17]
18
ACIN
DSP48_0
OPMODE 0000101
ALUMODE 0000
A
A[24:0]
Page 48
25
0,B[16:0]
P[42:0] = OUT[59:17]
SHIFT 17
P
18
www.xilinx.com
1-877-XLX-CLAS
P[16:0] = OUT[16:0]
Facilitator Guide

Show Slide 59:
Implement or Accelerate
DSP Functions
DSP Operation
Logic
DSP48E
Fast Fourier Transform (FFT)

Finite Impulse Response (FIR)
Infinite Impulse Response (IIR)
C Integer Comb (CIC)
Quadrature Filter
Decimating Filter
Interpolating Filter
Linear Phase Filter
CORDIC Functions
Butterworth Function
Chebyshev Function
Bessel Function
Forward Error Correction (FEC)
Pre-distortion
Encoding
Encryption
Compression
Show Slide 60:
IP Support
IP (COREGen & Architecture Wizard)
IP is currently supported in the IP (COREGen and Architecture Wizard)

tool for the ISE 10.1i software
A sampling of cores to be supported
Multiplier
Adder
Multiply and Accumulate (MAC)
Dynamic Control
MAC FIR
MAD
Serial Divider
CORDIC
FFT
SIN COS LUT
DDS
Multiplier Generator
www.xilinx.com
1-877-XLX-CLAS
Page 49
Facilitator Guide

Show Slide 61:
High Precision, High

Bandwidth
Virtex-5 FPGA
Solution
High-Precision Functions
Number of
Function
Instances in
V5LX330
Maximum
Bandwidth
@ 500 MHz
25x18 MACC
1 DSP48E
Slice
192 Operations
105 GMACCs/sec
25x18 Multiply plus Addition/Subtraction
1 DSP48E
Slice
192 Operations
210 GOPs/sec
48+48 Addition/Subtraction
1 DSP48E
Slice
192 Operations
105 GOPs/sec
35x25 Complex Multiplication
4 DSP48E
Slices
48 Operations
26 GOPs/sec
24x24 Single Precision Floating Point
2 DSP48E
Slices
96 Operations
53 GOPs/sec
Show Slide 62:

Dynamically Reconfigurable DSP OPMODEs
OPMODEs
OPMODE
1
0
0
0
0
1
1
0
1
1
X Select
OPMODE
2
0
1
0
1
Y Select
Notes
0
M
48'hffffffffffff
C
Default
Must select with OPMODE[1:0]=01
Used mainly for ALU bitwise operations
Z Select
Notes
3
0
0
1
1
6
0
0
0
0
1
1
1
1
OPMODE
5
0
0
1
1
0
0
1
1
0
M
P
A:B
4
0
1
0
1
0
1
0
1
0
PCIN
P
C
P
Shift(PCIN)
Shift(P)
Notes
Default
or ((-Z + (X + Y + PCIN) 1)(1)
Default
3) Given this OPMODE table, what is the

OPMODE for the following functions?
Used for MACC extend only
C + A:B
(A x B) + C
P + C + PCIN
Illegal selection
Page 50
Add/Subtract Output: (Z +/+/- (X + Y + PCIN)
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
OPMODEs
OPMODE
1
0
0
0
0
1
1
0
1
1
X Select
OPMODE
2
0
1
0
1
Y Select
Notes
0
M
48'hffffffffffff
C
Default
Used mainly for ALU bitwise operations
Z Select
Notes
3
0
0
1
1
6
0
0
0
0
1
1
1
1
OPMODE
5
0
0
1
1
0
0
1
1
0
M
P
A:B
4
0
1
0
1
0
1
0
1
0
PCIN
P
C
P
Shift(PCIN)
Shift(P)
Notes
Default
Default
Used for MACC extend only
Illegal selection
www.xilinx.com
1-877-XLX-CLAS
Page 51
Facilitator Guide

Key Points
Page 52
There are over 40 different modes. Each DSP48E slice is

individually controllable. Logic-driven or memory-driven
operation can be changed in a single clock cycleenabling
resource sharing for maximum utilization.
Note: The add/subtract functionality depends also on the

ALUMODE selected. For example, if ALUMODE = 0001 and
CARRYIN = 1, the function implemented is X Minus Z.
M = multiplier output
P = P registers
C = C input
A = A input
B = B input
A:B = A concatenated with B
PCIN = Cascaded PCOUT from previous DSP48E slice
Shift (PCIN) = 17-bit shifted PCIN
Shift (P) = 17-bit shifted P
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Other Features
Show Slide 63:
Lessons
Overview
I/O
Block RAMs and FIFO
Other Features
Summary
Show Slide 64:
Virtex-5 FPGA Tri-Mode

EMAC Description
Second-generation Tri-Mode10/100/1000 Mbps

Ethernet MAC blocks
UNH compliance tested

Four integrated TEMACs in
every Virtex-5 LXT and SXT device
Can be used with the RocketIO
GTP transceivers to build fully
integrated 1000-Base X Interface
Saves programmable logic resources
Full or half duplex
EMAC
EMAC
EMAC
EMAC
www.xilinx.com
1-877-XLX-CLAS
Page 53
Facilitator Guide
Other Features
Key Points
!
Fully integrated 10/100/1000 Mbps Ethernet Media Access

Controller:
The TEMAC supports a configurable full-duplex operation
in 10/100/1000 Mbps. It also supports a configurable halfduplex operation in 10/100 Mbps.
Each one is dedicated in the silicon, so it is proven
technology. These were built from experience with popular
IP from Xilinxthe EMAC from the CORE Generator
software. The Tri-Mode EMAC is IEEE 802.3 compliant.
Originally in the Virtex-4 FPGA, the EMAC block was a part
of the PPC block. But in the Virtex-5 family, it is its own
independent resource, located as part of a block RAM
column. There are two EMAC blockseach EMAC block
has two independent EMACs with a shared host interface.
As always, remember that this is a dedicated resource that,
if not used, will be wasted.
The CORE Generator software provides an example design
that shows a programmable PHY interface and a client side
that connects to the FPGA resources via a FIFO. For more
information, see the Virtex-5 data sheet and the sample
design referenced in the EMAC User Guide.
Page 54
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Other Features
Show Slide 65:
Full-Featured Ethernet
Functionality
IEEE 802.3 compliant

Programmable PHY interface support
Supports VLAN and jumbo frames

Receive address filter
Network traffic monitoring and filtering
Use RocketIO transceiver or

SelectIO technology
MII, GMII, RGMII, SGMII
PCS/PMA for 1000BASE-X
Real-time statistics for TX/RX
Fewer clocks needed than

previous generation
Key Points
!
The TEMAC is fully featured:

The client side of the embedded Ethernet MAC can be
connected to a Direct Memory Access Controller (DMA
engine). The DMA engine is then connected to the processor
bus, which allows an embedded processor to access the
Ethernet port. The TEMAC also supports a hardwareselectable Device Control Register (DCR) bus or generic host
bus interface.
The client side of the embedded Ethernet MAC is connected
to a FIFO to complete a single Ethernet port. This port is
connected to a switch or routing matrix, which can contain
several ports and be directly connected to the FPGA logic
resources.
www.xilinx.com
1-877-XLX-CLAS
Page 55
Facilitator Guide
Other Features
Key Points
The CORE Generator software provides an example design
for the embedded Tri-Mode Ethernet MAC in the Virtex-5
FPGA for any of the supported physical interfaces. The
supported PHY interfaces include GMII, MII, RGMII, and
SGMII. These interfaces are implemented inside the FPGA
by using programmable logic; they are not dedicated.
However, the CORE Generator software makes creating
these interfaces relatively easy.
The TEMAC resides in the same column as the dedicated
PCI core.
Show Slide 66:
Virtex-5 FPGA PCI Express

Integrated Endpoint Block
Full featured and compliant to base specification 1.1
Saves FPGA resources
Integrated in all T devices

Adjacent to high-speed serial transceivers
Electrical signaling
Protocol (CRC, automatic retry)
Quality of Service (QoS)
Hot pluggable
CC
FF
GG
PHY
PHY Layer
Layer
Data
Data Layer
Layer
Trans.
Trans. Layer
Layer
Embedded
Embedded
PCI
PCI Core
Core
Supports 1-, 2-, 4-, or 8-lane implementations

Uses transceiver blocks to provide fully integrated PCIe core endpoint
Page 56
GTP
GTP Transceiver
Transceiver
1,
1, 2,
2, 44 or
or 88 Lanes
Lanes
Meets all key requirements
Highly configurable endpoint solution
www.xilinx.com
1-877-XLX-CLAS
Virtex-5
FPGA
Facilitator Guide
Other Features
Key Points
!
The PCIe integrated Endpoint block is highly complex and

customizable. The PCIe Wizard is provided to customize and
generate a PCIe standard subsystem via a simple set of menu
options. The PCIe standard subsystem contains the PCIe
integrated Endpoint block, GTP transceiver tiles, block RAMs,
clock module, and a reset module, which are all automatically
configured and connected. The options available in the wizard
determine the correct attribute settings and tie off any
unneeded ports. Selecting the desired options in the wizard
generates a completely customized wrapper.
The PCIe cores are placed in a column of block RAM on the

right side of the die.
Show Slide 67:
GTP Transceiver
Industrys lowest power MGTs
Advanced features and capabilities
Flexible TX and RX equalization
Ease-of-design with new design and

debug tools
Now available in all LXT and SXT devices
Shortening design cycles and reducing

time to market
Enhanced standards support
Covering serial standards between 100

Mbps and 3.2 Gbps
Embedded hard cores: PCI Express core
and Ethernet
Virtex-5 LXT FPGA die has a

column of GTP transceivers
Key Points
!
GTP transceivers are placed as dual transceiver GTP_DUAL

tiles in the Virtex-5 LXT devices. This configuration allows two
transceivers to share a single PLL with the TX and RX functions
of both, reducing size and power consumption. The GTP
transceivers are placed on the right edge of the die.
www.xilinx.com
1-877-XLX-CLAS
Page 57
Facilitator Guide
Other Features
Show Slide 68:
GTP Transceiver Standards

Coverage
Market
Datacom
Telecom
Computing/Communication
Storage
Video
Standard
Speed
(bits per second per channel)
1G Ethernet
1.25 G
XAUI
3.125 G
10G Base CX-4
3.125 G (x4)
OC-3, OC-12, OC-48 /

SDH STM-1, STM-4, STM-16
155 M, 622 M, 2.488 G
OBSAI
768 M, 1.536 G, 3.072 G
CPRI
614 M,1.228 G, 2.457 G
SFI-5
2.448 - 3.125 G
PCI Express Standard
2.5 G
Serial Rapid IO
3.125 G
InfiniBand
2.5 G
Fibre Channel
1.0625 G, 2.125 G
SATA
1.5 G, 3.0 G
SAS
1.5 G, 3.0 G
SDI
270 M
DVB-ASI
270 M
HD-SDI
1.485 G, 1.4835 G, 2.97 G
Key Points
!
Page 58
The RocketIO Wizard automatically configures GTP and GTX

transceivers to support one of these protocols or performs
custom configurations.
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Other Features
Show Slide 69:
GTX Transceiver
High-performance MGTs
Available in all FXT devices

Additional serial standards up to 6.5 Gbps
PCI Express standard Gen2 (5.0 Gbps)

Interlaken (3.125, 6.25 Gbps)
OIF-CE16G(SR) and (LR) (6.25 Gbps)
FC-4 (4.25 Gbps)
SATA Gen 3 (6.0 Gbps)
SAS Rev 5 (6.0 Gbps)
Serial RapidIO standard (6.25 Gbps)
Advanced features and capabilities
Flexible gearbox to support 64B/66B and 64B/67B encoding
Show Slide 70:
GTP and GTX Transceiver

Tool Support
Xilinx standard tools and design flow
ISE software 10.1

RocketIO Wizard in the CORE Generator tool
IBERT in the ChipScope Pro tools
IBERT: Integrated Bit Error Rate Tester
SmartModel simulations on industry-leading platforms

Cadence NC Verilog
Mentor ModelSim
Synopsys VCS
HSPICE models for signal integrity simulation and analysis
www.xilinx.com
1-877-XLX-CLAS
Page 59
Facilitator Guide
Other Features
Show Slide 71:
Integrated PowerPC 440

Processor Core
High performance
>1100 DMIPS @ 550 MHz

7-stage execution pipeline
Third-generation FPGA with the

PowerPC processor
Enhanced CoreConnect bus architecture

Processor Local Bus (PLB v4.6) interface
Key Points
Page 60
The PowerPC processor is available in FXT platform devices

only.
PowerPC processor development is covered in the Embedded

Systems Development course.
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Other Features
Show Slide 72:
Enhanced PowerPC 440

Processor Block
Enhanced CoreConnect bus architecture
128-bit PLB v4.6
Non-blocking crossbar for higher bandwidth and low latency

Dedicated interface for connection to block RAM and external memory
Auto-synchronized for non-integer PLB-to-CPU clock ratios
All IP cores have been updated to support PLB v4.6
MicroBlaze processor v7 also supports PLB v4.6
32-kB level 1 Instruction and data caches
www.xilinx.com
1-877-XLX-CLAS
Page 61
Facilitator Guide
Summary
Show Slide 73:
Lessons
Overview
I/O
Block RAMs and FIFO
Other Features
Summary
Show Slide 74:
4) What is the easiest method for building resources such as I/O,

memory, and DSP48 functions?
Page 62
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Summary
Show Slide 75:
Summary
All I/O are lower-cap I/O

Center column I/O banks have 20 pins
Each region has one I/O bank with 40 pins
The I/O blocks contain I/O register resources as well as I/OSERDES
The I/OSERDES block provides source-synchronous capabilities utilizing
dedicated resources
The XtremeDSP solution block provides maximum performance and low
power for DSP applications
Block RAMs are now configurable (for smaller memory applications) and
cascadable (for larger memory applications)
The FIFO16 resources are implemented with dedicated FIFO logic and
are also cascadable
Where Can I Learn More?

!
Virtex-5 FPGA data sheets
Virtex-5 FPGA user guides

Virtex-5 FPGA User Guide
Virtex-5 FPGA XtremeDSP Design Considerations User Guide
DSP, I/O, block RAM, and FIFO primitives: Software

Documentation Libraries Guide
Virtex-5 FPGA home page

www.xilinx.com/virtex5
Links to everything related to the Virtex-5 FPGA: white
papers, boards, training, data sheets, and user guides
Virtex-5 FPGA memory application notes

Memory interface data capture, DDR-2 controllers, QDR II
SRAM, and DDR SDRAM controller
Application Note XAPP802: Memory Interface Application
Notes Overview
www.xilinx.com
1-877-XLX-CLAS
Page 63
Facilitator Guide
Summary
!
Memory Corner: www.xilinx.com Technology Solutions

Memory
Includes the Memory Interface Generator
Page 64
Software manuals
Xilinx Education Services courses
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide

Answers
1) Describe the I/O features of the Virtex-4 FPGA.

!
ILOGIC includes SDR and DDR input register resources

DDR: Added same edge and same edge pipelined
OLOGIC includes SDR and DDR output register resources

DDR: Added same edge
ISERDES includes 1-to-10 serial-to-parallel converter, BITSLIP,

IDELAY
One to up to 10 serial-to-parallel converter utilizing a
master/slave ISERDES pair
OSERDES includes 10-to-1 parallel-to-serial converter

(Up to) 10-to-1 parallel-to-serial converter utilizing a
master/slave OSERDES pair
Answers
2) Compare the following I/O resources in the Virtex-5 FPGA to the Virtex-4
FPGA.
!
Banking
ChipSync technology resources
Electrical
Standards
Banking
Architecture
ChipSync
Technology
Virtex-4 FPGA
Virtex-5 FPGA
>30
>40
(with clock-capable I/O)
(all I/Os same, homogeneous)
64 I/Os per bank
40 I/Os per bank
9 to 17 banks
13 to 35 banks
First generation
Added output delay (ODELAY)
text
www.xilinx.com
1-877-XLX-CLAS
Page 65
Facilitator Guide

Answers
3) Given this OPMODE table, what is the OPMODE for the

following functions?
!
C + A:B
OPMODE = 011 00 11 or 000 11 11
(A x B) + C
OPMODE = 011 01 01
P + C + PCIN
OPMODE = 001 11 10
4) What is the easiest method for building resources such as I/O,

memory, and DSP48 functions?
!
Inference
Basic I/O (single-ended)
Single Block RAMs
Multipliers
Use of CORE Generator and Architecture Wizard software

Larger Block RAM memories
FIFOs
DSP functions, arithmetic functions, MACCs, FIR filters, etc.
(see the High-Precision, High-Bandwidth table on page 50)
ChipSync Wizard
DDR
SERDES
Memory Interface Generator

Memory Controllers
Configure I/O for Memory Interface
Transition to CORE Generator Software System
Page 66
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide

Purpose

!
Describe the differences between LogiCORE and

AllianceCORE solutions
Identify two benefits of using cores in your designs
Create customized cores by using the CORE Generator

software system GUI
Instantiate cores into your schematic or HDL design
Run behavioral simulation on a design that contains cores
Time
20 minutes
Process
This module describes the basics of designing with the CORE

Generator software.
Lessons
!
Introduction
Overview
Using the CORE Generator Software

System
Summary
www.xilinx.com
1-877-XLX-CLAS
Page 67
Facilitator Guide
Introduction
Show Slide 76:
CORE Generator Software

System
Show Slide 77:
Objectives
Describe the differences between LogiCORE and AllianceCORE

solutions
Identify two benefits of using cores in your designs
Create customized cores by using the CORE Generator software
system GUI
Instantiate cores into your schematic or HDL design
Run behavioral simulation on a design that contains cores
CORE Generator Software System - 77
Page 68
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Overview
Show Slide 78:
Lessons
Overview
System
CORE Generator Software Design
Flows
Summary
Show Slide 79:
What Are Cores?
A core is a ready-made function that you can instantiate into your design
as a black box
Cores can range in complexity
Simple arithmetic operators, such as adders, accumulators, and multipliers

System-level building blocks, such as filters, transforms, and memories
Specialized functions, such as bus interfaces, controllers, and
microprocessors
Some cores can be customized
www.xilinx.com
1-877-XLX-CLAS
Page 69
Facilitator Guide
Overview
Key Points
!
Note: The terms function and core are sometimes used

interchangeably in this module to indicate a design entity
such as a multiplier or Finite Impulse Response (FIR) filter
that the CORE Generator software is able to create.
Intellectual Property (IP) is another term that is often used in

association with cores. Cores are one type of IP.
Show Slide 80:
Benefits of Using Cores
Save design time
Cores are created by expert designers who have in-depth knowledge of

Xilinx FPGA architecture
Guaranteed functionality saves time during simulation
Increase design performance
Cores that contain mapping and placement information have predictable

performance that is constant over device size and utilization
The data sheet for each core provides performance expectations
Use timing constraints to achieve maximum performance
Page 70
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Overview
Show Slide 81:
Types of Cores
LogiCORE solutions
Show Slide 82:
LogiCORE Solutions
Typically customizable
Fully tested, documented, and supported by Xilinx
Many are pre-placed for predictable timing
Many are unlicensed and provided for free with Xilinx software
More complex LogiCORE solution products are licensed
VHDL and Verilog flow support for several EDA tools

Schematic flow support for most cores
www.xilinx.com
1-877-XLX-CLAS
Page 71
Facilitator Guide
Overview
Show Slide 83:
AllianceCORE Solutions
Point-solution cores
Sold and supported by Xilinx AllianceCORE solution partners
Typically not customizable (some HDL versions are customizable)

Partners can be contacted directly to provide customized cores
A free evaluation version of the module is available
You will need to contact the IP Center for licensing and ordering information
All cores are optimized for Xilinx; some are pre-placed

Typically supplied as an Electronic Design Interchange Format (EDIF)
netlist
VHDL and Verilog flow support; some schematic support
Show Slide 84:
Sample Functions
LogiCORE solutions
DSP functions
Time skew buffers, Finite
Impulse Response (FIR)
filters, and correlators
Math functions
Accumulators, adders,
multipliers, integrators, and
square root
Memories
Pipelined delay elements,
single- and dual-port RAM
Synchronous FIFOs
PCI master and slave
interfaces, PCI bridge
Page 72
Peripherals
DMA controllers
Programmable interrupt
controllers
UARTs
Communications and
networking
ATM
Reed-Solomon encoders
and decoders
T1 framers
Standard bus interfaces
PCMCIA, USB
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide

Show Slide 85:
Lessons
Overview
System
Flows
Summary
Show Slide 86:
CORE Generator Software

System
A Graphical User Interface (GUI) allows central access to LogiCORE IP

products, as well as
Interfaces with design entry tools
Data sheets
Customizable parameters (available for some cores)
Creates graphical symbols for schematic-based designs
Creates instantiation templates for HDL-based designs
Web Links tab provides access to the Xilinx Website and the IP Center
The IP Center contains new cores to download and install
You always have access to the latest cores
www.xilinx.com
1-877-XLX-CLAS
Page 73
Facilitator Guide

Key Points
!
The CORE Generator software is a software application that

manages information about each available core, organizes cores
for easy browsing, and (for unlicensed LogiCORE solution
products) creates the actual files needed to integrate a core into
your design.
To view information about AllianceCORE products, visit the IP

Center on the Web at www.xilinx.com/ipcenter.
Show Slide 87:
Invoking the CORE

Generator System
From the Project

Navigator, select
Project New Source
Select IP
(CORE Generator &
Architecture Wizard)
and enter a filename
Click Next and then
select the type of core
Key Points
Page 74
To learn more about the Architecture Wizard, refer to the

Architecture Wizard and the Floorplan Editor REL module in
the Fundamentals of FPGA Design course.
If you are not using the Project Navigator, enter coregen at a

command prompt (UNIX shell or DOS box).
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide

TRAINER NOTE
Demo Instructions:
1. To open an existing project: Select File Open Project.
2. Browse to one of the lab project directories.
3. Select an ISE software file and click Open.
4. Follow the instructions in the slide above to open the CORE
Generator software.
5. Enter file name: test_core.
Show Slide 88:
Core Customize Window
version
information
Schematic
Symbol
(unused ports
grayed out)
Customizable
Parameters
spread over
several pages
Data sheet
access
TRAINER NOTE
Demo Instructions:
1. Enter parameters for the core you selected.
2. Click Next to show additional pages of parameters.
www.xilinx.com
1-877-XLX-CLAS
Page 75
Facilitator Guide

Show Slide 89:
Core Data Sheets

Performance
expectations (not shown)
Features
AlsoFunctionality and
Pinout (next page)
Resource utilization
TRAINER NOTE
Demo Instructions:
!
Page 76
In the customize GUI, click the View Data Sheet button.
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide

Show Slide 90:
Lessons
Overview
System
Flows
Summary
Show Slide 91:
Schematic Design Flow
Generate a core
Generate Core
.NGC
and
symbol
.xco
Instantiate the symbol onto your

schematic
When a schematic is added to

your design, a symbol is
automatically created
Creates an NGC file and
schematic symbol
Instantiate
Implement
Simulate
Treated as a black boxno

underlying schematic
Proceed with normal schematic

flow
www.xilinx.com
1-877-XLX-CLAS
Page 77
Facilitator Guide

Key Points
!
The XCO file is a log of the options used to create the core. You
can use this file to confirm that the correct options were used
during core generation. You can also use this file to create
another core with the same options. This file can also be used in
batch mode.
An NGC file is a Xilinx intermediate file for a core. It is merged

with the other netlists in your design during the Translate
phase. It is a Xilinx proprietary netlist format used to maintain
Xilinx IP.
Show Slide 92:
HDL Design Flow

compxlib.exe
XilinxCoreLib
Generate
Core
.xco
Instantiate
.VHO,
.VEO
.NGC
Core generation
and integration
Implement
Simulate
.VHD, .V
Compile library for

behavioral simulation
(one time only)
Key Points
Page 78
The next few slides describe each step in the HDL flow in more
detail.
The XCO file is a log of the options used to create the core. You
can use this file to confirm that the correct options were used
during core generation. You can also use this file to create
another core with the same options. This file can also be used in
batch mode.
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide

Key Points
!
In Project Navigator, the XCO file is automatically added to the

project.
In the Language Template, the instantiation templates will be

added. To see the templates, select Edit Language Templates
or click the Language Templates icon in the horizontal toolbar.
Show Slide 93:
HDL Design Flow

Compile Simulation Library
Before your first behavioral simulation, you must run compxlib.exe to

compile the XilinxCoreLib simulation library
Located in the $XILINX\bin\<platform> directory

Supports Mentor Graphics ModelSim and SpeedWave, Cadence
NC-Verilog, and Synopsys VCS and Scirocco simulation tools
If you download new or updated cores, additional simulation models will

be automatically extracted during installation
Key Points
!
If you are using a simulator that is not supported by compxlib

script, refer to the CORE Generator Guide and your simulator
documentation for information on how to compile the
XilinxCoreLib library.
www.xilinx.com
1-877-XLX-CLAS
Page 79
Facilitator Guide

Show Slide 94:
HDL Design Flow

Core Generation and Integration
Generate or purchase a core
Instantiate the core into your HDL source
Netlist file (NGC)

Instantiation template files (VHO or VEO)
Behavioral simulation wrapper files (VHD or V)
Cut and paste from the templates provided in the VEO or VHO file
The design is ready for synthesis and implementation

Use the wrapper files for behavioral simulation
The ISE software automatically uses wrapper files when cores are present
in the design
VHDL: Analyze the wrapper file for each core before analyzing the file that
instantiates the core
Key Points
Page 80
Instantiation template files provide a template with all of the

correct port declarations for the core.
Simply cut and paste the template into your source file, change
the instance name, if desired, and replace the dummy signal
names with your own signal names.
During synthesis, the core will be treated as a black box. During

the first stage of implementation, the Xilinx tools will read in
the EDIF file that was created by the CORE Generator software
system.
Many VHDL simulators require lower-level files to be analyzed

before the file that references them. Remember to analyze the
wrapper files for your cores before you analyze the file that
references them.
Most Verilog simulators do not have this order dependency.
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Summary
Show Slide 95:
Lessons
Overview
System
Flows
Summary
Show Slide 96:
1) What is the main difference between LogiCORE and AllianceCORE

solution products?
2) What is the purpose of compxlib.exe?
3) What is the difference between the VHO/VEO files and the VHD/V files
that are created by the CORE Generator software?
www.xilinx.com
1-877-XLX-CLAS
Page 81
Facilitator Guide
Summary
Show Slide 97:
Summary
A core is a ready-made function that you can insert into your design
LogiCORE solution products are sold and supported by Xilinx
AllianceCORE solution products are sold and supported by AllianceCORE
solution partners
Using cores can save design time and provide increased performance
Cores can be used in schematic or HDL design flows

!
Xilinx IP Center: www.xilinx.com/ipcenter

Software updates
Download new cores as they are released
Get core licensing help
IP evaluation
TRAINER NOTE
Demo Instructions:
1. Open a browser and go to www.xilinx.com/ipcenter.
2. Explore a few of the links on this page to see what is
available.
Page 82
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide

Answers
1) What is the main difference between LogiCORE and

AllianceCORE solution products?
!
LogiCORE solution products are sold and supported by Xilinx.
AllianceCORE solution products are sold and supported by

AllianceCORE solution partners.
2) What is the purpose of compxlib.exe?

!
compxlib.exe makes it easy to compile the XilinxCoreLib library

before your first behavioral simulation.
3) What is the difference between the VHO/VEO files and the

VHD/V files that are created by the CORE Generator software?
!
VHO/VEO files contain instantiation templates.
VHD/V files are wrappers for behavioral simulation that

reference the XilinxCoreLib library.
Transition to Lab 1: CORE Generator Software System
www.xilinx.com
1-877-XLX-CLAS
Page 83
Facilitator Guide
Lab 1: CORE Generator Software

System
Purpose
After completing this lab, you will be able to:

!
Create a custom memory component made of block RAM by

using the CORE Generator tool
Create a custom asynchronous FIFO by using the CORE

Generator tool
Time
30 minutes
Process
This lab illustrates how to build a block RAM memory with the
CORE Generator software.
General Flow
Page 84
Step 1: Build the block RAM memory
Step 2: Build the asynchronous FIFO
Step 3: Implement the design
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Lab
Designing for Performance Lab Workbook
!
Refer to the separate lab workbook for the CORE Generator

Software System lab.
TRAINER NOTE
Remind students that the lab workbook contains two versions of

each lab: a version with only general instructions and a version
that includes detailed steps following a general instruction. The
labs with only general instructions comprise the first section of the
lab workbook and the detailed versions comprise the second
section.
Transition to Designing Clock Resources
www.xilinx.com
1-877-XLX-CLAS
Page 85
Facilitator Guide

Purpose

!
Specify the resources available in the Clock Management Tile

(CMT)
Describe the basics of the PLL capabilities
Detail the clocking resources available in the Virtex-5 FPGA
Time
45 minutes
Process
This module describes how to design a complete FPGA clocking

scheme.
Lessons
Page 86
Introduction
Overview
Clock Networks
Summary
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Introduction
Show Slide 98:
Show Slide 99:
Objectives
Specify the resources available in the Clock Management Tile (CMT)

Describe the basics of the PLL capabilities
Detail the clocking resources available in the Virtex-5 FPGA
Designing Clock Resources - 99
www.xilinx.com
1-877-XLX-CLAS
Page 87
Facilitator Guide
Overview
Show Slide 100:
Lessons
Overview
Clock Networks
Summary
Show Slide 101:
Virtex-5 FPGA Delivers

Powerful Clock Management
Combination of digital and

analog technology
Optimized clocking resource

mix
Highest performance
Both DCMs and PLLs
PLL can accept an input

clock up to 710 MHz
More than 2x jitter filtering
Simple design creation through

cores
Page 88
PLL
Up to 550 MHz
DCM
Clock
Buffers
www.xilinx.com
1-877-XLX-CLAS
Select by:
Function
Component
Automatic
HDL code
Facilitator Guide
Overview
Show Slide 102:
Three Types of Clock

Resources
I/O Column
Global
Global
clocks
clocks
I/O
I/O
clocks
clocks
Clock
Clock region
region height:
height:
20
20 CLBs
CLBs
40
40 I/Os
I/Os (1
(1 bank)
bank)
Clock
Clock region
region width:
width:
One
One half
half the
the chip
chip
Global
Global
Muxes
Muxes
Regional
Regional
clocks
clocks
824
824 clock
clock regions
regions per
per
device
device
Performance matched to
application needs
710-MHz I/O Clocks
710
710-MHz
550-MHz Global Clocks
550
550-MHz
300-MHz Regional Clocks
300
300-MHz
www.xilinx.com
1-877-XLX-CLAS
Page 89
Facilitator Guide

Show Slide 103:
Lessons
Overview
Clock Networks
Summary
Show Slide 104:
Virtex-5 FPGA Clock

Management Tile
Up to six CMTs per device
DCM
Fifth-generation, all-digital technology

Provides the most clocking functions
Same functionality as in the Virtex-4 FPGA
PLL
Reduces internal clock jitter

Supports higher jitter on reference clock inputs
Replaces discrete PLLs and Voltage
Controlled Oscillators (VCOs)
PMCD removed
Functionality ported to PLL
Page 90
CMT
Each with two DCMs and one PLL

No external PWR/GND pins
Powerful combination of
flexibility and precision
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide

Key Points
!
Using the PLL to implement a PMCD implementation is not a

wise use of the PLL capabilities.
Show Slide 105:
Standard CMT Configurations

Use
Use each
each DCM
DCM
and
and PLL
PLL
individually
individually
InClk 1
DCM
InClk 2
PLL
InClk 3
DCM
DCM
InClk 1
PLL
Filter
Filter DCM
DCM
output
output clock
clock
jitter
jitter
InClk 1
To Global
Clocks
CMT
To Global
Clocks
Filter
Filter high
high clock
clock jitter
jitter
before
before reaching
reaching the
the
DCM
DCM
CMT
PLL
To Global
Clocks
DCM
CMT
Key Points
!
Dedicated connections exist between the DCM outputs and PLL

inputs, as well as from the PLL outputs to the DCM input.
Note that the grouping of the three elements in each Clock

Management Tile (CMT) is meaningful. Within each CMT,
DCMs and PLLs can be cascaded together through direct local
connections.
There are three options:

Option 1: All three clocking elements (two DCMs and one
PLL) can be used independently.
Option 2: The PLL can be used to filter high input clock jitter
before passing the clock to one or both DCMs for clock
generation functions.
www.xilinx.com
1-877-XLX-CLAS
Page 91
Facilitator Guide

Key Points
Option 3: The PLL can take a single DCM output clock and
create an ultra-low jitter version for global clock
distribution.
!
It is expected that the second option will be especially useful. In

the past, there have been cases where external PLLs needed to
be used on the PCB board in order to filter the jitter from a
noisy clock source before sending the clock into the FPGA. Now
that function can be pulled inside the FPGA, saving PCB board
space and cost.
Show Slide 106:
CMT General Use Model

Get the Best of Both Worlds
In Order To
Use
Remove clock insertion delay
DCM
Phase shift clocks
DCM
Correct clock duty cycles
DCM
Synthesize Fout = Fin * M/D
DCM or PLL*
Filter clock jitter
PLL
Switch between input clock sources dynamically
PLL
Implement the Virtex-4 FPGA PMCD function
PLL
* See the Virtex-5 FPGA data sheet to evaluate performance trade-offs between DCM and PLL usage
The Virtex-5 FPGA delivers advanced DCM and

Virtex
Virtex-5
PLL technology for superior clocking capability
Key Points
!
Page 92
The DCM provides finer resolution for phase shifting of

functions.
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide

Show Slide 107:
DCM Features
Functionally equivalent to the Virtex-4 FPGA
Operate from 19 MHz(1)550 MHz

DCM_BASE
CLKIN
CLKFB
Zero delay clock buffer

Synthesize FOUT = FIN * M/D
DRP address space is different
RST
CLKO
CLK90
CLK180
CLK270
CLK2X
CLK2X180
CLKDV
CLKFX
CLKFX180
LOCKED
CLKO
CLK90
CLK180
CLK270
Phase
CLK2X
Shift
CLK2X180
CLKDV
DRP
CLKFX
CLKFX180
LOCKED
RST
M, D values up to 32
Additional DCM_ADV features
DCM_ADV
CLKIN
CLKFB
Each DCM can be invoked with either the

DCM_BASE or DCM_ADV primitive
Dynamically phase shift clocks in

increments of period/256 or with direct delay line control
Use the Dynamic Reconfiguration Port (DRP) to adjust parameters without
reconfiguring
Note: As low as 1 MHz for some frequency synthesis

Key Points
!
DCM and DRP address mapping in the Virtex-5 FPGA has

changed from the Virtex-4 FPGA. Otherwise, functionality is
the same as in the Virtex-4 FPGA.
www.xilinx.com
1-877-XLX-CLAS
Page 93
Facilitator Guide

Show Slide 108:
PLL Features
PLL_ADV
Used as a frequency synthesizer and

jitter filter for either external or internal
clocks in conjunction with the DCMs
of the CMT
Operate from 19 MHz(1)550 MHz
Filter clock jitter
Synthesize Fout = Fin * M/(D*O)
Additional PLL_ADV features
CLKIN1
CLKOUT<5:0>
CLKFBOUT
CLKFBIN
RST
LOCKED
Each PLL can be invoked

with either the PLL_BASE
or PLL_ADV primitive
PLL input with >
400-ps jitter
M: 164, D: 152, O: 1128
RST
LOCKED
2x reduction in input clock jitter
Dynamically switch between clock

sources without global clock buffers
Cascade clocks to and from DCMs
Use the DRP to adjust parameters
without reconfiguring
CLKOUTDCM
CLKIN2
<5:0>
CLKINSEL
CLKFBDCM
REL
DRP
PLL_BASE
Inputs up to 710 MHz

VCO up to 1.1 GHz for more flexible
frequency synthesis
CLKIN1 CLKOUT<5:0>
CLKFBOUT
CLKFBIN
PLL output with <

100-ps jitter
PLL
Example measurement with a 400-MHz clock in a quiet XC5VLX30 device
Port existing PLL designs into the FPGA
Key Points
Page 94
Note: As low as 1 MHz for some frequency synthesis.
Clock switching: Assert reset, switch clocks, deassert reset.
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide

Show Slide 109:
PLL Primitives
PLL_ADV
CLKIN1 CLKOUT<5:0>
CLKFBOUT
CLKIN2
CLKFBIN
PLL_BASE
CLKIN1 CLKOUT<5:0>
CLKFBIN
CLKFBOUT
RST
CLKINSEL
REL
CLKOUTDCM
<5:0>
CLKFBDCM
DADDR(4:0]
DI(15:0)
DWE
DEN
DCLK
LOCKED
RST
DO(15:0)
DRDY
LOCKED
Key Points
!
CLKIN1/CLKIN2: Clock inputs to the PLL.
CLKFBIN: Feedback clock input to the PLL. PLL aligns the

CLKIN1/2 signal to the CLKFBIN signal.
CLKINSEL: Controls whether CLKIN1 or CLKIN2 is routed to

the PLL. Asynchronous switching; must hold PLL in reset
during switching.
CLKINSEL = 1 CLKIN1 selected
CLKINSEL = 0 CLKIN2 selected
RST: Asynchronous reset; must release to re-enable the PLL.
DADDR[4:0]: Address select signals for the Dynamic

Reconfiguration Port (DRP); allows dynamic reprogramming of
the PLL.
DI[15:0]: Data input to the DRP.
DWE: Write enable to the DRP.
DEN: Enable signal for the DRP.
DCLK: Clock signal for the DRP.

www.xilinx.com
1-877-XLX-CLAS
Page 95
Facilitator Guide

Key Points
!
CLKOUT[5:0]: Clock outputs from the PLL. Each one is

individually controllable, but all are based off the same VCO.
CLKFBOUT: PLL feedback output. This signal is used for

configuring how the PLL de-skews.
CLKOUTDCM[5:0]: Specially buffered version of the

CLKOUT[5:0] signals that can be used to connect to the DCM.
Otherwise, identical to CLKOUT[5:0].
CLKFBDCM: Feedback output used for de-skew when the PLL

and DCM are cascaded (PLL2DCM or DCM2PLL). CLKFBDCM
is the same as CLKFBOUT.
LOCKED: Indicates that the PLL has locked onto the reference
clock and is tracking the phase.
DO[15:0]: DRP output signals. This allows the stored PLL

configuration values to be read by an application.
DRDY: READY output indicating the DRP interface is ready for

the next sequence of reads or writes.
Show Slide 110:
PLL Basics
Lock Detect
Lock Monitor
CLKINSEL
CLKIN1
CLKIN2
PFD
CP
LOCKED
LF
VCO
M
CLKFBIN
FVCO = FIN * M / D
FOUT = FVCO / O = FIN * M / D / O
Page 96
www.xilinx.com
1-877-XLX-CLAS
8-phase
taps
O0
CLKOUT0
O1
CLKOUT1
O2
CLKOUT2
O3
CLKOUT3
O4
CLKOUT4
O5
CLKOUT5
Facilitator Guide

Key Points
!
The PLL will multiplex two clock input signals. The clock then
goes into the D counter which is used to divide down the input
clock. At the output of the VCO are eight clocks with differing
phases. All eight of these phase-shifted clocks can feed any of
the six outputs (O0O5).
Each output O can be further used to divide the output clock.

One of the output clocks is used as a feedback clock. One of the
phase-shifted clocks is used to detect alignment of rising edges
of the input clock with the VCO rising edges. This feedback
clock goes through an M counter, which can be used to
multiply the clock frequency.
The final output clock frequencies are determined by the

following calculation: FOUT = FIN * M / D / O where O = the
divide by value at the O0O5 output stage.
Eight phases: 0, 45, 90, 135, 180, 225, 270, 315
D: Programmable counter
PFD: Phase Frequency Detector compares both phase and
frequency of the input (reference) clock (from the D counter)
and the feedback clock (from the M counter). Only the rising
edges are considered because as long as a minimum
High/Low pulse is maintained, the duty cycle is not
important. The PFD is used to generate a signal proportional
to the phase and frequency between the two clocks. This
signal drives the Charge Pump (CP) and Loop Filter (LF) to
generate a reference voltage to the VCO. The PFD produces
an up or down signal to the CP and LF to determine
whether the VCO should operate at a higher or lower
frequency. When VCO operates at too high of a frequency,
the PFD activates a down signal, causing the control voltage
to be reduced and decreasing the VCO operating frequency.
When the VCO operates at too low of a frequency, an up
signal will increase voltage.
Loop filter: The loop determines the dynamic characteristics
of the PLL. The loop-filtered signal controls the VCO. The
loop filter is designed to match the characteristics required
by the application of the PLL in an FPGAprimarily a large
input clock bandwidth and the ability to track and maintain
lock.
www.xilinx.com
1-877-XLX-CLAS
Page 97
Facilitator Guide

Key Points
VCO: Voltage Controlled Oscillator. The VCO generates
eight output phases. Each output phase can be selected as
the reference clock to the output counters.
O: Output counter. Each of the six counters can be
independently programmed for generating up to six output
clocks, each using a different phase.
M: M counter, which controls the feedback clock of the PLL,
allowing a wide range of frequency synthesis.
Show Slide 111:
PLL Equations
FVCO
Calculating the VCO frequency
FVCO = FIN * M / D
For example
FOUT
FIN = 250 MHz, M = 4, D = 1

FVCO = 250 * 4 / 1 = 1000 MHz
FIN = 87MHz, M = 20, D = 3
FVCO = 87 * 20 / 3 = 580 MHz
Calculating FOUT
FOUT = FVCO / O = FIN * M / D / O

For example
FIN = 250 MHz, M = 4, D = 1, O = 2

As a general rule:
FOUT = 250 * 4 / 1 / 2 = 500 MHz
High FVCO equates to lower jitter and more power
Low FVCO equates to higher jitter but lower power
FIN = 87 MHz, M = 20, D = 3, O = 3
FOUT = 87 * 20 / 3 / 3 = 193.33 MHz
Key Points
Page 98
At the Phase Frequency Detector (PFD), FIN / D = FVCO / M.
All outputs operate off a common VCO frequency. This puts

constraints on what the output frequencies can be.
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide

Key Points
!
For example:
FIN = 100 MHz, M = 5, D = 1
FVCO = 100 * 5 / 1 = 500 MHz
Possible output clocks are 500 MHz (O = 1), 250 MHz (O =
2), 166.67 MHz (O = 3),
For example:
FIN = 100 MHz, M = 10, D = 1
FVCO = 100 * 10 / 1 = 1000 MHz
Possible output clocks are 1000 MHz (O = 1; too fast for the
clock networks), 500 MHz (O = 2), 333.33 MHz (O = 3),
As a general rule, run VCO frequency as high as possible.

Running the VCO at a higher frequency allows more
frequencies to be synthesized and results in lower jitter. The
tradeoff is increased power consumption.
Show Slide 112:
PLL Counter Attributes
Want FPFD as high as possible
Want FVCO as high as possible
Make D as small as possible
Better jitter performance

Higher power
Larger range of output frequencies available
Caveat
Making M too large can increase jitter
More characterization is required
Determining M & D values*
DMIN = FIN/FPDFMAX
DMAX = FIN/FPDFMIN
MMIN = FVCOMIN/FIN
MMAX = (DMAX * FVCOMAX)/FIN
MIDEAL = (DMIN * FVCOMAX)/FIN
Counter
Counterattributes
attributes
OODivide:
Divide:CLKOUT[0:5]_DIVIDE
CLKOUT[0:5]_DIVIDE=={1128}
{1128}
DDDivide:
Divide:DIVCLK_DIVIDE
DIVCLK_DIVIDE=={152}
{152}
MMMultiply:
Multiply:CLKFBOUT_MULT
CLKFBOUT_MULT=={164}
{164}
*Relevant minimum and maximum numbers are shown in the Key Points section
www.xilinx.com
1-877-XLX-CLAS
Page 99
Facilitator Guide

Key Points
Page 100
More information on attributes can be found in the Appendix.

For the most up-to-date information, go to www.xilinx.com and
refer to the Virtex-5 FPGA data sheet or the Virtex-5 FPGA User
Guide.
FINMIN = 19 MHz
FINMAX = 710 MHz
FPFDMIN = 19 MHz
FPFDMAX = 550 MHz (in 3, 500 in 2, 450 in 1)
FVCOMIN = 400 MHz
FVCOMAX = 1.1 GHz
O, D, and M attributes are all integer values.
Smallest D counter value: DMIN = int (roundup(FIN/FPDFMAX))
Largest D counter value: DMAX = int (rounddown (FIN/FPDFMIN))
Smallest M counter value: MMIN =

int (roundup(FVCOMIN/(FIN/DMIN)))
Largest M counter value: MMAX =

int (rounddown(FVCOMAX/(FIN/DMAX)))
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide

Show Slide 113:
PLL Attributes
Phase shift
Not all values are available; see the Key Points section for more
information
The higher the VCO frequency, the more options that are available
Directly related to having a higher CLKOUT[0:5]_DIVIDE number
CLKOUT[0:5]_PHASE = {0.0360.0}
Phase shifts that are always possible
0, 45, 90, 135, 180, 225, 270, 315
Duty cycle
The higher the VCO frequency, the more options that are available
Directly related to having a higher CLKOUT[0:5]_DIVIDE number
CLKOUT[0:5]_DUTY_CYCLE = {0.010.99}
Default
= 0.5
*Relevant
minimum
and maximum numbers are shown in the Key Points section
Key Points
!
The phase shift value is specified as a real valuerepresenting

the degrees of phase shift.
The duty cycle value is specified as a real valuerepresenting a

percentage duty cycle.
Phase shift calculation:

!
Phase shift step size:

phase_step = 360 / (8 * CLKOUTn_DIVIDE)
Maximum phase shift:

If CLKOUTn_DIVIDE <=64, phase_shift_max = 360
If CLKOUTn_DIVIDE > 64, phase_shift_max =
(64 / CLKOUTn_DIVIDE) * 360 + 7 * phase_step
www.xilinx.com
1-877-XLX-CLAS
Page 101
Facilitator Guide

Key Points
Duty cycle calculation:

!
Minimum duty cycle:

If CLKOUTn_DIVIDE > 64, min_duty_cycle =
(CLKOUTn_DIVIDE 64 / CLKOUTn_DIVIDE)
If CLKOUTn_DIVIDE <= 64, min_duty_cycle =
(1 / CLKOUTn_DIVIDE)
Duty cycle step = (0.5 / CLKOUTn_DIVIDE)
Maximum duty cycle:

If CLKOUTn_DIVIDE > 64, max_duty_cycle =
(64.5 / CLKOUTn_DIVIDE)
If CLKOUTn_DIVIDE <= 64, max_duty_cycle =
((CLKOUTn_DIVIDE 0.5) / CLKOUTn_DIVIDE)
Show Slide 114:
1) Given
You need the PLL to do the following
Input clock frequency = 133 MHz, targeting a Virtex-5 LX50 3 FPGA

Output clocks
266 MHz, 0 degrees phase shift, 50 percent duty cycle

Specify the optimal settings for the PLL
DIVCLK_DIVIDE =
CLKFBOUT_MULT =
CLKOUT1_PHASE =
CLKOUT2_PHASE =
CLKOUT3_PHASE =
Page 102
CLKOUT1_DUTY_CYCLE =
www.xilinx.com
1-877-XLX-CLAS
CLKOUT1_DIVIDE =
CLKOUT2_DIVIDE =
CLKOUT3_DIVIDE =
Facilitator Guide

!
FINMIN = 19 MHz
FINMAX = 710 MHz
FPDFMIN = 19 MHz
FPDFMAX = 550 MHz (in 3, 500 in 2, 450 in 1)
FVCOMIN = 400 MHz
FVCOMAX = 1.1 GHz
*********** Workspace ***********
DMIN = FIN/FPDFMAX =
DMAX = FIN/FPDFMIN =
MMIN = FVCOMIN/FIN =
MMAX = (DMAX * FVCOMAX)/FIN =
MIDEAL = (DMIN * FVCOMAX)/FIN =
www.xilinx.com
1-877-XLX-CLAS
Page 103
Facilitator Guide

Show Slide 115:
PLL Use Example

Frequency Synthesizer and Jitter Filter
IBUFG
BUFG
CLKIN 1
CLKOUT 0
CLKOUT 1
CLKFBIN
CLKOUT 2
RST
CLKOUT 4
This path could

come from
BUFG
CLKOUT 3
CLKOUT 5
CLKFBOUT
LOCKED
Nothing in this
feedback path keys the
software that
INTERNAL feedback is
desired
Use: Used when maintaining the phase relationship between the input
and output clocks is not required
PLL attribute
Compensation = Internal
Key Points
Page 104
Used when the input clock and output clock do not need to
have any phase relationship; that is, if the PLL is used strictly as
a frequency synthesizer or jitter filter.
The COMPENSATION attribute specifies the PLL phase

compensation for the incoming clock. For example, the
SYSTEM_SYNCHRONOUS setting attempts to compensate all
clock delay for 0 hold time. SOURCE_SYNCHRONOUS is used
when a clock is aligned with data.
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide

Show Slide 116:
PLL Use Example

Clock Network De-Skew
IBUFG
BUFG
CLKIN 1
CLKFBIN
RST
To logic
CLKOUT 0
CLKOUT 1
CLKOUT 2
CLKOUT 3
CLKOUT 4
CLKOUT 5
CLKFBOUT
LOCKED
BUFG
This line could

be used to
clock logic
Use: Used when maintaining the phase relationship between the input
and output clocks is desired
PLL attribute
Compensation = Source Synchronous or System Synchronous
Key Points
!
There are two reasons that clock de-skew from a PLL requires
two global clock buffers:
1. The CLKFBOUT feedback path should match the delay on the

CLKOUT path. The global clock networks are balanced clock
networkseach inserts an equal amount of delay.
2. The CLKFBOUT should be used to provide the feedback clock
signal due to this restriction: both input frequencies to the PFD
block must be identical. That is, the CLKIN1 input frequency/D
is equal to the CLKFBIN frequency. For example, FIN/D = FFB
= FVCO/M. Therefore, CLKFBOUT provides a frequency equal
to FIN/D. In most cases, the CLKOUTn signals will not match
the frequency of the FIN/D going into the PFD block of the
PLL.
www.xilinx.com
1-877-XLX-CLAS
Page 105
Facilitator Guide

Show Slide 117:
PLL Use Example

Zero Delay Buffer
IBUFG
Inside FPGA
CLKIN 1
CLKFBIN
IBUFG
RST
BUFG
OBUF
BUFG
OBUF
CLKOUT 0
CLKOUT 1
CLKOUT 2
CLKOUT 3
CLKOUT 4
CLKOUT 5
Route outside the

part; there will be
a maximum delay
that can be
introduced
CLKFBOUT
LOCKED
Use: Used to create an external clock buffer (clock mirror) when maintaining the
phase relationship between the input and external output clock is desired
PLL attribute
Compensation = External
Key Points
!
Page 106
The delay line on the CLKFBOUT trace should match the delay
on the trace for the CLKOUT0 path; that is, the edges should be
aligned.
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide

Show Slide 118:
PLL Use Example

DCM2PLL
BUFG
BUFG
IBUFG
CLKIN
CLK 0
CLKIN 1
CLKOUT 0
CLK 90
CLKIN 2
CLKOUT 1
CLKFBIN
CLK 180
CLKFBIN
CLKOUT 2
RST
CLK 270
CLK 2 X
CLK 2X 180
DCM
RST
CLKOUT 3
CLKINSEL
CLKOUT 4
DADDR [ 4: 0]
CLKOUT 5
CLKDV
DI[ 15: 0]
CLKFX
DWE
CLKOUTDCM 0
CLKFX 180
DEN
CLKOUTDCM 1
DCLK
CLKOUTDCM 2
REL
CLKOUTDCM 3
LOCKED
PLL Attribute
Compensation = DCM2PLL
CLKFBOUT
CLKOUTDCM 4
Feedback path
CANNOT include both the
DCM and PLL
To Logic
CLKOUTDCM 5
CLKFBDCM
DCM LOCKs first

the PLL LOCKs
LOCKED
PLL
DO[ 15: 0]
DRDY
Key Points
!
You must burn a BUFG to eliminate delay in DCM. PLL will

not automatically compensate for this. The feedback path
cannot include both the DCM and PLL.
Use: This example combines DCM frequency synthesis (CLK90

or CLKDV, for example, could also be used) and possibly DPS
capabilities (for greater phase shift resolution than PLL
provides) and then utilizes the PLL for jitter filtering. CLK0 can
be used in the design with the delay compensation. There is one
dedicated connection from the DCM to the PLL within the same
CMT tile. If more than one is used, you should use a BUFG to
route them from the DCM to the PLL.
This is just one possible examplethere are many

configurations of using a DCM to drive the PLL.
www.xilinx.com
1-877-XLX-CLAS
Page 107
Facilitator Guide

Show Slide 119:
PLL Use Example

PLL2DCM
BUFG
IBUFG
CLKIN1
CLKOUT0
CLKIN
CLK 0
CLKIN2
CLKOUT1
CLKFBIN
CLKOUT2
CLKFBIN
CLK 180
RST
CLK 270
RST
CLKOUT3
CLKINSEL
CLKOUT4
CLK2X
DADDR[4:0]
CLKOUT5
CLK 2X 180
DI[ 15:0]
To logic
CLK 90
CLKFBOUT
CLKDV
DWE
CLKOUTDCM0
CLKFX
DEN
CLKOUTDCM1
CLKFX 180
DCLK
CLKOUTDCM2
REL
CLKOUTDCM3
DCM
LOCKED
CLKOUTDCM4
CLKOUTDCM5
CLKFBDCM
PLL Attribute
Compensation = PLL2DCM
LOCKED
DO[ 15:0]
PLL
DRDY
Use the PLL to filter reference clock jitter

before going to the DCM
Key Points
!
Page 108
In this example, the PLL is used to filter clock jitter prior to

forwarding it to the DCM.
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Clock Networks
Show Slide 120:
Lessons
Overview
Clock Networks
Summary
Show Slide 121:
Virtex-5 FPGA Clock

Regions and I/O Banks
Four diff or singleended BUFIOs
All clock regions are 20
CLBs tall versus 16 in
the Virtex-4 FPGA
Clock regions
match I/O banks
40 I/Os per bank and
clock region
Clock regions
span one half the die
4 RCLKs per region
2 BUFRs per region
10 GCLKs per region
www.xilinx.com
1-877-XLX-CLAS
Page 109
Facilitator Guide
Clock Networks
Key Points
!
There are four BUFIOs per clock region. BUFIOs can no longer
span regions, which is the reason for the increase in the number
per region. They are still implemented differentially.
There are two BUFRs per clock region. However, there are now
four regional clock tracks, allowing the BUFRs in vertically
adjacent regions to drive the other two or all four.
Clock regions are slightly larger, but now also match the I/O
banks. The I/O banks in the Virtex-4 FPGA crossed two clock
regions.
You now have access to 10 global clocks per region (versus

eight in the Virtex-4 FPGA).
Show Slide 122:
Virtex-5 FPGA Global

Clocking
10
10 global
global clocks
clocks per
per
region
region (full
(full crossbar)
crossbar)
CMT
CMT
CMT
CMT
CMT
CMT
Global
Global
Muxes
Muxes
IBUFGs
IBUFGs
Global resources
for all devices
20 global clock inputs
32 global clock multiplexers
2 or 6 CMTs
IBUFGs
IBUFGs
CMT
CMT
CMT
CMT
CMT
CMT
Key Points
!
Page 110
BUFGCTRL is the same as in the Virtex-4 FPGA.
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Clock Networks
Show Slide 123:
Global Clocking Features

Global Clock Inputs
(IBUFG or IBUFGDS)
Flexibility
20 total
Performance
Global Clock Multiplexers

(BUFGCTRL)
20 differential (40 pins) or

20 single-ended (20 pins)
32 total
Optional clock enable
Guaranteed glitch-less
switching
Also use as an asynchronous
multiplexer
Span two I/O banks
Up to 550 MHz
Differential for maximum performance
High fanout (access to all clock loads in the FPGA)
Low skew
Short clock insertion delay
Key Points
!
In the Virtex-4 FPGA, GCLK pins spanned four I/O banks. In

the Virtex-5 FPGA, they only span two banksa lower and
upper bank.
www.xilinx.com
1-877-XLX-CLAS
Page 111
Facilitator Guide
Clock Networks
Show Slide 124:
Virtex-5 FPGA I/O Clocking

I/O Column
Per
Per region:
region:
Four
Four clock-capable
clock-capable I/Os
I/Os
Four
Four I/O
I/O clock
clock buffers
buffers
Four
Four I/O
I/O clock
clock nets
nets
BUFIOs
cannot drive
drive IOCLK
IOCLK
BUFIOs cannot
track
track in
in adjacent
adjacent region
region
Clock-Capable I/O
I/O Clock Buffer (BUFIO)
I/O Clock Net (IOCLK)
Ideal for sourcesourcesynchronous

interfaces
Key Points
!
Page 112
The four BUFIOs in each clock region can no longer drive an

IOCLK net in vertically adjacent regions. This was done to
allow the IOCLK track to run up to 710 MHz internally.
(Crossing into another region caused too much delay.)
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Clock Networks
Show Slide 125:
Virtex-5 FPGA Regional

Clocking
Per
Per region:
region:
Four
Four clock-capable
clock-capable I/Os
I/Os
Two
Two regional
regional clock buffers
buffers
Four
Four regional
regional clock
clock nets
nets
2
2
Clock-capable I/O
2
2
Regional Clock Buffer (BUFR)
Regional Clock Net (RCLK)
2
2
Easily create many clock

domains per FPGA
4
2
Key Points
!
The two regional clock buffers can be used to drive any of the
four regional clock nets in an adjacent region. This approach
allows more flexibility for regional clocks than in the Virtex-4
FPGA, which had only two regional clock buffers (BUFR) and
two regional clock tracks (RCLK) per clock region.
300-MHz performance is achieved in the highest speed grade.
BUFR or regional clocking supports divide by 1, 2, 3, 4, 5, 6, 7,

or 8 to support ISERDES.
www.xilinx.com
1-877-XLX-CLAS
Page 113
Facilitator Guide
Clock Networks
Show Slide 126:
I/O and Regional Clocking

Features
ClockClock-Capable I/Os
Flexibility
Exist in all I/O columns

Four CCIOs per region
Four differential (8 pins) or

Four single-ended (4 pins)
Adjacent to HCLK row
Two CCIOs above and

Two CCIOs below
I/O Clocks
(BUFIO IOCLK)
Exist in all I/O
columns
Four BUFIO
drivers per
region
Four IOCLKs per
region
Span single
region
Performance
710-MHz differential
710-MHz
differential
Regional Clocks
(BUFR RCLK)
Exist in non-center
I/O columns
Two BUFR drivers
per region
Four RCLKs per
region
Span up to three
regions (one above
and below)
Clock divider range
from 1 to 8
300 MHz
Show Slide 127:
Use
PLL and DCMs must be instantiated
Direct primitive instantiation
IP (CORE Generator & Architecture Wizard)
Attributes placed in the netlist; can include global clock buffers
BUFGs can be inferred
Xilinx suggests that you instantiate all clock resources
Place attributes in the UCF or HDL
Direct primitive instantiation

Place attributes in the UCF or HDL
IP (CORE Generator Tool & Architecture Wizard)
Attributes placed in the netlist
BUFIOs and BUFRs must be instantiated
Currently, there is no support for regional clocking resources in the

Architecture Wizard
Page 114
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Clock Networks
Key Points
!
In the future, creating and customizing regional clocking

resources will be possible.
Show Slide 128:
Clock Wizard
Choose
Choose function
function
Optimal
Optimal DCM/PLL
DCM/PLL flow
flow
automatically
automatically selected
selected
- or -
Choose
Choose component
component
Program
Program as
as desired
desired
www.xilinx.com
1-877-XLX-CLAS
Page 115
Facilitator Guide
Clock Networks
Hidden Slide 129:
Select Your Options

Xilinx Clocking Wizard
Use
Use the
the GUI
GUI to
to
instantiate
instantiate and
and
program
program your
your
clocking
clocking
components
components
Wizard generates
ready-toready
to-use VHDL
ready-to-use
or Verilog
Key Points
!
The Xilinx Clocking Wizard automates setting of attributes for

the primitive.
TRAINER NOTE
This slide (hidden in the PowerPoint presentation) is a screenshot

of the Clocking Wizard.
Page 116
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Clock Networks
Hidden Slide 130:
BUFR and BUFIO

Instantiation
VHDL
VHDL
Library
LibraryUNISIM;
UNISIM;
use
useUNISIM.vcomponents.all;
UNISIM.vcomponents.all;
. .. .. .
component
componentBUFIO
BUFIO
port
I I : :ininstd_logic;
port( (
std_logic;
OO: :out
outstd_logic);
std_logic);
component
BUFR
component BUFR
generic
BUFR_DIVIDE
generic( (
BUFR_DIVIDE: :string);
string);
port
I I : :ininstd_logic;
port( (
std_logic;
CE
CE : :ininstd_logic;
std_logic;
CLR
CLR: :ininstd_logic;
std_logic;
OO : :out
outstd_logic);
std_logic);
. .. .. .
BUFIO_inst
:
BUFIO
BUFIO_inst : BUFIO
port
I I=>
portmap
map( (
=>input_clk,
input_clk,
OO=>
=>clk_bufio);
clk_bufio);
BUFR_inst
BUFR_inst: :BUFR
BUFR
generic
genericmap
map(BUFR_DIVIDE
(BUFR_DIVIDE=>
=>BYPASS)
BYPASS)
port
I I=>
portmap
map( (
=>clk_bufio,
clk_bufio,
CE
=>
clk_enable,
CE => clk_enable,
CLR
CLR=>
=>async_rst,
async_rst,
OO=>
=>clk_bufr);
clk_bufr);
Verilog
Verilog
BUFIO
BUFIObufio_inst
bufio_inst
(.I(input_clk),
(.I(input_clk),
.O(clk_bufio));
.O(clk_bufio));
BUFR
BUFRbufr_inst
bufr_inst
(.I(clk_bufio),
(.I(clk_bufio),
.CE(clock_enable),
.CE(clock_enable),
.CLR(async_rst),
.CLR(async_rst),
.O(clk_bufr));
.O(clk_bufr));
////"BYPASS",
"BYPASS","1",
"1","2",
"2","3",
"3","4",
"4","5",
"5","6",
"6","7",
"7","8"
"8"
defparam
defparambufr_inst.BUFR_DIVIDE
bufr_inst.BUFR_DIVIDE=="BYPASS";
"BYPASS";
TRAINER NOTE
This slide (hidden in the PowerPoint presentation) shows HDL

examples of clock buffer instantiation.
www.xilinx.com
1-877-XLX-CLAS
Page 117
Facilitator Guide
Summary
Show Slide 131:
Lessons
Overview
Clock Networks
Summary
Show Slide 132:
2) Compare the following resources in the Virtex-5 FPGA to the Virtex-4

FPGA
Clock region size

Global clock inputs
Number of global clock buffers
Number of global clock buffers per region
Clock-capable inputs per clock region
BUFIO buffers per region
I/O clock nets per region
I/O clock region span
BUFR buffers per region
Regional clock (RCLK) nets per region
Regional clock span
Page 118
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Summary
Show Slide 133:
3) To perform the following, which should you use: the DCM or PLL?

Phase shift clocks
Synthesize frequency
Filter clock jitter
Switch between input clocks dynamically
Implement Virtex-4 FPGA PMCD functionality
Show Slide 134:
Summary
The new Clock Management Tile (CMT) includes two DCMs and one PLL
The new PLL includes filter jittering and frequency synthesis capabilities
Clock region = 20 CLBs, 40 IOBs, and 1 I/O bank
Twenty global input clock buffers (differential)
Thirty-two global clock buffers (differential)
Ten global clocks per region
Four BUFIOs per region (differential); BUFIO cannot drive into adjacent
regions
Two BUFRs per region; can drive into adjacent regions
Four regional clock tracks per region
www.xilinx.com
1-877-XLX-CLAS
Page 119
Facilitator Guide
Summary
!
Virtex-5 FPGA data sheets
Virtex-5 FPGA user guides

Virtex-5 FPGA User Guide
Virtex-5 FPGA XtremeDSP Design Considerations User Guide
Virtex-5 FPGA Configuration User Guide
Virtex-5 FPGA Packaging and Pinout Specification
Virtex-5 FPGA home page

www.xilinx.com/virtex5
Links to everything related to the Virtex-5 FPGA: white
papers, boards, training, data sheets, and user guides
Page 120
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide

Answers
1) Given:
!
Input clock frequency = 133 MHz, targeting a Virtex-5 LX50 3

FPGA
You need the PLL to do the following:

!
Output clocks
Specify the optimal settings for the PLL:

!
DIVCLK_DIVIDE = 1
CLKFBOUT_MULT = 8
CLKOUT1_PHASE = 0.0
CLKOUT1_DUTY_CYCLE = 0.5
CLKOUT1_DIVIDE = 4
CLKOUT2_PHASE = 45.0
CLKOUT2_DIVIDE = 4
CLKOUT3_PHASE = 90.0
CLKOUT3_DIVIDE = 16
www.xilinx.com
1-877-XLX-CLAS
Page 121
Facilitator Guide

!
FINMIN = 19 MHz
FINMAX = 710 MHz
FPDFMIN = 19 MHz
FPDFMAX = 550 MHz (in 3, 500 in 2, 450 in 1)
FVCOMIN = 400 MHz
FVCOMAX = 1.1 GHz
*********** Workspace ***********
DMIN = FIN/FPDFMAX = 133/550 = .241; minimum D value is 1
DMAX = FIN/FPDFMIN = 133/19 = 7
MMIN = FVCOMIN/FIN = 400/133 = 3
MMAX = (DMAX * FVCOMAX)/FIN = (7 * 1100)/133 = 57.9, truncated to 57
MIDEAL = (DMIN * FVCOMAX)/FIN =(1 * 1100)/133 = 8.27, truncated to 8
Page 122
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide

Answers
2) Compare the following resources in the Virtex-5 FPGA to the Virtex-4 FPGA.
Virtex-4 FPGA
Virtex-5 FPGA
Clock Region Size
16 CLBs, 32 I/Os ( I/O

bank)
20 CLBs, 40 I/Os (full I/O

bank)
Global Clock Inputs
Up to 32 differential (64
pins)
or 32 single-ended (32
pins)
Up to 20 differential (40
pins)
or 20 single-ended (20
pins)
Global Clock Buffers
32
32
Global Clock Buffers per

Region
10
Clock-Capable Inputs
per Region
BUFIO Buffers per

Region
I/O Clock Nets per

Region
3 regions (1 above and

below)
1 region
BUFR Buffers per

Region
Regional Clock Nets per

Region

below)

below)
BUFIO Clock Region

Span
Regional Clock Span

text
www.xilinx.com
1-877-XLX-CLAS
Page 123
Facilitator Guide

Answers
3) To perform the following, which should you use: the DCM or PLL?
In Order To
Use
DCM
Phase shift clocks
DCM
DCM
DCM or PLL*
Synthesize FOUT = FIN * M/D

Filter clock jitter
PLL
Switch between input clock sources dynamically
PLL
Implement Virtex-4 FPGA PMCD function
PLL
The DCM provides finer resolution for phase shifting of functions.

* See the Virtex-5 FPGA data sheet to evaluate performance trade-offs between
DCM and PLL usage.
Transition to Lab 2: Designing Clock Resources
Page 124
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide

Purpose

!
Customize the DCM components by using the Clocking Wizard
Connect the global clock buffers to the DCM outputs by using

the Clocking Wizard
Time
40 minutes
Process
This lab illustrates how to build a multiple clock system with the
ISE Architecture Wizard tool.
General Flow
!
Step 1: Create the DCM_divider_V5 core
Step 2: Create the DCM_divide_and_phase_shift_V5 core
Step 3: Implement the design
www.xilinx.com
1-877-XLX-CLAS
Page 125
Facilitator Guide
Lab
!
Refer to the separate lab workbook for the Designing Clock

Resources lab.
Transition to FPGA Design Techniques
Page 126
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide

Purpose

!
Increase design performance by duplicating flip-flops
Increase design performance by adding pipeline stages
Increase board performance by using I/O flip-flops
Build reliable synchronization circuits
Time
40 minutes
Process
This module describes how to build a reliable and fast FPGA

design.
Lessons
!
Introduction
Pipelining
I/O Flip-Flops
Summary
www.xilinx.com
1-877-XLX-CLAS
Page 127
Facilitator Guide
Introduction
Show Slide 135:
Show Slide 136:
Objectives
Increase design performance by duplicating flip-flops

Increase design performance by adding pipeline stages
Increase board performance by using I/O flip-flops
Build reliable synchronization circuits
FPGA Design Techniques - 136
Page 128
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Show Slide 137:
Lessons
Pipelining
I/O Flip-Flops
Summary
Show Slide 138:
High-fanout nets can be slow and

hard to route
Duplicating flip-flops can fix both
problems
Reduced fanout shortens net

delays
Each flip-flop can fanout to a
different physical region of the chip
to reduce routing congestion
Design trade-offs
Gain routability and performance

Increase design area
Increase fanout of other nets
fn1
fn1
fn1
www.xilinx.com
1-877-XLX-CLAS
Page 129
Facilitator Guide
Show Slide 139:
Example
The source flip-flop drives two

register banks that are constrained
to different regions of the chip
The source flip-flop and pad are
not constrained
PERIOD = 5 ns timing constraint
Implemented with default options
Longest path = 6.806 ns
Fails to meet timing constraint
Key Points
!
Page 130
In this simple design, the source flip-flop is trapped between

the two sets of loads. Moving the source flip-flop closer to one
register moves it farther away from the other register. The
overall result is that timing cannot be met.
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Show Slide 140:
Example
The source flip-flop has been

duplicated
Each flip-flop drives a region of
the chip
Each flip-flop can be placed

closer to the register that it is
driving
Shorter routing delays
Longest path = 4.666 ns
Meets timing constraint
Key Points
!
By duplicating the source flip-flop, the tools are able to move

each flip-flop closer to its set of loads.
The trade-off is that the paths from the input pad to the
duplicated flip-flops are increased. This design does not contain
an OFFSET IN constraint. If you have an OFFSET IN
requirement, you must consider how much slack you have on
the OFFSET before deciding to duplicate the flip-flop.
You can also consider duplicating the input pad that is

connected to the flip-flops. This method allows the
implementation tools to keep the input setup time short, while
improving the internal clock frequency. The trade-off is that an
additional I/O pin is used, and you must route the external
signal to two I/O pins.
www.xilinx.com
1-877-XLX-CLAS
Page 131
Facilitator Guide
Show Slide 141:
Tips on Duplicating Flip-Flops
Name duplicated flip-flops _a, _b; NOT _1, _2
Numbered flip-flops are mapped into the same slice by default

Duplicated flip-flops should be separated
Most synthesis tools have automatic fanout-control features
However, they do not always pick the best division of loads

Also, duplicated flip-flops will be named _1, _2
Many synthesis tools will optimize-out duplicated flip-flops
Especially if the loads are spread across the chip
Explicitly create duplicate flip-flops in your HDL code
Set your synthesis tool to keep redundant logic
Do not duplicate flip-flops that are sourced by asynchronous signals
Synchronize the signal first

Feed the synchronized signal to multiple flip-flops
Key Points
Page 132
If duplicated flip-flops are named numerically (for example,

signal_rep0 and signal_rep1), the implementation tools see this
as a bus, and the flip-flops are mapped into the same slice. If
this happens, routing congestion will still be a problem.
You can work around this problem by using the timing-driven

packing option, which is covered in the Advanced
Implementation Options module.
Synchronize the signal first: Synchronization circuits will be

discussed later in this module.
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Pipelining
Show Slide 142:
Lessons
Pipelining
I/O Flip-Flops
Summary
Show Slide 143:
Pipelining Concept
fMAX =
n MHz
fMAX
2n MHz
two logic levels
one
level
one
level
www.xilinx.com
1-877-XLX-CLAS
Page 133
Facilitator Guide
Pipelining
Key Points
!
Inserting flip-flops into a datapath is called pipelining.
Pipelining increases performance by reducing the number of

logic levels (LUTs) between flip-flops.
All Xilinx FPGA device families support pipelining. The basic

slice structure is a logic level (four-input LUT) followed by a
flip-flop.
Adding a pipeline stage, as shown in this example, will not

exactly double fMAX. The flip-flop that is added to the circuit
has an input setup time and a clock-to-Q time that make the
pipelined circuit run at less than double the original frequency.
You will see a more detailed example of increasing

performance by pipelining later in this lesson.
Show Slide 144:
Pipelining Considerations
Are enough flip-flops available?
Are there multiple logic levels between flip-flops?
Refer to the Synthesis or MAP Report

In general, you will not run out of flip-flops (except for the Virtex-5 FPGA)
If there is only one logic level between flip-flops, pipelining will not improve
performance
Refer to the Post-Map Static Timing Report or Post-Place & Route Static
Timing Report
Can the system tolerate latency?
Key Points
!
Page 134
Available flip-flops: The Design Summary section of the MAP

Report contains resource utilization information. In most cases,
you will have enough flip-flops available to add pipeline stages.
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Pipelining
Key Points
!
Logic levels: Timing reports show which paths are the longest
and how many logic levels are in each path. Look at the
detailed path analysis section of the report and count the
number of look-up table delays (Tilo) to determine the number
of logic levels in the path.
Show Slide 145:
Latency in Pipelines
Each pipeline stage

adds one clock cycle
of delay before the
first output will be
available
Also called filling

the pipeline
After the pipeline is

filled, a new output
is available every
clock cycle
Key Points
!
This is an example of a pipelined circuit. Follow the data as it

goes through the multiplication and then the addition.
Latency in pipelines can be visualized as a factory assembly

line. Raw materials enter the assembly line and then go through
several stations. At each station, one production step is
performed. When you start the assembly line, you have to wait
before the first finished product is produced. After that waiting
period, a constant stream of products comes off the assembly
line.
www.xilinx.com
1-877-XLX-CLAS
Page 135
Facilitator Guide
Pipelining
Show Slide 146:
Pipelining Example
Original circuit
Two logic levels between SOURCE_FFS and DEST_FF

fMAX = ~233 MHz
LUT
D
LUT
LUT
SOURCE_FFS
DEST_FF
LUT
Key Points
!
This path has two logic levels between SOURCE_FFS and

DEST_FF. The first logic level is the column of three LUTs in
parallel. The second logic level is the single LUT on the right.
The delays in this path are:

SOURCE_FFS clock-to-Q
Net delay
LUT delay
Net delay
LUT delay
DEST_FF setup time
Page 136
Estimating no clock skew and routing delays to be

approximately 1.5 ns each, this circuit could run at about 233
MHz in a Virtex-5 device (slowest speed grade).
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Pipelining
Show Slide 147:
Pipelining Example
Pipelined circuit
One logic level between each set of flip-flops

fMAX = ~385 MHz
LUT
LUT
LUT
LUT
SOURCE_FFS
DEST_FF
PIPE_FFS
Key Points
!
After adding a pipeline stage, the circuit has been split into two
paths. The first path is from SOURCE_FFS through one logic
level to PIPE_FFS. The second path is from PIPE_FFS through
one logic level to DEST_FF.
The delays in either path are:

Starting FF clock-to-Q
Net delay
LUT delay
Ending FF setup time
Estimating each routing delay to be approximately 1.5 ns, as in

the original circuit, this circuit could run at about 385 MHz (a 65
percent increase).
www.xilinx.com
1-877-XLX-CLAS
Page 137
Facilitator Guide
Pipelining
Show Slide 148:
1) Given the original circuit, what is wrong with the pipelined circuit?
2) How can the problem be corrected?
Key Points
!
This simple circuit is a multiplexed adder where the SELECT

signal determines whether the output is (A + B) or (C + D).
The designer decides to add a pipeline stage to the circuit to

improve performance, but something is not quite right. What is
the correct way to pipeline this circuit?
Original Circuit
Page 138
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Pipelining
Pipelined Circuit
Show Slide 149:
Answers
1) What is wrong with the

pipelined circuit?
Latency mismatch
Older data is mixed with
newer data
Circuit output is incorrect
2) How can the problem be

corrected?
Add a flip-flop on SELECT

All data inputs now experience the same amount of latency
www.xilinx.com
1-877-XLX-CLAS
Page 139
Facilitator Guide
Pipelining
Answer
Page 140
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
I/O Flip-Flops
Show Slide 150:
Lessons
Pipelining
I/O Flip-Flops
Summary
Show Slide 151:
I/O Flip-Flop Overview
Each IOB tile in the Virtex-5 FPGA contains flip-flops
Located in the ILOGIC and OLOGIC blocks

Single data rate or double data rate support
SERDES support
I/O flip-flops provide guaranteed setup, hold, and clock-to-out times when
the clock signal comes from a BUFG
www.xilinx.com
1-877-XLX-CLAS
Page 141
Facilitator Guide
I/O Flip-Flops
Key Points
!
Spartan-3 FPGA I/O blocks contain two registers on the

input, output, and output 3-state enable to support single and
double data rate.
Show Slide 152:
Accessing I/O Flip-Flops
During synthesis
Timing-driven synthesis can force flip-flops into Input/Output Blocks (IOBs)

Some tools support attributes or synthesis directives to mark flip-flops for
placement in an IOB
Xilinx Constraint Editor
Select the Misc tab and specify registers that should be placed into IOBs
You need to know the instance name for each register
During the MAP phase of implementation

In the Map Properties dialog box, the Pack I/O Registers/Latches into IOBs
option is selected by default
Timing-driven packing will also move registers into IOBs for critical paths
Check the MAP Report to confirm that IOB flip-flops have been used
IOB Properties section
Key Points
!
Refer to your synthesis tool documentation for details on how

to access IOB flip-flops.
The Constraints Editor Misc tab is covered in the Path-Specific

Timing Constraints module.
Timing-driven packing is covered in the Advanced

Implementation Options module.
The ChipSync Wizard also accesses I/O flip-flops.
The following I/O flip-flop resources must be instantiated:

IDDR flip-flops using SAME_EDGE or
SAME_EDGE_PIPELINED mode
ODDR flip-flops
ISERDES and OSERDES components
Page 142
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Show Slide 153:
Lessons
Pipelining
I/O Flip-Flops
Summary
Show Slide 154:
What is a synchronization circuit?
Why do you need synchronization circuits?
Captures an asynchronous input signal and outputs it on a clock edge

To prevent setup and hold time violations
To ensure a more reliable design
When do you need synchronization circuits?
Signals cross between unrelated clock domains
Chip inputs that are asynchronous
Between related clock domains, relative PERIOD constraints are sufficient
www.xilinx.com
1-877-XLX-CLAS
Page 143
Facilitator Guide
Key Points
!
For clock domains that have a clearly defined and constant

phase relationship, proper timing constraints ensure that there
are no setup or hold time violations. For more information on
constraining paths between clock domains, refer to the PathSpecific Timing Constraints module.
Show Slide 155:
Setup and Hold

Time Violations
Violations occur when

the flip-flop input changes
too close to a clock edge
Three possible results
Flip-flop clocks in an old

data value
Flip-flop clocks in a new
data value
Flip-flop output becomes
metastable
Page 144
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Show Slide 156:
Metastability
Flip-flop output enters a transitory state
Neither a valid 0 nor a valid 1
Remains in this state for an unpredictable length of time before settling to a

valid 0 or 1
Due to a statistical nature, the occurrence of metastable events can only

be reduced, not eliminated
Mean Time Between Failure (MTBF) is exponentially related to the length
of time the flip-flop is given to recover
Can be interpreted as 0 by some loads and as 1 by others
A few extra ns of recovery time can dramatically reduce the chances of a

metastable event
The circuits shown in this section allow maximum time for metastable
recovery
Key Points
!
When a signal is at a metastable value, it can be interpreted as a

logic 0 by some parts of the circuit and as a logic 1 by other
parts. This inconsistency will never be shown during
simulation and can be very hard to track down during boardlevel testing.
When the flip-flop leaves the metastable state, it can go to either

a logic 0 or 1. There is no known correct state when the flipflop input changes so close to the clock edge. Still, having an
incorrect value propagating through your circuit is better
than having a metastable value.
<recovery time> = <time before the data is used> - <datapath

delay>
Example: If CLK1 has a period of 50 ns, and the datapath delay

is 45 ns, then the recovery time of FF1 is 50 to 45 = 5 ns.
www.xilinx.com
1-877-XLX-CLAS
Page 145
Facilitator Guide
Metastability
FF
FF
CLK1
Show Slide 157:
Synchronization Circuit 1
Use when input pulses will always be at least one clock period wide
The extra flip-flops guard against metastability
Guards against metastability
Asynchronous input
FF1
Synchronized signal
FF2
CLK
Key Points
Page 146
This circuit is a simple 2-bit shift register.
The recovery time for FF1 is: <CLK period> <datapath delay>
<datapath delay> = <FF1 CLK-to-Q> + net delay + <FF2 setup>
If the flip-flops are placed in the same slice, the net will use a
fast-feedback routing connection to give FF1 the maximum
possible recovery time.
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Show Slide 158:
Use when input pulses may be less than one clock period wide
FF1 captures short pulses

VCC
Guards against metastability

D
FF1
FF2
Synchronized signal
FF3
Asynchronous input
CLR
CLK
Key Points
!
To obtain this circuit, add a flip-flop that is clocked by the

asynchronous input to the front of Synchronization Circuit 1
and an AND gate.
FF1 is a flip-flop with asynchronous clear.
The AND gate prevents FF1 from being reset if the input to the
circuit is still HIGH. This allows for long input pulses as well as
short ones. If multiple short pulses occur on the input within a
space of three clock cycles, only the first pulse will be seen by
this circuit. This is always a danger when passing data from a
fast clock domain into a slower clock domain.
FF2 and FF3 act in the same way as FF1 and FF2 in
Synchronization Circuit 1.
To avoid using this circuit, which uses the input as a clock

signal (probably not on a global buffer), you can use a faster
clock signal to synchronize inputs (allowing you to use
Synchronization Circuit 1). You can then use a divided version
of the same clock signal in the rest of the design.
www.xilinx.com
1-877-XLX-CLAS
Page 147
Facilitator Guide
Key Points
!
Because the clocks have a fixed-phase relationship, you will not

need to resynchronize the signals as they cross between the
clock domains; however, you will need to use timing
constraints to prevent setup or hold violations.
Use the CLK2X output of a DLL or the CLKFX output of a DCM

to get a faster clock signal for synchronizing inputs.
Show Slide 159:
Capturing a Bus
Leading edge detector
Input pulses must be at least one CLK period wide
Asynchronous
input CLK
One-shot enable
D
D Q
FF1
FF2
D QQ
CE
CLK
n bit
bus
Synchronized
bus inputs
Sync_Reg
D QQ
Key Points
Page 148
First, the data bus is registered by the asynchronous clock.

Second, the one-shot enable (synchronized to CLK) signals to
the internal circuit that data is captured (via CE).
Note: Now, there is a level of logic between the one-shot enable

generator and the synchronization register. If FF1 becomes
metastable, it has less time to recover before its output is used
to enable Sync_Reg.
Recovery time = <CLK period> <datapath delay>
<datapath delay> = <FF1 CLK-to-Q> + net delay + <LUT

delay> + <Sync_Reg setup>
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Key Points
!
A Falling Edge detector can be designed a couple of different

ways:
1. Invert asynchronous input Clk.
2. Change the AND gate to have bubble on the top instead of on

the bottom.
Show Slide 160:
Capturing a Bus
Leading edge detector
Input pulses may be less than one CLK period wide

VCC
One-shot enable
D
Asynchronous
Input CLK
CLK
FF1
FF2
FF3
CLR
D
CE
n bit
bus
D Q
Synchronized
bus inputs
Sync_Reg
Key Points
!
First, the data bus is registered by the asynchronous clock.

Second, the one-shot enable (synchronized to CLK) signals to
the internal circuit that data is captured (via CE).
Note: Now, there is a level of logic between the one-shot enable

generator and the synchronization register. If FF2 becomes
metastable, it has less time to recover before its output is used
to enable Sync_Reg.
Recovery time = <CLK period> <datapath delay>
<datapath delay> = <FF1 CLK-to-Q> + net delay + <LUT

delay> + <Sync_Reg setup>
www.xilinx.com
1-877-XLX-CLAS
Page 149
Facilitator Guide
Key Points
!
A Falling Edge detector can be designed a couple of different

ways:
1. Invert asynchronous input Clk.
2. Change the one-shot enable AND gate to have bubble on the

top instead of on the bottom, remove Reset AND gate bubble,
and, finally, change VCC to GND.
Show Slide 161:
Use a FIFO to cross domains
Key Points
!
Page 150
You must still synchronize FIFO status flags (FULL,

ALMOST_FULL, EMPTY, ALMOST_EMPTY) based on read
and write operations, synchronized to read and write clocks,
respectively. In general, this can be accomplished by
synchronizing the slower enable (read/write) to the faster clock
by using Synchronization Circuit 1. If synchronizing to the
slower clock, use Synchronization Circuit 2.
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Summary
Show Slide 162:
Lessons
Pipelining
I/O Flip-Flops
Summary
Show Slide 163:
3) High fanout is one reason to duplicate a flip-flop. What is another

reason?
4) Provide an example of when you do not need to resynchronize a signal

that crosses between clock domains
5) What is the purpose of the extra flip-flop in the synchronization

circuits shown in this module?
www.xilinx.com
1-877-XLX-CLAS
Page 151
Facilitator Guide
Summary
Show Slide 164:
Summary
You can increase circuit performance by
Some trade-offs
Duplicating flip-flops
Adding pipeline stages
Using I/O flip-flops
Duplicating flip-flops increases circuit area
Pipelining introduces latency and increases circuit area
Synchronization circuits increase reliability

!
User Guides: www.xilinx.com Documentation Doc Type

User Guides
Switching Characteristics
Detailed Functional Description Input/Output Blocks
(IOBs)
Application notes: www.xilinx.com Documentation Doc

Type Application Notes
Application Note XAPP094: Metastability Recovery
Application Note XAPP225: Data-to-Clock Phase Alignment
Page 152
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide

Answers
1) What is wrong with the pipelined circuit?

!
Latency mismatch
Older data is mixed with newer data
Circuit output is incorrect
2) How can the problem be corrected?

!
Add a flip-flop on SELECT
All data inputs now experience the same amount of latency
3) High fanout is one reason to duplicate a flip-flop. What is

another reason?
!
Loads are divided among multiple locations on the chip
4) Provide an example of when you do not need to resynchronize a

signal that crosses between clock domains.
!
Well-defined phase relationship between the clocks
Example: Clocks are the same frequency, 180 degrees out of

phase
Use related PERIOD constraints to ensure that datapaths will

meet timing
5) What is the purpose of the extra flip-flop in the

synchronization circuits shown in this module?
!
To allow the first flip-flop time to recover from metastability
Transition to Synthesis Techniques
www.xilinx.com
1-877-XLX-CLAS
Page 153
Facilitator Guide
Purpose

!
Specify Xilinx resources that need to be instantiated for various

FPGA synthesis tools
Identify synthesis tool options that can be used to increase

performance
Describe an approach to using your synthesis tool to obtain

higher performance
Time
40 minutes
Process
This module describes how to synthesize a fast and efficient FPGA

design by using the advanced capabilities of the synthesis tools.
Lessons
Page 154
Introduction
Synthesis Options
Summary
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Introduction
Show Slide 165:
Show Slide 166:
Objectives
Specify Xilinx resources that need to be instantiated for various FPGA

synthesis tools
Identify synthesis tool options that can be used to increase performance
Describe an approach to using your synthesis tool to obtain higher
performance
Synthesis Techniques - 166
www.xilinx.com
1-877-XLX-CLAS
Page 155
Facilitator Guide
Introduction
Show Slide 167:
Recommended REL Modules
Three recorded e-Learning modules are available for you to improve your
HDL coding style
Design guidelines (good design practices)

Best ways to pipeline your design
Finite State Machine design
Coding for hardware resources

SRL, multiplexers, carry logic
Coding to reduce your design size
Managing your control signals (sets, resets, clocks, clock enables)
Block RAM
Show Slide 168:
Recommended REL Modules
Three recorded e-Learning modules are available for you to improve your
HDL coding style
Managing your control sets

Control signal recommendations
How to build a fast and efficient Virtex-5 FPGA design
How to migrate an older design to a Virtex-5 FPGA
All of these RELs are available at no charge at

www.xilinx.com/support/training/free-courses.htm
Page 156
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Introduction
Show Slide 169:
Timing Closure
Timing Closure
www.xilinx.com
1-877-XLX-CLAS
Page 157
Facilitator Guide

Show Slide 170:
Lessons

Synthesis Options
Summary
Show Slide 171:
Breakthrough Performance
Three steps to achieve breakthrough performance

1. Utilize embedded (dedicated) resources
Performance by construction
DSP48, FIFO, block RAM, ISERDES, OSERDES,
PowerPC processor, EMAC, and MGT, for example
2. Write code for performance
Use synchronous design methodology

Ensure the code is written optimally for critical paths
Pipeline
Xilinx FPGAs have abundant registers: one register per LUT
3. Drive your synthesis and Place & Route tools
Try different optimization techniques

Add critical timing constraints in synthesis
Preserve hierarchy
Apply full and correct constraints
Use High effort
Page 158
www.xilinx.com
1-877-XLX-CLAS
Virtex
-4 FPGA
Virtex
Virtex-4
Performance Meter
Facilitator Guide

Key Points
!
Applying full and correct constraints refers to applying

constraints for all clocks in the design. Additionally, false paths
and multicycle paths should be correctly constrained, as should
the I/O.
The timing closure flow chart was created to help achieve

breakthrough performance.
Show Slide 172:
500-MHz Fabric Guidelines
I3
I2
I1
I0
I3
I2
I1
I0
SET
CE
D
Q
RST
One Level of Logic Only
SET
CE
D
Q
RST
For the fabric to achieve maximum performance, note the following

important considerations
1. Do not exceed more than one level of logic. That is why the registers are
there
2. Carry chains should not exceed 14* before being registered
3. You may need placement constraints to keep functions together
www.xilinx.com
1-877-XLX-CLAS
Page 159
Facilitator Guide

Show Slide 173:
Use Embedded Blocks
Embedded block timing is correct by construction
Offers as much as 3x the performance

of soft implementations
Examples
Not dependent on programmable routing
FIFO at 500 MHz

DSP slices at 500 MHz
PowerPC processor at up to
550 MHz
XtremeDSP Solution
Slice
Smart RAM FIFO

PowerPC Processor
Show Slide 174:
Simple Coding Steps Yield

3x Performance
Use pipeline stagesmore bandwidth

Use synchronous resetbetter system control
Use Finite State Machine (FSM) optimizations
Use inferable resources
Multiplexer
Shift Register LUT (SRL)
Block RAM, LUT RAM
Cascade DSP
Avoid high-level constructs (loops, for example) in code
Many synthesis tools produce slow implementations
See the Synthesis and Simulation Design Guide:

Guide:
Help Software Manuals Synthesis and Simulation Design Guide
Page 160
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide

Key Points
!
These are just the most obvious suggestions. For every design,
there may be more tricks or other clever things that can
improve performance.
Pipelining is the one thing that helps the most, and for most
systems today, pipelining is always an option because
bandwidth is what defines the system, not the latency. Latency
can be important, but if it is, it is usually the latency in a
different order of magnitude than the one that is caused by
pipelining.
FPGAs have lots of registers, so re-timing and clever use of

arithmetic functions can yield tremendous performance. If
designers need to balance the latency among different paths in
the system, the SRLs can be used to compensate efficiently for
delay differences.
Show Slide 175:
Synthesis Guidelines
Use timing constraints
Define tight but realistic individual clock constraints

Put unrelated clocks into different clock groups
Use these synthesis options to start (they dont always work best on every
design)
Turn off resource sharing

Move flip-flops from IOBs closer to logic
Turn on FSM optimization
Use the retiming option
Key Points
!
Resource sharing is a technique used by synthesis tools to

decrease circuit area, usually resulting in lower performance.
www.xilinx.com
1-877-XLX-CLAS
Page 161
Facilitator Guide

Key Points
!
The decision to move flip-flops into and out of IOBs can also be
made by the MAP process during implementation, if timingdriven packing is used. This option will be discussed in the
Advanced Implementation Options module at the end of this
course.
Show Slide 176:
Synplicity Example
Use constraints
Synplify and Synplify Pro software
stop optimizing when the constraints
are met
Use SCOPE to enter all timing constraints
Define real, individual clock
constraints
If the clocks are unrelated, always
put them into different clock groups
Using the global frequency field can
deteriorate results
(*) Synplicitys data
Key Points
!
Page 162
As shown on the graph on the right, the green line (estimated

performance) and the red line (actual performance) are very
close to each other, which means that the synthesis tools do a
fairly good job estimating the performance. The important
thing to note is that the performance increases significantly
when the right set of constraints is used (in this example ~55
percent of the maximum circuit performance). Keep in mind
that these are only the synthesis constraints. There is no change
in the code.
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide

Key Points
!
The XST and Precision software tools likewise have similar

results when clock constraints are applied. For XST software,
period and input/output delays can be specified via the XST
Constraints File (XCF). For Precision software, this information
can be specified individually on each clock via the Design
Hierarchy window, by right-clicking the Clocks folder and
selecting Set Clock Constraints.
For example, with a design that has two clocks (one a lowfrequency clock and other a high-frequency clock), the logic for
each domain will be optimized to meet the constraint. For a
low-frequency clock, the logic can be optimized for area
saving resources, while the logic of the high-frequency clock
domain can increase the area to meet the constraint. It is very
important that this information is provided to the XST,
Synplify, or Precision software, as all are constraint-driven
tools.
Show Slide 177:
Impact of Constraints
Non-timing-constrained designs can be optimized for area rather than

performance
LUT
LUT
LUT
LUT
LUT
LUT
LUT
LUT
LUT
LUT
LUT
Non-Timing Driven
Total LUTs: 5
Clock Freq: 423.7 MHz
Timing Driven
(Bigger but Faster!!!)
Total LUTs: 6
Clock Freq: 591.7 MHz (+ 40%)
www.xilinx.com
1-877-XLX-CLAS
Page 163
Facilitator Guide

Key Points
!
This example shows what happens when constraints are used

properly. If there is no performance requirement, the tools
generate a design that is as small as possible. The same is true
when there are several solutions that all meet the requirements;
the smallest implementation will be used.
Non-Timing Driven
Timing Driven
Page 164
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide

Show Slide 178:
Place & Route Guidelines
Timing constraints
Recommended options
Use tight, realistic constraints
By default, effort is set to Standard
Timing-driven MAP
Xplorer
Tools to help meet timing
Using the correct Place &

Route options can have a
dramatic impact on design
performance
High-effort Place & Route
Floorplanning
(Use the PACE and PlanAhead software tools)
Physical synthesis tools
Other available options
Incremental design
Modular design flows
Show Slide 179:
Impact of Constraints in
Tools
Reed-Solomon design from www.opencores.org 2.1
Performance
1.6
1.4
1.0
No constraints;
Standard effort
No constraints
in synthesis;
Place & Route
with High effort
and constraint
Constraints in
synthesis
and Place &
Route (High
effort)
Constraints in
synthesis and Place
& Route; retiming
in synthesis;
High effort in PAR
www.xilinx.com
1-877-XLX-CLAS
Page 165
Facilitator Guide
Synthesis Options
Show Slide 180:
Lessons

Synthesis Options
Summary
Show Slide 181:
Synthesis Options
There are many synthesis options that can help you obtain your
performance and area objectives
FSM extraction
Retiming
Register duplication
Hierarchy management
Resource sharing
Physical optimization
Page 166
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Synthesis Options
Show Slide 182:
Timing-Driven Synthesis
Synplify, Precision, and XST software

Timing-driven synthesis uses performance objectives to drive the
optimization of the design
Based on your performance objectives, the tools will try several algorithms
to attempt to meet performance while keeping the amount of resources in
mind
Performance objectives are provided to the synthesis tool via timing
constraints
Key Points
!
Synplify software: Communicate constraints via SCOPE.
Precision software: Communicate constraints by entering them

in a constraint file (SDC file) or by entering them individually
for each clock from the hierarchy window.
XST software: Communicate constraints via the XCF. For more

information, see the XST User Guide in the online software
documents (Help Software Manuals XST User Guide).
www.xilinx.com
1-877-XLX-CLAS
Page 167
Facilitator Guide
Synthesis Options
Show Slide 183:
Timing Constraints Editor
Synplify and Precision software

The timing constraints editor allows you to apply timing constraints for
your tool
These constraints will be used to drive synthesis optimization (for those

tools that use constraint-driven synthesis)
These constraints will also be passed (by default) on to the Xilinx
implementation tools via a Netlist Constraints File (NCF)
XST constraints
Communicated via the XCF
See the XST User Guide in Software Manuals: Help XST User Guide
XST Design Constraints
Key Points
!
Page 168
XST constraints are entered into a text file. For more

information on timing and non-timing XST constraints, see the
XST User Guide.
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Synthesis Options
Show Slide 184:
FSM Extraction

Finite State Machine (FSM) extraction optimizes your state machine by
re-encoding and optimizing your design based on the number of states
and inputs
By default, the tools will use FSM extraction
Safe state machines
By default, the synthesis tools will remove all decoding for illegal states
Must be turned on to use safe FSM implementation
Even if you include VHDL when others or Verilog default cases

See Notes for more information
Key Points
!
For more information on the specifics of how your synthesis

tool will re-encode your FSM, see the user guide provided by
each vendor.
To change FSM settings from the tool run standalone:

Precision: Tools Set Options, check Use Safe FSM
Synplify: In Scope, specify syn_encoding <state_registers> =
safe
To change FSM settings from the ISE software:

XST: Synthesize XST HDL Options: Safe Implementation
= Yes.
Synplify: syn_encoding = safe in the Scope constraint file.
Include the file from the Synthesis Properties menu
Precision: Synthesis Properties Input Options: Use Safe
FSM = checked.
www.xilinx.com
1-877-XLX-CLAS
Page 169
Facilitator Guide
Synthesis Options
Show Slide 185:
Retiming

Retiming: The synthesis tool automatically tries to move register stages to
balance combinatorial delay on each side of the registers
Before Retiming
D
After Retiming
D
Key Points
!
To access retiming:
Synplify software: Enable under Implementation Options or
the Retiming option in the Run window in the Synplify Pro
software (Synplify Options Configure VHDL or
Verilog Compiler).
Precision software: Check the box in the Setup Design
dialog box.
XST: Enable under the Properties dialog box for Synthesize
XST Xilinx Specific Options Register balancing.
Page 170
Retiming results will be design dependent. In some situations,

retiming may not provide any benefit (highly pipelined
designs); however, it may improve performance for some
designs.
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Synthesis Options
Show Slide 186:
Register Duplication

Register duplication is used to reduce fanout on registers (to improve
delays)
Xilinx recommends manual register duplication
Most synthesis vendors create signals <signal_name>_rep0, _rep1, etc.
Implementation tools pack logic with related names into the same slice, which
can prohibit a register from being moved closer to its destination
When manually duplicating registers, do not use a number at the end
Use synthesis options to prevent duplicate registers from being re-merged
Example: <signal_name>_0dup, <signal_name>_1dup
Key Points
!
Register duplication of the output 3-state register is used so that

the IOB 3-state register can be moved inside the IOB (to reduce
clk-to-output delays). Note that for the 3-state register to be
placed in the IOB, its fanout must be one.
www.xilinx.com
1-877-XLX-CLAS
Page 171
Facilitator Guide
Synthesis Options
Show Slide 187:
Hierarchy Management

The basic settings are
Flatten the design: Allows total combinatorial optimization across all

boundaries (XST default)
Maintain hierarchy: Preserves hierarchy without allowing optimization of
combinatorial logic across boundaries (Xilinx recommended)
If you have followed the synchronous design guidelines, use the setting
-maintain hierarchy
If you have not followed the synchronous design guidelines, use the
setting -flatten the design
Your synthesis tool may have additional settings
Refer to your synthesis documentation for details on these settings
Key Points
!
To access hierarchy control:

Synplify software: SCOPE Constraints Editor
Synplify also has an additional setting: Maintain hierarchy
but allow optimization. This setting allows combinatorial
logic to be optimized while maintaining hierarchy in the
netlist (setting in Synplify is firm).
Precision software: After compiling the design, right-click
Modules in the Design Hierarchy window and select
Preserve Hierarchy or Flatten Hierarchy.
XST: Synthesize XST Synthesis Option Keep
Hierarchy. Note that the default is NO.
Page 172
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Synthesis Options
Show Slide 188:
Hierarchy Preservation
Benefits
Easily locate problems in the code based on the hierarchical instance

names contained within static timing analysis reports
Enables floorplanning and incremental design flow
The primary advantage of flattening is to optimize combinatorial logic
across hierarchical boundaries
If the outputs of leaf-level blocks are registered, there is generally no need

to flatten
However, preserving hierarchy can limit register retiming (balancing) and

register duplication
Key Points
!
Registering outputs of each leaf-level block is part of the

synchronous design techniques methodology. Registering the
output boundaries helps because you know the delays from one
block to the next. That is, the delays are not variable based on
combinatorial outputs. Logic cannot be optimized across a
registered boundary. Therefore, if you do register outputs, you
know the delay is minimized from one hierarchical or
functional block to the next and you also know that no logic
optimization can occur across hierarchical domains.
In addition to the benefits listed above, preserving hierarchy

has the added benefit of limiting name changes to registers
thus, the element names used in a UCF will generally not
change. If you flatten the design, the register and element
names and hierarchical path and references in a flattened
design can change from one iteration to the next. In this case,
maintaining the UCF can be quite a burden.
However, preserving hierarchy can prevent register balancing

(retiming) and register duplication. Nevertheless, the benefits of
preserving hierarchy generally outweigh the benefits of
flattening except when you have combinatorial outputs.
www.xilinx.com
1-877-XLX-CLAS
Page 173
Facilitator Guide
Synthesis Options
Key Points
!
And in general, preserve hierarchy for large designs. For

smaller designs, preserve the hierarchy if you registered leaflevel outputs; otherwise, you might consider flattening the
design. If you flatten the design, remember the extra burdens of
name changes (UCF and static timing analysis) from one
iteration to the next and the limits on floorplanning.
Show Slide 189:
Schematic Viewers

Allows you to view synthesis results graphically
Check the number of logic levels between flip-flops

Locate net and instance names quickly
View the design as generic RTL or technology-specific components
Works best when hierarchy has been preserved during synthesis
Page 174
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Synthesis Options
Show Slide 190:
Cross-Probing
Cross-probing: Synplify and Precision software
From the Timing Analyzer, click a reported worst-case path and that path
will be highlighted in the synthesis schematic viewer
Cross-probe to the code
Review the code to determine whether or not it can be rewritten to improve

performance
Apply timing constraints in your synthesis tool to optimize this path better
You may need to set some environment variables for this to work
For more information, see Application Note XAPP406: Cross-Probing to

Synplicity and Exemplar
Key Points
!
To find a particular application note, it is easier to just search

for the application note number. For a list of all application
notes, go to www.xilinx.com Documentation Doc Type
Application Notes See all Application Notes.
www.xilinx.com
1-877-XLX-CLAS
Page 175
Facilitator Guide
Synthesis Options
Show Slide 191:
Physical Optimization
Synplicity Amplify FPGA Physical Optimizer or Mentor Precision Physical

software (add-on tools)
Based on the critical paths in the design, the tools will attempt to optimize
and physically locate the associated logic closely together to minimize the
routing delays
Essentially, this is a way to provide critical path information to the
synthesis tool so that it can attempt to optimize those paths further
Page 176
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide

Show Slide 192:
Lessons

Synthesis Options
Summary
Show Slide 193:
New XST Switches
LUT combining (Virtex-5 FPGA only)
Recall that the 6-input LUT is actually

two 5-input LUTs
This allows XST to map to this

configuration
Area can save LUTs (can be significant)

Area tries to balance size with speed
Reduce control sets (Virtex-5 FPGA only)
XST will assign synchronous set/reset and

CEs to LUT inputs
Low fanout controls signals assigned this
way can reduce the number of control sets,
which improves device utilization
Can be controlled with HDL coding style
www.xilinx.com
1-877-XLX-CLAS
Page 177
Facilitator Guide

Show Slide 194:
New XST Features
Inference of SRL for shift register with set/reset
XST uses SRL resources if the HDL description contains a single

asynchronous, synchronous set, or synchronous reset signal
This will require extra logic, because SRL does not support a set or reset
functionality
Inference is done if the shift register has at least four stages
Page 178
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Summary
Show Slide 195:
Lessons

Synthesis Options
Summary
Show Slide 196:
1) List a few of the options in the synthesis tools that help you increase
performance
2) What is the approach presented here for obtaining breakthrough

performance?
www.xilinx.com
1-877-XLX-CLAS
Page 179
Facilitator Guide
Summary
Show Slide 197:
Summary
Your HDL coding style can affect synthesis results

Infer resources whenever possible
Most resources are inferable, either directly or with an attribute
If you cannot infer the resource you need, instantiate it
Take advantage of the synthesis options provided to help you meet your
timing objectives
Use synchronous design techniques and timing-driven synthesis to
achieve higher performance

!
Synthesis & Simulation Design Guide , XST User Guide, and

Constraints Guide
Help Software Manuals
User guides
www.xilinx.com Documentation Doc Type User
Guides
Virtex-5 FPGA data sheets and user guides

www.xilinx.com Documentation Devices FPGA
Device Family Virtex-5
Page 180
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide

Answers
1) List a few of the options in the synthesis tools that help you
increase performance.
!
FSM extraction
Retiming
Register duplication
Physical optimization
2) What is the approach presented here for obtaining breakthrough

performance?
Three steps to achieve breakthrough performance:
1. Utilize embedded (dedicated) resources.
!
Performance by construction
DSP48, FIFO, Block RAM, ISERDES, OSERDES, PowerPC

processor, EMAC, and MGT, for example
2. Write code for performance.

!
Pipeline
Xilinx FPGAs have abundant registers: one register per LUT
3. Drive your synthesis and Place & Route tools.

!
Apply full and correct timing constraints
Utilize optional settings
Use High effort
Transition to Lab 3: Synthesis Techniques
www.xilinx.com
1-877-XLX-CLAS
Page 181
Facilitator Guide

Purpose

!
Access synthesis options for the targeted software
Read the Synthesis Report in the targeted software to find

performance estimates
Modify the synthesis timing constraints and, in the case of XST,

access and modify the contents of the Xilinx Constraints File
(XCF) for synthesis
Time
30 minutes
Process
This lab illustrates how to synthesize a design by taking advantage

of some of the advanced synthesis options available in the newest
synthesis tools.
General Flow
Page 182
Step 1: Review the existing design
Step 2: Apply a PERIOD constraint
Step 3: Apply with a tighter PERIOD constraint
Step 4: Apply with an even tighter PERIOD constraint
Step 5: Apply with yet an even tighter PERIOD constraint
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Lab
!
Refer to the separate lab workbook for the Synthesis

Techniques lab.
Transition to Day One Summary
www.xilinx.com
1-877-XLX-CLAS
Page 183
Day One Summary
Facilitator Guide
Day One Summary

Purpose

Time
10 minutes
Process

Lessons
!
Page 184
Day One Summary
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Day One Summary
Day One Summary

Show Slide 198:
Day One Summary
Show Slide 199:
Day One Review
Describe the flow for obtaining your performance objectives

Describe the available clocking resources in the Virtex-5 FPGA
Explain the architectural features of the Virtex-5 FPGA
Name one technique for building synchronization circuits that can provide
maximum recovery time
Describe a few of the synthesis options that can boost performance
Describe the synthesis approach that helps obtain higher performance
How can CORE Generator software system cores improve
performance?
Day One Summary - 199
www.xilinx.com
1-877-XLX-CLAS
Page 185
Day One Summary
Facilitator Guide
Day One Summary

Show Slide 200:
Timing Closure
Show Slide 201:
Day One Review Answers
Describe the available clocking resources in the Virtex-5 FPGA

CMT
32 global clock buffers

Provides glitch-free switching among clocks
Drives differential global clock trees
Any 10 global clocks can access any clock region
BUFIO
Provides a large range of multiply and divide-by values

Filters clock jitter
BUFGCTRL
DLL: Eliminates clock skew

DFS: Generates new clock frequencies
DPS: Phase shifts a clock signal
4 BUFIOs per region

Drives I/O and BUFRs
BUFR
2 BUFRs per region

4 regional clock tracks
Page 186
10
10
10
10
10
10
PLL
10
DCM
26 Clock Management Tiles

Each CMT has 2 DLLs and 1 PLL
32
10
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Day One Summary
Day One Summary

Show Slide 202:
IOSERDES: Input/output 10-bit parallel/serial converters in the I/O tile
DSP48
Can be used in parallel for up to 10 bits

18x18 2s complement optionally pipelined multiplier
Dynamic user-controlled operating modes (OPMODEs)
48-bit adder, adder, subtractor, and accumulator options
Symmetric rounding support
Optionally registered input and outputs
Expandable for 25x18 and 35x25 (using 2 DSP48s) multiplier applications
C-input is completely independent
A:B is expandable to 48 bits (suitable for SIMD applications)
A input cascade (efficient filter implementation)
Show Slide 203:
Block RAM
36-kb block memory that can be segmented into (2) 18-kb block memories or (1) 18-kb
block memory and (1) 18-kb FIFO
Optional output register for performance up to 550 MHz
Cascade mode for 64kb x 1
FIFO16
LXT
SXT
FXT
Uses block RAM for storage

Dedicated flag and status logic
Two modes (multirate and synchronous)
MGTs, PCI Express integrated Endpoint block, and tri-mode EMAC block features
Same as LXT, but with more block RAM and DSP resources
Same as LXT, plus GTX transceivers (instead of MGTs), and the PowerPC 440
processor
www.xilinx.com
1-877-XLX-CLAS
Page 187
Day One Summary
Facilitator Guide
Day One Summary

Show Slide 204:
Name one technique for building synchronization circuits that can provide
maximum recovery time
Use an extra flip-flop (two-bit shift register) to provide maximum recovery

time to the first flip-flop
Describe a few of the synthesis options that can boost performance
Timing-driven optimization, retiming, register replication, Finite State

Machine (FSM) extraction, timing constraints entry, hierarchy management,
and physical optimization
Show Slide 205:
Describe the synthesis approach that helps obtain higher

performance
1. Utilize embedded (dedicated) resources
Performance by construction (use all the dedicated hardware you can)

DSP48, PowerPC processor, EMAC, MGT, FIFO, block RAM, ISERDES, and OSERDES,
for example
2. Write code for performance
Use synchronous design methodology

Ensure the code is written optimally for critical paths
Pipeline (not as necessary with the Virtex-5 FPGA)
3. Drive your synthesis and Place & Route tools
Try different optimization techniques

Add critical timing constraints in synthesis
Preserve hierarchy
Apply full and correct constraints
Utilize optional settings
Use High effort
Page 188
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Day One Summary
Day One Summary

Show Slide 206:
How can CORE Generator software system cores improve performance?
These cores are pre-optimized for the Xilinx architecture
Show Slide 207:
Day One Summary
A flow for achieving timing closure was presented

The Virtex-5 FPGA architecture has many dedicated resources that can
improve performance and lower power
The DCM and PLL has many features that can increase design
performance
There are many clock features available for high-speed design
You can increase design performance by duplicating flip-flops, pipelining,
and using I/O flip-flops
Synthesis tools have many different options to improve synthesis results
CORE Generator software system cores can be used to take full
advantage of the Xilinx FPGA architecture
www.xilinx.com
1-877-XLX-CLAS
Page 189
Day One Summary
Facilitator Guide
Day One Summary

Transition to Course Agenda Day Two
Page 190
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide

Purpose
This module covers the day two agenda for the course.
Time
5 minutes
Process
This module covers the day two agenda for the course.
Lessons
!
www.xilinx.com
1-877-XLX-CLAS
Page 191
Facilitator Guide

Show Slide 208:

Show Slide 209:
Day One Objectives

Yesterday you learned how to:

Describe the features of the Digital Clock Manager (DCM) and PhaseLocked Loop (PLL) and how they can be used to improve performance
performance
Create and integrate cores into your design flow by using the CORE
Generator software system
Run behavioral simulation on an FPGA design that contains cores
Course Agenda Day Two - 209
Page 192
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide

Show Slide 210:
Day Two Objectives

After completing this course, you will be able to:

Apply advanced timing constraints to meet your performance goals
Use advanced implementation options to increase design performance
Show Slide 211:
Day One Agenda

www.xilinx.com
1-877-XLX-CLAS
Page 193
Facilitator Guide

Show Slide 212:
Day Two Agenda

Power Estimation (Optional)
Lab 7: FPGA Editor Demo (Optional)
ChipScope Pro Software (includes lab) (Optional)
Course Summary
Key Points
!
The ChipScope Pro Software lab is not available for classes

that use Toolwire for labs.
Transition to Achieving Timing Closure
Page 194
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide

Purpose

!
Interpret a timing report and determine the cause of timing

errors
Apply Timing Analyzer report options to create customized

timing reports
Time
45 minutes
Process
This module describes how to read the Timing Analyzer reports

and use the information to gain timing closure.
Lessons
!
Introduction
Timing Reports
Report Options
Summary
www.xilinx.com
1-877-XLX-CLAS
Page 195
Facilitator Guide
Introduction
Show Slide 213:
Show Slide 214:
Objectives
Interpret a timing report and determine the cause of timing errors

Apply Timing Analyzer report options to create customized timing reports
Achieving Timing Closure - 214
Page 196
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Introduction
Show Slide 215:
Timing Closure
www.xilinx.com
1-877-XLX-CLAS
Page 197
Facilitator Guide
Timing Reports
Show Slide 216:
Lessons
Timing Reports
Report Options
Summary
Show Slide 217:
Timing Reports
Timing reports enable you to determine how and why constraints were not
met
The Project Navigator can create timing reports at two points in the design
flow
Reports contain detailed descriptions of paths that fail their constraints
Post-Map Static Timing Report

Post-Place & Route Static Timing Report
The Timing Analyzer is a utility for creating and reading timing reports
Page 198
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Timing Reports
Key Points
!
After implementing a design, use timing reports to determine

overall design performance.
You should review the details for each failed constraint to

determine why the design does not meet performance
objectives.
Show Slide 218:
Using the Timing Analyzer
Double-click Analyze Post-Place

& Route Static Timing
Opens the Post-Place & Route

Static Timing Report
Allows you to create custom reports
Open a plain text version by clicking

Static Timing Report in the Design
Summary screen
Key Points
!
Although the plain text timing report contains the same

information as the Timing Analyzer version, it does not contain
hyperlinks to other tools.
www.xilinx.com
1-877-XLX-CLAS
Page 199
Facilitator Guide
Timing Reports
TRAINER NOTE
Demo Instructions:
1. Launch the Project Navigator and open the Timing Closure
lab project.
2. Expand the Implement, Place & Route and Generate PostPlace & Route Static Timing processes.
3. Double-click Analyze Post-Place & Route Static Timing.
Show Slide 219:
Timing Analyzer GUI
Hierarchical browser
Timing objects window
Timing tab
Quickly navigate to specific
report sections
Summarizes the path
displayed in the path detail
window
Report text
Links to the Timing

Improvement Wizard and
interactive data sheet
Logic highlighted in blue can be cross-probed
Key Points
!
Page 200
Clicking a link in the Delay Type column opens the interactive

data sheet on the Web. Customized for your target device and
speed grade, this data sheet includes timing model drawings
for clearly defining each incremental delay in the timing path.
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Timing Reports
TRAINER NOTE
Demo Instructions:
1. Click the Timing Improvement Wizard link to show the
popup dialog box.
2. Click a Tilo delay to open the interactive data sheet.
Show Slide 220:
Cross-Probing
Shows the placement of logic in a delay path
Floorplan-implemented view for seeing the actual placement and routing

used
Technology view shows logical path through components
Key Points
!
This enables you to quickly view the placement of logic in

critical paths.
www.xilinx.com
1-877-XLX-CLAS
Page 201
Facilitator Guide
Timing Reports
Show Slide 221:
Timing Report Structure
Timing constraints
Number of paths covered and number of paths that failed for each
constraint
Detailed descriptions of the longest paths
Data sheet report
Timing summary
Setup, hold, and clock-to-out times for each I/O pin

Number of errors (number of failing paths)
Timing score (total number of ps of all constraints that were missed)
Timing report description
Allows you to easily duplicate the report
Key Points
Page 202
Timing reports also contain headers with information such as

design name, device targeted, and software version.
The timing score is a key indicator of overall design

performance. The timing score represents the total number of
picoseconds by which the design fails to meet constraints. A
design that meets all constraints has a timing score of 0.
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Timing Reports
Show Slide 222:
Report Example
Constraint summary
Total delay
Number of paths covered

Number of timing errors
Length of critical path
Clock and data breakdown
Detailed path description
Delay types are described

in the data sheet
Worst-case conditions are
assumed, unless pro-rated
Key Points
!
The Timing Constraint Report lists each constraint as well as

the longest delay paths for each constraint. The report also
breaks down the delay paths into incremental delays.
Use the detailed path description to locate the logic in the

design that is causing the path to fail. If you do not label nets or
choose descriptive instance names, or if your synthesis tool has
created default net names, analyzing this report may be
difficult.
The tools account for clock distribution delay on input and

output paths as well as clock skew on internal flip-flop to flipflop paths.
All delays reported are for worst-case temperature and voltage.

You can prorate delays by specifying the worst-case
temperature and voltage that you expect your device to
encounter. Prorating will be discussed later in this module and
also in the Path-Specific Timing Constraints module.
www.xilinx.com
1-877-XLX-CLAS
Page 203
Facilitator Guide
Timing Reports
Key Points
!
In the far right column, the instance name in black text is the
physical resource associated with each delay. Use this instance
name to locate the logic in the FPGA Editor. The blue names are
logical resources, which can be used to locate the logic in the
floorplanner or in the RTL viewer of your synthesis tool.
Logic and routing breakdown can be useful during post-MAP

timing analysis to determine whether the constraints are
reasonable. This will be discussed further in the next lesson.
Show Slide 223:
Timing Improvement Wizard
Makes intelligent design suggestions
When a path fails to meet a timing constraint, the Timing Analyzer shows
its icon
The Wizard asks questions and provides useful suggestions
Answers range from design change guidance to implementation tool
options
Page 204
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide

Show Slide 224:
Lessons
Timing Reports
Report Options
Summary
Show Slide 225:
Estimating Design
Performance
Performance estimates are available before implementation is complete

Synthesis Report
Logic delays are accurate

Routing delays are estimated based on fanout
Reported performance is generally accurate to within 20 percent
Post-Map Static Timing Report
Logic delays are accurate

Routing delays are listed as 0 ns*
Use the 60/40 rule to obtain a more realistic performance estimate
www.xilinx.com
1-877-XLX-CLAS
Page 205
Facilitator Guide

Key Points
!
The Synthesis Report is the first place where performance

estimates are given. The estimate is not very accurate this early
in the implementation process, but it can be an indicator of
whether synthesis results are good enough to proceed to the
next step.
The Post-Map Static Timing Report is useful because it is based

on the Xilinx timing constraints, and this report shows detailed
descriptions of the longest paths covered by each constraint.
The routing delays are not accurate, but performance can be
estimated by using the logic delays and the 60/40 rule (covered
next).
* If MAP is run with the timing-driven packing option, routing

delays will be estimated based on logic placement and fanout.
Show Slide 226:
60/40 Rule
A rule of thumb to determine whether timing constraints are reasonable

Open the Post-Map Static Timing Report
Look at the percentage of the timing constraint that is used up by logic
delays
Under 60 percent: Good chance that the design will meet timing
60 to 80 percent: Design may meet timing if advanced options are used
Over 80 percent: Design will probably not meet timing (go back to improve
synthesis results)
Key Points
!
Page 206
The 60/40 rule is a long-standing rule of thumb used by Xilinx

designers. It states that logic delays should not exceed 60
percent of the timing budget.
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide

Show Slide 227:
Analyzing Post-Place &

Route Timing
There are many factors that contribute to timing errors, including
Each root cause has a different solution
Neglecting synchronous design rules or using incorrect HDL coding style

Poor synthesis results (too many logic levels in the path)
Inaccurate or incomplete timing constraints
Poor logic mapping or placement
Rewrite HDL code
Add path-specific timing constraints
Resynthesize or reimplement with different software options
Correct interpretation of timing reports can reveal the most likely cause
Therefore, the most likely solution
Key Points
!
The next several slides show examples of timing errors and

how to identify the root cause of the failure.
www.xilinx.com
1-877-XLX-CLAS
Page 207
Facilitator Guide

Show Slide 228:
Case 1
Data Path: source to dest
Delay type
Delay(ns)
---------------------------Tcko
0.272
net (fanout=7)
0.325
Tilo
0.146
net (fanout=1)
1.500
Tilo
0.146
net (fanout=1)
0.174
Tilo
0.146
net (fanout=1)
0.204
Tas
0.159
---------------------------Total
3.072ns
Logical Resource(s)
------------------source
net_1
lut_1
net_2
lut_2
net_3
lut_3
net_4
dest
-----------------------------(0.869ns logic, 2.203ns route)
(28.3% logic, 71.7% route)
This path is constrained to 3 ns

What is the primary cause of the timing failure?
Key Points
!
Page 208
Tas is the setup time of a slice flip-flop relative to the LUT

inputs to the slice (that is, this delay includes a Tilo delay).
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide

Show Slide 229:
Case 1 Answer
Delay type
Delay(ns)
---------------------------Tcko
0.272
net (fanout=7)
0.325
Tilo
0.146
net (fanout=1)
1.500
Tilo
0.146
net (fanout=1)
0.174
Tilo
0.146
net (fanout=1)
0.204
Tas
0.159
---------------------------Total
3.072ns
Logical Resource(s)
------------------source
net_1
lut_1
net_2
lut_2
net_3
lut_3
net_4
dest
-----------------------------(0.869ns logic, 2.203ns route)
(28.3% logic, 71.7% route)
The net_2 signal has a long delay and low fanout

Most likely cause is poor placement
Show Slide 230:
Poor Placement: Solutions
Increase placement effort level (or overall effort level)

Timing-driven packing, if the placement is caused by packing unrelated
logic together
Cross-probe to the floorplanner to see what has been packed together

This option is covered in the Advanced Implementation Options module
PAR extra effort or Xplorer
Area constraints with the PlanAhead tool or PACE
Covered in the Advanced Implementation Options module

Covered in the Designing with the PlanAhead Analysis and Design Tool
course
www.xilinx.com
1-877-XLX-CLAS
Page 209
Facilitator Guide

Show Slide 231:
Case 2
Delay type
Delay(ns)
---------------------------Tcko
0.272
net (fanout=7)
0.125
Tilo
0.146
net (fanout=187)
2.500
Tilo
0.146
net (fanout=1)
0.174
Tilo
0.146
net (fanout=1)
0.204
Tas
0.159
---------------------------Total
3.872ns
Logical Resource(s)
------------------source
net_1
lut_1
net_2
lut_2
net_3
lut_3
net_4
dest
-----------------------------(0.869ns logic, 3.003ns route)
(22.4% logic, 77.6% route)
This path is also constrained to 3 ns

Show Slide 232:
Case 2 Answer
Delay type
Delay(ns)
---------------------------Tcko
0.272
net (fanout=7)
0.125
Tilo
0.146
net (fanout=187)
2.500
Tilo
0.146
net (fanout=1)
0.174
Tilo
0.146
net (fanout=1)
0.204
Tas
0.159
---------------------------Total
3.872ns
The signal net_2 has a long delay, but the fanout is not low
Most likely cause is high fanout
Page 210
Logical Resource(s)
------------------source
net_1
lut_1
net_2
lut_2
net_3
lut_3
net_4
dest
-----------------------------(0.869ns logic, 3.003ns route)
(22.4% logic, 77.6% route)
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide

Show Slide 233:
High Fanout: Solutions
Most likely solution is to duplicate the source of the high-fanout net
If the net is the output of a flip-flop, the solution is to duplicate the flip-flop
If the net is driven by combinatorial logic, locating the source of the net in
the HDL code may be more difficult
Use manual duplication (recommended) or synthesis options
Use synthesis options to duplicate the source
Key Points
!
For more information about duplicating flip-flops, see the

FPGA Design Techniques module.
www.xilinx.com
1-877-XLX-CLAS
Page 211
Facilitator Guide

Show Slide 234:
Case 3
Delay type
Delay(ns)
---------------------------Tcko
0.272
net (fanout=7)
0.521
Tilo
0.146
net (fanout=1)
0.180
Tilo
0.146
net (fanout=1)
0.223
Tilo
0.146
net (fanout=1)
0.123
Tilo
0.146
net (fanout=1)
0.310
Tilo
0.146
net (fanout=1)
0.233
Tilo
0.146
net (fanout=1)
0.308
Tas
0.159
---------------------------Total
3.205ns
Logical Resource(s)
------------------source
net_1
lut_1
net_2
lut_2
net_3
lut_3
net_4
lut_4
net_5
lut_5
net_6
lut_6
net_7
dest
-------------------------------------(1.307ns logic, 1.898ns route)
(40.8% logic, 59.2% route)
This path is also constrained to 3 ns

Show Slide 235:
Case 3 Answer
Delay type
Delay(ns)
---------------------------Tcko
0.272
net (fanout=7)
0.521
Tilo
0.146
net (fanout=1)
0.180
Tilo
0.146
net (fanout=1)
0.223
Tilo
0.146
net (fanout=1)
0.123
Tilo
0.146
net (fanout=1)
0.310
Tilo
0.146
net (fanout=1)
0.233
Tilo
0.146
net (fanout=1)
0.308
Tas
0.159
---------------------------Total
3.205ns
There are no really long delays, but there are a lot of logic levels (7)
Page 212
Logical Resource(s)
------------------source
net_1
lut_1
net_2
lut_2
net_3
lut_3
net_4
lut_4
net_5
lut_5
net_6
lut_6
net_7
dest
-------------------------------------(1.307ns logic, 1.898ns route)
(40.8% logic, 59.2% route)
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide

Key Points
!
The seven logic levels in this path include the six Tilo delays
plus the Tas delay (setup time going through a LUT).
Show Slide 236:
Too Many Logic

Levels: Solutions
The implementation tools cannot do much to improve performance

The netlist must be altered to reduce the amount of logic between
flip-flops
Possible solutions
Check whether the path is a multicycle path
Use the retiming option during synthesis to distribute logic more evenly
among flip-flops
Confirm that good coding techniques were used to build this logic
(no nested if or case statements)
Add a pipeline stage
If yes, add a multicycle path constraint
www.xilinx.com
1-877-XLX-CLAS
Page 213
Facilitator Guide
Report Options
Show Slide 237:
Lessons
Timing Reports
Report Options
Summary
Show Slide 238:
Types of Timing Reports
Analyze Against Timing Constraints
Compares design performance with timing constraints

Most commonly used report format
Used for Post-Map and Post-Place & Route Static Timing Reports if the
design contains constraints
Analyze Against Auto-Generated Design Constraints
Determines the longest paths in each clock domain

Use with designs that have no constraints defined
Used for Post-Map and Post-Place & Route Static Timing Reports if the
design contains no constraints
Page 214
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Report Options
Key Points
!
Choosing which report to create depends on whether you used

timing constraints.
The Analyze Against Timing Constraints Report is the most

useful report if your design contains timing constraints. This
report provides you with information on each of your
constraints.
The Analyze Against Auto-Generated Constraints Report is

only used with designs that do not contain timing constraints.
Show Slide 239:
Types of Timing Reports
Analyze Against User-Specified Paths by Defining Endpoints
Custom report for selecting sources and destinations
Analyze Against User-Specified Paths by Defining Clock and I/O Timing
Allows you to define PERIOD and OFFSET constraints on-the-fly

Use with designs that have no constraints defined
Key Points
!
The Analyze Against User-Specified Paths by Defining

Endpoints Report allows you to create custom reports that
focus on specific paths in the design.
The Analyze Against User-Specified Paths by Defining Clock

and I/O Timing Report is only used with designs that do not
contain timing constraints.
www.xilinx.com
1-877-XLX-CLAS
Page 215
Facilitator Guide
Report Options
Key Points
!
Clicking the icons in the toolbar will create a report using the
currently defined options. To access the report options shown
next, you must select a report type from the Analyze menu.
Show Slide 240:
Timing Constraints Tab
After selecting a Timing Analyzer

report, you can select from
various report options
Report failing paths: Lists only
the paths that fail to meet your
specified timing constraints
Report unconstrained paths:
Allows you to list some or all of
the unconstrained paths in your
design
You can also select which
constraints you want reported
Key Points
Page 216
Selecting a report from the Analyze menu displays an options

dialog box.
Select the Report paths option to create reports after MAP but
before Place & Route. This format has detailed path information
on the longest paths for each constraint, even if they are not
timing errors (default format for the Post-Map Static Timing
Report).
Select the Report failing paths option to create reports after

Place & Route. This format has detailed path information on
just the paths that fail to meet timing (default format for the
Post-Place & Route Static Timing Report).
Reporting unconstrained paths is useful when you are not

certain which paths were covered by your timing constraints.
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Report Options
Key Points
!
You can select which constraints that you want to apply to the
design during the report creation. If a constraint is not selected,
the tools will act as if the constraint did not exist. For example,
if you disable a multicycle path constraint, those paths will be
analyzed and reported under the global PERIOD constraint
(probably as timing errors).
TRAINER NOTE
Demo Instructions:
Viewing the report options:
!
Select Analyze Against Timing Constraints.
Note: There may not be timing constraints for this design. If there
are timing constraints, they are listed in the Timing Constraints tab
as in the figure above.
Show Slide 241:
Options Tab
Speed grade
Constraint details
Specify the number of detailed

paths reported per constraint
Report details hold violations
Timing report contents
Generate new timing information

without reimplementing
Include or exclude report

sections
Prorating
Specify your own worst-case

environment
www.xilinx.com
1-877-XLX-CLAS
Page 217
Facilitator Guide
Report Options
Key Points
!
The Speed Grade option lets you easily determine whether

moving to a faster or slower speed-grade device will meet your
timing needs.
Select the Report fastest paths/verbose hold paths option if

you have clock signals on non-global routing resources. (Skew
analysis is automatically performed on all global clocks.)
Prorating values are best entered in the Constraints Editor

because these prorated delays will be used during Place &
Route. This mechanism only updates the generated timing
report for the new environmental conditions that are specified.
If prorating gets you close to timing closure, try entering the

prorated values in the Constraints Editor and reimplement.
Prorating is not always available for the newest device families.
Show Slide 242:
Filter Paths by Net Tab
Restrict which paths are reported

by selecting specific nets
Each net is assigned to be
included by default
Net filter values
Exclude paths containing this

net
Include Only paths containing
this net
Default
Key Points
!
Page 218
Filtering is method for reducing the number of reported paths

by covering or excluding paths that contain a particular net.
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Report Options
Key Points
!
If you specify some nets as Exclude, then all nets marked

Default will be included. If you specify some nets as Include
Only, then all nets marked Default will be excluded.
If you have disabled some constraints on the Timing

Constraints tab, this tab can be used to filter out false paths
from the report. For example, if you disabled a multicycle path
constraint, you can Exclude the associated clock enable net to
prevent those paths from being analyzed against the global
PERIOD constraint.
Show Slide 243:
Path Tracing Tab
Restrict which paths are

reported by selecting path
endpoints or path types
Key Points
!
This allows you to reduce the size of a report by specifying the

path endpoints to be reported.
You can also enable or disable analysis on some specific type of

paths.
Asynchronous Set/Reset to Output and Recovery enables path

tracing through CLB flip-flop asynchronous set or reset inputs
to the Q output.
www.xilinx.com
1-877-XLX-CLAS
Page 219
Facilitator Guide
Summary
Show Slide 244:
Lessons
Timing Reports
Report Options
Summary
Show Slide 245:
1) To which resources is the timing report linked?
2) List the possible causes of timing errors
Page 220
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Summary
Show Slide 246:
Summary
Timing reports enable you to determine how and why constraints were not
met
Use the Synthesis Report and Post-Map Static Timing Report to estimate
performance before running Place & Route
The detailed path description offers clues to the cause of timing failures
Cross-probe to see the placement and a technology view of a timing path
The Timing Analyzer can generate various types of reports for specific
circumstances

!
Timing Analyzer Overview/Online Help

Help Help Topics
www.xilinx.com
1-877-XLX-CLAS
Page 221
Facilitator Guide

Answers
1) To which resources is the timing report linked?

!
Timing Improvement Wizard
Interactive data sheet on the Web
Floorplanner-implemented view for cross-probing
Technology view for cross-probing
2) List the possible causes of timing errors.

!
Neglecting synchronous design rules or using incorrect HDL

coding style
Poor synthesis results (too many levels of logic)
Inaccurate or incomplete path-specific timing constraints
Poor logic mapping or placement
Transition to Lab 4: Review of Global Timing Constraints
Page 222
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Lab 4: Review of Global Timing

Constraints
Purpose

!
Enter global timing constraints in the Constraints Editor
Read reports to determine whether constraints were met
Analyze the failing paths in the timing report to determine the

cause
Describe possible solutions to the failing paths
Time
45 minutes
Process
This lab illustrates how to use global timing constraints and the
Timing Analyzer to find the timing-critical paths of a design and
develop a strategy for gaining timing closure.
General Flow
!
Step 1: Enter global timing constraints
Step 2: Implement the design and analyze the timing
Step 3: Implement the design and analyze the timing with

Offset In and Offset Out constraints
www.xilinx.com
1-877-XLX-CLAS
Page 223
Facilitator Guide
Lab
!
Refer to the separate lab workbook for the Review of Global

Timing Constraints lab.
Transition to Timing Groups and OFFSET Constraints
Page 224
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Timing Groups and OFFSET

Constraints
Purpose

!
Use the Constraints Editor to create groups of path endpoints
Use the Constraints Editor to create path-specific OFFSET

constraints
Time
45 minutes
Process
This module describes the best ways to group path endpoints to

make the most efficient path-specific timing constraints.
Lessons
!
Introduction
Overview
Creating Groups
OFFSET Constraints
Summary
www.xilinx.com
1-877-XLX-CLAS
Page 225
Facilitator Guide
Introduction
Show Slide 247:
Timing Groups and

OFFSET Constraints
Show Slide 248:
Objectives
Use the Constraints Editor to create groups of path endpoints

Use the Constraints Editor to create path-specific OFFSET constraints

Constraints - 248
Page 226
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Overview
Show Slide 249:
Lessons

Constraints - 249
Overview
Creating Groups
OFFSET Constraints
Summary
Show Slide 250:
Path-Specific Timing
Constraints
Using global timing constraints (PERIOD, OFFSET, and PAD-TO-PAD)

will constrain your entire design
Using only global constraints often leads to over-constrained designs
Constraints are too tight

Increases compile time and can prevent timing objectives from being met
Review performance estimates provided by your synthesis tool or the PostMap Static Timing Report
Path-specific constraints override the global constraints on specified paths
This allows you to loosen the timing requirements on specific paths

Constraints - 250
www.xilinx.com
1-877-XLX-CLAS
Page 227
Facilitator Guide
Overview
Key Points
!
The key to effective constraining is applying only the

constraints that are required to communicate your performance
objectives. If you specify unrealistic expectations that you do
not really need to be met, your compile time will increase, and
you may have difficulty getting your design to complete the
Place & Route phase of implementation.
Path-specific constraints provide an accurate method of

communicating design performance objectives. Global
constraints are very powerful and can constrain every delay
path in your design. Path-specific constraints allow you to
define critical timing paths that require further optimization,
multicycle paths that are not required to be constrained as
tightly, and false paths that are not required to be constrained
at all. Path-specific timing constraints provide the
implementation tools the greatest flexibility to meet your
system timing objectives and are a critical part of highperformance design.
Show Slide 251:
More About Path-Specific

Timing Constraints
Areas of your design that can benefit from path-specific constraints
Multicycle paths
Paths that cross between clock domains
Bidirectional buses
I/O timing
Path-specific timing constraints should be used to define your

performance objectives and should not be placed indiscriminately

Constraints - 251
Page 228
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Overview
Key Points
!
Implementing path-specific constraints on designs that contain

multicycle paths or bidirectional buses is very important.
Constraints placed on these designs often loosen or remove a
large number of constrained paths, which gives the
implementation tools a great deal of flexibility in meeting your
system timing objectives.
Show Slide 252:
Global Constraints Review

Using the global PERIOD, OFFSET IN, and OFFSET OUT constraints
constrains all of these paths
This makes it easy to control the overall performance of your design
ADATA
FLOP1
FLOP2
FLOP3
OUT1
CLK
BUFG
FLOP4
FLOP5
OUT2
BUS [7..0]
CDATA
Constraints - 252
Global Constraints Review

ADATA
BUF
FLO
FLO
FLO
D Q
D Q
D Q
FLO
FLO
D Q
D Q
OUT
OUT2
BUS [7..0]
CDATA
www.xilinx.com
1-877-XLX-CLAS
Page 229
Facilitator Guide
Overview
Key Points
!
In this example, three global constraints cover most of the paths

in the design. Because most of the delay paths are covered,
controlling design performance by adjusting the constraints is
easy.
The caveat to using global constraints is that they constrain

many delay paths to the same timing requirement and do not
allow you to constrain specific paths to a separate delay.
Show Slide 253:
Path-Specific
Constraint Example
A path-specific constraint can optimize as little as one path

This provides you greater control over the performance of your design
and allows the implementation tools the greatest flexibility in meeting your
performance and utilization needs
ADATA
FLOP1
FLOP2
DQ
D Q
FLOP3
D Q
OUT1
CLK
BUFG
FLOP4
FLOP5
D Q
DQ
OUT2
BUS [7..0]
CDATA

Constraints - 253
Key Points
!
Page 230
While global constraints are powerful because of their wide

scope, path-specific constraints are also powerful because of
their precision. By loosening or tightening the constraints on
specific paths, you provide the implementation tools more
flexibility and a greater chance of meeting all of your timing
goals.
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Overview
Show Slide 254:
The Constraints Editor
Creating path-specific constraints

requires two steps
Step 1: Create groups of path

endpoints
Step 2: Communicate the timing
objective between the groups

Constraints - 254
Key Points
!
Creating path-specific timing constraints is a two-step process:

Grouping path endpoints and defining the constraint length.
Groups of path endpoints can contain flip-flops, RAMs, latches,
or pads. The most commonly used path-specific timing
constraints are Slow/Fast Path Exceptions and Multicycle
Paths.
www.xilinx.com
1-877-XLX-CLAS
Page 231
Facilitator Guide
Overview
TRAINER NOTE
Demo Instructions:
Opening a project and launching the Constraints Editor:
1. Open the ISE software.
2. Select File Open Project.
3. Browse to the Review lab.
4. Select tc_review_lab.npl and click Open.
5. In the Source window, select the
correlate_and_accumulate.ucf file
6. In the Process window, double-click Create Timing
Constraints.
Page 232
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Creating Groups
Show Slide 255:
Lessons

Constraints - 255
Overview
Creating Groups
OFFSET Constraints
Summary
Show Slide 256:
Creating Groups of
Endpoints
Path-specific timing constraints will only be effective if path endpoints can

be easily grouped together
Otherwise, constraining a large design would be time consuming and

painstaking
The Constraints Editor makes this easy by allowing you to define groups
of path endpoints (pads, flip-flops, latches, and RAMs)
Specific delay paths can then be constrained with advanced timing
constraints

Constraints - 256
www.xilinx.com
1-877-XLX-CLAS
Page 233
Facilitator Guide
Creating Groups
Key Points
!
Global constraints use predefined groups of path endpoints.

Before you can create path-specific constraints, you must create
your own groups of path endpoints.
Creating path-specific constraints requires grouping path

endpoints and creating constraints between those groups. The
best thing about the Constraints Editor is that it allows large
quantities of paths to be constrained by grouping only a few
components. It also allows you to constrain one or several paths
with a single constraint.
The challenge when creating path-specific timing constraints is

grouping path endpoints. This is sometimes difficult because
synthesis tools do not always maintain instance or net names.
Show Slide 257:
Creating Groups of
Endpoints
With the Constraints Editor,

grouping path endpoints is
made easy with the following
options
By Nets
By Instance Name
By Hierarchy
By Element Type
By Clock Edge
Through Points
By DCM Output

Constraints - 257
Key Points
Page 234
Group elements associated by nets: Group all path endpoints

that are driven by a specific net (such as a clock enable).
Group elements by instance name: Group path endpoints by

name (wildcards are allowed).
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Creating Groups
Key Points
!
Group elements by hierarchy: Group all path endpoints in a

specific level of hierarchy.
Group elements by element type: Group synchronous

endpoints (not pads) by output net name (wildcards are
allowed). This method is mainly used by schematic designers,
but it can also be used with HDL design if you know the net
names of interest.
Timing THRU Points: Group nets or 3-state buffers to be used

as THRU points in path-specific constraints. Remember that 3state buffers are not path endpoints.
Group elements by clock edge: Group synchronous elements

that are clocked by the same edge of the same clock signal. This
option is useful for designs that use DDR output flip-flops.
Group elements by DCM Output pins: Group together the

output clock signals from one DCM component.
Show Slide 258:
Grouping by Nets or
Output Net Name
Step 1: Enter a group name

Step 2: Select the type of net to
search for
Clock or enable net

Optional filter string
Matching nets appear in the

Available list
Step 3: Select nets and click Add
Nets appear in the Time Name

Targets window

Constraints - 258
www.xilinx.com
1-877-XLX-CLAS
Page 235
Facilitator Guide
Creating Groups
Key Points
!
The Group elements associated by Nets option, shown in the figure

above, is used to select a control signal that connects to a group
of registers, latches, or RAMs. All components driven by the
control signal become a group of path endpoints and can be
given a reference name, such as Control_Registers.
Most synthesis tools have schematic viewers that allow you to

determine easily the clock enable signal name generated by the
synthesis tool. The Xilinx Floorplanner or the FPGA Editor can
also be used. Most designers find that the net names chosen by
their synthesis tool are recognizable.
The Group elements by output net name option (not shown, but
the dialog box is similar) is commonly used with designs that
use schematic design flows; however, if your synthesis tool
maintains the names of nets connected to the outputs of your
synchronous elements, it can be useful.
TRAINER NOTE
Demo Instructions:
Creating a group of path endpoints:
1. In the Constraints Editor, click Create next to Group
elements associated by Nets (TNM_NET).
2. Enter MY_CLKEN_GRP in the Time Name field.
3. Select Enable Nets from the drop-down list.
4. Click the Add All to move the nets into the Time Name
Targets window.
5. Click OK to create the group.
Page 236
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Creating Groups
Show Slide 259:
Grouping by Instance Name

or Hierarchy
Steps are the same

Design element types are
different
Instance name: FFs, pads,

latches, RAMs, CPUs,
HSIOs
Hierarchy: User levels,
levels created by Xilinx

Constraints - 259
Key Points
!
The Grouping by instance name option requires you to know the

name of each resource that is to be a part of a group. If you
preserve hierarchy during synthesis, finding the logic that you
want to group can be easier. Instance names can be found in
your synthesis tools Schematic Viewer or the Xilinx
Floorplanner.
The Grouping by hierarchy option (not shown) is not a commonly

used method. The best use of this option is in a schematic flow
or when your design contains cores. Both of these types of
designs will contain both user and Xilinx levels of hierarchy.
You can apply constraints to all of the cores in your design by
searching for Xilinx levels of hierarchy.
www.xilinx.com
1-877-XLX-CLAS
Page 237
Facilitator Guide
Creating Groups
Show Slide 260:
Grouping by Clock Edge
Step 1: Enter a group

name
Step 2: Select a previously
defined group
Optional filter to help find

the group
Step 3: Select a clock

edge

Constraints - 260
Key Points
!
Page 238
The Grouping by Clock Edge option is used to take an existing

group of flip-flops and create a subgroup that is clocked on a
specific clock edge.
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Creating Groups
Show Slide 261:
Grouping by DCM Outputs
Step 1: Enter a group

name
Step 2: Select a DCM
instance
Optional filter to help find

the group
Step 3: Select outputs and

click Add

Constraints - 261
Show Slide 262:
Timing THRU Points
Allows you to optimize paths through specific nets and 3-state buffers
In this example, a group of nets was named TEOUTS. A constraint can
now be referenced such that only the delay paths through the TEOUTS
nets will be optimized
TPTHRU = TEOUTS
D
reg
MYCTR
reg
D
reg

Constraints - 262
www.xilinx.com
1-877-XLX-CLAS
Page 239
Facilitator Guide
Creating Groups
Key Points
!
THRU points allow you to select particular paths through nets

and 3-state buffers.
If you use HDL, you can have difficulty determining what the
net names are in your design. You can use your synthesis tools
Schematic Viewer or the Xilinx Floorplanner to find net names.
Show Slide 263:
Timing THRU Points
Step 1: Enter a TPTHRU name

Step 2: Select nets or 3-state
buffers
Optional filter string
Step 3: Select items and click

Add

Constraints - 263
Key Points
Page 240
Remember that THRU points allow you to identify nets and 3state buffers so that particular paths that use those resources
can be specifically constrained.
This option is most often used for identifying false paths, which
can occur when bidirectional buses are a part of your design.
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Creating Groups
TRAINER NOTE
Demo Instructions:
Creating a group of nets:
1. In the Advanced tab, click the Create button next to Timing
THRU Points (TPTHRU).
2. Enter MY_MID_PT in the TPTHRU Name field.
3. Select tensout<0> through tensout<6> to be in this group.
You can use the Filter field to help find the nets. Enter tens*
in the Filter field and click Find to help narrow the search.
4. Click OK to create the group.
Show Slide 264:
Managing Groups
Groups that you have defined are written into the UCF
INST <element_name> TNM = <group_name>; OR

NET <net_name> TNM_NET = <group_name>; OR
TIMEGRP <group_name> = <elements>;
To add items to an existing group, click one of the grouping buttons and
use the same time name
To delete a group, delete it with a text editor
You cannot remove items from a group with the Constraints Editor
Edit the UCF with a text editor

Constraints - 264
Key Points
!
The INST constraint is used when you create groups by

instance name or level of hierarchy.
www.xilinx.com
1-877-XLX-CLAS
Page 241
Facilitator Guide
Creating Groups
Key Points
Page 242
The NET constraint is used when you create groups by net.
The TIMEGRP constraint is used when you create groups by

output net name.
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
OFFSET Constraints
Show Slide 265:
Lessons

Constraints - 265
Overview
Creating Groups
OFFSET constraints
Summary
Show Slide 266:
Review of Global OFFSET

Constraints
Use the Pad to Setup and Clock to Pad columns to specify OFFSETs for
all I/O paths on each clock domain
Easiest way to constrain most I/O paths
However, this can lead to an over-constrained design

Constraints - 266
www.xilinx.com
1-877-XLX-CLAS
Page 243
Facilitator Guide
OFFSET Constraints
Key Points
!
If you have large numbers of I/O pins with similar timing

requirements, set the global OFFSET constraints to that
requirement.
Then use path-specific OFFSET constraints to override the

global constraints for I/O pins that have different requirements.
Show Slide 267:
Pin-Specific OFFSET
Constraints
Use the Pad to Setup and Clock to Pad columns to specify OFFSET
constraints for each I/O pin
Use this type of constraint when only a few I/O pins need different timing

Constraints - 267
Key Points
Page 244
Pin-specific OFFSET In/Out constraints can be entered in the

Ports tab of the Constraint Editor.
You can select a large number of I/O paths by holding down

the Shift or Ctrl key and clicking each I/O pin under the
appropriate column heading. After selecting the pads, rightclick and select Clock to Pad or Pad to Setup.
Creating a large number of pin-specific constraints usually

requires the implementation tools to take more time during
Place & Route. To reduce compile time, creating group OFFSET
In/Out constraints (the next few pages) is recommended.
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
OFFSET Constraints
TRAINER NOTE
Demo Instructions:
!
Click the Ports tab to view where you can enter OFFSET
constraints for specific inputs and outputs.
Show Slide 268:
Creating Groups of Pads
Groups of I/O pads can be made in the Ports section
Use Shift-click or Ctrl-click to select multiple pads

Right-click and select Pad to Setup or Clock to Pad and enter the length
of the constraint

Constraints - 268
Key Points
!
This option constrains multiple paths to I/O pads at once.
For example, a global constraint of 20 ns on inputs is sufficient

for most paths, but a single bus may need to be constrained to
10 ns. The global OFFSET can be 20 ns, and the path-specific
OFFSET for the group of bus pins can be 10 ns. Group OFFSETs
are easier to enter than pin-specific OFFSETs and are faster to
compile.
You can also create groups of I/O pads by using the buttons in
the Advanced tab; however, I/O pads do not always have
common names for easy grouping. The Ports tab allows you to
easily create groups of pads with arbitrary names.
www.xilinx.com
1-877-XLX-CLAS
Page 245
Facilitator Guide
OFFSET Constraints
Key Points
!
Note that this allows you to easily make a group offset

constraint in case your design is a double-data rate application
as well.
Show Slide 269:
Creating Group OFFSET

Constraints
OFFSET IN/OUT constraints can also be entered in the Advanced tab

The Pad to Setup and Clock to Pad options allow you to enter OFFSET
IN/OUT constraints on specific groups of pads
Just group your pads by element type first

Constraints - 269
Key Points
Page 246
Instead of using the Ports tab, you may find it easier to use the
Advanced tab in some cases.
For example, if you want to constrain input paths that end at a

specific group of registers called critical_inputs, specifying an
OFFSET IN from All Pads to critical_inputs may be easier
instead of grouping all the input pads that feed those registers.
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
OFFSET Constraints
Show Slide 270:
Group OFFSET Constraints
Select a group of pads
Enter a timing requirement
Optional: Change clock

domain
Optional: Select a group of

synchronous elements

Constraints - 270
Key Points
!
This dialog box appears when you select the Pad to Setup or
Clock to Pad buttons in the Ports tab or the Advanced tab.
www.xilinx.com
1-877-XLX-CLAS
Page 247
Facilitator Guide
OFFSET Constraints
Show Slide 271:
Source-Synchronous
OFFSET Constraints
For source-synchronous
inputs, you can specify
the width of the valid
data window by
specifying a rising
edge constraint and
a falling edge constraint

Constraints - 271
Show Slide 272:
OFFSET Constraints with

Two-Phase Clocks
OFFSET constraints define the relationship between the data and the
reference clock edge at the pins of the FPGA
Defined in the global PERIOD constraint with the HIGH or LOW keyword
If all I/Os are clocked on a single edge, use the HIGH or LOW keyword in
the PERIOD constraint to define which edge is used
If both clock edges are used, use the opposite keyword in the OFFSET
constraint
Relative to Clock Edge option

Constraints - 272
Page 248
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
OFFSET Constraints
Key Points
!
If the HIGH keyword is used in the PERIOD constraint, then

the reference clock edge is a rising edge. This is the default.
If the LOW keyword is used, then the reference clock edge is a

falling edge.
If you need the OFFSET constraint to be able to reference both

clock edges, use one keyword in the PERIOD constraint, and
the opposite keyword when defining the OFFSET constraint.
www.xilinx.com
1-877-XLX-CLAS
Page 249
Facilitator Guide
Summary
Show Slide 273:
Lessons
Overview
Creating Groups
OFFSET Constraints
Summary

Constraints - 273
Show Slide 274:
1) How do path-specific timing constraints improve the performance of

your design?
2) How would you constrain this design to obtain a maximum internal
clock frequency of 100 MHz?
The input will be valid at least 3 ns before the rising edge of CLK. The
output must be valid 4 ns after the falling edge of CLK.
3) Write the appropriate OFFSET constraints

IN
OUT
C
CLK
RESET_A
RESET_B
Constraints - 274
Page 250
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Summary
IN
D Q
D Q
D Q
D Q
OUT
CLK
RESET_A
RESET_B
Show Slide 275:
Summary
Path-specific constraints are used to override global constraints
Creating path-specific constraints is a two-step process
Keeps your design from becoming over-constrained

Allows the software to make intelligent trade-offs to meet all of your
performance goals
Create groups of path endpoints
Communicate the timing objective between the groups
Path-specific OFFSET constraints can be entered on either the Ports tab

or the Advanced tab
When using both clock edges for I/O, write separate OFFSET constraints
for each clock edge

Constraints - 275

!
Constraints Guide
www.xilinx.com
1-877-XLX-CLAS
Page 251
Facilitator Guide

Answers
1) How do path-specific timing constraints improve the

performance of your design?
!
Path-specific timing constraints provide more flexibility to the

implementation tools for meeting all of your timing objectives.
2) How would you constrain this design to obtain a maximum

internal clock frequency of 100 MHz?
!
Enter a global PERIOD constraint of 10 ns on the CLK signal.
3) Write the appropriate OFFSET constraints.

!
Assuming that the PERIOD constraint uses the HIGH keyword

and 50-percent duty cycle:
OFFSET = IN 3 ns BEFORE CLK;
OFFSET = OUT 4 ns AFTER CLK FALLING;
Transition to Path-Specific Timing Constraints
Page 252
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide

Purpose

!
Constrain paths that cross between clock domains by using the

Constraints Editor
Constrain multicycle paths by using the Constraints Editor
Define false paths by using the Constraints Editor
Describe how constraints are prioritized
Time
45 minutes
Process
This module describes some of the most common applications for

path-specific timing constraints and how to make them with the
Xilinx Constraints Editor.
Lessons
!
Introduction
Multicycle Paths
False Paths
Summary
www.xilinx.com
1-877-XLX-CLAS
Page 253
Facilitator Guide
Introduction
Show Slide 276:
Path-Specific Timing
Constraints
Show Slide 277:
Objectives
Constrain paths that cross between clock domains by using the

Constraints Editor
Constrain multicycle paths by using the Constraints Editor
Define false paths by using the Constraints Editor
Describe how constraints are prioritized
Path-Specific Timing Constraints - 277
Page 254
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Introduction
Show Slide 278:
Timing Closure
www.xilinx.com
1-877-XLX-CLAS
Page 255
Facilitator Guide

Show Slide 279:
Outline

Multicycle Paths
False Paths
Summary
Show Slide 280:
Constraining Between Rising

and Falling Clock Edges
The PERIOD constraint automatically accounts for two-phase clocks
Includes adjustments for non-50-percent duty cycle clocks
Example: A PERIOD constraint of 10 ns on CLK will apply a 5-ns

constraint between these two flip-flops
No path-specific constraints are required for this case
OUT
CLK
Page 256
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide

Key Points
!
Recall that the PERIOD constraint allows you to specify the

clock duty cycle. The implementation tools automatically
reduce the length of the constraint when some flip-flops are
triggered off the negative edge of the same clock.
If your HDL code contains some processes that are triggered on

a rising edge and other processes that are triggered on a falling
edge, your synthesis tool will create a circuit like the one
shown.
If you manually create an inverted clock and use that clock in

your HDL code, your synthesis tool can create logic different
than what is shown. This can prevent the Xilinx software from
correctly constraining these paths.
Show Slide 281:
Constraining Between
Related Clock Domains
Create a PERIOD constraint for one clock
Define all related clocks in terms of this PERIOD constraint
The implementation tools will use the relationships to determine how to

cross between clock domains
DCM with multiple outputs
Define a PERIOD constraint on the input to the DCM

The implementation tools will push the constraint onto each output
All constraints will be defined relative to the original PERIOD constraint
Key Points
!
If your design contains clocks that have a fixed relationship in

frequency and/or phase, you should define your global
PERIOD constraints relative to each other.
www.xilinx.com
1-877-XLX-CLAS
Page 257
Facilitator Guide

Key Points
!
To learn more about creating related PERIOD constraints, refer

to the Global Timing Constraints module in the Fundamentals
of FPGA Design course.
Show Slide 282:
Unrelated Clock Domains
In this example, the delay path between the two clock domains is not
covered by either of the PERIOD constraints
You must add a synchronization circuit when crossing between unrelated

clock domains
This is the default behavior
A constraint is not technically needed, but you may want to constrain the
path for completeness
PERIOD CLK_A
DQ
PERIOD CLK_B
D Q
D Q
D Q
OUT1
CLK_A
CLK_B
Constraining Between Unrelated Clock Domains
Key Points
!
Page 258
If the two clocks are asynchronous (no known phase

relationship), then you should also insert a synchronization
circuit between the clock domains.
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide

Show Slide 283:
To constrain the path between the two clock domains (highlighted in gray)
Define groups of registers CLK_A and CLK_B with the Group by Nets
option
Automatically done if you have specified a PERIOD constraint for both clock
domains
Place a Slow/Fast Exception between the two groups of registers

PERIOD CLK_A
D
5 ns
PERIOD CLK_B
D
OUT1
CLK_A
CLK_B
Key Points
!
When crossing between unrelated clock domains, there will be

a synchronization circuit to handle setup or hold violations;
however, a timing constraint is useful to ensure that signals
cross between the clock domains in a reasonable amount of
time (for example, one clock period).
www.xilinx.com
1-877-XLX-CLAS
Page 259
Facilitator Guide

Show Slide 284:
Step 1: Create the groups by

using the Group by Nets option
Group by clock net

Skip this step if PERIOD
constraints are defined
Step 2: Create the constraint

by clicking Slow/Fast
Exceptions
Show Slide 285:
Enter a name for this constraint
Must begin with TS
Select the groups that define the

constraint
Specify the value of the

constraint
Page 260
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide

Key Points
!
Creating the groups is not shown.
www.xilinx.com
1-877-XLX-CLAS
Page 261
Facilitator Guide
Multicycle Paths
Show Slide 286:
Outline

Multicycle Paths
False Paths
Summary
Show Slide 287:
Multicycle Path Constraints
Always at least one clock cycle

between updates
Typically, the registers are
controlled by a clock enable
CLK
PRE2
TC
50 MHz
CE
Q0 Q1
A prescaled counter is one example
Registers in COUT14 are updated every four clock cycles

Paths between these registers are multicycle paths
Page 262
200 MHz
Multicycle paths occur when

registers are not updated on
consecutive clock cycles
www.xilinx.com
1-877-XLX-CLAS
COUT14
Q2 Q3 Q4
Q14 Q15
Facilitator Guide
Multicycle Paths
Key Points
!
Another common place to find multicycle paths is in state

machines. A request for data may be sent in one state, but the
state machine may need to go through multiple states before
the data is used. The path from the data request back into the
state machine would be a multicycle path, even though there is
no clock-enable signal involved.
Show Slide 288:
Creating Multicycle Path

Constraints
Step 1: Create a global PERIOD

constraint (not shown)
Step 2: Create groups by using
the Group by Nets option
Group by enable net
Step 3: Click Multicycle

Paths
Key Points
!
Before you can create a multicycle path constraint, you must

first create a PERIOD constraint on the clock net. You then
group the synchronous elements that contain the multicycle
paths. Finally, you create the multicycle path constraint.
www.xilinx.com
1-877-XLX-CLAS
Page 263
Facilitator Guide
Multicycle Paths
Show Slide 289:
Creating Multicycle Path

Constraints
Enter a TIMESPEC name
Select the groups that were

previously defined
Define the constraint relative

to the PERIOD constraint
Key Points
Page 264
When defining new constraints relative to existing constraints,

the Xilinx software simply multiplies or divides the reference
constraint value by the specified factor, keeping the units the
same.
For example, if the PERIOD constraint ts_clk has been defined

as a period length of 5 ns, you should multiply the constraint by
four to define a multicycle path constraint (5 ns x 4 = 20 ns).
If the PERIOD constraint had been defined as a frequency of

200 MHz, you would divide the constraint by four in order to
define the multicycle path constraint (200 MHz / 4 = 50 MHz).
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Multicycle Paths
Show Slide 290:

Background Information
Prescaled 16-bit counter is created in two blocks
Q0 and Q1 in block PRE2 toggle at 200 MHz

Q[15:2] toggle every fourth clock edge (50 MHz)
The design is fully synchronous because all registers share the same clock
However, COUT14 registers are disabled 3/4 of the time so they do not have
to meet a 200-MHz PERIOD constraint
200 MHz
CLK
PRE2
TC
50 MHz
CE
Q0 Q1
COUT14
Q2 Q3 Q4
Q14 Q15
Key Points
!
A prescaled counter is simply a 2-bit counter that generates a

clock-enable signal that allows the most significant bits to
toggle at less than the full clock rate. Designers sometimes use
this type of counter because of its extremely high performance
(often faster than traditional carry logic implementations).
Prescaled counters are faster because only the Least Significant

Bits (LSBs) must toggle at the full clock rate. The Most
Significant Bits (MSBs) are disabled three out of every four
clock cycles because the prescaler is counting to four and
enabling the MSBs only one out of every four clock cycles.
Because the LSBs have the only critical paths, this gives the
implementation tools the greatest placement flexibility for the
MSBs, and the counter can easily be placed to obtain peak
performance.
www.xilinx.com
1-877-XLX-CLAS
Page 265
Facilitator Guide
Multicycle Paths
200 MHz
CLK
PRE2
TC
50 MHz
COUT14
CE
Q2 Q3 Q4
Q0 Q1
Q14 Q15
Show Slide 291:
1) What constraints need to be placed on this design to ensure it will meet

the performance objectives?
2) How would you enter these constraints through the Constraints Editor?
3) How do multicycle path constraints improve the performance of your
design?
200 MHz
CLK
PRE2
TC
50 MHz
CE
Q0 Q1
Page 266
COUT14
Q2 Q3 Q4
www.xilinx.com
1-877-XLX-CLAS
Q14 Q15
Facilitator Guide
False Paths
Show Slide 292:
Outline

Multicycle Paths
False Paths
Summary
Show Slide 293:
False Paths
The False Paths option

prevents constraints from
being applied to specific
paths
Use the False Paths

option to reduce the
number of constrained
paths in your design
www.xilinx.com
1-877-XLX-CLAS
Page 267
Facilitator Guide
False Paths
Key Points
!
False paths are useful when your design has paths that are not
required to be constrained. Most commonly, these paths are
bidirectional paths that are not exercised during normal
operation; however, any path that you know will meet your
timing objectives can be defined as a false path.
Show Slide 294:
Defining False Paths
Use the False Paths (FROM:TO:TIG)

option to define false paths between
groups of path endpoints
TIG = Timing IGnore

Prevents any constraints from
being applied to the paths
Paths through specific nets or
3-state buffers can be defined with
the THRU Points option
What is wrong with this example?
Key Points
Page 268
Use the False Paths option to identify paths between groups of

path endpoints that should not have any constraints covering
them. While this will not remove the constraint, it will remove
these paths from the scope of the constraint.
In this example, all paths between flip-flops will be marked as

false paths. This effectively negates the PERIOD constraint. To
fix this problem, either change the groups of selected endpoints
or define and select a THRU point.
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
False Paths
Show Slide 295:
Defining False Paths by Nets
The False Paths by Nets option

allows you to ignore timing
constraints on a specific net
Any delay path containing

the RESET net will not be
constrained
The Ignored TIMESPECs

option allows specific
constraints to be ignored
Key Points
!
This option prevents any constraint from being applied to paths

that contain a specific net.
The Ignored TIMESPECs option prevents specific timing

constraints from being applied to the selected nets. By default,
all constraints are ignored for the selected nets. Select this
option to ignore specific constraints, while still allowing the
tools to apply other constraints to paths containing the selected
nets.
www.xilinx.com
1-877-XLX-CLAS
Page 269
Facilitator Guide
False Paths
Show Slide 296:
4) If a PERIOD constraint were placed on this design, what delay paths

would be constrained?
5) If the goal is to optimize the input and output times without constraining
the paths between registers, what constraints are needed?
Assume that a global PERIOD constraint is already defined
Status
Register
Control
Register
Control_Enable
Status_Enable
BIDIR_PAD(7:0)
BIDIR_BUS(7:0)

Status
Register
Control
Register
Control_Enable
BIDIR_PAD(7:0)
BIDIR_BUS(7:0)
Page 270
www.xilinx.com
1-877-XLX-CLAS
Status_Enable
Facilitator Guide
False Paths
Show Slide 297:
Answer
4) If a PERIOD constraint were placed on this design, what delay paths

would be constrained?
Paths between the control registers and the status registers would be
constrained
Paths from each register feeding back to itself are also constrained
Status
Register
Control
Register
Control_Enable
Status_Enable
BIDIR_PAD(7:0)
BIDIR_BUS(7:0)
Key Points
!
Because 3-state buffers are not path endpoints, all delay paths
through 3-state buffers can be unnecessarily constrained when
you use only global constraints. In this case, removing
constraints between the registers that drive the bus can be
useful.
www.xilinx.com
1-877-XLX-CLAS
Page 271
Facilitator Guide
False Paths
Show Slide 298:
Answer
5) If the goal is to optimize the input and output times without constraining
the paths between registers, what constraints are needed?
Enter OFFSET constraints in the Global tab

Define False Paths by Nets
Select the BIDIR_BUS[7:0] nets

Select the global PERIOD constraint to be ignored
Status
Register
Control
Register
Control_Enable
Status_Enable
BIDIR_PAD(7:0)
BIDIR_BUS(7:0)
Key Points
!
There are other ways to define the false paths:

Create a THRU point (BIDIR_BUS) that includes the
BIDIR_BUS[7:0] nets and define a path from All Flip-Flops
through BIDIR_BUS to All Flip-Flops.
Create a group (CONTROL) for the Control Register and a
group (STATUS) for the Status Register and define false
paths from CONTROL to STATUS and from STATUS to
CONTROL. This option only works if there are no other
paths between the registers that need to be covered by the
PERIOD constraint.
Page 272
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Show Slide 299:
Outline

Multicycle Paths
False Paths
Summary
Show Slide 300:
Miscellaneous Tab
Create area groups from TGs
Select nets to be routed on low-skew

resources
Use for high-fanout control

signals
Mark asynchronous registers
Good way to group logic without area

constraints
Prevents X propagation
during simulation
Assign individual registers to IOBs

Define initial values for storage elements
www.xilinx.com
1-877-XLX-CLAS
Page 273
Facilitator Guide
Key Points
!
The Map Process Properties option allows you to globally

merge registers into the IOBs. This option also allows you to
identify specific registers to be merged.
Marking a register as asynchronous will prevent an X value

from propagating during simulation when a setup/hold
violation occurs. This constraint has no effect on
implementation.
You do not need to mark clock nets with the

USELOWSKEWLINES constraint. Clock signals that do not use
global buffers automatically use the low-skew resources.
For more information on the constraints shown here, see the

Constraints Guide in the online documentation (Help
Software Manuals).
Show Slide 301:
Prorating Constraints
Prorating allows the tools to use the most accurate information
The implementation tools use the worst-case operating temperature and

voltage for your chosen device package (85 C for Commercial, 100 C for
Industrial)
Specify your own worst-case conditions
This will prorate the device delay characteristics to accurately reflect your
worst-case system conditions
Key Points
!
Page 274
Prorating constraints adds greater timing accuracy to the

implementation tools. The new worst-case timing delays will
then be applied when timing constraints are used.
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Key Points
!
If you prorate your constraints, make sure that you enter the
worst-case temperature and VCC that your device might ever
encounter.
Timing reports contain only the worst-case operating condition

delays. The Timing Analyzer can also create customized timing
reports for different worst-case operating conditions.
Show Slide 302:
Timing Constraint Priority
False paths
Highest
Must be allowed to override any

timing constraint
FROM THRU TO
FROM TO
Pin-specific OFFSETs
Group OFFSETs
Global PERIOD and OFFSETs
Groups of pads or registers

Lowest priority constraints
Lowest
Key Points
!
This is the way constraints are prioritized. Priority also explains

why the same path can be constrained multiple times. The more
specific the constraint, the higher the priority. Also note that the
value of the constraint has no effect on its priority.
www.xilinx.com
1-877-XLX-CLAS
Page 275
Facilitator Guide
Show Slide 303:
Timing Constraint
Interaction
Whenever a path is covered by more than one constraint, the tools must
choose which constraint to use for timing analysis
If the constraints are of different types, the highest priority constraint is
applied
If the constraints are of the same type (example: FROM TO), the decision
is more complex
Priority can be dictated with the PRIORITY keyword in the UCF
Values from 1000 to 1000

Lower number is higher priority
Example: TIMESPEC TS_01 = FROM src TO dest 7 ns PRIORITY 1;
Key Points
!
Page 276
If two constraints cover the same paths and have the same
priority level, the software follows a set of rules to determine
which constraint will be applied.
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Summary
Show Slide 304:
Outline

Multicycle Paths
False Paths
Summary
Show Slide 305:
Summary
Use a Slow/Fast Exception to constrain paths that cross between clock

domains
Identifying multicycle and false paths allows the implementation tools to
make appropriate trade-offs
These paths will use slower routing resources, which frees up fast routing
for critical signals
Prorating your operating conditions gives the tools the most accurate
picture of your design environment
In general, more specific constraints have a higher priority than less
specific constraints
www.xilinx.com
1-877-XLX-CLAS
Page 277
Summary
!
Constraints Guide
Page 278
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Facilitator Guide

Answers
1) What constraints need to be placed on this design to ensure it

will meet the performance objectives?
!
Global PERIOD constraint of 5 ns (or 200 MHz)
Multicycle path constraint of 5 x 4 = 20 ns (or 200 / 4 = 50 MHz)
2) How would you enter these constraints through the Constraints

Editor?
!
PERIOD constraint: Use the Global tab
Multicycle path constraint

Group the flip-flops in COUT14 by clock enable net (group
name: MSB)
Constrain from MSB to MSB
3) How do multicycle path constraints improve the performance of

your design?
!
They allow the implementation tools to place some logic further

apart and use slower routing resources.
4) If a PERIOD constraint were placed on this design, what delay

paths would be constrained?
!
Paths between the control registers and the status registers

would be constrained.
Paths from each register feeding back to itself are also

constrained.
www.xilinx.com
1-877-XLX-CLAS
Page 279
Facilitator Guide

Answers
5) If the goal is to optimize the input and output times without

constraining the paths between registers, what constraints are
needed?
!
Enter OFFSET constraints in the Global tab.
Define False Paths by Nets:

Select the BIDIR_BUS[7:0] nets
Select the global PERIOD constraint to be ignored
Transition to Lab 5: Achieving Timing Closure
Page 280
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide

Purpose

!
Use timing reports more effectively
Enter false path timing constraints (TIGs) by using the

Constraints Editor
Time
45 minutes
Process
This lab illustrates how to make path-specific timing constraints on

a design and use some of the advanced implementation options in
the ISE tools.
General Flow
!
Step 1: Evaluate the design performance with TIGs
Step 2: Remove the TIGs and implement the design
Step 3: Analyze the timing
www.xilinx.com
1-877-XLX-CLAS
Page 281
Facilitator Guide
Lab
!
Refer to the separate lab workbook for the Achieving Timing

Closure lab.
Transition to Advanced Implementation Options
Page 282
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide

Purpose

!
Time
30 minutes
Process
This module describes the advanced implementation options

available in the ISE tools.
Lessons
!
Introduction
Overview
Xplorer
Power Optimization
Summary
www.xilinx.com
1-877-XLX-CLAS
Page 283
Facilitator Guide
Introduction
Show Slide 306:
Advanced Implementation
Options
Show Slide 307:
Objectives
Increase design performance by using advanced MAP and Place & Route
options
Increase design performance by using the Xplorer tool
Save implementation time by using SmartGuide and partitions
Advanced Implementation Options - 307
Page 284
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Introduction
Show Slide 308:
Timing Closure
www.xilinx.com
1-877-XLX-CLAS
Page 285
Facilitator Guide
Overview
Show Slide 309:
Lessons
Overview
Advanced MAP and Place &
Route Options
Xplorer
Power Optimization
Summary
Show Slide 310:
Introduction
Xilinx recommends using the default options and global timing constraints
the first time you implement a design
If your design does not meet timing goals, follow the recommended flow
presented earlier
Early in the design cycle, examine ways of changing your HDL code
Confirm that good coding styles were used

Try synthesis options, such as retiming or adding pipeline stages, to reduce
logic levels
If you are early in the design cycle, you do not want to run a full
implementation every time a change is madethis will be time consuming
and frustrating
Increase the Place & Route effort level

Apply path-specific timing constraints for synthesis and implementation
Page 286
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Overview
Key Points
!
To check whether your timing constraints are reasonable before

running the Place & Route process, examine the Post-Map Static
Timing Report. The logic portion of your delay paths should
consume no more than 60 to 70 percent of the timing budget.
Show Slide 311:
When to Use
Advanced Options
If timing is still not met, consider using advanced MAP or Place & Route
(PAR) options
MAP: Perform timing-driven packing
Uses timing constraints to pack critical paths
PAR: Extra Effort
Xplorer is an automated method for trying different combinations of

implementation options
These options will increase the software runtime
This module discusses the expected trade-offs and benefits of each option
www.xilinx.com
1-877-XLX-CLAS
Page 287
Facilitator Guide

Show Slide 312:
Lessons
Overview
Advanced MAP and Place & Route
Options
Xplorer
Power Optimization
Summary
Show Slide 313:
Timing-Driven Packing
Timing constraints are used to optimize which pieces of logic are packed
into each slice
Normal (standard) packing is performed

PAR is run through the placement phase
Timing analysis analyzes the amount of slack in constrained paths
If necessary, packing changes are made to allow better placement
The output of MAP contains both mapping and placement information
The Post-Map Static Timing Report contains more realistic net delays
Place & Route runtime is reduced because some placement is already
performed
Page 288
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide

Show Slide 314:
Example
Originally, the flip-flops were packed together into a slice

After placement and timing analysis, the flip-flops are packed into different
slices to allow independent movement
Timing-Driven Pack
Standard Pack
FF1
FF1
FF2
FF2
Key Points
!
In this simple example, two flip-flops were originally packed

into one slice. They may share common inputs or the packing
may be necessary to fit the design into the target device.
During placement, it becomes clear that FF1 should move to the

top of the die and FF2 should move to the bottom (in order to
meet timing constraints).
If timing-driven packing is enabled, the design goes back into

the MAP process with this knowledge. The flip-flops will be
packed into two separate slices to allow independent
movement.
www.xilinx.com
1-877-XLX-CLAS
Page 289
Facilitator Guide

Show Slide 315:
Turning on
Timing-Driven Packing
Set the Property

Display Level to Advanced
Check Perform TimingDriven Packing and
Placement
Set other options if needed
Key Points
Page 290
To set the Property Display Level, open the Map Properties

dialog box and select Advanced from the Property Display
Level drop-down list at the bottom.
After you select Perform Timing-Driven Packing and

Placement, other options will become available. You can set the
Place & Route effort level that will be used (Map Effort Level),
whether to use Extra Effort (for placement, covered in the next
lesson), which Placer Cost Table to use (covered later), whether
to use register duplication to improve timing, and if global
optimization routine should be run.
Register Duplication: Duplicates registers to improve timing

when running timing-driven packing.
Global Optimization: This option directs MAP to perform

global optimization routines on the fully assembled netlist
before mapping the design. Global optimization includes logic
remapping and trimming, logic and register replication and
optimization, and logic replacement of 3-states.
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide

Show Slide 316:
Trade-Offs
Typical performance improvement: five to eight percent
Has the greatest effect on high-density designs when unrelated packing

has occurred
Density improvements are also seen
Look in the Map Report, Design Summary section
If no unrelated packing has occurred, performance improvement will be

minimal
Number of slices containing unrelated logic
Runtime for the MAP process always increases
Up to 200 percent
But you recover some of this increased runtime by saving runtime during
Place & Route
Key Points
!
Unrelated packing occurs when the software puts unrelated

logic into the same slice to fit the design into the target device.
This placement can affect performance because the pieces of
logic in the slice may need to be placed in different locations to
meet timing.
Timing-driven packing can fix this situation. Timing analysis

will show that the unrelated logic needs to be separated to meet
timing.
If no unrelated packing has occurred, the only change that

timing-driven packing can make to the design is to merge flipflops into IOBs to meet OFFSET constraints.
www.xilinx.com
1-877-XLX-CLAS
Page 291
Facilitator Guide

Show Slide 317:
PAR Extra Effort
Only available when the Place & Route effort level is set to High
Two settings: Normal and Continue on Impossible
Use the Normal setting only

Continue on Impossible will run until user break (Ctrl-C)
Typical performance improvement: four percent

Runtime for the Place & Route process always increases
Potential 200-percent increase or more
Show Slide 318:
Setting Extra Effort
Set the Place & Route

Property Display Level to
Advanced
Set Place & Route Effort
Level (Overall) to High
Set Extra Effort
Page 292
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide

Key Points
!
You can also set the Placer Effort and Router Effort separately.
If you set the Placer Effort to High, but leave the Router Effort
at Standard, the Extra Effort option will only be used during
placement. This trick can increase your productivity by
decreasing software runtime.
www.xilinx.com
1-877-XLX-CLAS
Page 293
Facilitator Guide
Xplorer
Show Slide 319:
Lessons
Overview
Options
Xplorer
Power Optimization
Summary
Show Slide 320:
Xplorer
Iterates through the implementation process, trying different combinations

of properties
Automatically stops when all timing constraints are met

Options used include
Overall Effort Level (MAP and PAR)

Timing-Driven Map (MAP)
Extra Effort Level (MAP and PAR)
Multi-Pass Place and Route (PAR)
Global Optimization (MAP)
Retiming (MAP)
Register Duplication (MAP)
Logic Optimization (MAP)
Optimization Strategy/Cover Mode (MAP)
Allow Logic Optimization Across Hierarchy (MAP)
Page 294
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Xplorer
Key Points
!
The combination of synthesis, MAP, and PAR options will vary

by device family. For the Virtex-4 FPGA, Xplorer uses all of
the options listed. For the Virtex-5 FPGA, Xplorer uses all but
Multi-Pass Place and Route.
Because your design can be effectively resynthesized when

using Xplorer, it is important to understand that some nodes
may be eliminated or renamed in the process, increasing the
difficulty of debugging and timing simulation. You can still use
the KEEP attribute in your HDL to maintain nodes for testing or
timing simulation. KEEP will not let any of these options
change the originally synthesized node.
Xplorer has 20 pre-assigned combinations of options for each

device family. These combinations have shown to be some of
the best-tested combinations. Xilinx recommends that you try
the default number of iterations for your device family to
determine which combination of options is best for your design.
Your results will vary by design, so there is no way to
determine which set of options will work best. If you have
enough time, you may wish to run all 20 combinations of
options.
Show Slide 321:
Xplorer Options
Overall Effort Level (MAP and PAR) Enables PAR to work longer and harder
Timing-Driven Map (MAP) Enables MAP to group timing-critical logic in the
same slice or CLB
Extra Effort Level (MAP and PAR) Even longer and harder
Multi-Pass Place and Route (PAR) Enables you to generate different results
with cost tables (not recommended for the Virtex-5 FPGA)
Global Optimization (MAP) Enables re-mapping, logic trimming, logic and
register duplication, and logic optimization
Retiming (MAP) Enables register migration
Register Duplication (MAP) Duplicates registers to reduce fanout
Logic Optimization (MAP) Duplicates logic to reduce logic levels
Optimization Strategy/Cover Mode (MAP) Controls how MAP assigns logic to
LUTs
Allow Logic Optimization Across Hierarchy (MAP) Last effort to reduce logic
levels
www.xilinx.com
1-877-XLX-CLAS
Page 295
Facilitator Guide
Xplorer
Key Points
Page 296
Timing Driven MAP Allows the router to use the fastest

routing resource available which surrounds each CLB.
Extra Effort Level Uses normal setting; runs continuously and

requires Ctrl + C to stop.
Multi-Pass Place and Route (MPPAR) Uses a different

algorithm to place and route the design. The caveat is that there
is no way to determine which of the 100 cost tables will work
best for your design. So when you try MPPAR, Xilinx
recommends saving all or most of your iterations so that you
can compare. Be aware that this basically runs additional PAR
iterations. Therefore, if your original PAR run was four hours
and you try 10 cost tables, you are setting your computer to
work for approximately 40 hours.
Global Optimization Enables logic remapping (grouping of

nodes into LUTs), logic trimming (removal), logic and register
replication (high fanout nets), and logic optimization (besides
Boolean optimization, XST can duplicate logic to reduce logic
levels).
Re-timing Enables register migration forward or backwards

on a timing-critical path with the intention of balancing a
timing-critical path.
Register Duplication Duplicates registers in an effort to

reduce the fanout of nets that are on a timing-critical path.
Logic Optimization Besides the standard Boolean

optimization techniques that are common in synthesis, XST can
duplicate logic to reduce logic levels.
Optimization Strategy/Cover Mode Controls how MAP

assigns logic to LUTs. The Area option reduces the overall
number of LUTs in the design. The Speed option reduces the
number of logic levels. The Balanced option blends the two
modes.
Logic Optimization Across Hierarchy/Ignore Keep Hierarchy

Although Xilinx recommends that you maintain design
hierarchy (so that you can maintain more node names for
debugging and timing simulation) Xplorer canas a last
resortselectively remove a designs hierarchy.
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Xplorer
Show Slide 322:
Running Xplorer
Right-click Implement Design

and select Properties
Select Xplorer Properties
Select Timing Closure from the
Xplorer Mode drop-down list
Set other options and click OK
Double-click Implement Design
Key Points
!
Xplorer properties:
Xplorer Mode: Select Timing Closure to enable Xplorer.
Turn Off Xplorer After Run Completes: By default, after
Xplorer completes, the mode is set back to Off so that the
next implementation will not use Xplorer. Select No to
ensure that Xplorer is used every time the implementation
process is run.
Maximum Number of Iterations: Up to 20 iterations can be
run. Xplorer will stop when timing closure is achieved or
after the maximum number of iterations.
Enable Retiming: Available for Virtex-4 and Virtex-5 FPGA
designs only. This option allows Xplorer to use the retiming
option during the MAP process to move registers forward
or backward to balance the delays between timing paths.
Macro Search Path: This is the same as the Translate option.
www.xilinx.com
1-877-XLX-CLAS
Page 297
Facilitator Guide
Xplorer
Key Points
Other Xplorer Command Line Options: Xplorer can also be
run from the command line. Xplorer also has an additional
mode called Best Performance Mode where you are able to
specify the name of a clock signal. This option does not
allow you to specify more than one clock and it does not
allow you to optimize the entire design, just the logic on one
clock domain. Use this option at your own discretion.
Show Slide 323:
Xplorer Results
Xplorer compares the results of all iterations

Best result is saved to the project directory
All other results are deleted (unless you run from the command line)
Information on all iterations is available in the Design Summary screen

Xplorer also allows you to set the best options (found after running
Xplorer) and then run Multi-Pass Place and Route (MPPR)
Key Points
Page 298
Xplorer uses the timing score to compare results. The timing

score is the total number of ps of all constraints that are missed.
A timing score of 0 indicates that all timing constraints were
met.
Running MPPAR is not recommended for the Virtex-5 FPGA.
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide

Show Slide 324:
Lessons
Overview
Options
Xplorer
Power Optimization
Summary
Show Slide 325:
SmartCompile
Two strategies for maintaining some PAR results while still making some
changes to a design
Partitions are used to maintain implementation results while still making design
changes
SmartGuide is used to maintain timing results while still making design changes
This is an instance-based method for preserving hierarchical blocks in a design

You do not have to re-verify a preserved partition
This is a timing-based method for preserving parts of a design that have not changed
Implementation tools have the flexibility to not maintain logic if it helps other paths to meet
timing constraints
Why do you care?
Saves verification time

Faster implementation time
www.xilinx.com
1-877-XLX-CLAS
Page 299
Facilitator Guide

Key Points
Page 300
One of the biggest challenges is preserving timing when a

design is modified. For example, you have part of a design with
a critical timing path that has been met with a great deal of
effort (multiple implementations with different options and/or
detailed timing constraints). But you also have another part of
the design that is modified and the critical timing path fails to
meet timing. This would normally require you to modify your
constraints and/or reimplement because of the new design
changes.
Verification is reduced with both of these flows because, if a

block of the design is exactly preserved, it does not need to be
re-verified. Both of these flows allow you to maintain parts of
their design.
Preserving a block of the design rather than reimplementing is

generally faster. There are edge cases where this will not be true
due to the interaction between the preserved portion of the
design and the new/modified portion of the design being
implemented.
These two design preservation techniques are not compatible

with each other. A design can use one or the other, but not both
at the same time.
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide

Show Slide 326:
SmartGuide
Timing Preservation in the Midst of Changes
Physical Layout
Use SmartGuide when you want to minimize the impact of a

small change
Turn on SmartGuide by right-clicking the top-level of your
design hierarchy and selecting Use SmartGuide
Also supported with TCL and command line scripts
SmartGuide
Small
Change
Physical Layout
With Small Design
Change
Key Points
!
This flow requires you to have implemented a successful

design, which means that your original design should meet
your timing constraints. That is, your good timing results will
be maintained and paths that fail to meet your timing
constraints will not be maintained.
This flow allows the implementation tools to preserve as much

as possible that which meets your timing constraints but does
not guarantee that all of the paths that meet your timing
constraints will be preserved. Some paths may be changed to
help other failing paths meet their timing constraints.
This flow will save a significant amount of implementation

time.
SmartGuide information will be included in the MAP and PAR

reports generated by the implementation tools.
www.xilinx.com
1-877-XLX-CLAS
Page 301
Facilitator Guide

Key Points
!
Note that for a LUT to be guided its equation can vary between
iterations. After the new logic is added, the tools complete a
clean-up phase where critical paths from the new and the old
logic may be re-placed and routed to help meet timing
constraints. This phase greatly improves the chances that the
tools will meet all of your timing objectives.
SmartGuide can be used after a first implementation has been

completed.
Show Slide 327:
Partitions
Top
Implementation Preservation
Set Partitions
Partitions guarantee exact preservation

of implementation results
A1
A2
Logical Design (HDL)
Provides control over what is preserved
Assert partitions by right-clicking each

level of hierarchy to be maintained and
selecting New Partition
2
Implement Design
Original Physical Layout
3
Make Changes
- For example: C
Modified; A, B
Preserved
Physical Layout after change
Key Points
Page 302
This flow requires you to have implemented a successful

design, which means that your original design should meet
your timing constraints. Any block that you want preserved (by
defining a partition) will be preserved exactly.
The remaining logic that is not preserved will then be

reimplemented, thus saving implementation time.
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide

Key Points
!
Good hierarchical design practices must still be used. This flow

is also supported with Tcl and command line scripts. You do
not have to re-verify a preserved partition. Partitions are set on
hierarchical blocks.
Partitions can also be set when using Synplify Pro software. Just
set a compile point for each level of hierarchy that will be a
partition.
Partition information is included with the XST, MAP, and PAR

reports generated by the implementation tools.
Partitions must be set before the first synthesis of your design

to ensure that the original synthesis output can be matched.
By placing a partition on a hierarchical boundary, you can

guarantee that the interface on a partition boundary will not
change. Yielding a timing-critical delay path that crosses a
partition is possible. When this occurs, having a registered
output at the partition is recommended.
www.xilinx.com
1-877-XLX-CLAS
Page 303
Facilitator Guide
Power Optimization
Show Slide 328:
Lessons
Overview
Options
Xplorer
Power Optimization
Summary
Show Slide 329:
Power Optimization
PAR has a power reduction

option
Optimizes routing to reduce
power consumption
Tries to reduce the overall

routing used at the expense of
timing and implementation time
XST has a power optimization

switch
Helps map logic to block RAMs

and DSP slices which use less
power (Virtex-4 and Virtex-5
FPGAs)
Page 304
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Power Optimization
Key Points
!
The PAR power reductions switch will try to route a high

fanout net so that as much of the common wire is shared as
possible.
www.xilinx.com
1-877-XLX-CLAS
Page 305
Facilitator Guide
Summary
Show Slide 330:
Lessons
Overview
Options
Xplorer
Power Optimization
Summary
Show Slide 331:
1) Under what conditions will timing-driven packing have the most impact on
design performance?
2) What is the trade-off when using PAR with the Extra Effort option?
3) How does Xplorer help to improve design performance?
Page 306
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Summary
Show Slide 332:
Summary
The timing closure flow still applies
Make certain that you have tried the options included in the timing closure
flow diagram if you have timing problems
Xplorer has a number of Map and PAR options it can run for you
SmartGuide and partitions enable you to save successful results and
reduce your implementation time

!
Online help
Click the Help button in the Process Properties window
Development System Reference Guide: MAP and PAR chapters

Documentation can also be installed on your local machine
Application Notes
Help Xilinx On the Web Application Notes
Application Note XAPP918: Incremental Design Reuse and
Partitions
www.xilinx.com
1-877-XLX-CLAS
Page 307
Facilitator Guide

Answers
1) Under what conditions will timing-driven packing have the

most impact on design performance?
!
When unrelated logic is packed together into the same slice,

which usually occurs with high device utilization (usually over
70%).
2) What is the trade-off when using PAR with the Extra Effort
option?
!
PAR runtime can increase by a factor of two or more.
3) How does Xplorer help to improve design performance?

!
By automatically iterating through different implementation

options.
Transition to Lab 6: Designing for Performance
Page 308
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide

Purpose

!
Utilize the Overall Effort Level, Timing-Driven Packing, and

Extra Effort Level implementation options to improve design
performance
Utilize Multi-Pass Place & Route (MPPR) to try and achieve

timing closure
Time
30 minutes
Process
This lab illustrates how to improve design performance and

maximize results solely with advanced implementation options.
General Flow
!
Step 1: Implement with higher effort levels
Step 2: Implement with MPPR
Step 3: Analyze the MPPR timing
www.xilinx.com
1-877-XLX-CLAS
Page 309
Facilitator Guide
Lab
!
Refer to the separate lab workbook for the Designing for

Performance lab.
Transition to Power Estimation
Page 310
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Power Estimation
Power Estimation
Purpose

!
List the three phases of the design cycle where power

calculations can be performed
Estimate power consumption by using the XPower Estimator

spreadsheet
Estimate power consumption by using the XPower Analyzer

software
Time
30 minutes
Process
This optional module describes the power estimation capabilities

included with the ISE tools.
Lessons
!
Introduction
Overview
XPower Estimator
Summary
www.xilinx.com
1-877-XLX-CLAS
Page 311
Power Estimation
Facilitator Guide
Introduction
Show Slide 333:
Power Estimation
Show Slide 334:
Objectives
List the three phases of the design cycle where power calculations can be
performed
Estimate power consumption by using the XPower Estimator spreadsheet
Estimate power consumption by using the XPower Analyzer software
Power Estimation - 334
Page 312
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Power Estimation
Overview
Show Slide 335:
Lessons
Overview
XPower Estimator
Using the XPower Analyzer
Software
Summary
Show Slide 336:
Power Consumption
Overview
As devices become larger and faster,

power consumption goes up
First-generation FPGAs had
Lower performance
Lower power requirements
No package power concerns
Package Power
Limit
PMAX
Todays FPGAs have
Much higher performance

Higher power requirements
Package power limit concerns
High Density
Low
Density
Real-World Design
Power Consumption
Performance (MHz)
www.xilinx.com
1-877-XLX-CLAS
Page 313
Power Estimation
Facilitator Guide
Overview
Key Points
!
The first generation of FPGAs was relatively small in size and

slow in performance. Power consumption rarely exceeded the
operating envelope of commonly available packages; however,
given the density and performance levels of the new generation
of FPGA devices, power consumption issues can no longer be
ignored.
Selecting the correct packagein particular, the ability to

dissipate heat efficiently away from the siliconis now also an
important design issue.
The Virtex family of FPGAs has gone one step further by

incorporating thermal management on chip, allowing for active
monitoring of the silicon via dedicated pins.
Show Slide 337:
Power Consumption
Concerns
High-speed and high-density designs require more power, leading to

higher junction temperatures
Package thermal limits exist
125 C for plastic

150 C for ceramic
Power directly limits
System performance
Design density
Package options
Device reliability
Key Points
Page 314
Junction temperature within an FPGA device is a function of

the power consumption and thermal resistance of the selected
package.
Several factors determine power consumption within an FPGA,

including supply voltage, system speed, and device utilization.
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Power Estimation
Overview
Key Points
!
As devices get bigger and faster, power consumption can

become a limiting factor in determining device utilization and
performance. For example, you may not be able to use all of the
resources of a device or run the FPGA as fast as possible
without risking reliability problems because of overheating.
Show Slide 338:
Estimating Power
Consumption
Estimating power consumption is a complex calculation
Power consumption of an FPGA is almost exclusively dynamic

Power consumption is dependent on design and is affected by
Output loading
System performance (switching frequency)
Design density (number of interconnects)
Design activity (percent of interconnects switching)
Logic block and interconnect structure
Supply voltage
www.xilinx.com
1-877-XLX-CLAS
Page 315
Power Estimation
Facilitator Guide
Overview
Show Slide 339:
Estimating Power
Consumption
Power calculations can be performed at three distinct phases of the

design cycle
Concept phase: A rough estimate of power can be calculated based on

estimates of logic capacity and activity rates
Design phase: Power can be calculated more accurately based on detailed

information about how the design is implemented in the FPGA
System integration phase: Power is calculated in a lab environment
Use the XPower Estimator spreadsheet
Use the XPower Analyzer software

Use actual instrumentation
Accurate power calculation at an early stage in the design cycle will result
in fewer problems later
Key Points
Page 316
Estimating power consumption usually has one of two goals:

thermal reliability evaluation or power-supply sizing.
The XPower Analyzer software bridges the gap between the

XPower Estimator and lab measurements by using the
implemented design files to estimate power consumption more
closely.
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Power Estimation
Overview
Show Slide 340:
Activity Rates
Accurate activity rates (also known as toggle rates) are required for
meaningful power calculations
Clocks and input signals have an absolute frequency
Synchronous logic nets use a percentage activity rate
One hundred percent indicates that a net is expected to change state on

every clock cycle
Allows you to adjust the primary clock frequency and see the effect on
power consumption
Can be set globally to an average activity rate on groups or individual nets
Logic elements also use a percentage activity rate
Based on the activity rate of output signals of the logic element

Logic elements have capacitance
www.xilinx.com
1-877-XLX-CLAS
Page 317
Power Estimation
Facilitator Guide
XPower Estimator
Show Slide 341:
Lessons
Overview
XPower Estimator
Software
Summary
Show Slide 342:
XPower Estimator
www.xilinx.com/power
Excel spreadsheets with power estimation formulas built in
Enter design data in white boxes

Power estimates are shown in gray boxes
Sheets
Summary (device totals)

Logic and I/O
Block RAMs and FIFOs
DCMs and PLLs
DSP48
PPC and MGT
Page 318
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Power Estimation
XPower Estimator
Key Points
!
XPower Estimators are a set of Excel spreadsheets that can be

found on the Web at www.xilinx.com/power. The spreadsheets
currently support Virtex-4, Virtex-5, and Spartan-3E FPGAs.
This Web page also provides access to power estimation tools

for older Xilinx device families. Some tools are Web-based, and
some are Excel spreadsheets using a different format than the
XPower Estimators.
Show Slide 343:
Web Power Tool:

Summary and Quiescent
www.xilinx.com
1-877-XLX-CLAS
Page 319
Power Estimation
Facilitator Guide
XPower Estimator
Show Slide 344:
Web Power Tool:

Logic, Memory, and DSP48
Show Slide 345:
Web Power Tool:

DCM and I/O
Page 320
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Power Estimation

Show Slide 346:
Lessons
Overview
XPower Estimator
Software
Summary
Show Slide 347:
What is XPower Software?
A utility for estimating the power consumption and junction temperature of

FPGA and CPLD devices
Reads an implemented design (NCD file) and timing constraint data
You supply activity rates
Clock frequencies
Activity rates for nets, logic elements, and output pins
Capacitive loading on output pins
Power supply data and ambient temperature
Detailed design activity data from simulation (VCD file)
The XPower tool calculates the total average power consumption and
generates a report
www.xilinx.com
1-877-XLX-CLAS
Page 321
Power Estimation
Facilitator Guide

Key Points
!
The XPower tool is accurate to within +/ 10 percent, given

accurate activity rates. The XPower software can only calculate
average power and cannot predict power spikes that can occur.
Supported device families:

FPGAs: Virtex-5, Virtex-4, Spartan-3E, Virtex, Virtex-E,
Virtex-II, Virtex-II Pro, Spartan-II, Spartan-IIE, and
Spartan-3 devices
CPLDs: CoolRunner XPLA3, CoolRunner-II, and
CoolRunner-IIS devices
A Value Change Dump (VCD) file is created by a simulation

tool (Mentor Graphics ModelSim, for example). Defined by
Verilog IEEE Standard 1364, the VCD file contains information
about signal or variable value changes. The VCD file can be
read by the Xilinx XPower tool to provide accurate power
estimation.
In order for the XPower tool to match instance and net names
from the VCD file to items in the NCD file, the VCD file must
be from a post-Place & Route simulation.
Show Slide 348:
Running XPower Software
Expand Implement Design

Place & Route
Double-click XPower Analyzer
to launch the XPower
tool in interactive mode
Use the Generate Power Data
process to create reports using
VCD files or TCL scripts
Page 322
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Power Estimation

Show Slide 349:
XPower Software GUI

Summary
Key Points
!
To launch the XPower tool from the Project Navigator in the

ISE software, expand the Implement Design process, expand
Place & Route, and double-click XPower Analyzer.
The upper-left portion of the GUI contains the Summary Bar.

The Summary displays estimated junction temperature,
quiescent power, and dynamic power.
On the left is the View window, which allows you to browse

the Thermal information, power supply information, and
settings for your design.
The Thermal Information window allows you to modify the

airflow, ambient temperature, and the theta-j-a of your device
package. This is useful for what-if analysis. The Voltage Source
Information window allows you to customize the internal
voltage for your component. The Settings window allows you
to modify your average toggle rates.
The By Type option allows you to categorize your designs

power consumption into the device resources your design uses.
This allows you to segment your power consumption by
signals, IO, or hierarchy.
www.xilinx.com
1-877-XLX-CLAS
Page 323
Power Estimation
Facilitator Guide

Key Points
!
The Main View window on the right displays power calculation

data for the currently selected data view. Reports are also
displayed in this window.
At the bottom is the History window, which displays text

messages.
Show Slide 350:
XPower Software Options

Settings
Report type
Default activity rate
Page 324
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Power Estimation

Show Slide 351:
XPower Software Summary

Report
To obtain this summary report, select

Tools Generate Summary Report
or click the icon in the horizontal toolbar
Power summary
| I(mA) | P(mW) |
---------------------------------------------------------------Total estimated power consumption |
|
206 |
--Total Vccint 1.20V |
69 |
83 |
Total Vccaux 2.50V |
45 |
113 |
Total Vcco33 3.30V |
3 |
10 |
--Inputs |
0 |
0 |
Outputs |
Vcco33 |
0 |
0 |
Signals |
0 |
0 |
--Quiescent Vccint 1.20V |
69 |
83 |
Quiescent Vccaux 2.50V | 45 |
113 |
Quiescent Vcco33 3.30V |
3 |
10 |
Thermal summary
---------------------------------------------------------------Estimated junction temperature
|
29C
Ambient temp |
25C
Case temp |
28C
Theta J-A |
21C/W
|
|
|
|
www.xilinx.com
1-877-XLX-CLAS
Page 325
Power Estimation
Facilitator Guide
Summary
Show Slide 352:
Lessons
Overview
XPower Estimator
Software
Summary
Show Slide 353:
1) Compare the total estimated power created by the XPower Analyzer

software and XPower Estimator tools. Are they close to one another?
2) Power estimations are typically made during which three phases of the
design cycle?
3) What methods can be used to enter activity rates into the XPower
Analyzer software?
Page 326
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Power Estimation
Summary
Show Slide 354:
Summary
Power calculations can be performed at three distinct phases of the

design cycle
Concept phase: (XPower Estimator spreadsheet)

Design phase: (XPower Analyzer software)
System integration phase: (Lab measurements)
Accurate power calculation at an early stage in the design cycle will result
in fewer problems later
The XPower Analyzer software is a utility for estimating the power
consumption and the junction temperature of FPGA and CPLD devices
The XPower Analyzer software uses activity rates to calculate total
average power consumption

!
XPower Analyzer help

Help Help Topics
Documentation can also be installed on your local machine
XPower Estimator spreadsheets, Application Notes, XPower

FAQ
Help Xilinx On the Web Xilinx Power Tools Web
Page
IC Packaging recorded e-learning module

www.xilinx.com/support/training/rel/packaging.htm
www.xilinx.com
1-877-XLX-CLAS
Page 327
Power Estimation
Facilitator Guide

Answers
1) Compare the total estimated power created by the XPower

Analyzer software and XPower Estimator tools. Are they close to
one another?
!
Yes
2) Power estimations are typically made during which three phases

of the design cycle?
!
Concept phase: A rough estimate based on estimated logic

capacity and activity rates
Design phase: A more accurate estimate based on information

about how the design is implemented in the FPGA
System integration phase: Actual power usage is measured in a

lab environment
3) What methods can be used to enter activity rates into the

XPower Analyzer software?
!
Load a VCD file
Manually enter activity rates
Specify default activity rates
Transition to Lab 7: FPGA Editor Demo
Page 328
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide

Purpose
After participating in this demonstration, you will be able to:

!
Locate logic and nets in the FPGA Editor
View the contents of slices and Input/Output Blocks (IOBs)
Add a probe
Time
30 minutes
Process
This optional demonstration illustrates how to locate logic, view

the contents of an FPGA design, and insert a probe with the FPGA
Editor.
General Flow
!
Step 1: Open the design in the FPGA Editor
Step 2: View the slice and IOB contents
Step 3: Add a probe
www.xilinx.com
1-877-XLX-CLAS
Page 329
Facilitator Guide
Lab
!
Refer to the separate lab workbook for the FPGA Editor

demo/lab.
Transition to ChipScope Pro Software
Page 330
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide

Purpose

!
Describe the value of the ChipScope Pro software
Describe how the ChipScope Pro software works
List what cores are available
Use the Core Generator and Core Inserter tools
Plan for and perform debugging with the ChipScope Pro

software
Time
30 minutes
Process
This optional module describes how to use the Core Inserter and
Core Generator tool flows and plan for debugging with the
ChipScope Pro software.
Lessons
!
Introduction
Importance of Debug
Design Flows
Summary
www.xilinx.com
1-877-XLX-CLAS
Page 331
Facilitator Guide
Introduction
Show Slide 355:
Show Slide 356:
ChipScope Pro Software Lab

Logistics
To participate in the lab you must have the following
Spartan-3E FPGA starter kit
ChipScope Pro software 10.1 installed

ISE 10.1 software installed
Spartan-3E FPGA starter board, power supply, and configuration cable
Take out and set up your Spartan-3E FPGA board
Verify that you have power

SPARTAN-3E STARTER KIT and www.xilinx.com/s3estarter should scroll
across the LCD
ChipScope Pro Software - 356
Page 332
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Introduction
Show Slide 357:
Objectives
Describe the value of the ChipScope Pro software

Describe how the ChipScope Pro software works
List what cores are available
Use the Core Generator and Core Inserter tools
Plan for and perform debugging with the ChipScope Pro software
www.xilinx.com
1-877-XLX-CLAS
Page 333
Facilitator Guide
Importance of Debug
Show Slide 358:
Lessons
Importance of Debug
Design Flows
Summary
Show Slide 359:
What Engineers are Saying
FPGA designs are getting more complex
Designs are getting faster

Design times are getting shorter
Debug and verification is more challenging
Debug and verification consume a significant portion* of FPGA design time

Debug and verification need to be easier and integrated into the FPGA
design flow
*An FPGA design survey conducted by Xilinx indicates that FPGA debug and verification accounts for
nearly half of FPGA design time
Page 334
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Importance of Debug
Show Slide 360:
Logic of Debug
Create Design
Engineers are trained to solve problems

Debug is problem solving
Break a problem into basic parts
Remove or reduce variables
and variation
Predict and verify
Debug is an iterative process

Verification is a component
of debug
Modify Design
Probe
Design
Identify Fix
Analyze
Debug Data
Confirming no problems remain
Reconfigurable nature of FPGAs enables

an iterative debug process
Verify Design
Show Slide 361:
Xilinx ChipScope Pro Software

Dramatically Shortens Debug and Verification
Works the way you solve problems
Breaks a problem into basic parts

Removes variation introduced by
external debug solutions
Enables a very fast, iterative
process of prediction and verification
Provides what you have requested
Shrink
Shrink overall
overall design
design
time
time by
by 25%
25%
Final Device
ChipScope Pro 20%
OnOn-Chip Verification of
Design
and Debug Tool Time
Reduction of debug and verification time

A powerful tool that is easy to use
Focus on solving the problem, not on
learning the tool
Integrated part of the Xilinx FPGA design
flow
40%
of
Design
Time
Design
Implementation
Design
Specification
www.xilinx.com
1-877-XLX-CLAS
Page 335
Facilitator Guide

Show Slide 362:
Lessons
Importance of Debug
Design Flows
Summary
Show Slide 363:
What is the ChipScope Pro

Software?
Tailored debug and verification cores

Efficient core generation and insertion tools
Total control via JTAG
Page 336
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide

Show Slide 364:
Multiple Debug Cores to Address

Different Debug Challenges
Integrated Logic Analysis (ILA)
Core
Virtual Input/Output (VIO)

Core
Virtual inputs and outputs
Stimulate logic with pulse trains
OPB GPIO
Bridge
IBA/PLBv46-specific bus
analysis core integrated with
EDK
IBA/OPB and IBA/PLB still
supported
Protocol detection
Debug and verify control,
address, and data buses
PLB Bus
Access internal nodes and signals

Debug and verify signal behavior
Define detailed trigger conditions
Agilent Trace Core 2 (ATC2)
OPB Bus
Integrated Bus Analysis (IBA)

Core
User Logic
Arbiter
Aurora
Agilent-created core enabling

on-chip debug of Xilinx FPGAs via
Agilent FPGA Dynamic Probing
OPB SDRAM
View cores as virtual test headers

headers
placed anywhere in the design
Show Slide 365:
Core Resources
ChipScope Pro software cores utilize FPGA resources
For what?
You must leave room for the ChipScope Pro software cores in the FPGA
Block RAM: trigger and data storage

Slice logic: trigger comparisons
This may require using a larger part in the same package as you will use in
production
ChipScope Pro software 10.1 includes a built-in resource estimator
www.xilinx.com
1-877-XLX-CLAS
Page 337
Facilitator Guide

Core Resources
Depth
256
Depth
512
Depth
1024
Depth
2048
Depth
4096
1 block RAM
15
2 block RAMs
31
15
4 block RAMs
63
31
15
8 block RAMs
127
63
31
15
16 block RAMs
255
127
63
31
15
32 block RAMs
255
127
63
31
64 block RAMs
255
127
63
128 block RAMs
255
127
256 block RAMs
255
text
Show Slide 366:
Using ChipScope Pro

Software
or
Core
Inserter
ChipScope
ChipScopePro
Pro
Core
CoreGenerator
Generator
Attach internal nodes for

viewing to the ChipScope
Pro software core
Generate the ChipScope Pro
software cores by using the
ChipScope Pro Core
Generator or Core Inserter
tools
Instantiate
InstantiateCores
Coresinto
into
Source
SourceHDL
HDL
Connect
ConnectInternal
InternalSignals
Signals
to
toCore
Core(in
(inSource
SourceHDL)
HDL)
Place and route the design with

the Xilinx ISE implementation
tools
Download the bitstream to the
device under test and analyze
the design with the ChipScope
Pro software
Page 338
Core
Generator
Place ChipScope Pro

software cores into the
design
ChipScope
ChipScopePro
ProCore
Core
Inserter
(intonetlist)
netlist)
Inserter(into
Synthesize
Synthesize
Implement
Implement
Download
Downloadand
andDebug
Debug
Using
UsingChipScope
ChipScopePro
ProSoftware
Software
www.xilinx.com
1-877-XLX-CLAS
Synthesize
Synthesize
Facilitator Guide

Show Slide 367:

ICON Core
ICON (Integrated Control) core: This core controls up to 15 capture cores
The ICON core interfaces between the JTAG interface and the capture
cores
Capture cores: customizable cores for creating triggers and data storage
Customizable number, width, and storage of trigger ports
ILA (Integrated Logic Analyzer) core: capture core for HDL designs
ILA/ATC (Integrated Logic Analyzer with Agilent Trace) core: similar to the ILA
core, except data is captured off-chip by the Agilent Trace Port Analyzer
IBA/OPB (Integrated Bus Analyzer for CoreConnect On-Chip Peripheral Bus)
core: capture core for debugging CoreConnect OPB buses
IBA/PLB (Integrated Bus Analyzer for CoreConnect Processor Local Bus)
core: similar to the IBA/OPB core, except for the PLB bus
IBA/PLBv46 supported through EDK
VIO (Virtual Input/Output) core: define and generate virtual I/O ports
Show Slide 368:
ChipScope Pro Software ILA

Core
User-selectable, one to four trigger ports
Up to 256 channels per trigger port

Multiple match units on the same trigger port
Up to 16 match units
Trigger condition sequencer
For example, 4 trigger ports, 4 match units

each = 16 match conditions
Defines complex trigger sequences that include

up to 16 states or levels
www.xilinx.com
1-877-XLX-CLAS
Page 339
Facilitator Guide

Show Slide 369:
Things to Know About ILA

Cores
Integrated Logic Analyzer (ILA) cores can be added with either the Core
Generator or Core Inserter tools
A design can contain up to 16 ILA cores
Maximum speed of the ILA core
7.1.01i (H.39)
Slowest Middle
Fastest
Speed
Speed
Speed
Grade
Grade
Grade
176 MHz 202 MHz 240 MHz
247 MHz 276 MHz 311 MHz
155 MHz 177 MHz
N/A
152 MHz 177 MHz
N/A
275 MHz 322 MHz 374 MHz
Device
2v1000a
2vp7a
3s400b
3s500e
4vlx25c
6.3.03i (G.38)
Slowest Middle
Fastest
Speed
Speed
Speed
Grade
Grade
Grade
232 MHz 267 MHz 310 MHz
267 MHz 307 MHz 343 MHz
163 MHz 187 MHz
N/A
154 MHz 177 MHz
N/A
246 MHz 289 MHz
N/A
a) Performance degradation due to non-optimal path chosen by ISE software tools (Map CR205561)
b) Performance degradation due to new Spartan-3 FPGA speed files and minor path routing differences
c) Performance improvement due to new Virtex-4 FPGA speed files (including new -12 speed grade)
Show Slide 370:
ChipScope Pro Software VIO

Core
Insert virtual pins into your design
Input or output
Synchronous or asynchronous
Up to 256 bits each
System clock or JTAG clock
Inputs are virtual LEDs
Outputs are virtual DIP switches
Different refresh rates are available

Force value or pulse train into the
FPGA
Page 340
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide

Show Slide 371:
Things to Know About VIO

Cores
Can only be added with the ChipScope Pro Core Generator tool
Uses no block RAM, only logic
Inputs are like LEDs, for examining signals
Outputs are switches or pushbuttons, for driving signals
www.xilinx.com
1-877-XLX-CLAS
Page 341
Facilitator Guide
Design Flows
Show Slide 372:
Lessons
Importance of Debug
Design Flows
Summary
Show Slide 373:
Core Inserter Flow
Core Inserter inserts cores

directly into the netlist
HDL code is untouched

Only post-synthesis nodes
are available
Bypass this tool to remove
cores
Inserter must perform the
first portion of translate
Core generation and
insertion are done together
ChipScope Pro Core
Inserter tool is run from
within Project Navigator
Page 342
ChipScope
ChipScopePro
Pro
Core
CoreGenerator
Generator
Instantiate
InstantiateCores
Coresinto
into
Source
SourceHDL
HDL
Connect
ConnectInternal
InternalSignals
Signals
to
toCore
Core(in
(inSource
SourceHDL)
HDL)
ChipScope
ChipScopePro
ProCore
Core
Inserter
Inserter(into
(intonetlist)
netlist)
Synthesize
Synthesize
Implement
Implement
Download
Downloadand
andDebug
Debug
Using
UsingChipScope
ChipScopePro
ProSoftware
Software
www.xilinx.com
1-877-XLX-CLAS
Synthesize
Synthesize
Facilitator Guide
Design Flows
Show Slide 374:
Core Generator Flow
Generate cores that are

instantiated directly into the
HDL
Allows access to all HDL

nodes
Requires changes to the
code
Must comment out cores to
remove them
Uses standard
implementation flow
Core generation and
insertion done separately
ChipScope
ChipScopePro
Pro
Core
CoreGenerator
Generator
Instantiate
InstantiateCores
Coresinto
into
Source
SourceHDL
HDL
Connect
ConnectInternal
InternalSignals
Signals
to
toCore
Core(in
(inSource
SourceHDL)
HDL)
Synthesize
Synthesize
ChipScope
ChipScopePro
ProCore
Core
Inserter
Inserter(into
(intonetlist)
netlist)
Synthesize
Synthesize
Implement
Implement
Download
Downloadand
andDebug
Debug
Using
UsingChipScope
ChipScopePro
ProSoftware
Software
www.xilinx.com
1-877-XLX-CLAS
Page 343
Facilitator Guide
Summary
Show Slide 375:
Lessons
Importance of Debug
Design Flows
Summary
Show Slide 376:
Summary
Shorten debug time by up to 50 percent
Break the problem into manageable parts

ChipScope Pro software enables rapid iteration
Add ChipScope Pro software cores at any time
Specialized cores allow you to focus on solving problems
Debug in three simple steps

ILA for viewing results
VIO for driving changes
Minimal impact to FPGA design
Design at system speed

Optimized cores consume minimal FPGA resources
Page 344
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Summary
!
www.xilinx.com/chipscopepro
View recorded ChipScope Pro software
product demos
Access a 60-day free evaluation version
of the ChipScope Pro tools
Access ChipScope Pro software
documentation (user guide, at-a-glance summary of
features)
Obtain information on Agilent FPGA Dynamic Probe
technology (combine on-chip debug with the power of a
logic analyzer)
Transition to Lab 8: ChipScope Pro Software
www.xilinx.com
1-877-XLX-CLAS
Page 345
Facilitator Guide

Purpose

!
Use the Core Inserter tool to add ChipScope Pro software

cores to an existing design
Use the ChipScope Pro Analyzer tool to configure an FPGA, set

trigger conditions, analyze, and debug a design
Time
60 minutes
Process
This optional lab illustrates how to use the ChipScope Pro software
to add the Analyzer ILA core and prepare for debugging.
General Flow
Page 346
Step 1: Download the non-working design
Step 2: Create ChipScope Pro software cores
Step 3: Debug the design
Step 4: Examine resource utilization
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Lab
!
Refer to the separate lab workbook for the ChipScope Pro

Software lab.
Transition to Course Summary
www.xilinx.com
1-877-XLX-CLAS
Page 347
Course Summary
Facilitator Guide
Course Summary
Purpose
This module reviews day two of the course and provides a

summary of the course.
Time
10 minutes
Process
This module reviews day two of the course and provides a

summary of the course.
Lessons
!
Page 348
Course Summary
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Course Summary
Course Summary
Show Slide 377:

Course Summary
Show Slide 378:
Day Two Review
How can you use the Timing Analyzer to improve design performance?
How do path-specific timing constraints help you to meet your

performance objectives?
What advanced software settings can you use to increase performance?
Course Summary - 378
www.xilinx.com
1-877-XLX-CLAS
Page 349
Course Summary
Facilitator Guide
Course Summary
Show Slide 379:
Day Two Review Answers
How can you use the Timing Analyzer to improve design performance?
Use the detailed path descriptions to find the root cause of timing errors
Cross-probe to the Floorplan Editor to view the placement of logic
How do path-specific timing constraints help you to meet your

performance objectives?
Multicycle and false paths provide the Xilinx implementation tools greater
flexibility in meeting your timing objectives
Path-specific (critical paths) constraints have a higher priority in the
implementation tools
Show Slide 380:
Day Two Review Answers
What advanced software settings can you use to increase performance?
MAP: Timing-driven packing

PAR: Extra effort level
Xplorer
Page 350
www.xilinx.com
1-877-XLX-CLAS
Facilitator Guide
Course Summary
Course Summary
Show Slide 381:
Day One Summary
A flow for achieving timing closure was presented

The Virtex-5 FPGA architecture has many dedicated resources that can
improve performance and lower power
The DCM and PLL has many features that can increase design
performance
There are many clock features available for high-speed design
You can increase design performance by duplicating flip-flops, pipelining,
and using I/O flip-flops
Synthesis tools have many different options to improve synthesis results
CORE Generator software system cores can be used to take full
advantage of the Xilinx FPGA architecture
Show Slide 382:
Day Two Summary
Timing reports are used to identify critical paths and analyze the cause of
timing failures
Multicycle, false path, and critical path timing constraints can be easily
specified via the Advanced tab in the Xilinx Constraints Editor
Advanced implementation options, such as timing-driven packing, extra
effort level, and Xplorer can help increase performance
www.xilinx.com
1-877-XLX-CLAS
Page 351

Fpga23000 10 WKBF Rev1

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Fpga23000 10 WKBF Rev1

Diunggah oleh

Hak Cipta:

Format Tersedia

Designing for Performance

Designing for Performance

About This Guide

The Program in Perspective

Quick Reference Material

Review of Fundamentals of FPGA Design

Block RAMs and FIFO

XtremeDSP Solution Cores

Apply Your Knowledge Answers

CORE Generator Software System

Using the CORE Generator Software System

CORE Generator Software Design Flows

Apply Your Knowledge Answers

Lab 1: CORE Generator Software System

Designing Clock Resources

Clock Management Tile

Apply Your Knowledge Answers

Lab 2: Designing Clock Resources

FPGA Design Techniques

Apply Your Knowledge Answers

Achieving Breakthrough Performance

XST Synthesis Options

Apply Your Knowledge Answers

Day One Summary

Day One Summary

Course Agenda Day Two

Course Agenda Day Two

Achieving Timing Closure

Interpreting Timing Reports

Apply Your Knowledge Answers

Lab 4: Review of Global Timing Constraints

Timing Groups and OFFSET Constraints

Apply Your Knowledge Answers

Path-Specific Timing Constraints

Inter-Clock Domain Constraints

Apply Your Knowledge Answers

Lab 5: Achieving Timing Closure

Advanced Implementation Options

Advanced MAP and Place & Route Options

SmartGuide and Partitions

Apply Your Knowledge Answers

Lab 6: Designing for Performance

Using the XPower Analyzer Software

Apply Your Knowledge Answers

Lab 7: FPGA Editor Demo

ChipScope Pro Software

ChipScope Pro Software Cores

Lab 8: ChipScope Pro Software

Appendix B: Spartan-3 FPGA HDL Coding Techniques*

Appendix C: Virtex-5 FPGA HDL Coding Techniques*

Appendix D: Synthesis Techniques*

Inferring Logic and Flip-Flop Resources

Inferring I/Os and Global Resources

Inferring DSP48 Resources

Appendix E: Spartan-3E FPGA 1600E MicroBlaze Processor Development

* Not included in the printed workbook, but available via

The course delivery sequence

Checklists of any necessary materials and equipment

Presentation scripts and key points to cover

Instructions for managing exercises, case studies, and other

How Is This Guide Organized?

About This Guide, continued

About This Guide, continued