Digital Design

Modern Digital Design Flow
• Agenda
1. History of Digital Design Approach

2. HDLs
3. Design Abstraction
4. Modern Design Steps
5. Implementation Options (FPGAs)
History
• In the beginning…
1970's
- designers used Paper/Pencil & Boolean Equations to create schematics
- the drawback :
- each flop required a Boolean equation
- impractical in large designs
1980's
- schematic based designs using electronic editors
- this enabled Copy/Past & Hierarchy
- Design-reuse was enabled which increased design sizes
mid 80's
- HDL's became more common (created mid 80's)
- Text-based Compilers (C, PASCAL) could be adapted to perform digital simulation
- Larger Designs could be described using text
Design
Physical
Simulation Still separate Implementation
2
History
• More recently
1990's
- Synthesis became practical due to increase in computational power of computers
Synthesis - the creation of circuitry from a functional description
ex) "Functional Description of MUX"

Sel
if (Sel = 0)
Out = A
A
else Synthesis
Out
Out = B B
3
HDL
• Real Power
1990's - Now engineers had a power combination
"HDL"
if (Sel = 0)
Out = A
else
Out = B
"Simulation" "Synthesis"
Sel
A
Out
B
4
HDL
• Abstraction
Engineers could now stay at a higher level of abstraction and rely on the tools to
1) Simulation
2) Synthesize the circuitry
- This allows larger systems to be described/designed in the same time
- Since HW is expensive to build, using the tools to reduce prototyping was the next step
5
HDL
• Timing Verification
HDL
- Let the tool "Verify" timing

Functional
Synthesis
Simulation
- Less time spent prepping
design for a prototyping run
Technology
Mapping
Place/Route
(extract RC's)
Post Implementation
Match? Simulation
Fab
6
Hardware Description Languages vs.
Programming Languages
• Program structure
– instantiation of multiple components of the same type
– specify interconnections between modules via schematic
– hierarchy of modules (only leaves can be HDL in Xilinx Foundation)
• Assignment
– continuous assignment (logic always computes)
– propagation delay (computation takes time)
– timing of signals is important (when does computation have its effect)
• Data structures
– size explicitly spelled out - no dynamic structures
– no pointers
• Parallelism
– hardware is naturally parallel (must support multiple threads)
– assignments can occur in parallel (not just sequentially)
7
Hardware Description Languages and
Combinational Logic
• Modules - specification of inputs, outputs, bidirectional, and
internal signals
• Continuous assignment - a gate's output is a function of its
inputs at all times (doesn't need to wait to be "called")
• Propagation delay- concept of time and delay in input affecting
gate output
• Composition - connecting modules together with wires
• Hierarchy - modules encapsulate functional blocks
• Specification of don't care conditions (accomplished by setting
output to “x”)
8
Hardware Description Languages and
Sequential Logic
• Flip-flops
– representation of clocks - timing of state changes
– asynchronous vs. synchronous
• FSMs
– structural view (FFs separate from combinational logic)
– behavioral view (synthesis of sequencers)
• Data-paths = ALUs + registers (e.g. Combinational Lock)
– use of arithmetic/logical operators
– control of storage elements
• Parallelism
– multiple state machines running in parallel
• Sequential don't cares
9
Design Abstraction
• At What level can we design?
10
Design Abstraction
• What does abstraction give us?
- The higher in abstraction we go, the more complex

& larger the system becomes
- But, we let go over the details of how it performs

(speed, fine tuning)
- There are engineering jobs at each level
- Guru's can span multiple levels
• What does VHDL model?
- System : Chip : Register : Gate
- VHDL let's us describe systems in two ways:
1) Structural (text netlist)

2) Behavioral (requires synthesis)
11
VHDL/Verilog: Structure/Behavior
• Supports structural and behavioral descriptions
• Structural
– explicit structure of the circuit
– e.g., each logic gate instantiated and connected to others
• Behavioral
– program describes input/output behavior of circuit
– many structural implementations could have same behavior
– e.g., different implementation of one Boolean function
• We’ll only be using behavioral VHDL/Verilog in design works
– rely on schematic when we want structural descriptions
12
Modern Digital Design Flow
• Designing Large Digital Circuits
- this is the ideal process
13
Digital Design Flow
• Designing Large Digital Circuits
- this is reality
14
Digital Design Flow
• A More Detailed Breakdown Relation to our class
HW or Lab Assignment
Write VHDL, Simulate with ModelSim
Synthesize in Quartus, Run Timing Simuluation
Place/Route on FPGA, Download, Test
Take idea, create custom HW to reduce cost

start your own company
sell and become rich
15
Hardware Design Flow
16
Digital Implementation
• What options do we have for hardware implementation?
- Discrete Devices (i.e., go to the stock room and buy NAND gates & Flip-flops)
- ASICs (Application Specific Integrated Circuits (custom silicon)
- Programmable Logic (CPLDs, FPGAs)
• FPGAs have become one of the most popular technologies recently
- We’ll use an FPGA in this class to test our designs
- We’ll use the ModelSim simulator for functional simulation
- We’ll use the Altera Quartus II design software for

synthesis, place/route, and post-synthesis verification.
- We’ll use an Altera Cyclone II FPGA on a DE2 evaluation board

to test our designs in hardware.
17
FPGA's
• What is an FPGA
Field Programmable Gate Array
• An FPGA uses Re-configurable Logic Blocks
- we set the config bits of this block to set its Boolean logic function
- the configuration is a Truth Table (or Look Up Table) of functionality
In1 config Out

Out 000 NOT(In1)
In2 001 NOT(In2)
010 OR
config 011 NOR
100 AND
101 NAND
110 XOR
111 XNOR
18
FPGA's
• LUTs = Look Up Tables
- we can program the LUTs to be whatever type of gate is needed by the design
- there are a finite number of LUTs within a given FPGA (also called "resources")
• The LUTs are configured into an ARRAY on the silicon
- Array of LUT's = Array of Gates = Gate Array

In1 In1 In1
Out Out Out
In2 In2 In2
config config config
In1 In1 In1

Out Out Out
In2 In2 In2
In1 In1 In1

Out Out Out
In2 In2 In2
19
FPGA's
• Programmable Interconnect
- there are programmable interconnect switches that connect the LUTs
LUT X LUT X LUT
X X X X X
LUT X LUT X LUT
X X X X X
LUT X LUT X LUT
20
FPGA's
• Configuration
- We start with a Gate Level Schematic of our design (from synthesis)

- The FPGA LUTs are configured to implement Gates
LUT X LUT X LUT
X X X X X
LUT X LUT X LUT
X X X X X
LUT X LUT X LUT
21
FPGA's
• Configuration
- The interconnect switches are then programmed to implement the net connections
A INV X AND X LUT
B X X X X X Out
C INV X OR X LUT
X X X X X
LUT X LUT X LUT
22
FPGA's
• Configuration
- The LUT and Interconnect configuration is volatile

(i.e., it goes away when power is removed)
- Since the programming is done by the user after fabrication, we call

it "Field Programmable”
A INV X AND X LUT
B X X X X X Out
C INV X OR X LUT
X X X X X
LUT X LUT X LUT
- We now understand where the name “Field Programmable Gate Array” comes from.
23
FPGA's
• Adding More Functionality
- FPGA manufacturer's quickly learned that Flip-Flops would be useful
- They put a DFF next to a 4-Input LUT to form a "Configurable Logic Block" (CLB)
CLB X CLB
X X X
CLB X CLB
24
FPGA's
• Adding Even More Functionality
- To Improve performance, common logic

functions were "hard coded" on the silicon
- Block RAM
- Adders / Multipliers
- Global Clock Buffers
- even Microprocessors!
25
FPGA's
• What else can we program?
- Which Pins to use on the package
- What logic levels
- CMOS_33, CMOS25
- SSTL, SSTL2, etc…
26
VHDL
• Agenda
1. Hardware Description Languages

2. VHDL History
3. VHDL Systems and Signals
4. VHDL Entities, Architectures, and Packages
5. VHDL Data Types
6. VHDL Operators
7. VHDL Structural Design
8. VHDL Behavioral Design
9. VHDL Test Benches
VHDL History
• VHDL
V = Very High Speed Integrated Circuit

H = Hardware
D = Description
L = Language
- Originally a Department of Defense sponsored project in the 80's
- Original Intent was to Document Behavior (instead of writing system manuals)
- Original Intent was NOT synthesis, that came later
- Simulation was a given, since the designs were already in text and we had text compilers (C, ….)
- Designed by IBM, TI, Intermetrics (all sponsored by DoD)
28
VHDL History
• VHDL & IEEE
- In 1987, IEEE published the "VHDL Standard"
- IEEE 1076-1987 = First formal version of VHDL

- Strong "Data Typing"
- each signal/variable is typed (bit, bit_vector, real, integer)
- assignments between different types NOT allowed
- Did not handle multi-valued logic
29
VHDL History
• VHDL & IEEE
- What is multi-valued logic?
- when there are more possible values than 0 and 1
- we need this for real world systems such as buses
- a bus is where multiple circuits drive and receive information

- only one agent drives the bus (low impedance)
- all other agents listen (high impedance)
- how can something drive AND receive?
- a "transceiver" has both a transmit (i.e., a gate facing out) and receive (i.e., a gate facing in)
- we can draw it as follows:

Tx/Rx'
30
VHDL History
• VHDL & IEEE
- What is multi-valued logic?
- but that circuit doesn't actually work because the driving gate will always be driving?
Tx/Rx
- in reality it looks like this:
Tx/Rx'
- what does this look like when it is "OFF"? High Impedance
31
VHDL History
• VHDL & IEEE
- High Impedance
Tx/Rx
Tx/Rx
Tx/Rx
- it is how circuits behave, strong drivers will control the bus when everyone is High-Z
- When nobody is driving the bus, the bus is High-Z
- So for true behavior, VHDL has to model High-Z
- VHDL's built in types (bit and bit_vector) can only be 0 or 1, these don't cut it.
- Weak/Strong
- Some busses have multiple drivers but some are weaker than others (i.e., MCAN)?
- We should model these too
32
VHDL History
• VHDL & IEEE
- VHDL allows users to come up with their own data types. Since the world needed multi-valued logic,
everyone started creating their own add-on packages.
- this created a lot of confusion when multiple vendors worked together (i.e., Fab Shop and Designer)
- In 1993, IEEE published an Upgrade
- IEEE 1164 - added support for Multi-Valued Logic through the "STD_LOGIC" package
- better syntax consistency
- Every time there is a need for a data type, industry will start to create add-ons. Then IEEE
will create a standard to reduce confusion
- Other package standards that were added to VHDL
- 1076.2 = "Real and Complex Data Types"

- 1076.3 = "Signed and Unsigned Data Types"
- The last rev of VHDL in 2003 (1076.3) is considered by most to be the more recent major release
- Although people are talking about VHDL 2006 (which now has turned into VHDL 200x)
33
VHDL History
• At What level can we design?
34
VHDL History
• What does abstraction give us?
- The higher in abstraction we go, the more complex

& larger the system becomes
- But, we let go over the details of how it performs

(speed, fine tuning)
- There are engineering jobs at each level
- Guru's can span multiple levels
• What does VHDL model?
- System : Chip : Register : Gate
- VHDL let's us describe systems in two ways:
1) Structural (text netlist)

2) Behavioral (requires synthesis)
35
VHDL Systems and Signals
• Systems
- The world is made up of systems communicating with each other
- Systems are made up of other Systems
- A System has a particular "Behavior" and "Structure"

Adder System
Behavior Structure
OUT = In1 + In2
- We can describe an "Adder" system in multiple ways and at multiple levels of abstraction
36
• System Interface
- We must first describe the system's Interface to connect it to other systems
Adder
In1
Out
In2
- An "Interface" is a description of the Inputs and Outputs
- We also call these "Ports"
37
• System Behavior
- We then must describe the system's behavior (or functionality)
Adder
In1
Out
In2
- There are many ways to describe the behavior in VHDL
- When describing a system, we must always describe its:
1) Interface
2) Behavior
38
• Signals
- Multiple Systems communicate with each other using signals
Adder
In1
Out Adder
In2
In1
Adder Out
In2
In1
Out
In2
Internal Signals
External Signals
39
VHDL Entity
• VHDL
Entity - used to describe a system's interface

- we call the Inputs and Outputs "Ports"
- creating this in VHDL is called an "Entity Declaration"
Architecture - used to describe a system's behavior (or structure)

- separate from an entity
- an architecture must be tied to an entity
- creating this in VHDL is called an "Architecture Definition"
Syntax Details we'll follow:
- we put the entity and architecture together in one text file

- we name the text file with the system name used in the entity
- the post fix for VHDL is *.vhd
adder.vhd
entity declaration
architecture definition
40
VHDL Entity
• More Syntax Notes
- VHDL is NOT case sensitive

- Comment text is proceeded with "--"
- Names must start with an alphabetic letter (not a number)
- Names can include underscore, but not two in a row (i.e., __) or as the last character.
- Names cannot be keywords (in, out, bit, ….)
41
VHDL Entity
• Entity Details
- an entity declaration must possess the following:
1) entity-name - user selected, same as text file
2) signal-names - user selected
- mode - direction of signal (in, out, buffer, inout)
3) signal-type - what type of data is it?

(bit, STD_LOGIC, real, integer, signed,…)
- this is where VHDL is strict!
- we say it is a "strong type cast" language
- there are built in (or pre-defined) types
(bit, bit_vector, boolean, character, integer, real, string, time)
- we can add more types for realistic behavior (i.e., buses)
42
VHDL Entity
• Entity Syntax
entity entity-name is
port (signal-name : mode signal-type;

signal-name : mode signal-type;
signal-name : mode signal-type);
end entity entity-name;
NOTES: - the keywords are entity, is, port, end

- multiple signal-names with the same type can be comma delimited on the same line
- the port definition is contained within parenthesis
- each signal-name line ends with a ";"
except
the last line (watch the ");" at the end, this will get you every time!)
43
VHDL Entity
• Entity Example
entity adder is Adder

In1
port (In1, In2 : in bit; Out
Out1 : out bit); In2
end entity adder;
NOTES: - we can also put "Generics" within an entity, which are dynamic variables
ex) generic (BusWidth : Integer := 8);
more on generics later….
44
VHDL Entity
• Systems in VHDL adder.vhd
entity declaration
- Systems need to have two things described
1) Interface (I/O, Ports…) architecture definition

2) Behavior (Functionality, Structure)
- In VHDL, we do this using entity and architecture
Entity - used to describe a system's interface

- we call the Inputs and Outputs "Ports"
- creating this in VHDL is called an "Entity Declaration"
Architecture - used to describe a system's behavior (or structure)

- separate from an entity
- an architecture must be tied to an entity
- creating this in VHDL is called an "Architecture Definition"
45
VHDL Architecture
• Architecture Details
- an architecture is always associated with an entity (in the same file too)
- an architecture definition must possess the following:
1) architecture-name - user selected, different from entity

- we usually give something descriptive (adder_arch, and2_arch)
- some companies like to use "behavior", "structural" as the names
2) entity-name - the name of the entity that this architecture is associated with
- must already be declared before compile
3) optional items… - types

- signals : internal connections within the architecture
- constants
- functions : calling predefined blocks
- procedures : calling predefined blocks
- components : calling predefined blocks
4) end architecture - keywords to signify the end of the definition

- we follow this by the architecture name and ";"
46
VHDL Architecture
• Architecture Syntax
architecture architecture-name of entity-name is
type…
signal…
constant…
function…
procedure…
component…
begin
…behavior or structure
end architecture architecture-name;
NOTE: - the keywords are architecture, of, is, type…component, begin, end
- there is a ";" at the end of the last line
47
VHDL Architecture
• Architecture definition of an AND gate
architecture and2_arch of and2 is
begin
Out1 <= In1 and In2;
end architecture and2_arch;
• Architecture definition of an ADDER
architecture adder_arch of adder is

Adder
begin
In1
Out1 <= In1 + In2;
Out
In2
end architecture adder_arch;
48
VHDL Packages
• VHDL is a "Strong Type Cast" language…
- this means that assignments between different data types are not allowed.
- this means that operators must be defined for a given data types.
- this becomes important when we think about synthesis
ex) string + real = ???
- can we add a string to a real?

- what is a "string" in HW?
- what is a "real" in HW?
- VHDL has built-in features:
1) Data Types
2) Operators
- built-in is also called "pre-defined"
49
VHDL Packages
• Pre-defined Functionality
ex) there is a built in addition operator for integers
integer + integer = integer
- the built-in operator "+" works for "integers" only

- it doesn't work for "bits" as is
• Adding on Functionality
- VHDL allows us to define our own data types and operators

- a set of types, operators, functions, procedures… is called a "Package"
- A set of packages are kept in a "Library"
50
VHDL Packages
• IEEE Packages
- when functionality is needed in VHDL, engineers start creating add-ons using Packages
- when many packages exist to perform the same function (or are supposed to)
keeping consistency becomes a problem
- IEEE publishes "Standards" that give a consistent technique for engineers to use in VHDL
- we include the IEEE Library at the beginning of our VHDL code
syntax: library library-name
- we include the Package within the library that we want to use
syntax: use library-name.package.function
- we can substitute "ALL" for "function" if we want to include everything
51
VHDL Packages
• Common IEEE Packages
- in the IEEE library, there are common Packages that we use:
STD_LOGIC_1164
STD_LOGIC_ARITH
STD_LOGIC_SIGNED
Ex) library IEEE;

use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_SIGNED.ALL;
- libraries are defined before the entity declaration
52
VHDL Design
• Let's Put it all together now…
library IEEE; -- package

use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_SIGNED.ALL;
entity and2 is -- entity declaration
port (In1, In2 : in STD_LOGIC;

Out1 : out STD_LOGIC);
end entity and2;
architecture and2_arch of and2 is -- architecture definition
begin
53
VHDL Design
• Another Example…
library IEEE; -- package

entity inv1 is -- entity declaration
port (In1 : in STD_LOGIC;

end entity inv1;
architecture inv1_arch of inv1 is -- architecture definition
begin
Out1 <= not In1;
end architecture inv1_arch;
• The Pre-defined features of VHDL are kept in the STANDARD library

- but we don't need to explicitly use the STANDARD library, it is automatic
54
VHDL Data Types
• Signals
- a single bit is considered a Scalar quantity
- a bus (or multiple bits represented with one name) is called a Vector
- in VHDL, we can define a signal bus as:
data_bus : in bit_vector (7 downto 0); -- we will use "downto"
or
data_bus : in bit_vector (0 to 7);
- the Most Significant Bit (MSB) is ALWAYS on the left of the range description:
ex) data_bus : in bit_vector (7 downto 0);
data_bus(7) = MSB
ex) data_bus : in bit_vector (0 to 7);
data_bus(0) = MSB
55
VHDL Data Types
• Signals
- there are "Internal" and "External" signals
Internal - are within the Entity's Interface
External - are outside the Entity's Interface and connect it to other systems
56
VHDL Data Types
• Scalar Data Types (Built into VHDL)
- scalar means that the type only has one value at any given time
Boolean - values {TRUE, FALSE}

- not the same as '0' or '1'
Character - values are all symbols in the 8-bit ISO8859-1 set (i.e., Latin-1)
- examples are '0', '+', 'A', 'a', '\'
Integer - values are whole numbers from -2,147,483,647 to +2,147,483,647

- the range comes from +/- 232
- examples are -12, 0, 1002
Real - values are fractional numbers from -1.0E308 to +1.0E308

- examples are 0.0, 1.134, 1.0E5
Bit - values {'0', '1'}

- different from Boolean
- this type can be used for logic gates
- single bits are always represented with single quotes (i.e., '0', '1')
57
VHDL Data Types
• Array Data Types (Built into VHDL)
- array is a name that represents multiple signals
Bit_Vector - vector of bits, values {'0', '1'}

- array values are represented with double quotes (i.e., "0010")
- this type can be used for logic gates
ex) Addr_bus : in BIT_VECTOR (7 downto 0);
- unlimited range
- first element of array has index=0 (i.e., Addr_bus(0)…)
String - vector of characters, values{Latin-1}

- again use double quotes
- define using "to" or "downto" ("to" is easier for strings)
ex) Message : string (1 to 10) := "message here…"
- first element in array has index=1, this is different from BIT_VECTOR
58
VHDL Data Types
• Physical Data Types (Built into VHDL)
- these types contain object value and unites

- NOT synthesizable
Time - range from -2,147,483,647 to +2,147,483,647

- units: fs, ps, ns, us, ms, sec, min, hr
• User-Defined Enumerated Types
- we can create our own descriptive types, useful for State Machine
- no quotes needed
ex) type States is (Red, Yellow, Green);
59
VHDL Operators
• VHDL Operators
- Data types define both "values" and "operators"
- There are "Pre-Determined" data types
Pre-determined = Built-In = STANDARD Package
- We can add additional types/operators by including other Packages
- We'll first start with the STANDARD Package that comes with VHDL
60
VHDL Operators
• Logical Operators
- works on types BIT, BIT_VECTOR, BOOLEAN
- vectors must be same length
- the result is always the same type as the input
not
and
nand
or
nor
xor
xnor
61
VHDL Operators
• Numerical Operators
- works on types INTEGER, REAL
- the types of the input operands must be the same
+ "addition"
- "subtraction"
* "multiplication"
/ "division"
mod "modulus"
rem "remainder"
abs "absolute value"
** "exponential"
ex) Can we make an adder circuit yet?
A,B : in BIT_VECTOR (7 downto 0)

Z : out BIT_VECTOR (7 downto 0)
Z <= A + B;
62
VHDL Operators
• Relational Operators
- used to compare objects
- objects must be of same type
- Output is always BOOLEAN (TRUE, FALSE)
- works on types: BOOLEAN, BIT, BIT_VECTOR, CHARACTER, INTEGER, REAL, TIME, STRING
= "equal"
/= "not equal"
< "less than"
<= "less than or equal"
> "greater than"
>= "greater than or equal"
63
VHDL Operators
• Shift Operators
- works on one-dimensional arrays
- works on arrays that contain types BIT, BOOLEAN
- the operator requires

1) An Operand (what is to be shifted)
2) Number of Shifts (specified as an INTEGER)
- a negative Number of Shifts (i.e., "-") is valid and reverses the direction of the shift
sll "shift left logical“ 0011 sll 1 = 0110

srl "shift right logical“ 1100 slr 2 = 0011
sla "shift left arithmetic“ 1100 sla 1 = 1000 (rightmost = 0, insert 0)
sra "shift right arithmetic“ 1100 sra 2 = 1111 (leftmost = 1, insert 1)
rol "rotate left“ 1001 rol 1 = 0011
ror "rotate right“ 0110 ror 1 = 0011
If negative integer occurs, it will perform the function same as opposite operator:
» 1100 ror -1 = 1100 rol 1 = 1001
•
64
VHDL Operators
• Concatenation Operator
- combines objects of same type into an array
- the order is preserved
& "concatenate"
ex) New_Bus <= ( Bus1(7:4) & Bus2(3:0) )
65
VHDL Operators
• Assignment Operators
- The assignment operator is <=
- The Results is always on the Left, Operands on the Right
- Types need to all be of the same type
- need to watch the length of arrays!
Ex) x <=y;
a <= b or c;
sum <= x + y;
NewBus <= m & k;
66
VHDL Operators
• Delay Modeling
- VHDL allows us to include timing information into assignment statements
- this gives us the ability to model real world gate delay
- we use the keyword "after" in our assignment followed by a time operand.
Ex) B <= not A after 2ns;
- VHDL has two types of timing models that allow more accurate representation of real gates
1) Inertial Delay (default)
2) Transport Delay
67
VHDL Operators
• Inertial Delay
- if the input has two edge transitions in less time than the inertial delay, the pulse is ignored
said another way…
- if the input pulse width is smaller than the delay, it is ignored
- this models the behavior of trying to charge up the gate capacitance of a MOSFET
ex) B <= A after 5ns;
any pulses less than 5ns in width are ignored.
68
VHDL Operators
• Transport Delay
- transport delay will always pass the pulse, no matter how small it is.
- this models the behavior of transmission lines
- we have to explicitly call out this type of delay using the "transport" keyword
ex) B <= transport A after 5ns;
B <= transport not A after t_delay; -- here we used a constant
69
Generics vs. Constants
• Generics vs. Constants
- it is very useful to be able to design using variables/parameters instead

of hard coded values
ex) width of bus, delay, loop counters,
- VHDL Provides two methods for this functionality
1) Generics
2) Constants
- These are similar but have subtle differences
70
• Generics
- declared in Entity
- design can be compiled without initialization
- global variable which can be altered at run-time
- is visible to all architectures below that entity
syntax:
generic (gen-name : gen-type := init-val)
NOTE: init-val is optional
ex) entity inv_n is
generic (WIDTH : integer := 7);
port (In1 : STD_LOGIC_VECTOR (WIDTH downto 0);

Out1 : STD_LOGIC_VECTOR (WIDTH downto 0) );
end entity inv_n;
71
• Constants
- declared in Architecture
- needs to be initialized
- only visible to the architecture it is defined in
syntax:
constant (const-name : const-type := init-val)
NOTE: init-val is NOT optional
ex) architecture inv_n_arch of inv_n is
constant (t_dly : time := 1ns);
begin
Out1 <= not In1 after t_dly;
end architecture inv_n_arch;
72
VHDL Concurrent Signal Assignments
• Concurrency
- the way that our designs are simulated is important in modeling real HW behavior
- components are executed concurrently (i.e., at the same time)
- VHDL gives us another method to describe concurrent logic behavior called
"Concurrent Signal Assignments"
- we simply list our signal assignments (<=) after the "begin" statement in the architecture
- each time any signal on the Right Hand Side (RHS) of the expression changes,
the Left Hand Side (LHS) of the assignment is updated.
- operators can be included (and, or, +, …)
73
• Concurrent Signal Assignment Example
entity TOP is node1

port (A,B,C : in STD_LOGIC;
X : out STD_LOGIC);
end entity TOP;
architecture TOP_arch of TOP is
signal node1 : STD_LOGIC;
begin
node1 <= A xor B;

X <= node1 or C;
end architecture TOP_arch;
74
• Concurrent Signal Assignment Example
node1 <= A xor B; node1

X <= node1 or C;
- if these are executed concurrently, does it model the real behavior of this circuit?
Yes, that is how these gates operate. We can see that there may be timing that
needs to be considered….
- When does C get to the OR gate relative to (A  B)?

- Could this cause a glitch on X? What about a delay in the actual value?
75
• Conditional Signal Assignments
- we can also include conditional situations in a concurrent assignment
- the keywords for these are:
"when" = if the condition is TRUE, make this assignment

"else" = if the condition is FALSE, make this assignment Priority logic
ex) X <= '1' when A='0' else '0';

Y <= '0' when A='0' and C='0' else '1';
- X and Y are evaluated concurrently !!!
- notice that we are assigning static values (0 and 1), this is essentially a "Truth Table"
- if using this notation, make sure to include every possible input condition, or else you haven't
described the full operation of the circuit.
76
• Conditional Signal Assignments
- We can also assign signals to other signals using conditions
- this is similar to a MUX
ex) X <= A when Sel='0' else B;
- Again, make sure to include every possible input condition, or else you haven't
described the full operation of the circuit.
- If you try to synthesis an incomplete description, the tool will start making stuff up!
77
• Selected Signal Assignment
- We can also use a technique that allows the listing of "choices" and "assignments" in a comma
delimited fashion.
- this is called "Selected Signal Assignment" but it is still CONCURRENTLY assigned
syntax:
with expression select
signal-name <= signal-value when choices,

No priority, no overlap
signal-value when choices,
:
signal-value when others;
- we use the term "others" to describe any input condition that isn't explicitly described
78
• Selected Signal Assignment Example
Describe the following Truth Table using Selected Signal Assignments:
Input X
000 0
001 1
010 1
011 0
100 1
101 1
110 0
111 0 begin
with Input select
X<= '0' when "000",
'1' when "001",
'1' when "010",
'0' when "011",
'1' when "100",
'1' when "101",
'0' when "110",
'0' when "111";
79
• Selected Signal Assignment Example
- we can shorten the description by using "others" for the 0's
- we can also use "|" delimited choices
Input X
000 0
001 1
010 1
011 0
100 1
101 1
110 0
111 0 begin
with Input select
X<= '1' when "001" | "010" | "100" | "101",
'0' when others;
80
VHDL Structural Design
• Structural Design
- we can specify functionality in an architecture in two ways
1) Structurally : text based schematic, manual instantiation of another system

When internal connection are clear, straightforward; small design
2) Behaviorally : abstract description of functionality
- we will start with learning Structural VHDL design
• Components
- blocks that already exist and are included into a higher level design
- we need to know the entity declaration of the system we are calling
- we "declare" a component using the keyword "component"
- we declare the component in the architecture which indicates we wish to use it
81
• Component Syntax
component component-name
port (signal-name : mode signal-type;

signal-name : mode signal-type); -- exactly the same as the Entity declaration
end component;
• Let's build this…
82
• Component Example
- let's use these pre-existing entities "xor2" & "or2"
entity xor2 is

end entity xor2;
entity or2 is

end entity or2;
83
• Component Example
- now let's include the pre-existing entities "xor2" & "or2" into our "TOP" design
entity TOP is
port (A,B,C : in STD_LOGIC;
X : out STD_LOGIC);
end entity TOP;
component xor2 -- declaration of xor2 component

end component;
component or2 is -- declaration of or2 component

end component;
begin
…..
84
• Signals
- now we want to connect items within an architecture, we need "signals" to do this
- we defined signals within an architecture
Internal "Signal"
Internal "Components"
85
• Signal Syntax
signal signal-name : signal-type;

signal signal-name : signal-type;
86
• Let's put the signal declaration into our Architecture
- now let's include the pre-existing entities "xor2" & "or2" into our "TOP" design

end component;
entity or2 is -- declaration of or2 component

end component;
begin
…..
node1
end architecture TOP_arch;
87
• Component Instantiation
- after the "begin" keyword, we can start adding components and connecting signals
- we add components with a "Component Instantiation"
syntax:
label : component-name port map (port => signal, ……) ;
NOTE: - "label" is a unique reference designator for that component (U1, INV1, UUT1)
- "component-name" is the exact name as declared prior to the "begin" keyword
- "port map" is a keyword
- the signals with in the ( ) of the port map define how signals are connected
to the ports of the instantiated component
88
• Port Maps
- There are two ways describe the "port map" of a component
1) Positional
2) Explicit
• Positional Port Map
- signals to be connected to the component are listed in the exact order as the components port order
ex) U1 : xor2 port map (A, B, node1);
• Explicit Port Map
- signals to be connected to the component are explicitly linked to the port names of the
component using the "=>" notation (Port => Signal, Port => Signal, ….)
ex) U1 : xor2 port map (In1 => A, In2 => B, Out1 => node1);
89
• Execution
- All components are executed CONCURRENTLY
- this mimics real hardware
- this is different from traditional program execution (i.e., C/C++) which is executed sequentially
because
We are NOT writing code, we are describing hardware!!!
90
• Let's put everything together

end component;
component or2 is -- declaration of or2 component

end component;
begin
U1 : xor2 port map (In1=>A, In2=>B, Out1=>node1);
U2 : or2 port map (In1=>C, In2=>node1, Out1=>X);
U1
node1
end architecture TOP_arch; U2
91
VHDL Behavioral Design
• Behavioral Design
- we've learned the basic constructs of VHDL (entity, architecture, packages)
- we've learned how to use structural VHDL to instantiate lower-level systems

and to create text-based schematics
- now we want to go one level higher in abstraction and design using

"Behavioral Descriptions" of HW
- when we design at the Behavioral level, we now rely on Synthesis tools to create
the ultimate gate level schematic
- we need to be aware of what we CAN and CAN'T synthesis
- Remember, VHDL was invented to model systems, not for synthesis
- This means we can simulate a lot more functionality that could ever by synthesized
92
• Processes
- a way to describe interaction between signals
- a process executes a SEQUENCE of operations
- the new values in a process (i.e., the LHS) depend on the current and past values
of the other signals
- the new values in a process (i.e., the LHS) do not get their value until the process
terminates
- a process goes in the architecture after the "begin" keyword
syntax: name : process (sensitivity list)
declarations
begin
sequential statements
end process name;
93
• Process Execution
- Real systems start on certain conditions
- they then perform an operation
- they then wait for the next start condition
ex) Button pushed?

Clock edge present?
Reset?
Change on Inputs?
- to mimic real HW, we want to be able to START and STOP processes
- otherwise, the simulation would get stuck in an infinite loop or "hang"
94
• Process Execution
- Processes execute in Sequence (i.e., one after another, in order)
- these are NOT concurrent
- this is a difficult concept to grasp and leads to difficulty in describing HW
ex) name : process (sensitivity list)
begin
sequential statement;
end process name;
- these signal assignments are called "Sequential Signal Assignments"
(as opposed to "Concurrent Signal Assignments")
95
• Starting and Stopping a Process
- There are two ways to start and stop a process 1) Sensitivity List
2) Wait Statement
• Sensitivity List
- a list of signal names
- the process will begin executing if there is a change on any of the signals in the list
ex) FLOP : process (clock)
begin
Q <= D;
end process FLOP;
- each time there is a change on "clock", the process will execute ONCE
- the process ends after the last statement
96
• Wait Statements
- the keyword "wait" can be used inside of a process to start/stop it
- the process executes the sequences 1-by-1 until hitting the wait statement
- we don't use "waits" and "sensitivity lists" together
ex) DOIT : process DOIT : process

begin begin
statement 1; statement 1;
statement 2; statement 2;
statement 3; wait;
end process DOIT; end process DOIT;
(No Start/Stop Control, loops forever) (w/ Start/Stop Control, executes until "wait" then stops)
- we need to have a conditional operator associated with the wait statement,

otherwise it just stops the process and it will never start again.
97
• Wait Statements
- the wait statements can be followed by keywords "for" or "until" to describe the
wait condition
- the wait statement can wait for:
1) type-expression ex) wait for 10ns;

wait for period/2;
2) condition ex) wait until Clock='1'

wait until Data>16;
98
• Signals and Processes
- Rules of a Process
1) Signals cannot be declared inside of a process
2) Assignment to a Signal takes effect only after the process suspends.

Until it suspends, signals keeps their previous value
3) Only the last signal assignment to a signal in the list has an effect.
So there's no use making multiple assignments to the same signal.
ex) DOIT : process (A,B) -- initially A=2, B=2… then A changes to 7
begin -- Y = 7 + 2 NOT Y=7+0

A <= '0';
B <= '0';
Y <= A+B;
end process DOIT;
99
• Signals and Processes
- But what if we want this behavior?
ex) DOIT : process (A,B) -- initially A=2, B=2… then A changes to 7
begin
A <= '0'; -- we WANT A to be assigned '0'
B <= '0'; -- we WANT B to be assigned '0'
Y <= A+B; -- we WANT Y to be assigned A + B = 0
end process DOIT;
- we need something besides a Signal to hold the interim value
- we need a "Variable"
100
Variables
• Variables
- Signals in processes are only assigned their value when the process suspends
- this makes multiple assignments to a signal meaningless
ex) DOIT : process (A,B) -- a change on A or B will trigger this process
begin
A <= 2; -- B gets its value from the previous value of A,
B <= A + 1; -- not from the A <= 2 assignment
end process DOIT;
- Variables allow us to assign values during the sequence of statements
101
Variables
• Variables
- Variables are defined within a process
syntax:
variable var-name : var-type := init value
- assignments to variables are made using ":=" instead of "<="
- assignments take place immediately
ex) DOIT : process (A,B) -- a change on A or B will trigger this process
variable temp : integer := 0;
begin
temp := 2;
B <= temp + 1;
end process DOIT;
102
Variables
• Signal vs. Variable
Signal Variable
has type (type, value, time) has type (type, value)
assignment with <= assignment with :=
declared outside of the process declared inside of process
assignment takes place when process suspends assignment is immediate
always exists only exists when process executes
103
If-Then Statements
• If / Then Statements
- Used ONLY within a process. VHDL has the following:
- if, then
- if, then, else
- if, then, elsif, then
- if, then, elsif, then, else
syntax:
if boolean-exp then seq-statement

elsif boolean-exp then seq-statement
else seq-statement
- parenthesis are allowed, but not required
- multiple sequential statements allowed, they are separated by a ";" and

can be on different lines
- logical operators allowed in Boolean Expression
104
If-Then Statements
• If / Then Statements
ex) Design a 2-to-1 MUX
architecture mux_2to1_arch of mux_2to1 is
begin
MUX : process (A,B,Sel)
begin
if (Sel = '0') then
Out1 <= A;
elsif (Sel = '1') then
Out1 <= B;
else
Out1 <=A; -- this isn't necessary, just for illustration
end if;
end process MUX;
end architecture mux_2to1_arch;
105
Case Statements
• Case Statements
- used ONLY within a process
- better for larger input combinations, If/Then's can get too long
syntax:
case expression is
when choices => seq-statement;
when choices => seq-statement;
:
end case;
- the keyword "others" is available for input combinations not explicitly called out
106
Case Statements
• Case Statements
ex) Design a 2-to-1 MUX
begin
MUX : process (A,B,Sel)
begin
case (Sel) is
when '0' => Out1 <= A;
when '1' => Out1 <= B;
when others => Out1 <= A; -- this isn't necessary, just for illustration
end case;
end process MUX;
- the case statement works nice on vectors
- if you want to combine individual signals to form a vector, you can use
variables and the concatenation operator
107
Conditional Loops
• Conditional Loops
- There are multiple loop structures we can use within VHDL
1) Loop
2) While
3) For
• Loops
- "Loop" is a keyword that starts a loop
- creates an infinite loop
- useful for modeling process that go forever (i.e, clocks, time)
108
Conditional Loops
• Loops
ex) CLOCK_GEN : process

begin
clock <= '0';
loop
clock <= '1' after 1ns;
clock <= '0' after 1ns;
end loop;
end process CLOCK_GEN;
- the loop is ended using the keywords "end loop;"
109
Conditional Loops
• While Loops
- a Boolean condition is tested at the beginning of the loop
- the loop only executes if the condition is true
ex) CLOCK_GEN : process

begin
clock <= '0';
while (EN = '1')

clock <= not clock after 1ns;
end loop;
end process CLOCK_GEN;
110
Conditional Loops
• For Loops
- a loop with a counter
- the loop executes the # of times in the range that is specified
syntax:
for identifier in range loop
seq-statement
seq-statement
end loop;
- the "identifier" is the loop variable.
- It is implicitly declared when included in the "for" statement.

- It is automatically the same type as the "range"
- it will step through ALL values in range
111
Conditional Loops
• For Loops
- the "range" needs to be previously defined. All types are allowed
- Supporting all types is powerful for enumerated lists in state machines
(i.e., state_list = idle, go, stop, ….)
ex) for state in state_list loop
if (current_state = state) then
valid_state = TRUE;
end if;
end loop;
112
Attributes
• Attributes
- ability to get more information about a signal other than its current value
- attributes allow access to the signal's history
- previous value
- time since last change
- this is how we can specify "edge triggered" events in sequential logic
- we put the attribute keyword after the signal name using the apostrophe (')
- there are many attributes, the most commonly used are:
1) event
2) transaction
3) last_value
4) last_event
113
Attributes
• "event" Attribute
- tells us when there was a change on the signal
- useful for edge detection
ex) "rising edge"
if (Clock'event and Clock='1')
• "transaction" Attribute
- tells us when there was an assignment is made to a signal
- the signal value does not need to change (i.e., 0 to 0)
ex) process (A'transaction)
statement if anybody ever assigns to A
114
Attributes
• "last_value" Attribute
- tells us the last value of a signal (before most recent assignment)
• "last_event" Attribute
- gives TIME since last event
- good for tracking timing violations (Setup/Hold, signals changing too fast)
ex) process (Data'event)
begin
if (Data'last_event < 0.5ns) then
too_fast <= TRUE;
else
too_fast <= FALSE;
end if;
115
VHDL : Test Benches
• Test Benches
- We need to stimulate our designs in order to test their functionality
- Stimulus in a real system is from an external source, not from our design
- We need a method to test our designs that is not part of the design itself
- This is called a "Test Bench“
- Test Benches are VHDL entity/architectures with the following:
- We instantiate the design to be tested using components
- We call these instantiations "Unit Under Test" (UUT) or "Device Under Test".
- The entity has no ports
- We create a stimulus generator within the architecture
- We can use reporting features to monitor the expected outputs
116
VHDL : Test Benches
• Test Benches
- Test Benches are for Verification, not for Synthesis!!!
- this allows us to use constructs that we ordinarily wouldn't put in a design

because they are not synthesizable
• Let's test this MUX
entity Mux_2to1 is
port (A, B, Sel : in STD_LOGIC;

Y : out STD_LOGIC);
entity Mux_2to1;
117
VHDL : Test Benches
entity Test_Mux is
end entity Test_Mux; -- the test bench entity has no ports
architecture Test_Mux_arch of Test_Mux is
signal In1_TB, In2_TB : STD_LOGIC; -- setup internal Test Signals

signal Sel_TB : STD_LOGIC; -- give descriptive names to make
signal Out_TB : STD_LOGIC; -- apparent they are test signals
component Mux_2to1 -- declare any used components

port (A, B, Sel : in STD_LOGIC;
Y : out STD_LOGIC);
end component;
begin
UUT : Mux_2to1 -- instantiate the design to test

port map ( A => In1_TB,
B => In2_TB,
Sel => Sel_TB,
Y => Out_TB);
118
VHDL : Test Benches
STIM : process -- create process to generate stimulus

begin
In1_TB <= '0'; In2_TB <= '0'; Sel_TB <= '0' wait for 10ns -- we can use wait
In1_TB <= '0'; In2_TB <= '1'; Sel_TB <= '0' wait for 10ns -- statements to control
In1_TB <= '1'; In2_TB <= '0'; Sel_TB <= '0' wait for 10ns -- the speed of the stim
:
:
:
In1_TB <= '1'; In2_TB <= '1'; Sel_TB <= '1' wait for 10ns -- end with a wait…
end process STIM;
end architecture Test_Mux_2to1;
119
VHDL : Test Benches
• Test Bench Reporting
- There are reporting features that allow us to monitor the output of a design
- We can compare the output against "Golden" data and report if there are differences
- This is powerful when we evaluate our designs across power, temp, process…..
• Assert
- the keyword "assert" will check a Boolean expression
- if the Boolean expression is FALSE, it will print a string following the "report" keyword
- Severity levels are also reported with possible values {ERROR, WARNING, NOTE, FAILURE}
ex) A<='0'; B<='0'; wait for 10ns;

assert (Z='1') report "Failed test 00" severity ERROR;
- The message comes out at the simulator console.
120
VHDL : Test Benches
• Report
- the keyword "report" will always print a string
- this is good for outputting the process of a test
- Severity levels are also reported
ex) report "Beginning the MUX test" severity NOTE;
A<='0'; B<='0'; wait for 10ns;

assert (Z='1') report "Failed test 00" severity ERROR;
121
Logic Synthesis with VHDL
What is logic synthesis
v Logic synthesis is the process of converting a high-
level description of design into an optimized gate-
level representation
v Logic synthesis uses standard cell library which have
simple cells, such as basic logic gates like and, or, and
nor, or macro cells, such as adder, muxes, memory, and
special flip-flops
v The designer would first understand the architectural
description. Then he/she would consider design
constraints such as timing, area, testability, and power
pp. 2
What is logic synthesis
v Synthesis = translation + optimization + mapping
residue = 16’h0000;
Translate
if ( high_bits == 2’b10) residue =
state_table[index]; else
state_table[index] =16’h0000;
Optimize + Map
HDL Source
Generic Boolean
(GTECH)
Target Technology
pp. 3
Synthesis is Constraint Driven
always @(reset or set)

area begin : direct_set_reset
if (reset)
y=1'b0;
Translation else if (set)
y=1'b1;
end
always @(gate or reset)
if (reset)
t=1'b0;
else if (gate)
t=d;
optimization
speed
pp. 4
Technology Independent
v Design can be transferred to any technology
area
Technology A
Technology B
speed
pp. 5
What is logic synthesis(cont.)
Architectural
Description
High-Level
Description Design
Constraints
Computer-Aided
Logic Synthesis
Standard Cell
Optimized Gate- Library
Level Netlist (technology
dependent)
no Meets
Constraints
Basic Computer-Aided Logic
yes Synthesis Process
Place and Route
pp. 6
Impact of Logic Synthesis
v Limitation on manual design
v For large designs, manual conversion was prone human
error, such as a small gate missed somewhere
v The designer could never be sure that the design constraints
were going to be met until the gate-level implementation is
complete and tested
v A significant portion of the design cycle was dominated by
the time taken to convert a high-level design into gates
v Design reuse was not possible
v Each designer would implement design blocks differently.
For large designs, this could mean that smaller blocks were
optimized but the overall design was not optimal
pp. 7
Impact of Logic Synthesis(cont.)
v Automated Logic synthesis tools addressed these problems as
follows
v High-level design is less prone to human error because
designs are described at a higher level of abstraction
v High-level design is done without significant concern about
design constraints
v Conversion from high-level design to gates is fast
v Logic synthesis tools optimize the design as a whole. This
removes the problem with varied designer styles for the
different blocks in the design and suboptimal designs
v Logic synthesis tools allow technology-independent design
v Design reuse is possible for technology-independent
descriptions.
pp. 8
Logic Synthesis
v Takes place in two stages:
v Translation of Verilog (or VHDL) source to a netlist

v Register inference
v Optimization of the resulting netlist to improve

speed and area
v Most critical part of the process
v Algorithms very complicated and beyond the scope of this
class
pp. 9
Logic Optimization
v Netlist optimization the critical enabling technology
v Takes a slow or large netlist and transforms it into one
that implements the same function more cheaply
v Typical operations
v Constant propagation
v Common subexpression elimination
v Function factoring
v Time-consuming operation
v Can take hours for large chips
pp. 10
Translating VHDL into Gates
vParts of the language easy to translate
vStructural descriptions with primitives
Already a netlist
vContinuous assignment
Expressions turn into little datapaths
vBehavioral statements the bigger challenge
pp. 11
What Can Be Translated
v Structural definitions
v Everything
v Behavioral blocks
v Depends on sensitivity list
v Only when they have reasonable interpretation as
combinational logic, edge, or level-sensitive latches
v Blocks sensitive to both edges of the clock, changes on
unrelated signals, changing sensitivity lists, etc. cannot be
synthesized
v User-defined primitives
v Primitives defined with truth tables
v Some sequential UDPs can’t be translated (not latches or
flip-flops)
pp. 12
What Isn’t Translated
v Initial blocks
v Used to set up initial state or describe finite testbench stimuli
v Don’t have obvious hardware component
v Delays
v May be in the Verilog source, but are simply ignored
v A variety of other obscure language features
v In general, things heavily dependent on discrete-
event simulation semantics
v Certain “disable” statements
v Pure events
pp. 13
Compile: the “Art” of Synthesis
vcompile command is design optimization
vLogic level Optimization
vflatten (off by default ):removes structure
vstructure : minimizes generic logic
vGate level Optimization
vmap : makes design technology dependent
pp. 14
Compile
pp. 15
Compile
pp. 16
Logic Level Optimization
vOperate with Boolean representation of
a circuit
vHas a global effect on the overall
area/speed characteristic of a design
vStrategy
vStructure
vFlatten
vIf both are true, the design is first flattened
and then structured
pp. 17
Gate Level Optimization
vSelect components to meet timing, design
rule & area goals specified for the circuit
vHas a local effect on the area/speed
characteristics of a design
vStrategy
vMapping
Combination mapping
Sequential Mapping
pp. 20
Combinational vs. Sequential Mapping
Combinational Mapping Sequential Mapping
v Mapping rearranges v Optimize the mapping to
components, combining and sequential cells from
re-combining logic into technology library
different components v Analyze combinational
v May use different algorithms surrounding a sequential cell
such as cloning, resizing or to see if it can absorb the
buffering logic attribute with HDL
v Try to meet the design rule v Try to save speed and area
constraints and timing/area by using a more complex
goals sequential cell
pp. 21
Mapping
Combinational mapping Sequential mapping
pp. 22
Design Methodology
pp. 23
Design Flow
v 1. Write a design description in the Verilog language. This
description can be a combination of structural and functional
elements. This description is used with both the Synopsys HDL
Compiler and the Verilog simulator.
v 2. Provide Verilog-language test drivers for the Verilog HDL
simulator. The drivers supply test vectors for simulation and
gather output data.
v 3. Simulate the design by using a Verilog HDL simulator. Verify
that the description is correct.
v 4. Synthesize the HDL description with HDL Compiler. HDL
Compiler performs architectural optimizations, then creates an
internal representation of the design.
pp. 24
Design Flow
v 5. Use Synopsys Design Compiler to produce an optimized
gate-level description in the target ASIC library. You can
optimize the generated circuits to meet the timing & area
constraints wanted.
v 6. Use Synopsys Design Compiler to output a gate-level Verilog
description. This netlist-style description uses ASIC components
as the leaf-level cells of the design. The gate-level description
has the same port and module definitions as the original high-
level Verilog description.
v 7. Use the original Verilog simulation drivers from Step 2
because module and port definitions are preserved.
v 8. Compare the output of the gate-level simulation with the
output of the original Verilog description simulation to verify that
the implementation is correct.
pp. 25
Basic Logic Design with VHDL
• Agenda
Combinational Logic Review
• Combinational logic circuits are memoryless
• No feedback path
• Output can have multiple logical transitions before settling to
correct value
146
Boolean Equations in VHDL
• Boolean equations and truth tables are both valid ways to
define a function (f = ???)
• Use logical operators in signal assignment statements
147
Boolean Equation Example
148
Binary Coding
• How do we represent information with more than two possible
values?
– eg, numbers
– N voltage levels? — No.
• Multiple binary signals (multiple bits)
• (a1, a0): (0, 0), (0, 1), (1, 0), (1, 1)
– This is a binary code
– Each pair of values is a code word
– Uses two signal wires for a1, a0
• Code Word Size
– An n-bit code has 2n code words
– To represent N possible values
• Need at least ⎡log2N⎤ code word bits
• More bits can be useful in some cases
• Example: code for inkjet printer
– black, cyan, magenta, yellow, red, blue
– six values, ⎡log26⎤ = 3
– black: (0, 0, 1), cyan: (0, 1, 0), magenta: (0, 1, 1), yellow: (1, 0, 0), red: (1, 0, 1), blue: (1, 1, 0)
149
One-Hot Codes
• Each code word has exactly one 1 bit
• Traffic light:
– red: (1,0,0), yellow: (0,1,0), green: (0,0,1)
– Three signal wires: red, yellow, green g,y,g
• Each bit of a one-hot code corresponds to an encoded value
– No hardware needed to decode values
150
Binary Codes in VHDL
• Multiple bits represented by a vector
• signal s: std_logic_vector(4 downto 0);
– This is a five-element signal
– s(4), s(3), s(2), s(1), s(0)
• signal a: std_logic_vector(1 to 3);
– This is a three-element signal
– a(1), a(2), a(3)
151
Binary Coding Example
152
Combinational Logic Design with VHDL
• Agenda
1. Decoders/Encoders
2. Multiplexers/Demultiplexers
3. Tri-State Buffers
4. Comparators
5. Adders (Ripple Carry, Carry-Look-Ahead)
6. Subtraction
7. Multiplication
8. Division (brief overview)
Integrated Circuit Scaling
• Integrated Circuit Scales
Example # of Transistors
SSI - Small Scale Integrated Circuits Individual Gates 10's
MSI - Medium Scale Integrated Circuits Mux, Decoder 100's
LSI - Large Scale Integrated Circuits RAM, ALU's 1k - 10k
VLSI - Very Large Scale Integrated Circuits uP, uCNT 100k - 1M
ULSI - Ultra Large Scale Integrated Circuits Modern uP's > 1M
SoC - System on Chip Microcomputers
SoP - System on Package Different technology blending
- we use the terms SSI and MSI. Everything larger is typically just called "VLSI"
- VLSI covers design that can't be done using schematics or by hand.
154
Decoders
• Decoders
- a decoder has n inputs and 2n outputs
- one and only one output is asserted for a given input combination
ex) truth table of decoder
Input Output
00 0001
01 0010
10 0100
11 1000
- these are key circuits for a Address Decoders
155
Decoder
• Decoder Structure
- The output stage of a decoder can be constructed using AND gates

- Inverters are needed to give the appropriate code to each AND gate
- Using AND/INV structure, we need:
2n AND gates
n Inverters
Showing more inverters

than necessary to illustrate
concept
156
Decoders
• Decoders with ENABLES
- An Enable line can be fed into the AND gate
- The AND gate now needs (n+1) inputs
- Using positive logic:
EN = 0, Output = 0
EN =1, Output depends on input code
157
Decoders
• Decoder Example
- Let's design a 2-to-4 Decoder using Structural VHDL
- We know we need to describe the following structure:
- We know what we'll need:
2n AND gates = 4 AND gates

n Inverters = 2 Inverters Showing more inverters
than necessary to illustrate
concept
158
Decoder
• Decoder Example
- Let's design the inverter using concurrent signal assignments….
entity inv is
end entity inv;
architecture inv_arch of inv is

begin
Out1 <= not In1;
end architecture inv_arch;
159
Decoders
• Decoder Example
- Let's design the AND gate using concurrent signal assignments….
entity and2 is
port (In1,In2 : in STD_LOGIC;
end entity and2;
architecture and2_arch of and2 is

begin
160
Decoders
• Decoder Example
- Now let's work on the top level design entity called "decoder_2to4"
entity decoder_2to4 is
port (A,B : in STD_LOGIC;
Y0,Y1,Y2,Y3 : out STD_LOGIC);
end entity decoder_2to4;
161
Decoders
• Decoder Example
- Now let's work on the top level design architecture called "decoder_2to4_arch"
architecture decoder_2to4 _arch of decoder_2to4 is
signal A_n, B_n : STD_LOGIC;
component inv
end component;
component and2
port (In1,In2 : in STD_LOGIC;
end component;
begin
………
162
Decoders
• Decoder Example
- cont….
begin
U1 : inv port map (A, A_n);
U2 : inv port map (B, B_n);
U3 : and2 port map (A_n, B_n, Y0);

U4 : and2 port map (A, B_n, Y1);
U5 : and2 port map (A_n, B, Y2);
U6 : and2 port map (A, B, Y3);
end architecture decoder_2to4 _arch;
163
Decoder Example
164
Encoders
• Encoder
- an encoder has 2n inputs and n outputs
- it assumes that one and only one input will be asserted
- depending on which input is asserted, an output code will be generated
- this is the exact opposite of a decoder
ex) truth table of binary encoder
Input Output
0001 00
0010 01
0100 10
1000 11
165
Encoders
• Encoder
- an encoder output is a simple OR structure that looks at the incoming signals
ex) 4-to-2 encoder
I3 I2 I1 I0 Y1 Y0
0 0 0 1 0 0
0 0 1 0 0 1
0 1 0 0 1 0
1 0 0 0 1 1
Y1 = I3 + I2
Y0 = I3 + I1
166
Encoders
• Encoders in VHDL
- 8-to-3 binary encoder modeled with Structural VHDL
entity encoder_8to3_binary is
generic (t_delay : time := 1.0 ns);
port (I : in STD_LOGIC_VECTOR (7 downto 0);
Y : out STD_LOGIC_VECTOR (2 downto 0) );
end entity encoder_8to3_binary;
architecture encoder_8to3_binary_arch of encoder_8to3_binary is
component or4 port (In1,In2,In3,In4: in STD_LOGIC; Out1: out STD_LOGIC); end component;
begin
U1 : or4 port map (In1 => I(1), In2 => I(3), In3 => I(5), In4 => I(7), Out1 => Y(0) );
end architecture encoder_8to3_binary_arch;
167
Encoders
entity encoder_8to3_binary is
• Encoders in VHDL generic (t_delay : time := 1.0 ns);
- 8-to-3 binary encoder modeled Y : out STD_LOGIC_VECTOR (2 downto 0) );
with Behavioral VHDL end entity encoder_8to3_binary;
architecture encoder_8to3_binary_arch of encoder_8to3_binary is

begin
ENCODE : process (I)
begin
case (I) is
when "00000001" => Y <= "000";
when "00000010" => Y <= "001";
when "00000100" => Y <= "010";
when "00001000" => Y <= "011";
when "00010000" => Y <= "100";
when "00100000" => Y <= "101";
when "01000000" => Y <= "110";
when "10000000" => Y <= "111";
when others => Y <= "ZZZ";
end case;
end process ENCODE;
end architecture encoder_8to3_binary_arch;
168
Encoder Example
169
Priority Encoders
• Priority Encoder
- a generic encoder does not know what to do when multiple input bits are asserted
- to handle this case, we need to include prioritization
- we decide the list of priority (usually MSB to LSB) where the truth table can be written as follows:
ex) 4-to-2 encoder I3 I2 I1 I0 Y1 Y0

1 x x x 1 1
0 1 x x 1 0
0 0 1 x 0 1
0 0 0 1 0 0
- we can then write expressions for an intermediate stage of priority bits “H” (i.e., Highest Priority):
H3 = I3
H2 = I2∙I3’
H1 = I1∙I2’∙I3’
H0 = I0∙I1’∙I2’∙I3’
- the final output stage then becomes:
Y1 = H3 + H2
Y0 = H3 + H1
170
Priority Encoders
• Priority Encoders in VHDL
- 8-to-3 binary priority encoder modeled entity encoder_8to3_priority is

with Behavioral VHDL generic (t_delay : time := 1.0 ns);
Y : out STD_LOGIC_VECTOR (2 downto 0) );
- If/Then/Else statements give priority
end entity encoder_8to3_priority;
- Concurrent Conditional Signal
Assignments give priority
architecture encoder_8to3_priority_arch of encoder_8to3_priority is
begin
Y <= "111" when I(7) = '1' else -- highest priority code
"110" when I(6) = '1' else
"101" when I(5) = '1' else
"100" when I(4) = '1' else
"011" when I(3) = '1' else
"010" when I(2) = '1' else
"001" when I(1) = '1' else
"000" when I(0) = '1' else -- lowest priority code
"ZZZ";
end architecture encoder_8to3_priority_arch;
171
Priority Encoder Example
172
Seven-Segment Decoder
173
Multiplexer
• Multiplexer
- gates are combinational logic which generate an output depending on the current inputs
- what if we wanted to create a “Digital Switch” to pass along the input signal?
- this type of circuit is called a “Multiplexer”
ex) truth table of Multiplexer
Sel Out
0 A
1 B
174
Multiplexer
• Multiplexer
- we can use the behavior of an AND gate to build this circuit:
X∙0 = 0 “Block Signal”

X∙1 = X “Pass Signal”
- we can then use the behavior of an OR gate at the output state (since a 0 input has no effect)
to combine the signals into one output
175
Multiplexer
• Multiplexer
- the outputs will track the selected input
- this is in effect, a “Switch”
ex) truth table of Multiplexer
Sel AB Out
0 0x 0
0 1x 1
1 x0 0
1 x1 1
- an ENABLE line can also be fed into each AND gate
176
Multiplexer
• Multiplexers in VHDL
- Structural entity mux_4to1 is

Model port (D : in STD_LOGIC_VECTOR (3 downto 0);
Sel : in STD_LOGIC_VECTOR (1 downto 0);
Y : out STD_LOGIC);
end entity mux_4to1;
signal Sel_n : STD_LOGIC_VECTOR (1 downto 0);

signal U3_out, U4_out, U5_out, U6_out : STD_LOGIC;
component inv1 port (In1: in STD_LOGIC; Out1: out STD_LOGIC); end component;
component and3 port (In1,In2,In3 : in STD_LOGIC; Out1: out STD_LOGIC); end component;
begin
U1 : inv1 port map (In1 => Sel(0), Out1 => Sel_n(0));
U3 : and3 port map (In1 => D(0), In2 => Sel_n(1), In3 => Sel_n(0), Out1 => U3_out);
U4 : and3 port map (In1 => D(1), In2 => Sel_n(1), In3 => Sel(0), Out1 => U4_out);
U5 : and3 port map (In1 => D(2), In2 => Sel(1), In3 => Sel_n(0), Out1 => U5_out);
U6 : and3 port map (In1 => D(3), In2 => Sel(1), In3 => Sel(0), Out1 => U6_out);
U7 : or4 port map (In1 => U3_out, In2 => U4_out, In3 => U5_out, In4 => U6_out, Out1 => Y);
177
Multiplexer
• Multiplexers in VHDL
entity mux_4to1 is
- Structural port (D : in STD_LOGIC_VECTOR (3 downto 0);
Model Sel : in STD_LOGIC_VECTOR (1 downto 0);
w/ EN EN : in STD_LOGIC;
Y : out STD_LOGIC);

signal U3_out, U4_out, U5_out, U6_out : STD_LOGIC;
component and4 port (In1,In2,In3,In4: in STD_LOGIC; Out1: out STD_LOGIC); end component;
begin
U3 : and4 port map (In1 => D(0), In2 => Sel_n(1), In3 => Sel_n(0), In4 => EN, Out1 => U3_out);
U4 : and4 port map (In1 => D(1), In2 => Sel_n(1), In3 => Sel(0), In4 => EN, Out1 => U4_out);
U5 : and4 port map (In1 => D(2), In2 => Sel(1), In3 => Sel_n(0), In4 => EN, Out1 => U5_out);
U6 : and4 port map (In1 => D(3), In2 => Sel(1), In3 => Sel(0), In4 => EN, Out1 => U6_out);
U7 : or4 port map (In1 => U3_out, In2 => U4_out, In3 => U5_out, In4 => U6_out, Out1 => Y);
178
Multiplexer
• Multiplexers in VHDL entity mux_4to1 is
port (D : in STD_LOGIC_VECTOR (3 downto 0);
- Behavioral Model w/ EN EN : in STD_LOGIC;
Y : out STD_LOGIC);

begin
MUX : process (D, Sel, EN)
begin
if (EN = '1') then

case (Sel) is
when "00" => Y <= D(0);
when "01" => Y <= D(1);
when "10" => Y <= D(2);
when "11" => Y <= D(3);
when others => Y <= 'Z';
end case;
else
Y <= 'Z';
end if;
end process MUX;

179
Multiplexer Example
180
Multi-bit Mux Example
181
Demultiplexer
• Demultiplexer
- this is the exact opposite of a Mux
- a single input will be routed to a particular output pin depending on the Select setting
ex) truth table of Demultiplexer
Sel Y0 Y1
0 In 0
1 0 In
182
Demultiplexer
• Demultiplexer
- we can again use the behavior of an AND gate to “pass” or “block” the input signal
- an AND gate is used for each Demux output
183
Demultiplexer
• Demultiplexers in VHDL
- Structural entity demux_1to4 is

Model port (D : in STD_LOGIC;
EN : in STD_LOGIC;
Y : out STD_LOGIC_VECTOR (3 downto 0));
end entity demux_1to4;
architecture demux_1to4_arch of demux_1to4 is
begin
U3 : and4 port map (In1 => D, In2 => Sel_n(1), In3 => Sel_n(0), In4 => EN, Out1 => Y(0));
U4 : and4 port map (In1 => D, In2 => Sel_n(1), In3 => Sel(0), In4 => EN, Out1 => Y(1));
U5 : and4 port map (In1 => D, In2 => Sel(1), In3 => Sel_n(0), In4 => EN, Out1 => Y(2));
U6 : and4 port map (In1 => D, In2 => Sel(1), In3 => Sel(0), In4 => EN, Out1 => Y(3));
end architecture demux_1to4_arch;
184
Demultiplexer
• Demultiplexers in VHDL entity demux_1to4 is
port (D : in STD_LOGIC;
- Behavioral Model with High Z Outputs
EN : in STD_LOGIC;
Y : out STD_LOGIC_VECTOR (3 downto 0));
end entity demux_1to4;
architecture demux_1to4_arch of demux_1to4 is

begin
DEMUX : process (D, Sel, EN)
begin
if (EN = '1') then

case (Sel) is
when "00" => Y <= 'Z' & 'Z' & 'Z' & D;
when "01" => Y <= 'Z' & 'Z' & D & 'Z';
when "10" => Y <= 'Z' & D & 'Z' & 'Z';
when "11" => Y <= D & 'Z' & 'Z' & 'Z';
when others => Y <= "ZZZZ";
end case;
else
Y <= "ZZZZ";
end if;
end process DEMUX;
end architecture demux_1to4_arch;
185
Tri-State Buffers
• Tri-State Buffers
- Provides either a Pass-Through or High Impedance Output depending on Enable Line
- High Impedance (Z) allows the circuit to be connected to a line with multiple circuits driving/receiving
- Using two Tri-State Buffers creates a "Bus Transceiver"
- This is used for "Multi-Drop" Buses (i.e., many Drivers/Receivers on the same bus)
ex) truth table of Tri-State Buffer ex) truth table of Bus Transceiver
ENB Out Tx/Rx Mode

0 Z 0 Receive from Bus (Rx)
1 In 1 Drive Bus (Tx)
186
Tri-State Buffers
• Tri-State Buffers in VHDL
- 'Z' is a resolved value in the STD_LOGIC data type defined in Package STD_LOGIC
-Z&0=0
-Z&1=1
-Z&L=L
-Z&H=H
TRISTATE: process (In1, ENB)
begin
if (ENB = '1') then
Out1 <= 'Z';
else
Out1 <= In1;
end if;
end process TRISTATE;
187
Comparators
• Comparators
- a circuit that compares digital values (i.e., Equal, Greater Than, Less Than)
- we are considering Digital Comparators (Analog comparators also exist)
- typically there will be 3-outputs, of which only one is asserted
- whether a bit is EQ, GT, or LT is a Boolean expression
- a 2-Bit Digital Comparator would look like:
(A=B) (A>B) (A<B)
AB EQ GT LT
0 0 1 0 0 EQ = (AB)'
0 1 0 0 1 GT = A·B'
1 0 0 1 0 LT = A'·B
1 1 1 0 0
188
Comparators
• Non-Iterative Comparators
- "Iterative" refers to a circuit make up of identical blocks. The first block performs its operation which
produces a result used in the 2nd block and so on.
- this can be thought of as a "Ripple" effect
- Iterative circuits tend to be slower due to the ripple, but take less area
- Non-Iterative circuits consist of combinational logic executing at the same time
"Equality"
- since each bit in a vector must be equal, the outputs of each bit's compare can be AND'd
- for a 4-bit comparator:
EQ = (A3B3)' · (A2B2)' · (A1B1)' · (A0B0)'
189
Comparators
"Greater Than"
- we can start at the MSB (n) and check whether An>Bn.
- If it is, we are done and can ignore the rest of the LSB's.
- If it is NOT, but they are equal, we need to check the next MSB bit (n-1)
- to ensure the previous bit was equal, we include it in the next LSB's logic expression:
Steps - GT = An·Bn' (this is ONLY true if An>Bn)

- if it is NOT GT, we go to the n-1 bit assuming that An= Bn (An  Bn)'
- we consider An-1>Bn-1 only when An= Bn [i.e., (An  Bn)' · (An-1·Bn-1') ]
- we continue this process through all of the bits
- 4-bit comparator
GT = (A3·B3') +
(A3B3)' · (A2·B2') +
(A3B3)' · (A2B2)' · (A1·B1') +
(A3B3)' · (A2B2)' · (A1B1)' · (A0·B0')
190
Comparators
"Less Than"
- since we assume that if the vectors are either EQ, GT, or LT, we can create LT using:
LT = EQ' · GT'
• Iterative Comparators
- we can build an iterative comparator by passing signals between identical modules from MSB to LSB
ex) module for 1-bit comparator
EQout = (AB)' · EQin
- EQout is fed into the EQin port of the next LSB module
- the first iterative module has EQin set to '1'
191
Comparators
• Comparators in VHDL
- Structural Model
entity comparator_4bit is
port (In1, In2 : in STD_LOGIC_VECTOR (3 downto 0);

EQ, LT, GT : out STD_LOGIC);
end entity comparator_4bit;
architecture comparator_4bit_arch of comparator_4bit is
signal Bit_Equal : STD_LOGIC_VECTOR (3 downto 0);

signal Bit_GT : STD_LOGIC_VECTOR (3 downto 0);
signal In2_n : STD_LOGIC_VECTOR (3 downto 0);
signal In1_and_In2_n : STD_LOGIC_VECTOR (3 downto 0);
signal EQ_temp, GT_temp : STD_LOGIC;
component xnor2 port (In1,In2: in STD_LOGIC; Out1: out STD_LOGIC); end component;
component nor2 port (In1,In2: in STD_LOGIC; Out1: out STD_LOGIC); end component;
component and2 port (In1,In2: in STD_LOGIC; Out1: out STD_LOGIC); end component;
component and3 port (In1,In2,In3: in STD_LOGIC; Out1: out STD_LOGIC); end component;
192
Comparators
begin
-- "Equal" Circuitry
Cont… XN0 : xnor2 port map (In1(0), In2(0), Bit_Equal(0)); -- 1st level of XNOR tree
XN1 : xnor2 port map (In1(1), In2(1), Bit_Equal(1));
AN0 : and4 port map (Bit_Equal(0), Bit_Equal(1), Bit_Equal(2), Bit_Equal(3), Eq); -- 2nd level of "Equal" Tree
AN1 : and4 port map (Bit_Equal(0), Bit_Equal(1), Bit_Equal(2), Bit_Equal(3), Eq_temp);
-- "Greater Than" Circuitry

IV0 : inv1 port map (In2(0), In2_n(0)); -- creating In2'
IV1 : inv1 port map (In2(1), In2_n(1));
AN2 : and2 port map (In1(3), In2_n(3), In1_and_In2_n(3)); -- creating In1 & In2'
AN3 : and2 port map (In1(2), In2_n(2), In1_and_In2_n(2));
AN6 : and2 port map (Bit_Equal(3), In1_and_In2_n(2), Bit_GT(2));
AN7 : and3 port map (Bit_Equal(3), Bit_Equal(2), In1_and_In2_n(1), Bit_GT(1));
AN8 : and4 port map (Bit_Equal(3), Bit_Equal(2), Bit_Equal(1), In1_and_In2_n(0), Bit_GT(0));
OR0 : or4 port map (In1_and_In2_n(3), Bit_GT(2), Bit_GT(1), Bit_GT(0), GT);
OR1 : or4 port map (In1_and_In2_n(3), Bit_GT(2), Bit_GT(1), Bit_GT(0), GT_temp);
-- "Less Than" Circuitry

ND0 : nor2 port map (EQ_temp, GT_temp, LT);
end architecture comparator_4bit_arch;
193
Comparators
- Behavioral Model
entity comparator_4bit is
port (In1, In2 : in STD_LOGIC_VECTOR (3 downto 0);

EQ, LT, GT : out STD_LOGIC);
end entity comparator_4bit;
architecture comparator_4bit_arch of comparator_4bit is

begin
COMPARE : process (In1, In2)

begin
EQ <= '0'; LT <= '0'; GT <= '0'; -- initialize outputs to '0'
if (In1 = In2) then EQ <= '1'; end if; -- Equal

if (In1 < In2) then LT <= '1'; end if; -- Less Than
if (In1 > In2) then GT <= '1'; end if; -- Greater Than
end process COMPARE;
end architecture comparator_4bit_arch;
194
Numeric Basics
• Representing and processing numeric data is a common
requirement
– unsigned integers
– signed integers
– fixed-point real numbers
– floating-point real numbers
– complex numbers
195
Unsigned Integers in VHDL
196
Extending/Truncating Unsigned Numbers
197
Increment/Decrement in VHDL
198
Scaling in VHDL
199
Signed Integers in VHDL
200
Resizing Signed Integers
201
Ripple Carry Adder
• Addition – Half Adder
- one bit addition can be accomplished with an XOR gate (modulo sum 2)
0 1 0 1
+0 +0 +1 +1
0 1 1 10
- notice that we need to also generate a “Carry Out” bit
- the “Carry Out” bit can be generated using an AND gate
- this type of circuit is called a “Half Adder”
- it is only “Half” because it doesn’t consider a “Carry In” bit
202
Ripple Carry Adder
• Addition – Full Adder
- to create a full adder, we need to include the “Carry In” in the Sum
Cin A B Cout Sum

0 0 0 0 0
0 0 1 0 1 Sum = A  B  Cin
0 1 0 0 1 Cout = Cin∙A + A∙B + Cin∙B
0 1 1 1 0
1 0 0 0 1
1 0 1 1 0
1 1 0 1 0
1 1 1 1 1
- you could also use two "Half Adders" to accomplish the same thing
203
Ripple Carry Adder
• Addition – Ripple Carry Adder
- cascading Full Adders together will allow the Cout’s to propagate (or Ripple) through the circuit
- this configuration is called a Ripple Carry Adder
204
Ripple Carry Adder
- What is the delay through the Full Adder?
- Each Full Adder has the following logic:
Sum = A  B  Cin
Cout = Cin∙A + A∙B + Cin∙B
- tFull-Adder will be the longest combinational logic delay path in the adder
205
Ripple Carry Adder
- What is the delay through the entire iterative circuit?
- Each Full Adder has the following logic:
tRCA = n·tFull-Adder
- the delay increases linearly with the number of bits
- different topologies within the full-adder to reduce delay (Δt) will have a n·Δt effect
206
Carry Look Ahead Adders
• Addition – Carry Look Ahead Adder
- We've seen a Ripple Carry Adder topology (RCA)
- this is good for simplicity and design-reuse
- however, the delay increases linearly with the number of bits
tRCA = n·tFull-Adder
- different topologies within the full-adder to reduce delay (Δt) will have a n·Δt effect
- the linear increase in delay comes from waiting for the Carry to Ripple through
207
- to avoid the ripple, we can build a Carry Look-Ahead Adder (CLA)
- this circuit calculates the carry for all Full-Adders at the same time
- we define the following intermediate stages of a CLA:
Generate "g", an adder (i) generates a carry out (Ci+1)under input conditions Ai and Bi
independent of Ai-1, Bi-1, or Carry In (Ci)
Ai Bi Ci+1
0 0 0
0 1 0 we can say that: gi = Ai·Bi
1 0 0
1 1 1 remember, g does NOT consider carry in (Ci)
208
Propagate "p", an adder (i) will propagate (or pass through) a carry in (Ci) depending on input
conditions Ai and Bi,:
Ci Ai Bi Ci+1
0 0 0 0
0 0 1 0 pi is defined when there is a carry in,
0 1 0 0 so we ignore the row entries where Ci=0
0 1 1 1
1 0 0 0 if we only look at the Ci=1 rows
1 0 1 1 we can say that:
1 1 0 1 pi = (Ai+Bi)·Ci
1 1 1 1
209
- said another way, Adder(i) will "Generate" a Carry Out (Ci+1) if:
gi = Ai·Bi
and it will "Propagate" a Carry In (Ci) when
pi = (Ai+Bi)·Ci
- a full expression for the Carry Out (Ci+1) in terms of p and g is given by:
Ci+1 = gi+pi·Ci
- this is good, but we still generate Carry's dependant on previous stages (i-1) of the iterative circuit
210
- We can eliminate this dependence by recursively expanding each Carry Equation
ex) 4 bit Carry Look Ahead Logic
C1 = g0+p0·C0 (2-Level Product-of-Sums)
C2 = g1+p1·C1
C2 = g1+p1·(g0+p0·C0)
C2 = g1+p1·g0+p1·p0·C0 (2-Level Product-of-Sums)
C3 = g2+p2·C2
C3 = g2+p2·(g1+p1·g0+p1·p0·C0)
C3 = g2+p2·g1+p2·p1·g0+p2·p1·p0·C0 (2-Level Product-of-Sums)
C4 = g3+p3·C3
C4 = g3+p3·(g2+p2·g1+p2·p1·g0+p2·p1·p0·C0)
C4 = g3+p3·g2+p3·p2·g1+p3·p2·p1·g0+p3·p2·p1·p0·C0 (2-Level Product-of-Sums)
- this gives us logic expressions that can generate a next stage carry based upon ONLY
the inputs to the adder and the original carry in (C0)
211
- the Carry Look Ahead logic has 3 levels
1) g and p logic
2) product terms in the Ci equations
3) sum terms in the Ci equations
- the Sum bits require 2 levels of Logic
1) AiBiCi NOTE: A Full Adder made up of 2 Half Adders

has 3 levels. But the 3rd level is used in the
creation of the Carry Out bit. Since we do not
use it in a CLA, we can ignore that level.
- So a CLA will have a total of 5 levels of Logic
212
- the 5 levels of logic are fixed no matter how many bits the adder is (really?)
- In reality, the most significant Carry equation will have i+1 inputs into its largest sum/product term
- this means that Fan-In becomes a problem since real gates tend to not have more than 4-6 inputs
- When the number of inputs gets larger than the Fan-In, the logic needs to be broken into another level
ex) A+B+C+D+E = (A+B+C+D)+E
- In the worst case, the logic Fan-In would be 2. Even in this case, the delay associated with the
Carry Look Ahead logic would be proportional to log2(n)
- Area and Power are also concerns with CLA's. Typically CLA's are used in computationally intense
applications where performance outweighs Power and Area.
213
• Adders in VHDL
- (+) and (-) are not defined for STD_LOGIC_VECTOR
- The Package STD_LOGIC_ARITH gives two data types:
UNSIGNED (3 downto 0) := "1111"; -- +15

SIGNED (3 downto 0) := "1111"; -- -1
- these are still resolved types (STD_LOGIC), but the equality and arithmetic operations are slightly
different depending on whether you are using Signed vs. Unsigned
• Considerations
- when adding signed and unsigned numbers, the type of the result will dictate how the operands are
handled/converted
- if assigning to an n-bit, SIGNED result, an n-1 UNSIGNED operand will automatically be converted
to signed by extending its vector length by 1 and filling it with a sign bit (0)
214
• Adders in VHDL
ex) A,B : in UNSIGNED (7 downto 0);

C : in SIGNED (7 downto 0);
D : in STD_LOGIC_VECTOR (7 downto 0);
S : out UNSIGNED (8 downto 0);

T : out SIGNED (8 downto 0);
U : out SIGNED (7 downto 0);
S(7 downto 0) <= A + B; -- 8-bit UNSIGNED addition, not considering Carry
S <= ('0' & A) + ('0' & B); -- manually increasing size of A and B to include Carry.
Carry will be kept in S(9)
T <= A + C; -- T is SIGNED, so A's UNSIGNED vector size is increased

by 1 and filled with '0' as a sign bit
U <= C + SIGNED(D); -- D is converted (considered) to SIGNED, not increased in size

U <= C + UNSIGNED(D); -- D is converted (considered) to UNSIGNED, not increased in size
215
Subtraction
• Half Subtractor
- one bit subtraction can be accomplished using combinational logic
(A-B) A B Bout D
0 0 0 0
0 1 1 1 D =AB
1 0 0 1 Bout = A'·B
1 1 0 0
216
Subtraction
• Full Subtractor
- to create a full Subtractor, we need to include the “Borrow In” in the Difference
(A-B-Bin) A B Bin Bout D

0 0 0 0 0
0 0 1 1 1 D = A  B  Bin
0 1 0 1 1 Bout = A'∙B + A'∙Bin + B∙Bin
0 1 1 1 0
1 0 0 0 1
1 0 1 0 0
1 1 0 0 0
1 1 1 1 1
- notice this is very similar to addition.
- The Sum and Difference Logic are identical
- The Carry and Borrow Logic are close
217
Subtraction
• Subtraction
- Can we manipulate the subtraction logic so that Full Adders can be used as Full Subtractors?
Addition Subtraction
S = A  B  Cin D = A  B  Bin
Cout = A∙B + A∙Cin + B∙Cin Bout = A'∙B + A'∙Bin + B∙Bin
- Let's manipulate Bout to try to get it into a form similar to Cout
Bout = A'∙B + A'∙Bin + B∙Bin
Bout' = (A+B') ∙ (A+Bin') ∙ (B'+Bin') Generalized DeMorgan's Theorem
Now Multiply Out the Terms
Bout' = (A∙A∙B')+(A∙B'∙Bin')+(A∙B'∙B')+(B'∙B'∙Bin')+(A∙A∙Bin')+(A∙Bin'∙Bin')+(A∙B'∙Bin')+(B'∙Bin'∙Bin')
Now Remove Redundant Terms

Bout' = (A∙B')+(A∙B'∙Bin')+(A∙Bin')+(B'∙Bin')
Bout' = (A∙B')+(A∙Bin')+(B'∙Bin')
218
Subtraction
• Subtraction
- Now we have similar expressions for Cout and Bout where
Cout = A∙B + A∙Cin + B∙Cin Bout' = A∙B' + A∙Bin' + B'∙Bin'
- But this requires the Subtrahend and Bin be inverted, how does this effect the Sum/Difference Logic?
S = A  B  Cin D = A  B  Bin
- remember that both inputs of a 2-input XOR can be inverted without changing the logic
function which gives us:
S = A  B  Cin D = A  B'  Bin'
219
Subtraction
• Subtraction
- After all of this manipulation, we are left with
S = A  B  Cin D = A  B'  Bin'

Cout = A∙B + A∙Cin + B∙Cin Bout' = A∙B' + A∙Bin' + B'∙Bin'
- This means we can use "Full Adders" for subtraction as long as:
1) The Subtrahend is inverted

2) Bin is inverted
3) Bout is inverted
- In a ripple carry subtractor, intermediate Bout's are fed into Bin's, which is a double inversion
- We can now invert by the first Bin and the last Bout by inserting a '1' into the first Bin of the chain
220
Subtraction
• Subtraction
- this gives us the minimal logic for a "Ripple Carry Subtractor" using "Full Adders"
X-Y
221
Adders/Subtractors in VHDL
222
Signed Addition in VHDL
223
Multipliers
• Multipliers
- binary multiplication of an individual bit can be performed using combinational logic:
A*B P
0 0 0
0 1 0 we can say that: P = A·B
1 0 0
1 1 1
- for multi-bit multiplication, we can mimic the algorithm that we use when doing multiplication by hand
ex) 12 this number is the "Multiplicand"

x 34 this number is the "Multiplier"
48 1) multiplicand for digit (0)
+36 2) multiplicand for digit (1)
408 3) Sum of all multiplicands
- this is called the "Shift and Add" algorithm
224
Multipliers
• "Shift and Add" Multipliers
- example of Binary Multiplication using our "by hand" method
11 1011 - multiplicand
x 13 x 1101 - multiplier
33 1011
11 0000 - these are the individual multiplicands
1011
+ +1011
1 4 3 10001111 - the final product is the sum of all multiplicands
- this is simple and straight forward. BUT, the addition of the individual multiplicand products requires
as many as n-inputs.
- we would really like to re-use our Full Adder circuits, which only have 3 inputs.
225
Multipliers
- we can perform the additions of each multiplicand after it is created
- this is called a "Partial Product"
- to keep the algorithm consistent, we use "0000" as the first Partial Product
1011 - Original multiplicand

x 1101 - Original multiplier
0000 - Partial Product for 1st multiply
1011 - Shifted Multiplicand for 1st multiply
1011 - Partial Product for 2nd multiply
0000 - Shifted Multiplicand for 2nd multiply
01011 - Partial Product for 3rd multiply
1011 - Shifted Multiplicand for 3rd multiply
110111 - Partial Product for 4th multiply
1011 - Shifted Multiplicand for 4th multiply
10001111 - the final product is the sum of all multiplicands
226
Multipliers
- Graphical view of product terms and summation
227
Multipliers
- Graphical View of interconnect for an 8x8 multiplier. Note the Full Adders
228
Multipliers
• "Sequential" Multipliers
- the main speed limitation of the Combinational "Shift and Add" multiplier is the delay through the
adder chain.
- in the worst case, the number of delay paths through the adders would be [n + 2(n-2)]
ex) 4-bit = 8 Full Adders

8-bit = 20 Full Adders
- we can decrease this delay by using a register to accumulate the incremental additions as they
take place.
- this would reduce the number of operation states to [n-1]
• "Carry Save" Multipliers
- another trick to speed up the multiplication is to break the carry chain
- we can run the 0th carry from the first row of adders into adder for the 2nd row
- a final stage of adders is needed to recombine the carrys. But this reduces the delay to [n+(n-2)]
229
Multipliers
• "Carry Save" Multipliers
230
Unsigned Multiplication in VHDL
231
Signed Multipliers
• Multipliers
- we leaned the "Shift and Add" algorithm for constructing a combinational multiplier
- but this only worked for unsigned numbers
- we can create a signed multiplier using a similar algorithm
• Convert to Positive
- one of the simplest ways is to first convert any negative numbers to positive, then use the unsigned
multiplier
- the sign bit is added after the multiplication following:
pos x pos = pos Remember 0=pos and 1=neg is 2's comp so this is an XOR
pos x neg = neg
neg x pos = neg
neg x neg = pos
232
Signed Multipliers
• 2's Comp Multiplier
- remember that in a "Shift and Add', we created a shifted multiplicand
- the shifted multiplicand corresponded to the weight of the multiplier bit
- we can use this same technique for 2's comp remembering that
- the MSB of a 2's comp # is -2(n-1)
- we also must remember that 2's comp addition must
- be on same-sized vectors
- the carry is ignored
- we can make partial products the same size as shifted multiplicands by doing a "2's comp sign extend"
ex) 1011 = 11011 = 1110111
- since the MSB has a negative weight, we NEGATE the shifted multiplicand for that bit prior to the
last addition.
233
Signed Multipliers
• 2's Comp Shift and Add Multipliers
- we can perform the additions of each multiplicand after it is created
- this is called a "Partial Product"
- to keep the algorithm consistent, we use "0000" as the first Partial Product
1011 - Original multiplicand

x 1101 - Original multiplier
00000 - Partial Product for 1st multiply w/ Sign Extension
11011 - Shifted Multiplicand for 1st multiply w/ Sign Extension
111011 - Partial Product for 2nd multiply w/ Sign Extension
00000 - Shifted Multiplicand for 2nd multiply w/ Sign Extension
1111011 - Partial Product for 3rd multiply w/ Sign Extension
11011 - Shifted Multiplicand for 3rd multiply w/ Sign Extension
11100111 - Partial Product for 4th multiply w/ Sign Extension
00101 - NEGATED Shifted Multiplicand for 4th multiply w/ Sign Extension
1 00001111 - the final product is the sum of all multiplicands ignore Carry_Out
234
Division
• Division - "Repeated Subtraction"
- a simple algorithm to divide is to count the number of times you can subtract the divisor from the
dividend
- this is slow, but simple
- the number of times it can be subtracted without going negative is the "Quotient"
- if the subtracted value results in a zero/negative number, whatever was left prior to the
subtraction is the "Remainder"
235
Division
• Division - "Shift and Subtract"
- Division is similar to multiplication, but instead of "Shift and Add", we "Shift and Subtract"
236
Fixed-Point in VHDL
• Many applications use non-integers
– especially signal-processing apps
– Fixed-point numbers allow for fractional parts
– represented as integers that are implicitly scaled by a power of 2
• Choosing Range and Precision
– Choice depends on application
– Need to understand the numerical behavior of computations performed
• some operations can magnify quantization
– In DSP: fixed-point range affects dynamic range
– In DSP: precision affects signal-to-noise ratio
• Use numeric_bit with implied scaling
• Use proposed fixed_pkg package
– Currently being standardized by IEEE
– Types ufixed and sfixed yp
– Arithmetic operations, resizing, conversion
237
Floating-Point in VHDL
• Similar to scientific notation for decimal
– e.g., 6.02214199×1023, 1.60217653×10–19
• Allow for larger range, with same
– relative precision throughout the range
• Use proposed float_pkg package

– Currently being standardized by IEEE
– Types float, float32, float64, float128
– Arithmetic operations, resizing, conversion
• Not likely to be synthesizable
– Rather, use to verify results of hand-
optimized circuits
238
Sequential Logic Design with VHDL
• Agenda
1. Flip-Flops & Latches

2. Counters
3. Finite State Machines
4. State Variable Encoding
Model of Sequential Circuits
240
Example
241
Types of Memory Elements
• Flip-Flop
– Latch
– Registers
• Others
– Register Files
– Cache
– Flash memory
– ROM
– RAM
242
D-FF vs. D-Latch
• FF is edge sensitive (can be either positive or negative edge)
– At trigger edge of clock, input transferred to output
• Latch is level sensitive (can be either active-high or active-low)
– When clock is active, input passes to output (transparent)
– When clock is not active, output stays unchanged
243
Important Timing Parameters (1)
244
Important Timing Parameters (2)
245
System Timing: Minimum Period
246
System Timing: Minimum Delay
247
FF Based, Edge Trigger Clocking
• Td = delay of combinational logic
• Tcycle = cycle time of clock
– Duty cycle does not matter
• Timing requirements for Td

– Tdmax < Tcycle –Tsetup – Tcq -> no setup time violation
– Tdmin > Thold – Tcq -> no hold time violation
248
Latch Based, Single Phase Clocking
• Aka. Pulse Mode clocking
• Tcycle = cycle time of clock; Tw = pulse width of clock
• Timing requirements for Td

– Tdmax < Tcycle –Tdq -> data latched correctly
– Tdmin > Tw – Tdq -> no racing through next stage
249
Comparison
• Flip-Flop Based
− Larger in area
− Larger clocking overhead (Tsetup, Tcq)
+ Design more robust
• Only have to worry about Tdmax
• Tdmin usually small, can be easily fixed by buffer
+ Pulse width does not matter
• Latch Based Single Phase
+ Smaller area
+ Smaller clocking overhead ( only Tdq)
− Worry about both Tdmax and Tdmin
– Pulse width DOES matter
(unfortunately, pulse width can vary on chip)
250
Latches
• Latches
– we’ve learned all of the VHDL syntax necessary to describe sequential
storage elements
– Let’s review where sequential devices come from
• SR Latch
- To understand the SR Latch, we must remember the truth table for a NOR Gate AB F
00 1
01 0
10 0
11 0
251
Latches
• SR Latch
- when S=0 & R=0, it puts this circuit into a Bi-stable feedback mode where the output is either:
Q=0, Qn=1 Q=1, Qn=0

0 0
1 0 0 1
0 1
0 1 0
0
AB F AB F
00 1 (U2) 00 1 (U1)
01 0 01 0 (U2)
10 0 (U1) 10 0
11 0 11 0
252
Latches
• SR Latch
- we can force a known state using S & R:
Set (S=1, R=0) Reset (S=0, R=1)

0 1
0 1 1 0
1 0
1 0 1
0
AB F AB F
00 1 (U1) 00 1 (U2)
01 0 01 0 (U1)
10 0 (U2) 10 0
11 0 (U2) 11 0 (U1)
253
Latches
• SR Latch
- we can write a Truth Table for an SR Latch as follows
SR Q Qn .
0 0 Last Q Last Qn - Hold
0 1 0 1 - Reset
1 0 1 0 - Set
1 1 0 0 - Don’t Use
- S=1 & R=1 forces a 0 on both outputs. However, when the latch comes out of this state it is
metastable. This means the final state is unknown.
254
Latches
• S’R’ Latch
- we can also use NAND gates to form an inverted SR Latch
S’ R’ Q Qn .
0 0 1 1 - Don’t Use
0 1 1 0 - Set
1 0 0 1 - Reset
1 1 Last Q Last Qn - Hold
255
Latches
• SR Latch w/ Enable
- we then can add an enable line using NAND gates
- remember the Truth Table for a NAND gate
AB F
00 1 - a 0 on any input forces a 1 on the output
01 1 - when C=0, the two EN NAND Gate outputs are 1, which forces “Last Q/Qn”
10 1 - when C=1, S & R are passed through INVERTED
11 0
256
Latches
• SR Latch w/ Enable
- the truth table then becomes
C SR Q Qn .
1 0 0 Last Q Last Qn - Hold
1 0 1 0 1 - Reset
1 1 0 1 0 - Set
1 1 1 1 1 - Don’t Use
0 x x Last Q Last Qn - Hold
257
Latches
• D Latch
- a modification to the SR Latch where R = S’ creates a D-latch
- when C=1, Q <= D

- when C=0, Q <= Last Value
CD Q Qn .
1 0 0 1 - track
1 1 1 0 - track
0 x Last Q Last Qn - Hold
258
Latches
• VHDL of a D Latch
architecture Dlatch_arch of Dlatch is

begin
LATCH : process (D,C)
begin
if (C=‘1’) then
Q<=D; Qn<=not D;
else
Q<=Q; Qn<=Qn;
end if;
end process;
end architecture;
259
Flip Flops
• D-Flip-Flops
- we can combine D-latches to get an edge triggered storage device (or flop)
- the first D-latch is called the “Master”, the second D-latch the “Slave”
Master Slave
CLK=0, Q<=D “Open” CLK=0, Q<=Q “Close”
CLK=1, Q<=Q “Closed” CLK=1, Q<=D “Open”
- on a rising edge of clock, D is “latched” and held on Q until the next rising edge
260
Flip Flops
• VHDL of a D-Flip-Flop
architecture DFF_arch of DFF is

begin
FLOP : process (CLK)
begin
if (CLK’event and CLK=1) then -- recognized by all synthesizers as DFF
Q<=D; Qn<=not D;
else
Q<=Q; Qn<=Qn;
end if;
end process;
end architecture;
261
Registers
• Store a multi-bit encoded value
– One D-flipflop per bit
– Stores a new value on each clock cycle
262
Register with Enable
• Storage controlled by a clock-enable
– stores only when CE = 1 on a rising edge of the clock
– CE is a synchronous control input
• One flipflop per bit
– clk and CE wired in common
263
Example: Accumulator
• Sum a sequence of signed numbers
– A new number arrives when data_en = 1
– Clear sum to 0 on synch reset
264
Flipflop and Register Variations
265
Shift Registers
• Performs shift operation on stored data
– Arithmetic scaling
– Serial transfer of data
• Example: Sequential Multiplier
– 16×16 multiply over 16 clock cycles, using one adder
– Shift register for multiplier bits
– Shift register for lsb’s of accumulated product
266
Counters
• Counters
- special name of any clocked sequential circuit whose state diagram is a circle
- there are many types of counters, each suited for particular applications
267
Counters
• Binary Counter
- state machine that produces a straight binary count
- for n-flip-flops, 2n counts can be produced
- the Next State Logic "F" is a combinational SOP/POS circuit
- the speed will be limited by the Setup/Hold and Combinational Delay of "F"
- this gives the maximum number of counts for n-flip flops
268
Counters
• Toggle Flop
- a D-Flip-Flop can product a "Divide-by-2" effect by feeding back Qn to D
- this topology is also called a "Toggle Flop"
269
Counters
• Ripple Counter
- Cascaded Toggle Flops can
be used to form rippled counter
- there is no Next State Logic
- this is slower than a straight

binary counter due to waiting
for the "ripple"
- this is good for low power,

low speed applications
270
Counters
• Synchronous Counter with ENABLE
- an enable can be included in a "Synchronous" binary counter using Toggle Flops
- the enabled is implemented by AND'ing the Q output prior to the next toggle flop
- this gives us the "ripple" effect, but also gives the ability to run synchronously
- a little faster, but still less gates than a straight binary circuit
271
Counters
• Shift Register
- a chain of D-Flip-Flops that
pass data to one another
- this is good for "pipelining"
- also good for Serial-to-Parallel

conversion
- for n-flip-flops, the data is

present at the final state after
n clocks
272
Counters
• Ring Counter
- feeding the output of a
shift register back to the
input creates a "ring counter"
- also called a "One Hot"
- The first flip-flop needs to

reset to 1, while the others
reset to 0
- for n flip-flops, there will

be n counts
273
Counters
• Johnson Counter
- feeding the inverted output of a
shift register back to the
input creates a "Johnson Counter"
- this gives more states with the

same reduced gate count
- all flip-flops can reset to 0
- for n flip-flops, there will

be 2n counts
274
Counters
• Linear Feedback Shift Register (LFSR) Counter
- all of the counters based off of shift registers give far less states than the 2n counts that are possible
- a LFSR counter is based off of the theory of finite fields
- created by French Mathematician Evariste Galois (1811-1832)
- for each size of shift register, a feedback equation is given which is the sum modulo 2 of a certain
set of output bits
- this equation produces the input to the shift register
- this type of counter can produce 2n-1 counts, nearly the maximum possible
275
Counters
- the feedback equations are listed in Table 8.26 of the textbook
- It is defined that bits always shift from Xn-1 to X0 (or Q0 to Qn-1) as we defined the shift
register previously
- they each use XOR gates (sum modulo 2) of particular bits in the register chain
ex)
n Feedback Equation
2 X2 = X1  X0
3 X3 = X1  X0
4 X4 = X1  X0
5 X5 = X2  X0
6 X6 = X1  X0
7 X7 = X3  X0
8 X8 = X4  X3  X2  X0
: :
: :
276
Counters
ex) 4-flip-flop LFSR Counter
Feedback Equation = X1  X0 (or Q2  Q3 as we defined it)
# Q(0:3) Sin
0 1000 0
1 0100 0
2 0010 1
3 1001 1
4 1100 0
5 0110 1
6 1011 0
7 0101 1
8 1010 1
9 1101 1
10 1110 1
11 1111 0
12 0111 0
13 0011 0
14 0001 1 - this is 2n-1 unique counts
repeat 1000
277
Counters
• Counters in VHDL
- strong type casting in VHDL can make modeling counters difficult (at first glance)
- the reason for this is that the STANDARD and STD_LOGIC Packages do not define
"+", "-", or inequality operators for BIT_VECTOR or STD_LOGIC_VECTOR types
278
Counters
- there are a couple ways that we get around this
1) Use the STD_LOGIC_UNSIGNED Package
- this package defines "+" and "-" functions for STD_LOGIC_VECTOR
- we can use +1 just like normal
- the vector will wrap as suspected (1111 - 0000)
- one catch is that we can't assign to a Port
- we need to create an internal signal of STD_LOGIC_VECTOR for counting
- we then assign to the Port at the end
279
Counters
• Counters in VHDL using STD_LOGIC_UNSIGNED
use IEEE.STD_LOGIC_UNSIGNED.ALL; -- call the package
entity counter is
Port ( Clock : in STD_LOGIC;
Reset : in STD_LOGIC;
Direction : in STD_LOGIC;
Count_Out : out STD_LOGIC_VECTOR (3 downto 0));
end counter;
280
Counters
• Counters in VHDL using STD_LOGIC_UNSIGNED
architecture counter_arch of counter is
signal count_temp : std_logic_vector(3 downto 0); -- Notice internal signal
begin
process (Clock, Reset)
begin
if (Reset = '0') then
count_temp <= "0000";
elsif (Clock='1' and Clock'event) then
if (Direction='0') then
count_temp <= count_temp + '1'; -- count_temp can be used on both LHS and RHS
else
count_temp <= count_temp - '1';
end if;
end if;
end process;
Count_Out <= count_temp; -- assign to Port after the process
end counter_arch;
281
Counters
2) Use integers for the counter and then convert back to STD_LOGIC_VECTOR
- STD_LOGIC_ARITH is a Package that defines a conversion function
- the function is: conv_std_logic_vector (ARG, SIZE)
- functions are defined for ARG = integer, unsigned, signed, STD_ULOGIC
- SIZE is the number of bits in the vector to convert to, given as an integer
- we need to keep track of the RANGE and Counter Overflow
282
Counters
• Counters in VHDL using STD_LOGIC_ARITH
use IEEE.STD_LOGIC_ARITH.ALL; -- call the package
entity counter is
Port ( Clock : in STD_LOGIC;
Reset : in STD_LOGIC;
Direction : in STD_LOGIC;
Count_Out : out STD_LOGIC_VECTOR (3 downto 0));
end counter;
283
Counters
• Counters in VHDL using STD_LOGIC_ARITH
architecture counter_arch of counter is
signal count_temp : integer range 0 to 15; -- Notice internal integer specified with Range
begin
begin
count_temp <= 0; -- integer assignment doesn't requires quotes
elsif (Clock='1' and Clock'event) then
if (count_temp = 15) then
count_temp <= 0; -- we manually check for overflow
else
count_temp <= count_temp + 1;
end if;
end if;
end process;
Count_Out <= conv_std_logic_vector (count_temp, 4); -- convert integer into a 4-bit STD_LOGIC_VECTOR
end counter_arch;
284
Counters
3) Use UNSIGNED data types #'s
- STD_LOGIC_ARITH also defines "+", "-", and equality for UNSIGNED types
- UNSIGNED is a Data type defined in STD_LOGIC_ARITH
- UNSIGNED is an array of STD_LOGIC
- An UNSIGNED type is the equivalent to a STD_LOGIC_VECTOR type
- the equality operators assume it is unsigned (as opposed to 2's comp SIGNED)
• Pro's and Cons

- using integers allows a higher level of abstraction and more functionality can be included
- easier to write unsynthesizable code or code that produces unwanted logic
- both are synthesizable when written correctly
285
Counters
• Ring Counters in VHDL
- to mimic the shift register behavior, we need access to the signal value before and after clock'event
- consider the following concurrent signal assignments:
architecture ….
begin
Q0 <= Q3;
Q1 <= Q0;
Q2 <= Q1;
Q3 <= Q2;
end architecture…
- since they are executed concurrently, it is equivalent to Q0=Q1=Q2=Q3, or a simple wire
286
Counters
• Ring Counters in VHDL
- since a process doesn't assign the signal values until it suspends, we can use this to model the
"before and after" behavior of a clock event.

begin
Q0<='1'; Q1<='0'; Q2<='0'; Q3<='0';
elsif (Clock'event and Clock='1') then
Q0<=Q3; Q1<=Q0; Q2<=Q1; Q3<=Q2;
end if;
end process
- notice that the signals DO NOT appear in the sensitivity list. If they did the process would
continually execute and not be synthesized as a flip-flop structure
287
Counters
• Johnson Counters in VHDL

begin
Q0<='0'; Q1<='0'; Q2<='0'; Q3<='0';
Q0<=not Q3; Q1<=Q0; Q2<=Q1; Q3<=Q2;
end if;
end process
288
Counters
• Linear Feedback Shift Register Counters in VHDL

begin
Q0<='0'; Q1<='0'; Q2<='0'; Q3<='0';
Q0<=Q3 xor Q2; Q1<=Q0; Q2<=Q1; Q3<=Q2;
end if;
end process
289
Terminal Count and Divide by k
• TC is '1' for one cycle in every 2n cycles
– frequency = clock frequency / 2n
– Called a clock divider
• Decode k–1 as terminal count and reset counter register
– Counter increments modulo k
• Example: decade counter
– Terminal count (TC) = 9
• Decade Counter in VHDL
290
Loadable Counter in VHDL
• Load a starting value, then decrement
– Terminal count = 0
– Useful for interval timer
291
Reloading Counter in VHDL
292
State Machines
What is FSM?
• A model of computation consisting of
– a set of states, (limited number)
– a start state,
– input symbols,
– a transition function that maps input symbols and current states to a next
state.
294
Counters
• Multiple Processes
- we can now use State Machines to control the start/stop/load/reset of counters
- each are independent processes that interact with each other through signals
- a common task for a state machine is:
1) at a certain state, load and enable a counter
2) go to a state and wait until the counter reaches a certain value
3) when it reaches the certain value, disable the counter and continue to the next state
- since the counter runs off of a clock, we know how long it will count between the start and stop
295
State Machines
• State Machines
- there is a basic structure for a Clocked, Synchronous State Machine
1) State Memory (i.e., flip-flops)

2) Next State Logic “G” (combinational logic)
3) Output Logic “F” (combinational logic) we’ll revisit F later…
- if we keep this structure in mind while designing digital machines in VHDL, then it is a very
straight forward task
- Each of the parts of the State Machine are modeled with individual processes
- let’s start by reviewing the design of a state machine using a manual method
296
Elements of FSM
• Memory Elements (ME)
– Memorize Current States (CS)
– Usually consist of FF or latch
– N-bit FF have 2n possible states
• Next-state Logic (NL)
– Combinational Logic
– Produce next state
• Based on current state (CS) and input (X)
• Output Logic (OL)
– Combinational Logic
– Produce outputs (Z)
• Based on current state, or
• Based on current state and input
297
Finite State Machine
• Used control the circuit core
• Partition FSM and non-FSM part
298
Finite State Machines
• Synchronous (i.e. clocked) finite state machines (FSMs) have
widespread application in digital systems, e.g. as datapath
controllers in computational units and processors.
Synchronous FSMs are characterized by a finite number of
states and by clock-driven state transitions.
• Mealy Machine: The next state and the outputs depend on the
present state and the inputs.
• Moore Machine: The next state depends on the present state
and the inputs, but the output depends on only the present
state.
299
State Machines
• State Machines
“Mealy Outputs” – outputs depend on the Current_State and the Inputs
300
State Machines
• State Machines
“Moore Outputs” – outputs depend on the Current_State only
301
State Machines
• State Machines
- the steps in a state machine design are:
1) Word Description of the Problem

2) State Diagram
3) State/Output Table
4) State Variable Assignment
5) Choose Flip-Flop type
6) Construct F
7) Construct G
8) Logic Diagram
302
State Machines
• State Machine Example “Sequence Detector”
1) Design a machine by hand that takes in a serial bit stream and looks for the pattern “1011”.
When the pattern is found, a signal called “Found” is asserted
2) State Diagram
303
State Machines
3) State/Output Table
Current_State In Next_State Out

(Found)
S0 0 S0 0
1 S1 0
S1 0 S2 0
1 S0 0
S2 0 S0 0
1 S3 0
S3 0 S0 0
1 S0 1
304
State Machines
4) State Variable Assignment – let’s use binary

Q1 Q0 Q1* Q0* Found
0 0 0 0 0 0
1 0 1 0
0 1 0 1 0 0
1 0 0 0
1 0 0 0 0 0
1 1 1 0
1 1 0 0 0 0
1 0 0 1
5) Choose Flip-Flop Type
- 99% of the time we use D-Flip-Flops
305
State Machines
Q1 Q0 Q1
6) Construct Next State Logic “F” In
00 01 11 10
0 2 6 4
0 0 1 0 0
1 3 7 5
Q1* = Q1’∙Q0∙In’ + Q1∙Q0’∙In
In 1 0 0 0 1
Q0
Q1 Q0 Q1
In
00 01 11 10
0 2 6 4
0 0 0 0 0
1 3 7 5
Q0* = Q0’∙In
In 1 1 0 0 1
Q0
306
State Machines
7) Construct Output Logic “G”
Q1 Q0 Q1
In
00 01 11 10
0 2 6 4
Found = Q1∙Q0∙In 0 0 0 0 0
1 3 7 5
In 1 0 0 1 0
Q0
8) Logic Diagram
- for large designs, this becomes impractical
307
State Machines in VHDL
• State Memory
- we use a process that updates the “Current_State” with the “Next_State”
- we describe DFF’s using (CLK’event and CLK=‘1’)
- this will make the assignment on the rising edge of CLK
STATE_MEMORY : process (CLK)

begin
if (CLK’event and CLK='1') then
Current_State <= Next_State;
end if;
end process;
- at this point, we need to discuss State Names
308
• State Memory using “User-Enumerated Data Types"
- we always want to use descriptive names for our states
- we can use a user-enumerated type for this
type State_Type is (S0, S1, S2, S3);

signal Current_State : State_Type;
signal Next_State : State_Type;
- this makes our simulations very readable.
• State Memory using “Pre-Defined Data Types"

- we haven’t encoded the variables though, we can either leave it to the synthesizer or manually do it
subtype State_Type is BIT_VECTOR (1 downto 0);

constant S0 : State_Type := “00”;
signal Current_State : State_Type;

signal Next_State : State_Type;
309
• State Memory with “Synchronous RESET”

begin
if (Reset = ‘1’) then

Current_State <= S0; -- name of “reset” state to go to
else
end if;
end if;
end process;
- this design will only observe RESET on the positive edge of clock (i.e., synchronous)
310
• State Memory with “Asynchronous RESET”
STATE_MEMORY : process (CLK, Reset)

begin
if (Reset = ‘1’) then
Current_State <= S0; -- name of “reset” state to go to
elsif (CLK’event and CLK='1') then
end if;
end process;
- this design is sensitive to both RESET and the positive edge of clock (i.e., asynchronous)
311
• Next State Logic “F”
- we use another process to construct “F”
312
• Next State Logic “F”
- the process will be combinational logic
NEXT_STATE_LOGIC : process (In, Current_State)

begin
case (Current_State) is
when S0 => if (In=‘0’) then Next_State <= S0;

elsif (In=‘1’) then Next_State <= S1; end if;
end case;
end process;
313
• Output Logic “G”
- we use another process to construct “G”
- the expressions in the sensitivity list dictate Mealy/Moore type outputs
- for now, let’s use combinational logic for G (we’ll go sequential later)
314
- Mealy type outputs
OUTPUT_LOGIC : process (In, Current_State)

begin
when S0 => if (In=‘0’) then Found <= 0;

elsif (In=‘1’) then Found <= 0; end if;
end case;
end process;
315
- Moore type outputs
OUTPUT_LOGIC : process (Current_State)

begin
when S0 => Found <= 0;

end case;
end process;
- this is just an example, it doesn’t really work for this machine
316
• Example
- Let’s design a 2-bit Up/Down Gray Code Counter using User-Enumerated State Encoding
- In=0, Count Up
- In=1, Count Down
- this will be a Moore Type Machine
- no Reset
317
• Example
- let’s collect our thoughts using a State/Output Table
CNT0 0 CNT1 00
1 CNT3
CNT1 0 CNT2 01
1 CNT0
CNT2 0 CNT3 11
1 CNT1
CNT3 0 CNT0 10
1 CNT2
318
• Example
architecture CNT_arch of CNT is
type State_Type is (CNT0, CNT1, CNT2, CNT3);

signal Current_State, Next_State : State_Type;
begin
begin
end if;
end process;
NEXT_STATE_LOGIC : process (In, Current_State)

begin
when CNT0 => if (In=‘0’) then Next_State <= CNT1;
elsif (In=‘1’) then Next_State <= CNT3; end if;
end case;
end process;
OUTPUT_LOGIC : process (Current_State)

begin
when CNT0 => Out <= “00”;
end case;
end process;
end architecture;
319
• Example
- in the lab, we may want to observe the states on the LEDs
- in this case we want to explicitly encode the STATE variables
architecture CNT_arch of CNT is
subtype State_Type is BIT_VECTOR (1 dowto 0);

constant CNT0 : State_Type := “00”;
signal Current_State, Next_State : State_Type;
320
State Encoding
• State Variable Encoding
- we can decide how we encode our state variables
- there are advantages/disadvantages to different techniques
• Binary Encoding
- straight encoding of states
S0 = “00”
S1 = “01”
S2 = “10”
S3 = “11”
- for n states, there are log(n)/log(2) flip-flops needed
- this gives the Least # of Flip-Flops
- Good for “Area” constrained designs
- Drawbacks: - multiple bits switch at the same time = Increased Noise & Power
- the Next State Logic “F” is multi-level = Increased Power and Reduced Speed
321
State Encoding
• Gray-Code Encoding
- encoding using a gray code where only one bits switches at a time
S0 = “00”
S1 = “01”
S2 = “11”
S3 = “10”
- for n states, there are log(n)/log(2) flip-flops needed
- this gives low Power and Noise due to only one bit switching
- Good for “Power/Noise” constrained designs
- Drawbacks: - the Next State Logic “F” is multi-level = Increased Power and Reduced Speed
322
State Encoding
• One-Hot Encoding
- encoding one flip-flop for each state
S0 = “0001”
S1 = “0010”
S2 = “0100”
S3 = “1000”
- for n states, there are n flip-flops needed
- the combination logic for F is one level (i.e., a Decoder)
- Good for Speed
- Especially good for FPGA due to “Programmable Logic Block”
- Drawbacks: - takes more area
323
State Encoding
• State Encoding Trade-Offs
- We typically trade off Speed, Area, and Power
One-Hot
speed
area
power
Binary Gray
324
Mealy Finite State Machine
• A serially-transmitted BCD (8421 code) word is to be
converted into an Excess-3 code. An Excess-3 code word
is obtained by adding 3 to the decimal value and taking
the binary equivalent. Excess-3 code is self-complementing
[Wakerly, p. 80], i.e. the 9's complement of a code word is
obtained by complementing the bits of the word.
325
Mealy Finite State Machine
• The serial code converter is described by the state transition
graph of a Mealy FSM below
• The vertices of the state transition graph of a Mealy machine
are labeled with the states.
• The branches are labeled with (1) the input that causes a
transition to the indicated next state, and (2) with the output
that is asserted in the present state for that input.
• The state transition is synchronized to a clock.
• The state table summarizes the machine's behavior in tabular
format.
326
Design of Mealy Finite State
Machine
327
Machine
328
Machine
329
Machine
330
Example: Design of A Serial Line Code Converter
331
332
333
334
335
336
337
338
339
Pipelined Outputs
• Pipelined Outputs
- Having combinational logic drive outputs can lead to:
- multiple delay paths through the logic

- potential for glitches
- Both reduce the speed at which the system clock can be ran
- A good design practice is to pipeline the outputs (i.e., use DFF’s as the output driver)
340
Pipelined Outputs
- This gives a smaller Data Uncertainty window on the output
- The only consideration is that the output is not present until one clock cycle later
341
Pipelined Outputs
- we use a 4th process for this stage of the State Machine
PIPELINED_OUTPUTS : process (CLK)

begin
Out <= Next_Out;
end if;
end process;
342
Asynchronous Inputs
• Asynchronous Inputs
- Real world inputs are not phase-locked to the clock
- this means an input can change within the Setup/Hold window of the clock
- this can send the Machine into an incorrect state
- we always want to “synchronize” inputs so that this doesn’t happen
343
Asynchronous Inputs
• Asynchronous Inputs
- We use D-Flip-Flops to take in the input
- with one D-Flip-Flop, the input can still occur within the Setup/Hold window
- the output of the first DFF may be metastable for a moment of time (trecovery)
- a second DFF is used to latch in the metastable input after it has had time to settle
- the output of the second flip-flop is now stable and synchronized as long as:
Tclk > trecovery + tcomb + tsetup
- where tcomb is the delay of any combinational logic in the input path
344
Comparison of Binary and Onehot Style
• Binary-encoded FSM
– fewer flip-flops for state register
– = log2(state number)
• Onehot-encoded FSM
– more flip-flops for state register
– = state number
• FPGA vender frequently recommend using onehot encoding

style because flip-flops are plentiful in FPGA and the
combinational logic cells required to implement is less for
onehot style.
• i.e. Onehot style FSM usually runs faster than binary style
FSM on FPGA
345
A Simple Design Example:
Level-to-Pulse Converter
• A level-to-pulse converter produce a single-cycle pulse each
time its input goes high
– In other words, it’s a synchronous rising edge detector
• Sample application
– Button and switches (may need de-bounce processing)
– Single-cycle enable signal for counters
346
347
348
349
350
351
352
353
Datapaths and Control
• Digital systems perform sequences of operations on encoded
data
• Digital hardware systems = data-path + control
• Datapath: registers, counters, combinational functional units
(e.g., ALU), communication (e.g., busses)
– Combinational circuits for operations
– Registers for storing intermediate results
• Control section: control sequencing (FSM generating
sequences of control signals that instructs datapath what to do
next)
– Generates control signals
• Selecting operations to perform
• Enabling registers at the right times
– Uses status signals from datapath
354
Review of FSM Design
• FSM Design
– Partition FSM and non-FSM logic
– Partition combinational part and sequential part
– Use parameter to define names of the state vector
– Assign a default (reset) state
355
Homework
• Design a traffic signal controller at crossroads
• One pair traffic signal controller

– State Diagram
– State Coding
– Performance
– [Optional] With interrupt/extra setting
• Other example:
– Automatic Vending Machine
– Automatic Teller Machine
356
Project Example:
DataPath - Digital combinational lock
(Verilog)
Digital combinational lock
• Door combination lock:
– punch in 3 values in sequence and the door opens; if there is an error
the lock must be reset; once the door opens the lock must be reset
– inputs: sequence of input values, reset
– outputs: door open/close
– memory: must remember combination or always have it available
– open questions: how do you set the internal combination?
• stored in registers (how loaded?)
• hardwired via switches set by user
358
Digital combinational lock
Implementation in software
359
Determining details of the specification
• How many bits per input value?
• How many values in sequence?
• How do we know a new input value is entered?
• What are the states and state transitions of the system?
360
Digital combination lock state diagram
• States: 5 states
– represent point in execution of machine
– each state has outputs
• Transitions: 6 from state to state, 5 self transitions, 1 global
– changes of state occur when clock says its ok
– based on value of inputs
• Inputs: reset, new, results of comparisons
• Output: open/closed
361
Digital combination lock
(state encoding)
• Verilog description including state encoding
module string (clk, value, new, rst, open); always @(posedge clk) begin
input clk, new; if rst state = ‘S1;
input [3:0] value; else
output open; case (state)
‘S1: if ((value== C1) & new) state = ‘S2
reg state[2:0]; else state = ‘ERR;
‘define S1 = [0,0,0]; ‘S2: if ((value== C2) & new) state = ‘S3
‘define S2 = [0,0,1]; else state = ‘ERR;
‘define S3 = [0,1,0]; ‘S3: if ((value== C3) & new) state = ‘OPEN
‘define OPEN = [0,1,1]; else state = ‘ERR;
‘define ERR = [1,0,0]; ‘OPEN: state = ‘OPEN;
‘ERR: state = ‘ERR;
‘define C1 = [1,1,0,1]; default: begin
‘define C2 = [0,1,1,1]; $display (“invalid state reached”);
‘define C3 = [0,1,0,0]; state = 3’bxxx;
end
assign open = (state == ‘OPEN); endcase
end
endmodule
362
Data-path and control structure
363
State table for combination lock
• Finite-state machine
– refine state diagram to take internal structure into account
– state table ready for encoding
next
reset new equal state state mux open/closed
1 – – – S1 C1 closed
0 0 – S1 S1 C1 closed
0 1 0 S1 ERR C1 closed
0 1 1 S1 S2 C1 closed
0 1 1 S2 S3 C2 closed
0 1 1 S3 OPEN C3 closed
0 – – OPEN OPEN – open
364
Encodings for combination lock
• Encode state table
– state can be: S1, S2, S3, OPEN, or ERR
• needs at least 3 bits to encode: 000, 001, 010, 011, 100
• and as many as 5: 00001, 00010, 00100, 01000, 10000
• choose 4 bits: 0001, 0010, 0100, 1000, 0000
– output mux can be: C1, C2, or C3
• needs 2 to 3 bits to encode
• choose 3 bits: 001, 010, 100
– output open/closed can be: open or closed
• needs 1 or 2 bits to encode
• choose 1 bit: 1, 0
365
Data-path implementation for combination lock
• Multiplexer
– easy to implement as combinational logic when few inputs
– logic can easily get too big for most PLDs
0 i  3
output mux can be: C1, C2, or C3 Value[i] C1[i] C2[i] C3[i]
3 Mux control bits: 001, 010, 100 mux
control
C1 C2 C3
4 4 4 mux
control
multiplexer
4
value comparator
4 equal
equal 366
Data-path implementation (cont’d)
• Tri-state logic
– utilize a third output state: “no connection” or “float”
– connect outputs together as long as only one is “enabled”
– open-collector gates can
0 i  3
only output 0, not 1
• can be used to implement Value[i] C1[i] C2[i] C3[i]
logical AND with only wires
mux
control
+ oc
C1 C2 C3
4 4 4 mux
control tri-state driver
multiplexer
4 (can disconnect
equal from output)
value comparator
4 equal open-collector connection
(zero whenever one connection is zero,
one otherwise – wired AND) 367
Tri-state gates
• The third value
– logic values: “0”, “1”
– don't care: “X” (must be 0 or 1 in real circuit!)
– third value or state: “Z” — high impedance, infinite R, no connection
• Tri-state gates
– additional input – output enable (OE)
– output values are 0, 1, and Z
– when OE is high, the gate functions normally
– when OE is low, the gate is disconnected from wire at output
– allows more than one gate to be connected to the same output wire
• as long as only one has its output enabled at any one time (otherwise, sparks
could fly)
368
Tri-state and multiplexing
• When using tri-state logic
– (1) make sure never more than one "driver" for a wire at any one time
(pulling high and low at the same time can severely damage circuits)
– (2) make sure to only use value on wire when its being driven (using a
floating value may cause failures)
• Using tri-state gates to implement an economical multiplexer
369
Open-collector gates and wired-AND
• Open collector: another way to connect gate outputs to the same wire
– gate only has the ability to pull its output low
– it cannot actively drive the wire high (default – pulled high through resistor)
• Wired-AND can be implemented with open collector logic
– if A and B are "1", output is actively pulled low
– if C and D are "1", output is actively pulled low
– if one gate output is low and the other high, then low wins
– if both gate outputs are "1", the wire value "floats", pulled high by resistor
• low to high transition usually slower than it would have been with a gate pulling
high
– hence, the two NAND functions are ANDed together
Equivalent circuits
open-collector with ouputs wired together

NAND gates using "wired-AND"
to form (AB)'(CD)'
370
Digital combination lock (new data-path)
• Decrease number of inputs
• Remove 3 code digits as inputs
– use code registers
– make them loadable from value
– need 3 load signal inputs (net gain in input (4*3)–3=9)
• could be done with 2 signals and decoder
(ld1, ld2, ld3, load none)
371
Complex Datapath
Complex Multiplier Datapath
373
Complex Multiplier in VHDL
374
Multiplier Control Sequence
• Avoid resource conflict
• First attempt
– 1. a_r * b_r → pp1_reg
– 2. a_i * b_i → pp2_reg
– 3. pp1 – pp2 → p_r_reg
– 4. a_r * b_i → pp1_reg
– 5. a_i * b_r → pp2_reg
– 6. pp1 + pp2 → p_i_reg
• Takes 6 clock cycles
• Merge steps where no resource conflict
• Revised attempt
– 1. a_r * b_r → pp1_reg
– 2. ai * bi → pp2reg
– 3. pp1 – pp2 → p_r_reg
– a_r * b_i → pp1_reg
– 4. a_i * b_r → pp2_reg
– 5. pp1 + pp2 → p_i_reg
375
• Takes 5 clock cycles
Finite-State Machines
• Used the implement control sequencing
– Based on mathematical automaton theory
• A FSM is defined by
– set of inputs: Σ
– set of outputs: Γ
– set of states: S
– initial state: s0 ∈ S
– transition function: δ: S × Σ → S
– output function: ω: S × Σ → Γ or ω: S → Γ
• FSM in Hardware
376
FSM Example: Multiplier Control
• One state per step
– Separate idle state?
– Wait for input_rdy = '1‘
– Then proceed to steps 1, 2, ...
– But this wastes a cycle!
• Use step 1 as idle state
– Repeat step 1 if input_rdy ≠ '1‘
– Proceed to step 2 otherwise
• Output function
– Defined by table on slide 43
– Moore or Mealy?
377
FSMs in VHDL
• Use an enumeration type for state values
– abstract, avoids specifying encoding
378
Multiplier Control in VHDL
379
Multiplier Control Diagram
• Input: input_rdy
• Outputs
– a_sel, b_sel, pp1_ce, pp2_ce, sub, p_r_ce, p_i_ce
380
Bubble Diagrams or VHDL?
• Many CAD tools provide editors for bubble diagrams
– Automatically generate VHDL for simulation and synthesis
• Diagrams are visually appealing
– but can become unwieldy for complex FSMs
• Your choice...
– or your manager's!
381
Verifying Sequential Circuits
• DUV may take multiple and varying number of cycles to
produce output
• Checker needs to
– „synchronize with test generator
– „ensure DUV outputs occur when expected
– „ensure DUV outputs are correct
– „ensure no spurious outputs occur
382
383
Computer Systems
• Agenda
1. Memory
2. Von Neumann Architecture
3. Sequence Controllers
4. Processing Units & Register Modeling
Memory
• Memory Types
Notes on definitions:
1) The word "RAM" is now used interchangeably with R/W memory.

Formally, most types ROM are also Random Access
2) ROM memory typically refers to storage that can't be written during program execution.
It can hold program and data information, but under normal operation a CPU doesn't
use it for variable storage.
As Flash EEprom gets faster and more reliable, Flash may become used as RAM
385
Memory - SRAM
• Static Random Access Memory (SRAM)
- SRAM is volatile memory (i.e., if the power is removed, the information is lost)
- SRAM uses an inverter loop to store the digital information
- two NMOS transistors acting as switches are used to Read and Write the stored data
- we call the circuitry to store 1-bit a "cell"
386
Memory - SRAM
• SRAM Addressing
- we configure the cells into an array
- we address each cell using:
Row Address
- a row decoder produces a "Word Line"
- this gives a "Row Select" (RS) signal
Column Address
- a column decoder produces a "Bit Line"
- this gives a "Column Select" (CS)
387
Memory - SRAM
• SRAM Addressing
- The Word Lines are used to address a row of cells
- The Bit Lines are used to address a column in addition to reading and writing
- There are two bit lines per cell, BL and BL'
- This allows a difference amplifier to be used to

distinguish between a 1 and a 0
388
Memory - SRAM
• SRAM Reading
- The capacitance of the Bit Lines can be very large due to multiple cells being attached
- This creates a problem during a READ because the small cell will need to drive this large capacitance
- To reduce the amount of charge that the

cell has to drive during a READ, pull-up
transistors are used to "pre-charge" the
lines to VDD
389
Memory - SRAM
• SRAM Reading
- In order to design a usable SRAM cell, we must meet the condition that:
"Reading the value does NOT destroy the contents of the cell"
- Let's look at what happens during a read to see how to meet this condition
Reading a '0'
- Initially V1=0v, V2=VDD
- M3 and M4 are turned ON
- this allows the Cell to drive BL and BL'
- The voltage V2 will be the same as the

pre-charged BL' line, so no current will
flow through M4
390
Memory - SRAM
• SRAM Writing
- when writing to the SRAM cell, we inject full swing digital signals onto BL and BL'.
- when we assert the Word Lines, M3 and M4 will open and attempt to change the state of
the cell.
391
Memory - DRAM
• Dynamic Random Access Memory (DRAM)

- A volatile memory storage device even smaller than SRAM
- DRAM uses a capacitor to store the value of the digital information (instead of an inverter loop)
- one NMOS transistor is used to address the storage element
- the one-transistor configuration is known as a “1T” DRAM
392
Memory - DRAM
• DRAM Operation
- When the cell is addressed, the charge on the storage capacitor (CS) is dumped onto the bit line (BL)
- To reduce the amount of charge the cell has to provide, the bit line capacitance (CBL) is
pre-charged to VDD/2
- When the NMOS switch closes, the two capacitances will share their charge and settle to a readable
level by amplifiers
393
Memory – ROM
• Nonvolatile Memory
- SRAM and DRAM and attractive due to their speed
- however, they are volatile which means when the power is removed, the data is lost
- for a microcomputer, we need a nonvolatile storage device so that upon power-up, the
computer knows what to do.
- currently, the most popular semiconductor ROM is Flash (or EEprom)
- before looking at the details of a Flash transistor, let’s first look at the different types
of ROM arrays and addressing modes
394
Memory – ROM
• ROM Arrays
- There are two basic types of ROM arrays
1) NOR-based ROM
2) NAND-based ROM
• NOR-based ROM
- All Column Lines are pulled-up using a PMOS transistor (or resistor)
- The Row Lines are connected to the gates of NMOS transistors at the intersection of
Row and Column Lines
- The presence or absence of the NMOS transistors dictates whether a 1 or a 0 is stored
- If the NMOS transistor is present, it will pull down the Column Line when its gate is
driven high by the Row Line
- if the NMOS transistor is absent, the Column Line will not be pulled down, so it will remain
pulled up by the PMOS’s
395
Memory – ROM
• NOR-based ROM
- In order to Read from the array, the Row line is asserted and the desired Column line is observed
- a NOR-based ROM is similar to a Hex Keypad
396
Memory – ROM
• NAND-based ROM
- NAND-based ROM is a different array architecture
- it uses a depletion-load NMOS as the pull-up transistor
- the Column NMOS’s are connected in series with the

column lines (i.e. a NAND configuration)
- If an NMOS exists in the Column line and the Row line

is asserted, the NMOS will pull the Column Line down
and represent a stored ’0’
- If an NMOS is absent on the Column line and the

Row line is asserted, the Column Line will remain
pulled high by the depletion NMOS and represent
a stored ‘1’
- since all of the NMOS’s are in series, in order to Read

from a Row, all other Rows much be turned ON
- this means in order to distinguish the Row we are asserting,

we write a ‘0’ to it
397
Memory – ROM
• NAND-based ROM
- In this configuration, if an NMOS is present, it will
represent a “stored 1” since in order to address its
location, the Row line is driven to a ‘0’ and the NMOS
not turned on. This leaves the Column line pulled HIGH
- if an NMOS is absent, it will represent a “stored 0”

since all of the other Row NMOS’s are turned on
and will pull the Column Line LOW
- this gives the opposite behavior as in a NOR-based ROM
NOR NAND
NMOS present 0 1
NMOS absent 1 0
- it also gives a complementary addressing scheme
NOR NAND
Address Row Line by driving: 1 0
All other Row Lines driven to: 0 1
398
Memory – Flash
• Flash Memory Cells

- a novel breakthrough in ROM memory was the invention of the floating gate transistor in 1984
by Toshiba
- this transistor is constructed such that the threshold of the device can be changed in-system
- if the threshold can be raised and lowered, this allows the transistor within the ROM array
to either be:
“present” i.e., Normal Row addressing will turn the device ON (VRow-HIGH>VT,n)
or
“absent” i.e., Normal Row addressing is not high enough to turn the device on (VRow-HIGH<VT,n)
- the threshold change is accomplished by applying an E-field to specifically induce

“hot electron injection” to change the characteristics of the Gate structure
- if this threshold change can be accomplished after fabrication, this allows a reconfigurable
ROM device that is nonvolatile, reusable, and programmable with electricity (i.e., EEprom)
399
Memory – Flash

- a floating gate transistor has a Control Gate and a Floating Gate
- the Floating Gate is separated from the semiconductor substrate using a “Thin Tunneling Oxide”
- On top of the Floating Gate, a thick Dielectric is grown and another Control Gate is patterned
400
Memory – Flash

Raising VT,n
- if charge accumulates at the Floating Gate, this in effect makes the thin dielectric a better conductor
- If the thin dielectric becomes a conductor, this is the same as moving the functional Gate further
away from the substrate
- this makes it more difficult to create a channel in the substrate (i.e., VT,n gets higher)
401
Memory – Flash

Raising VT,n
- we use hot electron injection to accomplish this
- if we apply a high voltage across the Source and Drain (VD=6v), electrons near the Drain
region will receive enough energy to form electron/hole pairs
- if we apply a high voltage at the Gate (VG=12v), the hot electrons in the substrate
will be attracted to the gate
- since the electron/holes have enough energy to move freely, electrons will tunnel into the thin oxide
and holes will tunnel into the substrate
- when the high voltages are removed, the

electron/holes will remain in their new
locations and effectively increase VT,n
- Raising VT,n is called Programming
402
Memory – Flash

Lowering VT,n
- we use the Fowler-Nordheim Tunneling Mechanism (FN tunneling) to return the

Thin Floating Gate oxide to a conductor
- if the Gate is grounded and a high voltage (12v) is applied to the Source, the electrons in the
Floating Gate will be ejected out of the dielectric and into the Source
- this has the effect of restoring the insulating ability of the Thin Dielectric and effectively moves
the functional gate of the transistor closer to the substrate
- this makes it easier to create a channel in the substrate (i.e., VT,n gets lower)
- Lowering VT,n is called Erasing
403
Memory – Flash

- If we position the threshold voltage at a normal CMOS level (~1v), then the transistor
can be turned on using a standard signal level at the gate (i.e., Vgate=5v)
- If we position the threshold voltage at a raised level (>VDD), then a standard signal level
at the gate will NOT be able to turn on the transistor
404
Memory – Flash
• NAND/NOR Flash
- we can use Flash Cells in a NOR or NAND Array to implement a EEprom
- the Flash Cell requires one additional line on the Source of each transistor in order to accomplish
the programming and erasing.
405
Memory – Flash
• NAND vs. NOR Flash

- “Flash” implies that blocks of memory are erased at a time
- this is a specific type of EEprom and is cheaper to fabrication due to less programming circuitry
NOR Flash
- slower erase and write times

- allows access to any address which makes it truly Random Access
- this is suitable for uP ROM applications such as BIOS or Firmware in which the uP needs to access
memory locations individually
NAND Flash
- faster erase and write times

- smaller chip area which creates higher density and lower cost
- more erase cycles than NOR-Flash
- not Random Access, data must be read/written in large blocks, not suitable for uP ROM
- it is well suited for thumb drives, iPods, and secondary storage in microcomputers
(i.e., hard drives, CDROMS)
406
Memory in VHDL
• Memory in VHDL
– Memory is described in VHDL using the keyword array
– The array keyword defines a 2D vector of information.
type memory_type is array (0 to 255) of

std_logic_vector(7 downto 0);
– This defines a data type which is a 2D array that is m x n (256 x 8)
– This data type can then be used to define either a signal (for RAM) or
constant (for ROM)
– Arrays in VHDL require integers as their indeces. This means a type

conversion must be used when access the 2D array since the address
lines will come in as STD_LOGIC_VECTOR (i.e.,
conv_integer(address))
407
Memory in VHDL
• RAM in VHDL
entity ram_256x8_sync is
port (clock : in std_logic;
data_in : in std_logic_vector(7 downto 0);
write : in std_logic;
address : in std_logic_vector(7 downto 0);
data_out : out std_logic_vector(7 downto 0)); This line defines a new data
end entity;
type called “ram_type” which
is a 2D array that is 256x8 of
architecture rtl of ram_256x8_sync is STD_LOGIC_VECTOR
type ram_type is array (0 to 255) of std_logic_vector(7 downto 0);

signal RAM : ram_type; This line creates a signal
called RAM which uses
begin “ram_type”. This signal can
be read or written to.
memory : process (clock)
begin
if (clock'event and clock='1') then
if (write = '1')) then

RAM(conv_integer(address)) <= data_in; -- this handles the synchronous write mode (en=1, write = 1)
else
data_out <= RAM(conv_integer(address)); -- this handles the synchronous read mode (en=0, write = 0)
end if;
Since “address” is STD_LOGIC
end if; but the array can only be
end process; indexed with integers, we do a
type conversion when
end architecture; accessing the 2D array.
408
Memory in VHDL
• ROM in VHDL (synchronous)

entity rom_128x8_sync is This line defines a new data
port (clock : in std_logic; type called “rom_type” which
address : in std_logic_vector(7 downto 0); is a 2D array that is 128x8 of
data_out : out std_logic_vector(7 downto 0));
end entity;
STD_LOGIC_VECTOR
architecture rtl of rom_128x8_sync is
type rom_type is array (0 to 127) of std_logic_vector(7 downto 0);

constant ROM : rom_type := (0 => x“12",
1 => x"AA",
Instead of creating a signal as
2 => x“CD", in RAM, we create a constant
3 => x"80", of type “rom_type”. This
: constant is 128x8 and can be
: initialized. It can only be read
begin
from by external systems.

begin
data_out <= ROM(conv_integer(address));
end if;
Again, a type conversion is
end process; needed to access the 2D array.
Only read capability needs to
end architecture; be modeled.
409
Memory in VHDL
• ROM in VHDL (asynchronous)

entity rom_128x8_sync is
port (clock : in std_logic;
address : in std_logic_vector(7 downto 0);
data_out : out std_logic_vector(7 downto 0));
end entity;
architecture rtl of rom_128x8_sync is
type rom_type is array (0 to 127) of std_logic_vector(7 downto 0);

constant ROM : rom_type := (0 => x“12",
1 => x"AA",
2 => x“CD",
3 => x"80",
:
:
begin
end architecture; data_out is always being

driven with this concurrent
signal assignment.
410
Memory Mapping
• Memory Mapping
- Mapping different types of memory to

certain address ranges creates a
“Memory Mapped” system.
- This makes addressing from the

CPU simpler
411
Memory Mapping
ROM mapped to addresses 0-127

• Address Decoding memory : process (clock)
begin
if (address >= 0 and address <= 127) then
- Address decoding can be accomplished end if;
within the model for the RAM/ROM/IO end if;
end process;
RAM mapped to addresses 128-191

begin
if ((address >= 128 and address <= 191) and (write = '1')) then
RAM(conv_integer(address)) <= data_in;
elsif (address >= 128 and address <= 191) then
data_out <= RAM(conv_integer(address
end if;
end if;
end process;
An output port mapped to 192

U3 : process (clock, reset)
begin
if (reset = '0') then
port_out_00 <= x"00";
elsif (clock'event and clock='1') then
if (address = x"C0" and write = '1') then
port_out_00 <= data_in;
end if;
end if;
end process;
412
More Details of Using VHDL
for Memories
Portions of this work are from the book, Digital Design: An Embedded
Systems Approach Using VHDL, by Peter J. Ashenden, published by Morgan
Kaufmann Publishers, Copyright 2007 Elsevier Inc. All rights reserved.
VHDL
General Concepts
 A memory is an arrayof m bits
storage locations 0
 Each with a unique address 1
2
 Like a collection of 3
registers, but with 4
optimized implementation 5
6
 Address is unsigned-binary
encoded
2n-2
 n address bits ⇒ 2n locations 2n–1
 All locations the same size
 2n × m bit memory
2
VHDL
Memory Sizes
 Use power-of-2 multipliers
 Kilo (K): 210 = 1,024 ≈ 103
 Mega (M): 2
20 = 1,048,576 ≈ 106
 Giga (G): 230 = 1,073,741,824 ≈ 109
 Example
 32K × 32-bit memory
 Capacity = 1,025K = 1Mbit
 Requires 15 address bits
 Size is determined by application
requirements
3
VHDL
Basic Memory Operations

 a inputs: unsigned address
 d_in and d_out
 Type depends on application
a(0)
a(1)  Write operation
a(n-1)  en = 1, wr = 1
d_in(0) d_out(0)  d_in value stored in location given
d_in(1) d_out(1) by address inputs
d_in(m-1) d_out(m-1)  Read operation
en
wr
 en = 1, wr = 0
 d_out driven with value of location
given by address inputs
 Idle: en = 0
4
VHDL
Example: Audio Delay Unit

 System clock: 1MHz
 Audio samples: 8-bit signed, at 50kHz (50 samples/msec)
 New sample arrives when audio_in_en = 1

 Delay control: 8-bit unsigned ⇒ ms to delay
 Output: audio_out_en = 1 when output ready
20µs
clk
audio_in st st+1
audio_in_en
audio_out st−d st−d+1
audio out en
5
VHDL
Audio Delay Datapath

14
count_en en Q
clk clk
0 14
a
- 1 16
delay
8
×50
14 d_in d_out audio_out
en
addr_sel wr
16
audio_in
mem_en
mem_wr
 Max delay = 255ms

 Need to store 255 × 50 = 12,750 samples
 Use a 16K × 16-bit memory (14 address bits)
214 = 16384
6
VHDL
Audio Delay Control Section

Step 1: (idle state)
 audio_in_en = 0 ⇒ do nothing
 audio_in_en = 1 ⇒ write memory using counter
value as address
 Step 2:
 Read memory using subtractor output as address,
increment counter
State audio_ Next state addr_sel mem_en mem_wr count_en audio_
in_en out_en
Step 1 0 Step 1 0 0 0 0 0
Step 1 1 Step 2 0 1 1 0 0
Step 2 – Step 1 1 1 0 1 1
7
VHDL
Wider Memories
 Memory components have a fixed width
 E.g., ×1, ×4, ×8, ×16, ...
 Use memory en
wr
en
wr
components in a(13…0)
d_in(15…0)
a(13…0)
d_in(15…0)
parallel to make d_out(15…0) d_out(15…0)
a wider memory en
wr
a(13…0)
 E.g, three 16K×16 d_in(31…16) d_in(15…0)
components for a d_out(15…0) d_out(31…16)
16K×48 memory en
wr
a(13…0)
d_in(47…32) d_in(15…0)
d_out(15…0) d_out(47…32)
8
VHDL
More Locations

To provide 2n locations with 0
1
2k-location components 2k-1
 Use 2n/2k components 2k
2k+1
 Address A 2×2k-1
 at offset A mod 2k 2×2k
2×2k+1
 least-significant k bits of A
3×2k-1
 in component ⎣A/2k⎦
 most-significant n-k bits of A
2n-2k
 decode to select component 2n-2k +1
2n-1
n-k bits k bits
to decoder to address bus
to chip enables of all memory chips
-rks
9
VHDL
More Locations
en
wr wr
a(13…0) a(13…0)
d_in(7…0) d_in(7…0)
d_out(7…0)
en
wr
a(13…0)
en en 0
1 d_in(7…0)
2
a(15…14) 3 d_out(7…0)
 Example: en
wr
0
1
2 d_out(7…0)
64K×8 memory a(130)
…
d_in(7…0)
3
composed of d_out(7…0)
16K×8 components en
wr
a(13…0)
d_in(7…0)
d_out(7…0)
10
VHDL
Tristate Drivers
 Allow multiple outputs to be connected together
 Only one active at a time
 Remaining outputs are high-impedance
 Both output transistors turned off
 Allow bidirectional input/output ports
+V
+V +V +V
output
11
VHDL
Memories with Tristate Ports
 During write wr
en
wr
 memory d drivers hi-Z a(13…0) a(13…0)

d(7…0)
 memory senses d en
wr
 During read en en
0
1
a(13…0)
a(15…14)
2 d(7…0)
selected memory drives d
3

en
 Fewer pins and wires wr
a(13… 0)
 Reduced cost of PCB d(7…0)
 Usually not available en

wr
within ASICs or FPGAs d(7…0)

a(13…0)
d(7…0)
12
VHDL
Memory Types
 Random-Access Memory (RAM)
 Can read and write
 Static RAM (SRAM)
 Stores data so long as power is supplied
 Asynchronous SRAM: not clocked
Synchronous SRAM (SSRAM): clocked

 Dynamic RAM (DRAM)
 Needs to be periodically refreshed
 Read-Only Memory (ROM)
 Combinational
 Programmable and Flash rewritable
 Volatile and non-volatile
13
VHDL
Asynchronous SRAM
 Data stored in 1-bit latch cells
 Address decoded to enable a given cell
 Usually use active-low control inputs
 Not available as components in ASICs or FPGAs
A
A
CE
D
CE WE
WE
OE OE
tsu th
D stored data read data
Setup and Hold time (write) Access time (read)

14
VHDL
Asynch SRAM Timing

 Timing parameters published in data sheets
 Access time
 From address/enable valid to data-out valid
 Cycle time
 From start to end of access
 Data setup and hold
 Before/after end of WE pulse
 Makes asynch SRAMs hard to use in clocked
synchronous designs
15
VHDL
Example Data Sheet
16
VHDL
Synchronous SRAM (SSRAM)

 Clocked storage registers for inputs
 address, data and control inputs
 stored on a clock edge
 held for read/write cycle
 Flow-through SSRAM clk

A a1 a2
 no register on en
data output wr
Flow-through : On write, Input shows up at Output D_in xx

after propagation delay.
D out xx M(a22)
17
VHDL
Example: Coefficient Multiplier

 Compute function y = ci × x 2
 Coefficient stored in flow-through SSRAM
 12-bit unsigned integer index for i
 x, y, ci 20-bit signed fixed-point

 8 pre- and 8 post-binary point bits
 Use a single multiplier
 Multiply ci × x × x
18
VHDL
Multiplier Datapath
ci×x 1. (mult_sel = 0)
SSRAM
i A x × (ci × x) 2. (mult_sel = 1)
c in D in D out
c_ram_en en
c_ram_wr wr 0
1
clk
× D Q y
x D Q 0 ce
x_ce ce 1 clk
clk
mult_sel
y_ce
clk
19
VHDL
Multiplier Timing and Control
0
step1 1 step2
1, 1, 0, 0 0, 0, 0, 1
step3
0, 0, 1, 1 step1 step1 step2 step3 step1
clk
start
c_ram_en
x_ce
mult_sel
y_ce
20
VHDL
Pipelined SSRAM
 Data output also has a register
 More suitable for high-speed systems
 Access RAM in one cycle, use the data in
the next cycle

clk
A a1 a2
en
wr
D_in xx
D_out xx M(a2)
21
VHDL
Memories in VHDL
 RAM storage represented by an array signal
type RAM_4Kx16 is array (0 to 4095) of std_logic_vector(15 downto 0);

signal data_RAM : RAM_4Kx16;
...
data_RAM_flow_through : process (clk) is
begin
if rising_edge(clk) then
if en = '1' then Flow-through : On write, Input shows up at Output
after propagation delay.
if wr = '1' then
data_RAM(to_integer(a)) <= d_in; d_out <= d_in;
else
d_out <= RAM(to_integer(a));
end if;
end if;
end if;
end process data_RAM_flow_through;
22
VHDL

library ieee; use ieee.std_logic_1164.all,
ieee.numeric_std.all, ieee.fixed_pkg.all;
entity scaled_square is
port ( clk, reset : in std_logic;
start : in std_logic;
i : in unsigned(11 downto 0);
c_in, x : in sfixed(7 downto -12);
y : out sfixed(7 downto -12) );
end entity scaled_square;
architecture rtl of scaled_square is

signal c_ram_en ,c_ram_wr ,x_ce ,mult_sel ,y_ce : std_logic;
signal c_out, x_out : sfixed(7 downto -12);
signal y_out : sfixed(7 downto -12);
type c_array is array (0 to 4095) of sfixed(7 downto -12);
signal c_RAM : c_array;
type state is (step1, step2, step3);
signal current_state, next_state : state;
23
VHDL

begin SSRAM
i A
c_in D_inD_out
c_ram_wr <= '0'; c_ram_en en
c_ram_wr wr 0
1
c_RAM_flow_through : process (clk) is clk
× D Q y
x
begin x_ce
D Q
ce
0
1
ce
clk
if rising_edge(clk) then clk
mult_sel
if c_ram_en = '1' then y_ce
clk
if c_ram_wr = '1' then
c_RAM(to_integer(i)) <= c_in; Store (and use) the scaling values
c_out <= c_in;
else
c_out <= c_RAM(to_integer(i)); Use the previously stored values.
end if;
end if;
end if;
end process c_RAM_flow_through;
24
VHDL

y_reg : process (clk) is
variable operand1, operand2 : sfixed(7 downto -12);
begin
if y_ce = '1' then
if mult_sel = '0' then
operand1 := c_out; operand2 := x_out; i
c ×x
else
operand1 : =x out; operand2 : =y out; x × (ci × x)
end if;
y_out <= operand1 * operand2;
end if;
end if;
SSRAM
end process y_reg; i A
c_in D_inD_out
y <= y_out; c_ram_en

c_ram_wr
en
wr 0
1
clk
×
state_reg : process ... x D Q 0
D Q
ce
y
next_state_logic : process ... x_ce ce

clk
1 clk
output_logic : process ... mult_sel

y_ce
clk
end architecture rtl;
25
VHDL
Pipelined SSRAM in VHDL

data_RAM_pipelined : process (clk) is
variable pipeline_en : std_logic;
variable pipeline_d_out : std_logic_vector(15 downto 0);
begin
if pipelined_en = '1' then
output
d_out <= pipelined_d_out; register
end if;
pipeline_en := en; Need the enable for one more clock. SSRAM
if en = '1' then
if wr = '1' then
data_RAM(to_integer(a)) <= d_in; pipeline_d_out := d_in;
else
pipeline_d_out := RAM(to_integer(a));
end if;
end if;
end if;
end process data RAM pipelined;
26
VHDL
Generating SSRAM Components

 Variations on SSRAM behavior
 E.g., write-first, read-first or no-change on
write cycle
 Burst accesses to successive locations
 Not all synthesis tools recognize the
same templates
 Use a RAM core generator tool
27
VHDL
Example: RAM Core Generator
28
VHDL
Multiport Memories
 Multiple address, data and control
connections to the storage locations
Allows concurrent accesses
 Avoids multiplexing and sequencing
 Scenario
 Data producer and data consumer
 What if two writes to a location occur
concurrently?
 Result may be unpredictable
 Some multi-port memories include an arbiter
29
VHDL
FIFO Memories
 First-In/First-Out buffer
 Connecting producer and consumer
 Decouples rates of production/consumption
Producer Consumer
FIFO
subsystem subsystem
 Implementation using
dual-port RAM
read
 Circular buffer
 Full: write_addr = read_addr
write
 Empty: write_addr = read_addr
30
VHDL
Example: FIFO Datapath
counter
8-bit A_rd
rd_en ce Q
reset
clk = equal
counter dual-port
8-bit A_wr SSRAM
ce Q A_wr A_rd
reset reset D_wr D_rd D_rd
clk wr en rd en
D_wr clk clk

wr_en
clk
 Equal = full or empty

 Need to distinguish between these states — How?
31
VHDL
Example: FIFO Control

 Control FSM
 -> filling when write without concurrent read
-> emptying when read without concurrent write
Unchanged when concurrent write and read
emptying
full = filling and equal
wr_en, rd_en 1, 0 0, 1
empty = emptying and equal
filling
32
VHDL
Multiple Clock Domains

 Need to resynchronize data that
traverses clock domains
 Use resynchronizing registers
May overrun if sender's clock is faster
than receiver's clock
 FIFO smooths out differences in data
flow rates
 Latch cells inside FIFO RAM written with
sender's clock, read with receiver's clock
33
VHDL
Dynamic RAM (DRAM)

 Data stored in a 1-transistor/1-capacitor cell
 Smaller cell than SRAM, so more per chip
 But longer access time
 Write operation bit line
word line
 pull bit-line high or low (0 or 1)
 activate word line
 Read operation
 precharge bit-line to intermediate voltage
 activate word line, and sense charge equalization
 rewrite to restore charge
34
VHDL
DRAM Refresh
 Charge on capacitor decays over time
 Need to sense and rewrite periodically
 Typically every cell every 64ms
 Refresh each location
 DRAMs organized into banks of rows
 Refresh whole row at a time
 Can’t access while refreshing
 Interleave refresh among accesses
 Or burst refresh every 64ms
35
VHDL
DDR DRAM
Feature DDR DDR2 DDR3
Voltage 2.5V 1.8V 1.5V
Max data rate per I/O pin (Mbits/sec) 800 1066 1600
Peak Bandwidth 3.2 4.2 6.4
(Gbytes/sec for a 32 bit data bus)
Sustained Bandwidth 1.9 2.5 3.8
(Gbytes/sec for a 32 bit data bus) - (60%)
Max Density 1 4 4
(Gbits per device)
36
VHDL
Read-Only Memory (ROM)

 For constant data, or CPU programs
 Masked ROM
 Data manufactured into the ROM

 Programmable ROM (PROM)
 Use a PROM programmer
 Erasable PROM (EPROM)
 UV erasable
 Electrically erasable (EEPROM)
 Flash RAM
37
VHDL
Combinational ROM
 AROMmapsaddressinputtodataoutput
 This is a combinational function!
 Specify using a table
 Example: 7-segment decoder
Address Content Address Content
BCD0 A0 D0 a
BCD1 A1 D1 b 0 0111111 6 1111101
BCD2 A2 D2 c 1 0000110 7 0000111
BCD3 A3 D3 d
blank A4 D4 e 2 1011011 8 1111111
D5 f 3 1001111 9 1101111
D6 g
4 1100110 10-15 1000000
5 1101101 16-31 0000000
38
VHDL
Example: ROM in VHDL

library ieee; use ieee.numeric_std.all;
architecture ROM_based of seven_seg_decoder is
type ROM_array is
array (0 to 31) of std_logic_vector ( 7 downto 1 );
constant ROM_content : ROM_array
:= ( 0 => "0111111", 1 => "0000110",
2 => "1011011", 3 => "1001111",
4 => "1100110", 5 => "1101101",
6 => "1111101", 7 => "0000111",
8 => "1111111", 9 => "1101111"
10 to 15 => "1000000", 16 to 31
=> "0000000" ); begin
seg <= ROM_content(to_integer(unsigned(blank & bcd)));

end architecture ROM based;
39
VHDL
Flash RAM
 Non-volatile, readable (relatively fast), writable
(relatively slow)
 Storage partitioned into blocks
 Erase a whole block at a time, then write/read
 Once a location is written, can't rewrite until erased
 NOR Flash
 Can write and read individual locations
 Used for program storage, random-access data
 NAND Flash
 Denser, but can only write and read block at a time
 Used for bulk data, e.g., cameras, memory sticks
40
VHDL
Memory Errors
 Bits in memory can be flipped
 Hard error
 The chip is broken
 E.g., manufacturing defect, wear (in Flash)
 Soft error
 Stored data corrupted, but cell still works
 E.g., from atmospheric neutrons
 Soft-error rate
 frequency of occurrence
41
VHDL
Error Detection using Parity

 Add a parity bit to each location
 On write access
 compute data parity and store with data

 On read access
 check parity, take exception on error
 If we could tell which bit flipped
 correct by flipping it back, then write back
to memory location
 Can’t do this with parity
42
VHDL
Error-Correcting Codes (ECC)

 Allow identification of the flipped bit
 Hamming Codes
 Eg , for single-bit-error correction of N-bit word ,
need log2N + 1 extra bits
 Example: 8-bit word,d1 …d 8
 12-bit ECC code, e1...e12
 e1, e2, e4, e8 are check bits, the rest data
d8 d7 d6 d5 d4 d3 d2 d1
Check bits are in bit positions
whose indices are a power of 2
e12
12
e11
11
e10
10
e99 e88 e77 e66 e55 e44 e33 e22 e11
43
VHDL
Hamming Code Example

d8 d7 d6 d5 d4 d3 d2 d1 e1 = e3 e5 e7 e9 e11
e12 e11 e10 e9 e8 e7 e6 e5 e4 e3 e2 e1 e 2 = e3 e6 e7 e10 e11

e1 0 0 0 1 e4 = e5 e6 e7 e12
e2 0 0 1 0
e4 0 1 0 0 e8 =e 9 e10 e11 e12
e8 1 0 0 0
e3 0 0 1 1
e5 0 1 0 1
 Every data bit covered by two
e6 0 1 1 0 or more check bits
e7 0 1 1 1  On write: Compute check bits
e9 1 0 0 1 and store with data
e10 1 0 1 0
e11 1 0 1 1
e12 1 1 0 0
44
VHDL
Hamming Code Example

 On read: Recompute check bits
and XOR with read check bits
 result called the syndrome
e1 0 0 0 1  0000 => no error
e2 0 0 1 0
e4 0 1 0 0
 If data bit flipped
e8 1 0 0 0  covering bits of syndrome are 1
e3 0 0 1 1  = binary code of flipped ECC bit
e5 0 1 0 1  If stored check bit flipped
e6 0 1 1 0
 that bit of syndrome is 1
e7 0 1 1 1
e9 1 0 0 1  On error, unflip bit and rewrite
e10 1 0 1 0 memory location
e11 1 0 1 1
e12 1 1 0 0
45
VHDL
Multiple-Error Detection
 What if two bits flip
 syndrome identifies wrong bit, or is invalid
 One extra check bit allows
 single-error correction, double-error detection
Single-bit correction Double-bit detection
N Check bits Overhead Check bits Overhead
8 4 50% 5 63%
16 5 31% 6 38%
32 6 19% 7 22%
64 7 11% 8 13%
128 8 6.3% 9 7.0%
256 9 3.5% 10 3.9%
46
VHDL
Summary
 Memory: addressable storage locations
 Read and Write operations
 Asynchronous RAM
 Synchronous RAM (SSRAM)
 Dynamic RAM (DRAM)
 Read-Only Memory (ROM) and Flash
 Multiport RAM and FIFOs
 Error Detection and Correction
 Hamming Codes
47
Embedded Computers
VHDL
Embedded Computers
 A computer as part of a digital system
 Performs processing to implement or control the
system’s function
 Components
 Processor core
 Instruction and data memory
Input,output,and input/output controllers
For interacting with the physical world
 Accelerators
 High-performance circuit for specialized functions
 Interconnecting buses
2
VHDL
Memory Organization
 Von Neumann architecture
 Single memory for instructions and data
 Harvard architecture
 Separate instruction and data memories
 Most common in embedded systems
Instruction Data
CPU Accelerator
memory memory
Input Output I/O …

controller controller controller
3
VHDL
Bus Organization
 Single bus for low-cost low-performance
systems
 Multiple buses for higher performance
Data
Accelerator
memory
Instruction
CPU
memory
Input Output I/O

controller controller controller
4
VHDL
Bus Organization
Traditional Bus Topology
5
VHDL
Bus Organization
Typical Switch Fabric Topology
6
VHDL
Bus Organization
Altera’s System Interconnect Fabric Example
7
VHDL
Bus Organization
Altera’s Memory-Mapped and Streaming System Interconnect Fabrics
SRIO:
Serial RapidIO is a high-
performance, point-to-
point, packet-switched
interconnect technology
defined by the RapidIO
Trade Association.
Full-duplex point-to-point
links are established with
single or multiple high-
speed serial lanes (1x and
4x are currently defined),
and industry-standard
8B/10B-encoded data
transmission at signaling
rates of 1.25, 2.50, or
3.125 Gbaud for peak
bandwidth of up to 20
Gbps.
8
VHDL
Microprocessors
 Single-chip processor in a package
 External connections to memory and
I/O buses
 Most commonly seen in general
purpose computers
Eg , IntelPentiumfamily,PowerPC, …
9
VHDL
Microcontrollers
 Single chip combining
 Processor
 A small amount of instruction/data memory
 I/O controllers
 Microcontroller families
 Same processor, varying memory and I/O
 8-bit microcontrollers NXP’s 50-MHz ARM Cortex-
M0-based LPC1100
 Operate on 8-bit data microcontroller family
represents the latest 32-bit
 Low cost, low performance challenge to 8- and 16-bit
processors. The parts are
16-bit and 32-bit microcontrollers

available now with prices
 starting at 65 to 95 cents
(10,000).
 Higher performance CoreMark Benchmark

measures 40 to 50% better
code density for the LPC1100
than that of 8- and 16-bit
microcontrollers. 10
VHDL
Processor Cores
 Processor as a component in an FPGA or
ASIC
In FPGA,can be a fixed-function block
 E.g., PowerPC cores in some Xilinx FPGAs
 Or can be a soft core
 Implemented using programmable resources
 E.g., Xilinx MicroBlaze, Altera Nios-II
 In ASIC, provided as an IP block
 E.g., ARM, PowerPC, MIPS, Tensilica cores
 Can be customized for an application
11
VHDL
Digital Signal Processors

 DSPs are processors optimized for
signal processing operations
 E.g., audio, video, sensor data; wireless
communication
 Often combined with a conventional
core for processing other data
 Heterogeneous multiprocessor
12
VHDL
Instruction Sets
 Aprocessorexecutesaprogram
 A sequence of instructions, each performing a
small step of a computation
 Instruction set: the repertoire of available
instructions
 Different processor types have different instruction
sets How are new instructions chosen to be
added to Instruction Set?
 High-level languages: more abstract

 Eg , C,C++,Ada,Java
 Translated to processor instructions by a compiler
Memory _ stall _ cycles
CPU _ time= IC × (CPI execution + ) ×Clock _ period
Instruction
13
VHDL
Instruction Execution
 Instructions are encoded in binary
 Stored in the instruction memory
 Approcessor executes a program by
repeatedly
 Fetching the next instruction
 Decoding it to work out what to do
 Executing the operation
 Program counter (PC)

 Register in the processor holding the
address of the next instruction
14
VHDL
Data and Endian-ness

 Instructions operate on data from the data memory
 Byte: 8-bit data
 Data memory is usually byte addressed
 16-bit, 32-bit, 64-bit words of data
Little endian Big endian
0 8-bit data 0 8-bit data
m least sig. byte m most sig byte

16-bit data 16-bit data
m+1 most sig. byte m+1 least sig. byte
Little endian Big endian
LSB=lowest address MSB=lowest address
n least sig. byte
Intel x86 n most sig. byte PowerPC
n+1 n+1
32-bit data 32-bit data
n+2 n+2
n+3 most sig byte n+3 least sig. byte
15
von Neumann Computer
• von Neumann Stored Program Computer
- "Stored Program" means the HW is designed to execute a set of pre-defined instructions
- the program and data reside in a storage unit (i.e., memory)
- to change the functionality of the computer, the program is changed (instead of the HW)
- John von Neumann was a mathematician who described a computer architecture where the
instructions and data reside in the same memory
- this implies sequential execution
- it is simple from the standpoint of state machine timing
- the drawback is the "von Neumann bottleneck" in getting data into and out of memory in order for
the computer to run
- this architecture is what we are using in the labs on the Freescale microcontrollers
476
• Block Diagram of von Neumann Computer
- Notice that information going into/out-of the computer is on ports.
477
• Bus Management
- There are a great deal of signal that exist in a microcomputer. Sharing lines reduces the amount
of wiring needed on the chip.
- This creates a situation where bus contention needs to be avoided.
- There are three common techniques for bus management:
1) verbose routing – every devices has a dedicated input / output bus that connects to
or explicit any/all devices that it needs to communicate.
2) High Impedance - devices share a signel output bus but each devices has a high
impedance state. Only one device is allowed to drive the bus at any
given time.
3) Mulitiplexed - device share a single output bus, but each devices routes its output
to a multiplexer which then in turn drives the bus.
478
• Block Diagram of the Central Processing Unit (CPU)
479
• Central Processing Unit (CPU)
- the CPU consists of:
1) Control Unit - the state machine that directs the execution of instructions.
- for a given Opcode, the state machine traverses a specific
path within its state diagram
- also called the "Sequence Controller" or "Sequencer"
2) Processing Unit - contains all of the registers and ALU that hold and manipulate data
- memory signals (data/address) coming into/out-of this unit
3) Control Signals - signals sent to processing unit from the control unit
- direct data flow
- load data into registers
- select ALU operation
- manage memory access signals
4) Test Signals - signals sent to control unit from the processing unit
- results of operations that effect state machine flow
480
von Neumann Computer (Processing Unit)
• Processing Unit
- let's start with the registers within the processing unit
Instruction Registers (IR) - holds the Opcode that is read from memory
- passes the Opcode to the Control Unit as a test signal
Memory Address Reg (MAR) - holds the current address being sent to memory
Program Counter (PC) - tracks the address of which instruction is being executed
- PC is sequential (0,1,2…)
- PC is loaded during a branch, incremented otherwise
- MAR tracks PC when executing instruction
User-Controlled Reg (X, Y,..) - these are operated on directly by the program
- can be loaded and stored
ALU Operand Register (Z) - holds one of the inputs to the ALU
- the other input comes from one of the user-controlled registers
481
• Processing Unit
Arithmetic / Logic Unit (ALU)

- performs data math and manipulation
- we first load Z with the first input
- we then select which user-controlled register is the other input
- the control unit sends select lines to indicate which operation to perform
Condition Code Register (CCR)

- tracks the status of ALU operations (i.e., NZVC)
- these signals are sent to the control unit in order to alter sequence flow
482
• Buses
- for this example, let’s use a multiplexed bus sytsem
- we route data in the processing unit between registers/memory using shared lines called buses
- for this architecture, we need two buses
Bus1 - can take either PC or the User-Controlled Registers
- will drive to Memory_In or Bus 1
Bus2 - can take either ALU, Bus1, or Memory_Out
- will drive to IR, MAR, PC, User-Controlled Registers, or ALU Operand Reg
- Information from Bus1 can be routed to Bus2 for feedback operations (PC = PC + 1)
- Bus select lines come from the Control Unit to select which information is on which bus at any
given time.
483
• Control Signals
- the Bus1 and Bus2 control lines come from the control unit and drive the multiplexers
- the WRITE line is a synchronous load to memory from Memory_Out
- CCR_Load will load the status bits (NZVC), whose values depend on the previous ALU operation
- the ALU_Sel line tells the ALU which function to perform (AND, ADD, …)
• Test Signals
- the Instruction Register (IR) holds the Opcode for the Control Unit to base state decisions on
- the CCR_Result is the NZVC status bits from an ALU operation and influence state decisions
484
• Register Modeling
- each register in the processing unit can be loaded by the control unit.
- the input to most registers is Bus2
- the loads are synchronous to clock and occur on the following state
Instruction Register (IR) Memory Address Register (MAR)

IR_Register : process (Clock, Reset) MAR_Register : process (Clock, Reset)
begin begin
if (Reset = '0') then if (Reset = '0') then
IR <= x"00"; MAR <= x"00";
elsif (Clock'event and Clock='1') then elsif (Clock'event and Clock='1') then
if (IR_Load = '1') then if (MAR_Load = '1') then
IR <= Bus2; MAR <= Bus2;
end if; end if;
end if; end if;
end process; end process;
485
• Register Modeling Cont…

- The Program Counter needs a “load” and an “increment”
Program Counter (PC)
PC_Register : process (Clock, Reset)
begin
PC <= x"00";
if (PC_Load = '1') then
PC <= Bus2;
elsif (PC_Inc = '1') then
PC <= PC + 1;
end if;
end if;
end process;
X Register Y Register Z Register

X_Register : process (Clock, Reset) Y_Register : process (Clock, Reset) Z_Register : process (Clock, Reset)
begin begin begin
if (Reset = '0') then if (Reset = '0') then if (Reset = '0') then
X <= x"00"; Y <= x"00"; Z <= x"00";
elsif (Clock'event and Clock='1') then elsif (Clock'event and Clock='1') then elsif (Clock'event and Clock='1') then
if (X_Load = '1') then if (Y_Load = '1') then if (Z_Load = '1') then
X <= Bus2; Y <= Bus2; Z <= Bus2;
end if; end if; end if;
end if; end if; end if;
end process; end process; end process;
486
• MUX Modeling
- The bus select signals come from the control unit. The Multiplexers are “combinational logic”
Bus 1 Bus 2
BUS1_CONTROL : process (Bus1_Sel, PC, X, Y) BUS2_CONTROL : process (Bus2_Sel, ALU, Bus1, Memory_Out)
begin begin
case (Bus1_sel) is case (Bus2_sel) is
when "00" => Bus1 <= PC; when "00" => Bus2 <= ALU;
when "01" => Bus1 <= X; when "01" => Bus2 <= Bus1;
when "10" => Bus1 <= Y; when "10" => Bus2 <= Memory_Out;
when others => Bus1 <= "XXXXXXXX"; when others => Bus2 <= "XXXXXXXX";
end case; end case;
end process; end process;
487
von Neumann Computer (ALU)
• ALU Modeling
- The ALU is combinational logic. It contains as many operations as desired. The operation being
performed is dictated by the control unit.
ALU
ALU_Functions : process (ALU_Sel, Z, Bus1)
begin
case (ALU_sel) is
when '0' => ALU <= Z and Bus1; -- AND
when '1' => ALU <= Z + Bus1; -- ADD
when others => ALU <= x"00";
end case;
end process;
488
von Neumann Computer (ALU)
• CCR Modeling
- The CCR is a register because we want it to hold the status flags across multiple instructions.
- Typical flags are: Negative (N), Zero (Z), 2’s Comp Overflow (V), and Carry (C)
- These flags are fed back to the control unit for state transition decisions during branch instructions
(i..e, Branch if Zero, Branch if Carry, etc…)
CCR example for Zero Flag
CCR_Register : process (Clock, Reset)
begin
CCR_Result <= x"00";
if (CCR_Load = '1') then
if (ALU = x"00") then
CCR_Result <= "00000100";
else
CCR_Result <= "00000000";
end if;
end if;
end if;
end process;
489
von Neumann Computer (Control Unit)
• Sequence Control Modeling

- The control unit is the finite state machine that handles the computer operations of:
Fetch, Decode, & Execute
- It consists of a single state transition path for Fetch & Decode followed by a set of parallel paths
which handle the execution of each instruction in the instruction set of the microcomputer.
- The Sequence Controller creates all of the control signals which drive the processing unit & ALU.
- Its inputs include:
- The Instruction Register (for decoding the Opcode)

- The Condition Code Register (for branching)
490
• Sequence Control State Diagram

- Example State Paths for:
1) Load X with Immediate Addressing

2) Store X with Immediate Addressing
3) Branch Always
Fetch States handle reading

the OpCode from memory and
placing it in the Instruction
Register.
Decode State(s) handle giving

time for the state machine to
decide which instruction was
read
Execute State(s) perform the

specific operation for each of
the instructions in the
microcomputer’s instruction
set.
491
• Sequence Controller Modeling

- Instruction mnemonics can be symbolized using “generics”
Mnemonics for 3 instructions
generic (LDX_IMM : STD_LOGIC_VECTOR (7 downto 0) := x"86"; -- Load Register X with Immediate Addressing
STX_DIR : STD_LOGIC_VECTOR (7 downto 0) := x"96"; -- Store Register X to memory (RAM or IO)
BRA : STD_LOGIC_VECTOR (7 downto 0) := x"20"); -- Branch Always
- States are included as instructions are added to the instruction set

State Names for executing 3 instructions
type State_Type is (S_FETCH_0, S_FETCH_1, S_FETCH_2, -- States to Fetch Opcode
S_DECODE_3, -- State to Decode Opcode
S_LXIMM_4, S_LXIMM_5, S_LXIMM_6, -- States for LDX_IMM Instruction
S_STXDIR_4, S_STXDIR_5, S_STXDIR_6, S_STXDIR_7, -- States for STX_DIR Instruction
S_BRA_4, S_BRA_5, S_BRA_6); -- States for BRA Instruction
492

- The FSM is then modeled using the traditional 3-process technique in VHDL
Next State Memory
STATE_MEMORY : process (Clock, Reset)
begin
Current_State <= S_FETCH_0; -- State upon reset
Current_State <= Next_State; -- Normal Operation
end if;
end process STATE_MEMORY;
493

NEXT_STATE_LOGIC : process (Current_State, IR) Next State Logic
begin
when S_FETCH_0 => Next_State <= S_FETCH_1; -- Fetch First Opcode

when S_FETCH_1 => Next_State <= S_FETCH_2;
when S_FETCH_2 => Next_State <= S_DECODE_3;
when S_DECODE_3 => if (IR = LDX_IMM) then
Next_State <= S_LXIMM_4; -- LDX_IMM Instruction
elsif (IR = STX_DIR) then
Next_State <= S_STXDIR_4; -- STX_DIR Instruction
elsif (IR = BRA) then
Next_State <= S_BRA_4; -- BRA Instruction
end if;
when S_LXIMM_4 => Next_State <= S_LXIMM_5; -- States when the instruction is Load X Immediate
when S_LXIMM_5 => Next_State <= S_LXIMM_6;
when S_LXIMM_6 => Next_State <= S_FETCH_0;
when S_STXDIR_4 => Next_State <= S_STXDIR_5; -- States when the instruction is Store X Direct
when S_STXDIR_5 => Next_State <= S_STXDIR_6;
when S_STXDIR_6 => Next_State <= S_STXDIR_7;
when S_STXDIR_7 => Next_State <= S_FETCH_0;
when S_BRA_4 => Next_State <= S_BRA_5; -- States when the instruction is a Branch Always
when S_BRA_5 => Next_State <= S_BRA_6;
when S_BRA_6 => Next_State <= S_FETCH_0;
when others => Next_State <= S_FETCH_0;

end case;
end process NEXT_STATE_LOGIC;
494

Output Logic
OUTPUT_LOGIC : process (Current_State) -- Moore Type
begin
when S_FETCH_0 => Bus1_Sel <= "00"; -- Bus1_Sel = PC
Bus2_Sel <= "01"; -- Bus2_Sel = Bus1
IR_Load <= '0';
MAR_Load <= '1'; -- Mar Load
PC_Load <= '0';
PC_Inc <= '0';
X_Load <= '0';
Y_Load <= '0';
Z_Load <= '0';
Write <= '0';
ALU_Sel <= '0';
CCR_Load <= '0';
when S_FETCH_1 => Bus1_Sel <= "00";

Bus2_Sel <= "10"; -- Bus2_Sel = Memory_Out
IR_Load <= '0';
MAR_Load <= '0';
PC_Load <= '0';
PC_Inc <= '1'; -- PC Inc
X_Load <= '0';
Y_Load <= '0';
Z_Load <= '0';
Write <= '0';
ALU_Sel <= '0';
CCR_Load <= '0';
495
Gumnut Core in VHDL
VHDL
The Gumnut Core

 Asmall8-bitsoftcore
 Can be used in FPGA designs
Instruction set illustrates features typical of 8-
bit cores and processors in general
 Programs written in assembly language

 Each processor instruction written explicitly
 Translated to binary representation by an
assembler
 Resources available on companions web site
16
VHDL
Gumnut Storage
General-Purpose Registers Condition Code Registers
How many
r0 0 C Carry
registers should r1 Z Zero
you encode for in r2
the instruction? r3
Two? Three?
r4 Program Counter
How many r5 PC
registers should r6
there be? r7
Data Memory Instruction Memory

(256 × 8-bit, 8-bit addresses) (4K × 18-bit, 12-bit addresses)
0 0
1 1
2 2
254 4094
255 4095
17
VHDL
Arithmetic Instructions
 Operate on register data and put result
in a register
 add,addc,sub,subc
 Can have immediate value operand
 Condition codes
 Z: 1 if result is zero, 0 if result is non-zero
 C: carry out of add/addc, borrow out of
sub/subc
 addc and subc include C bit in
operation
18
VHDL
Arithmetic Instructions
 Examples
 add r3, r4, r1 
add r5, r1, 2

 sub r4, r4, 1
 Evaluate 2x + 1; x in r3, result in r4
 add r4 ,r4 ,r3 ; double x
add r4, r4, 1 ; then add 1
19
VHDL
Logical Instructions
 Operate on register data and put result
in a register
 and, or, xor, mask (and not)
 Operate bitwise on 8-bit operands
 Can have immediate value operand
 Condition codes
 Z: 1 if result is zero, 0 if result is non-zero
 C: always 0
20
VHDL
Logical Instructions
 Examples
 and r3, r4, r5
 or r1, r1, 0x80 ; set r1(7)
 xor r5, r5, 0xFF ; invert r5
 Set Z if least-significant 4 bits of r2 are 0101
 and r1, r2, 0x0F ; clear high bits
sub r0, r1, 0x05 ; compare with 0101
21
VHDL
Shift Instructions
 Logical shift/rotate register data and
put result in a register
 shl, shr, rol, ror
 Count specified as a literal operand
 Condition codes
 Z: 1 if result is zero, 0iif result is non-zero
 C: the value of the last bit shifted/rotated
past the end of the byte
22
VHDL
Shift Instructions
 Examples
 shl r4, r1, 3
 ror r2, r2, 4
 Multiply r4 by 8, ignoring overflow
 shl r4, r4, 3
 Multiply r4 by 10, ignoring overflow
 shl r1, r4, 1; multiply by 2
shl r4, r4, 3 ; multiply by 8
add r4, r4, r1
23
VHDL
Memory Instructions
 Transfer data between registers and data
memory
 Compute address by adding an offset to a base
register value
 Load register from memory
 ldm r1, (r2)+5
 Store from register to memory
 stm r1, (r4)-2
 Use r0 if base address is 0
 ldm r3, 23 ≡ ldm r3, (r0)+23
 Condition codes not affected
24
VHDL
Memory Instructions
 Increment a 16-bit integer in memory
 Little-endian: address of lsb in r2, msb in next
location
 ldm r1, (r2) ; increment lsb
add r1, r1, 1
stm r1, (r2)
ldm r1, (r2)+1 ; increment msb
addc r1, r1, 0 ; with carry
stm r1, (r2)+1
25
VHDL
Input/Output Instructions
 I/O controllers have registers that govern
their operation
 Each has an address,like data memory
 Gumnut has separate data and I/O address spaces

Input from I/O register
 inp r3, 157 ≡ inp r3, (r0)+157
 Output to I/O register
 out r3, (r7) ≡ out r3, (r7)+0
 Condition codes not affected
 Further examples in Chapter 8
26
VHDL
Branch Instructions
 Programs can evaluate conditions and take
alternate courses of action
 Condition codes (Z,C) represent outcomes of
arithmetic/logical/shift instructions
 Branch instructions examine Z or C
 bz, bnz, bc, bnc
Add a displacement to PC if condition is true
 Specifies how many instructions forward or
backward to skip
 Counting from instruction after branch
27
VHDL
Branch Example
 Elapsed seconds in location 100
 Increment, wrapping to 0 after 59
 ldm r1, 100
add r1, r1, 1
sub r0, r1, 60 ; Z set if r1 = 60
bnz +1 ; Skip to store if
add r1, r0, 0 ; Z is 0
stm r1, 100
28
VHDL
Jump Instruction
 Unconditionally skips forward or backward to
specified address
 Changes the PC to the address
 Example: if r1 = 0, clear data location 100 to
0; otherwise clear location 200 to 0
 Assume instructions start at address 10
 10: sub r0, r1, 0
11: bnz +2
12: stm r0, 100
13: jmp 15
14: stm r0, 200
15: …
29
VHDL
Subroutines
 Asequenceofinstructionsthatperform
some operation
 Can call them from different parts of a
program using a jsb instruction
 Subroutine returns with a ret instruction
jsb m m subroutine
… instructions
…
jsb m
…
…
ret
30
VHDL
Subroutine Example
 Subroutine to increment second count
 Address of count in r2
 ldm r1, (r2)
add r1 ,r1,1
sub r0, r1, 60
bnz +1
add r1 ,r0,0
stm r1, (r2)
ret
 Call to increment locations 100 and 102
 add r2, r0, 100
jsb 20
add r2, r0, 102
jsb 20
31
VHDL
Return Address Stack

 The jsb saves the return address for
use by the ret
 But what if the subroutine includes a jsb?
Gumnut core includes an 8-entry push-
down stack of return addresses
return addr for third call

return addr for second call return addr for second call
return addr for first call return addr for first call
32
VHDL
Miscellaneous Instructions
 Instructions supporting interrupts
 See Chapter 8 (more later)
 reti Return from interrupt
 enai Enable interrupts
 disi Disable interrupts
 wait Wait for an interrupt
 stby Stand by in low power mode until
an interrupt occurs
33
VHDL
The Gumnut Assembler

 Gasm: translates assembly programs
 Generates memory images for program
text (binary-coded instructions) and data
 See documentation on web site
 Write a program as a text file

 Instructions
 Directives 
Comments
 Use symbolic labels
34
VHDL
Example Program
; Program to determine greater of value_1 and value_2
text
org 0x000 ; start here on reset
jmp main
; Data memory layout
data
value_1: byte 10
value_2: byte 20
result: bss 1
; Main program
text
org 0x010
main: ldm r1, value_1 ; load values
ldm r2, value_2
sub r0, r1, r2 ; compare values
bc value_2_greater
stm r1, result ; value_1 is greater
jmp finish
value_2_greater: stm r2, result ; value_2 is greater
finish: jmp finish ; idle loop
35
VHDL
Gumnut Instruction Encoding

 Instructions are a form of information
 Can be encoded in binary
 Gumnut encoding
 18 bits per instruction
 Divided into fields representing different
aspects of the instruction

 Opcodes and function codes
 Register numbers The VAX has a computer architecture with easily
the most complex instruction set.
 Addresses
The instruction set has a highly variable format
where the minimal instruction length is 1 byte
and the longest instruction is 37 bytes (296 bits).
36
VHDL
Gumnut Instruction Encoding

4 3 3 3 2 3
Arith/Logical
1 1 1 0 rd rs rs2 fn
Register
1 3 3 3 8
Arith/Logical
0 fn rd rs immed
Immediate
3 1 3 3 3 3 2
Shift 1 1 0 rd rs count fn
2 2 3 3 8
Memory, I/O 1 0 fn rd rs offset
6 2 2 8
Branch 1 1 1 1 1 0 fn disp
5 1 12
Jump 1 1 1 1 0 fn addr
7 3 8
Miscellaneous 1 1 1 1 1 1 0 fn
37
VHDL
Encoding Examples
 Encoding for addc r3 ,r5, 24
 Arithmetic immediate, fn = 001
1 3 3 3 8
0 fn rd rs immed
0 0 0 1 0 1 1 1 0 1 0 0 0 1 1 0 0 0  05D18
 Instruction encoded by 2ECFC

1 1 1 1 1 0 1 1 0 0 1 1 1 1 1 1 0 0
6 2 2 8
Branch 1 1 1 1 1 0 fn disp  bnc -4
38
VHDL
Other Instruction Sets

 8-bit cores and microcontrollers
 Xilinx PicoBlaze: like Gumnut
051,and numerous like it
„ Originated as 8-bit microprocessors
„ Instructions encoded as one or more bytes
„ Instruction set is more complex and irregular
„ Complex instruction set computer (CISC)
„ C.f. Reduced instruction set computer (RISC)
 16-, 32- and 64-bit cores
 Mostly RISC
 E.g., PowerPC, ARM, MIPS, Tensilica, …
39
VHDL
Instruction and Data Memory

 In embedded systems
 Instruction memory is usually ROM, flash,
SRAM,or combination
 Data memory is usually SRAM
 DRAM if large capacity needed

 Processor/memory interfacing
 Gluing the signals together
40
VHDL
Example: Gumnut Memory
instruction gumnut data

ROM clk_i SRAM
clk_i rst_i clk_i
en inst_cyc_o data_cyc_o en
inst stb o data stb o
D Q inst_ack_i data_ack_i Q D
clk clk
data we o we
adr inst_adr_o
dat_o inst_dat_i data_adr_o adr
data_dat_o dat_i
data_dat_i dat_o
41
VHDL
IMem : process (clk) is

begin
if inst_cyc_o = '1' and inst_stb_o = '1' then
inst_dat_i <=
instr_ROM(to_integer(inst_adr_o(10 downto 0)));
inst_ack_i <= '1';
else
inst_ack_i <= '0';
end if;
end if;
end process IMem;
42
VHDL

DMem : process (clk) is
begin
if data_cyc_o = '1' and data_stb_o = '1' then
if data_we_o = '1' then
data_RAM(to_integer(data_adr_o)) <= data_dat_o;
data_dat_i <= data_dat_o;
data_ack_i <= '1';
else
data_dat_i <= data_RAM(to_integer(data_adr_o));
data_ack_i <= '1';
end if;
else
data_ack_i <= '0';
end if;
end if;
end process DMem;
43
VHDL
Example: Microcontroller Memory
8051 SRAM
P2 A(15..8)
D
P0 D Q A(7..0)
ALE LE
PSEN A(16)
WR WE
OE
RD
CE
PSEN (program store enable)
44
VHDL
32-bit Memory
 Four bytes per memory word
 Little-endian: lsb at least address
 Big-endian: msb at least address
0 1 2 3
4 5 6 7
8 9 10 11
 Partial-word read
 Read all bytes, processor selects those needed
 Partial-word write
 Use byte-enable signals
45
VHDL
Example: MicroBlaze Memory

2:16 SSRAM
Addr
Data_Write A
0:7 0:7
AS D_in D_out
Write_Strobe en
Byte_Enable(0) wr
Byte Enable(1) clk
Byte_Enable(2) SSRAM
Byte_Enable(3) A
Read_Strobe 8:15 8:15
D_in D_out
en
Data Read
wr
+V
clk
Ready
Clk SSRAM
A
16:23 16:23
D_in D_out
en
wr
clk
SSRAM
A
24:31 24:31
D_in D_out
en
wr
clk
46
VHDL
Cache Memory
 For high-performance processors
 Memory access time is several clock cycles
 Performance bottleneck
 Cache memory
 Small fast memory attached to a processor
 Stores most frequently accessed items,
plus adjacent items

 Locality: those items are most likely to be
accessed again soon
47
VHDL
Cache Memory
 Memory contents divided into fixed-
sized blocks (lines)
 Cache copies whole lines from memory
 When processor accesses an item
 If item is in cache: hit - fast access
 Occurs most of the time
 If item is not in cache: miss
 Line containing item is copied from memory
 Slower, but less frequent
 May need to replace a line already in cache
48
VHDL
Fast Main Memory Access

 Optimize memory for line access by cache
 Wide memory
 Read a line in one access
 Burst transfers
 Send starting address, then read successive locations
 Pipelining
 Overlapping stages of memory access
 E.g., address transfer, memory operation, data transfer
 Double data rate (DDR), Quad data rate (QDR)
 Transfer on both rising and falling clock edges
49
VHDL
Summary
 Embedded computer
 Processor, memory, I/O controllers, buses
 Microprocessors,microcontrollers,and
processor cores
 Soft-core processors for ASIC/FPGA
 Processor instruction sets

 Binary encoding for instructions
 Assembly language programs
 Memory interfacing
50
532

Digital Design

Diunggah oleh

Informasi Dokumen

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Digital Design

Diunggah oleh

Hak Cipta:

Format Tersedia

Modern Digital Design Flow

1. History of Digital Design Approach

Synthesis - the creation of circuitry from a functional description

ex) "Functional Description of MUX"

1990's - Now engineers had a power combination

- This allows larger systems to be described/designed in the same time

- Let the tool "Verify" timing

- The higher in abstraction we go, the more complex

- But, we let go over the details of how it performs

- There are engineering jobs at each level

- Guru's can span multiple levels

• What does VHDL model?

- System : Chip : Register : Gate

- VHDL let's us describe systems in two ways:

1) Structural (text netlist)

- this is the ideal process

Write VHDL, Simulate with ModelSim

Synthesize in Quartus, Run Timing Simuluation

Place/Route on FPGA, Download, Test

Take idea, create custom HW to reduce cost

- ASICs (Application Specific Integrated Circuits (custom silicon)

- Programmable Logic (CPLDs, FPGAs)

• FPGAs have become one of the most popular technologies recently

- We’ll use an FPGA in this class to test our designs

- We’ll use the ModelSim simulator for functional simulation

- We’ll use the Altera Quartus II design software for

- We’ll use an Altera Cyclone II FPGA on a DE2 evaluation board

Field Programmable Gate Array

• An FPGA uses Re-configurable Logic Blocks

- the configuration is a Truth Table (or Look Up Table) of functionality

In1 config Out

• The LUTs are configured into an ARRAY on the silicon

- Array of LUT's = Array of Gates = Gate Array

config config config

In1 In1 In1

config config config

In1 In1 In1

config config config

- there are programmable interconnect switches that connect the LUTs

LUT X LUT X LUT

LUT X LUT X LUT

LUT X LUT X LUT

- We start with a Gate Level Schematic of our design (from synthesis)

LUT X LUT X LUT

LUT X LUT X LUT

LUT X LUT X LUT

A INV X AND X LUT

LUT X LUT X LUT

- The LUT and Interconnect configuration is volatile

- Since the programming is done by the user after fabrication, we call

LUT X LUT X LUT

- FPGA manufacturer's quickly learned that Flip-Flops would be useful

- To Improve performance, common logic

- Which Pins to use on the package

- What logic levels

1. Hardware Description Languages

V = Very High Speed Integrated Circuit

- Originally a Department of Defense sponsored project in the 80's

- Original Intent was to Document Behavior (instead of writing system manuals)

- Original Intent was NOT synthesis, that came later

- Designed by IBM, TI, Intermetrics (all sponsored by DoD)

- In 1987, IEEE published the "VHDL Standard"

- IEEE 1076-1987 = First formal version of VHDL

- each signal/variable is typed (bit, bit_vector, real, integer)