• Agenda
1970's
- designers used Paper/Pencil & Boolean Equations to create schematics
- the drawback :
- each flop required a Boolean equation
- impractical in large designs
1980's
- schematic based designs using electronic editors
- this enabled Copy/Past & Hierarchy
- Design-reuse was enabled which increased design sizes
mid 80's
- HDL's became more common (created mid 80's)
- Text-based Compilers (C, PASCAL) could be adapted to perform digital simulation
- Larger Designs could be described using text
Design
Physical
Simulation Still separate Implementation
2
History
• More recently
1990's
- Synthesis became practical due to increase in computational power of computers
3
HDL
• Real Power
"HDL"
if (Sel = 0)
Out = A
else
Out = B
"Simulation" "Synthesis"
Sel
A
Out
B
4
HDL
• Abstraction
Engineers could now stay at a higher level of abstraction and rely on the tools to
1) Simulation
2) Synthesize the circuitry
- Since HW is expensive to build, using the tools to reduce prototyping was the next step
5
HDL
• Timing Verification
HDL
Place/Route
(extract RC's)
Post Implementation
Match? Simulation
Fab
6
Hardware Description Languages vs.
Programming Languages
• Program structure
– instantiation of multiple components of the same type
– specify interconnections between modules via schematic
– hierarchy of modules (only leaves can be HDL in Xilinx Foundation)
• Assignment
– continuous assignment (logic always computes)
– propagation delay (computation takes time)
– timing of signals is important (when does computation have its effect)
• Data structures
– size explicitly spelled out - no dynamic structures
– no pointers
• Parallelism
– hardware is naturally parallel (must support multiple threads)
– assignments can occur in parallel (not just sequentially)
7
Hardware Description Languages and
Combinational Logic
• Modules - specification of inputs, outputs, bidirectional, and
internal signals
• Continuous assignment - a gate's output is a function of its
inputs at all times (doesn't need to wait to be "called")
• Propagation delay- concept of time and delay in input affecting
gate output
• Composition - connecting modules together with wires
• Hierarchy - modules encapsulate functional blocks
• Specification of don't care conditions (accomplished by setting
output to “x”)
8
Hardware Description Languages and
Sequential Logic
• Flip-flops
– representation of clocks - timing of state changes
– asynchronous vs. synchronous
• FSMs
– structural view (FFs separate from combinational logic)
– behavioral view (synthesis of sequencers)
• Data-paths = ALUs + registers (e.g. Combinational Lock)
– use of arithmetic/logical operators
– control of storage elements
• Parallelism
– multiple state machines running in parallel
• Sequential don't cares
9
Design Abstraction
• At What level can we design?
10
Design Abstraction
• What does abstraction give us?
11
VHDL/Verilog: Structure/Behavior
• Supports structural and behavioral descriptions
• Structural
– explicit structure of the circuit
– e.g., each logic gate instantiated and connected to others
• Behavioral
– program describes input/output behavior of circuit
– many structural implementations could have same behavior
– e.g., different implementation of one Boolean function
• We’ll only be using behavioral VHDL/Verilog in design works
– rely on schematic when we want structural descriptions
12
Modern Digital Design Flow
• Designing Large Digital Circuits
13
Digital Design Flow
• Designing Large Digital Circuits
- this is reality
14
Digital Design Flow
• A More Detailed Breakdown Relation to our class
HW or Lab Assignment
15
Hardware Design Flow
16
Digital Implementation
• What options do we have for hardware implementation?
- Discrete Devices (i.e., go to the stock room and buy NAND gates & Flip-flops)
17
FPGA's
• What is an FPGA
- we set the config bits of this block to set its Boolean logic function
18
FPGA's
• LUTs = Look Up Tables
- we can program the LUTs to be whatever type of gate is needed by the design
- there are a finite number of LUTs within a given FPGA (also called "resources")
19
FPGA's
• Programmable Interconnect
X X X X X
X X X X X
20
FPGA's
• Configuration
X X X X X
X X X X X
21
FPGA's
• Configuration
- The interconnect switches are then programmed to implement the net connections
B X X X X X Out
C INV X OR X LUT
X X X X X
22
FPGA's
• Configuration
B X X X X X Out
C INV X OR X LUT
X X X X X
- We now understand where the name “Field Programmable Gate Array” comes from.
23
FPGA's
• Adding More Functionality
- They put a DFF next to a 4-Input LUT to form a "Configurable Logic Block" (CLB)
CLB X CLB
X X X
CLB X CLB
24
FPGA's
• Adding Even More Functionality
- Block RAM
- Adders / Multipliers
- Global Clock Buffers
- even Microprocessors!
25
FPGA's
• What else can we program?
- CMOS_33, CMOS25
- SSTL, SSTL2, etc…
26
VHDL
• Agenda
- Simulation was a given, since the designs were already in text and we had text compilers (C, ….)
28
VHDL History
• VHDL & IEEE
29
VHDL History
• VHDL & IEEE
- a "transceiver" has both a transmit (i.e., a gate facing out) and receive (i.e., a gate facing in)
30
VHDL History
• VHDL & IEEE
- but that circuit doesn't actually work because the driving gate will always be driving?
Tx/Rx
Tx/Rx'
31
VHDL History
• VHDL & IEEE
- High Impedance
Tx/Rx
Tx/Rx
Tx/Rx
- it is how circuits behave, strong drivers will control the bus when everyone is High-Z
- VHDL's built in types (bit and bit_vector) can only be 0 or 1, these don't cut it.
- Weak/Strong
- Some busses have multiple drivers but some are weaker than others (i.e., MCAN)?
32
VHDL History
• VHDL & IEEE
- VHDL allows users to come up with their own data types. Since the world needed multi-valued logic,
everyone started creating their own add-on packages.
- this created a lot of confusion when multiple vendors worked together (i.e., Fab Shop and Designer)
- IEEE 1164 - added support for Multi-Valued Logic through the "STD_LOGIC" package
- better syntax consistency
- Every time there is a need for a data type, industry will start to create add-ons. Then IEEE
will create a standard to reduce confusion
- The last rev of VHDL in 2003 (1076.3) is considered by most to be the more recent major release
- Although people are talking about VHDL 2006 (which now has turned into VHDL 200x)
33
VHDL History
• At What level can we design?
34
VHDL History
• What does abstraction give us?
35
VHDL Systems and Signals
• Systems
Behavior Structure
- We can describe an "Adder" system in multiple ways and at multiple levels of abstraction
36
VHDL Systems and Signals
• System Interface
Adder
In1
Out
In2
37
VHDL Systems and Signals
• System Behavior
Adder
In1
Out
In2
1) Interface
2) Behavior
38
VHDL Systems and Signals
• Signals
Adder
In1
Out Adder
In2
In1
Adder Out
In2
In1
Out
In2
Internal Signals
External Signals
39
VHDL Entity
• VHDL
entity declaration
architecture definition
40
VHDL Entity
• More Syntax Notes
41
VHDL Entity
• Entity Details
42
VHDL Entity
• Entity Syntax
entity entity-name is
43
VHDL Entity
• Entity Example
NOTES: - we can also put "Generics" within an entity, which are dynamic variables
44
VHDL Entity
• Systems in VHDL adder.vhd
entity declaration
- Systems need to have two things described
45
VHDL Architecture
• Architecture Details
- an architecture is always associated with an entity (in the same file too)
2) entity-name - the name of the entity that this architecture is associated with
- must already be declared before compile
46
VHDL Architecture
• Architecture Syntax
type…
signal…
constant…
function…
procedure…
component…
begin
…behavior or structure
NOTE: - the keywords are architecture, of, is, type…component, begin, end
- there is a ";" at the end of the last line
47
VHDL Architecture
• Architecture definition of an AND gate
begin
Out1 <= In1 and In2;
48
VHDL Packages
• VHDL is a "Strong Type Cast" language…
- this means that assignments between different data types are not allowed.
- this means that operators must be defined for a given data types.
1) Data Types
2) Operators
49
VHDL Packages
• Pre-defined Functionality
• Adding on Functionality
50
VHDL Packages
• IEEE Packages
- when functionality is needed in VHDL, engineers start creating add-ons using Packages
- when many packages exist to perform the same function (or are supposed to)
keeping consistency becomes a problem
- IEEE publishes "Standards" that give a consistent technique for engineers to use in VHDL
51
VHDL Packages
• Common IEEE Packages
STD_LOGIC_1164
STD_LOGIC_ARITH
STD_LOGIC_SIGNED
52
VHDL Design
• Let's Put it all together now…
begin
Out1 <= In1 and In2;
53
VHDL Design
• Another Example…
begin
Out1 <= not In1;
54
VHDL Data Types
• Signals
- a bus (or multiple bits represented with one name) is called a Vector
or
- the Most Significant Bit (MSB) is ALWAYS on the left of the range description:
data_bus(7) = MSB
data_bus(0) = MSB
55
VHDL Data Types
• Signals
External - are outside the Entity's Interface and connect it to other systems
56
VHDL Data Types
• Scalar Data Types (Built into VHDL)
- scalar means that the type only has one value at any given time
Character - values are all symbols in the 8-bit ISO8859-1 set (i.e., Latin-1)
- examples are '0', '+', 'A', 'a', '\'
57
VHDL Data Types
• Array Data Types (Built into VHDL)
- unlimited range
- first element of array has index=0 (i.e., Addr_bus(0)…)
58
VHDL Data Types
• Physical Data Types (Built into VHDL)
- we can create our own descriptive types, useful for State Machine
- no quotes needed
59
VHDL Operators
• VHDL Operators
- We'll first start with the STANDARD Package that comes with VHDL
60
VHDL Operators
• Logical Operators
not
and
nand
or
nor
xor
xnor
61
VHDL Operators
• Numerical Operators
+ "addition"
- "subtraction"
* "multiplication"
/ "division"
mod "modulus"
rem "remainder"
abs "absolute value"
** "exponential"
Z <= A + B;
62
VHDL Operators
• Relational Operators
- works on types: BOOLEAN, BIT, BIT_VECTOR, CHARACTER, INTEGER, REAL, TIME, STRING
= "equal"
/= "not equal"
< "less than"
<= "less than or equal"
> "greater than"
>= "greater than or equal"
63
VHDL Operators
• Shift Operators
- a negative Number of Shifts (i.e., "-") is valid and reverses the direction of the shift
64
VHDL Operators
• Concatenation Operator
& "concatenate"
65
VHDL Operators
• Assignment Operators
Ex) x <=y;
a <= b or c;
sum <= x + y;
66
VHDL Operators
• Delay Modeling
- VHDL has two types of timing models that allow more accurate representation of real gates
2) Transport Delay
67
VHDL Operators
• Inertial Delay
- if the input has two edge transitions in less time than the inertial delay, the pulse is ignored
- this models the behavior of trying to charge up the gate capacitance of a MOSFET
68
VHDL Operators
• Transport Delay
- transport delay will always pass the pulse, no matter how small it is.
- we have to explicitly call out this type of delay using the "transport" keyword
69
Generics vs. Constants
• Generics vs. Constants
1) Generics
2) Constants
70
Generics vs. Constants
• Generics
- declared in Entity
syntax:
generic (gen-name : gen-type := init-val)
71
Generics vs. Constants
• Constants
- declared in Architecture
- needs to be initialized
syntax:
constant (const-name : const-type := init-val)
begin
Out1 <= not In1 after t_dly;
72
VHDL Concurrent Signal Assignments
• Concurrency
- the way that our designs are simulated is important in modeling real HW behavior
- we simply list our signal assignments (<=) after the "begin" statement in the architecture
- each time any signal on the Right Hand Side (RHS) of the expression changes,
the Left Hand Side (LHS) of the assignment is updated.
73
VHDL Concurrent Signal Assignments
begin
74
VHDL Concurrent Signal Assignments
- if these are executed concurrently, does it model the real behavior of this circuit?
Yes, that is how these gates operate. We can see that there may be timing that
needs to be considered….
75
VHDL Concurrent Signal Assignments
- notice that we are assigning static values (0 and 1), this is essentially a "Truth Table"
- if using this notation, make sure to include every possible input condition, or else you haven't
described the full operation of the circuit.
76
VHDL Concurrent Signal Assignments
- Again, make sure to include every possible input condition, or else you haven't
described the full operation of the circuit.
- If you try to synthesis an incomplete description, the tool will start making stuff up!
77
VHDL Concurrent Signal Assignments
- We can also use a technique that allows the listing of "choices" and "assignments" in a comma
delimited fashion.
syntax:
- we use the term "others" to describe any input condition that isn't explicitly described
78
VHDL Concurrent Signal Assignments
Input X
000 0
001 1
010 1
011 0
100 1
101 1
110 0
111 0 begin
with Input select
X<= '0' when "000",
'1' when "001",
'1' when "010",
'0' when "011",
'1' when "100",
'1' when "101",
'0' when "110",
'0' when "111";
79
VHDL Concurrent Signal Assignments
Input X
000 0
001 1
010 1
011 0
100 1
101 1
110 0
111 0 begin
with Input select
X<= '1' when "001" | "010" | "100" | "101",
'0' when others;
80
VHDL Structural Design
• Structural Design
• Components
- blocks that already exist and are included into a higher level design
81
VHDL Structural Design
• Component Syntax
component component-name
end component;
82
VHDL Structural Design
• Component Example
entity xor2 is
entity or2 is
83
VHDL Structural Design
• Component Example
- now let's include the pre-existing entities "xor2" & "or2" into our "TOP" design
entity TOP is
port (A,B,C : in STD_LOGIC;
X : out STD_LOGIC);
end entity TOP;
begin
…..
84
VHDL Structural Design
• Signals
Internal "Signal"
Internal "Components"
85
VHDL Structural Design
• Signal Syntax
86
VHDL Structural Design
• Let's put the signal declaration into our Architecture
- now let's include the pre-existing entities "xor2" & "or2" into our "TOP" design
begin
…..
node1
87
VHDL Structural Design
• Component Instantiation
- after the "begin" keyword, we can start adding components and connecting signals
syntax:
NOTE: - "label" is a unique reference designator for that component (U1, INV1, UUT1)
- the signals with in the ( ) of the port map define how signals are connected
to the ports of the instantiated component
88
VHDL Structural Design
• Port Maps
1) Positional
2) Explicit
- signals to be connected to the component are listed in the exact order as the components port order
- signals to be connected to the component are explicitly linked to the port names of the
component using the "=>" notation (Port => Signal, Port => Signal, ….)
ex) U1 : xor2 port map (In1 => A, In2 => B, Out1 => node1);
89
VHDL Structural Design
• Execution
- this is different from traditional program execution (i.e., C/C++) which is executed sequentially
because
90
VHDL Structural Design
• Let's put everything together
begin
U1 : xor2 port map (In1=>A, In2=>B, Out1=>node1);
U2 : or2 port map (In1=>C, In2=>node1, Out1=>X);
U1
node1
91
VHDL Behavioral Design
• Behavioral Design
- when we design at the Behavioral level, we now rely on Synthesis tools to create
the ultimate gate level schematic
- This means we can simulate a lot more functionality that could ever by synthesized
92
VHDL Behavioral Design
• Processes
- the new values in a process (i.e., the LHS) depend on the current and past values
of the other signals
- the new values in a process (i.e., the LHS) do not get their value until the process
terminates
declarations
begin
sequential statements
end process name;
93
VHDL Behavioral Design
• Process Execution
94
VHDL Behavioral Design
• Process Execution
begin
sequential statement;
sequential statement;
sequential statement;
end process name;
95
VHDL Behavioral Design
• Starting and Stopping a Process
- There are two ways to start and stop a process 1) Sensitivity List
2) Wait Statement
• Sensitivity List
- the process will begin executing if there is a change on any of the signals in the list
begin
Q <= D;
end process FLOP;
- each time there is a change on "clock", the process will execute ONCE
96
VHDL Behavioral Design
• Wait Statements
- the process executes the sequences 1-by-1 until hitting the wait statement
(No Start/Stop Control, loops forever) (w/ Start/Stop Control, executes until "wait" then stops)
97
VHDL Behavioral Design
• Wait Statements
- the wait statements can be followed by keywords "for" or "until" to describe the
wait condition
98
VHDL Behavioral Design
• Signals and Processes
- Rules of a Process
3) Only the last signal assignment to a signal in the list has an effect.
So there's no use making multiple assignments to the same signal.
99
VHDL Behavioral Design
• Signals and Processes
begin
A <= '0'; -- we WANT A to be assigned '0'
B <= '0'; -- we WANT B to be assigned '0'
Y <= A+B; -- we WANT Y to be assigned A + B = 0
- we need a "Variable"
100
Variables
• Variables
- Signals in processes are only assigned their value when the process suspends
begin
A <= 2; -- B gets its value from the previous value of A,
B <= A + 1; -- not from the A <= 2 assignment
101
Variables
• Variables
syntax:
variable var-name : var-type := init value
begin
temp := 2;
B <= temp + 1;
102
Variables
• Signal vs. Variable
Signal Variable
103
If-Then Statements
• If / Then Statements
- if, then
- if, then, else
- if, then, elsif, then
- if, then, elsif, then, else
syntax:
104
If-Then Statements
• If / Then Statements
begin
MUX : process (A,B,Sel)
begin
if (Sel = '0') then
Out1 <= A;
elsif (Sel = '1') then
Out1 <= B;
else
Out1 <=A; -- this isn't necessary, just for illustration
end if;
end process MUX;
105
Case Statements
• Case Statements
- better for larger input combinations, If/Then's can get too long
syntax:
case expression is
when choices => seq-statement;
when choices => seq-statement;
:
end case;
- the keyword "others" is available for input combinations not explicitly called out
106
Case Statements
• Case Statements
begin
MUX : process (A,B,Sel)
begin
case (Sel) is
when '0' => Out1 <= A;
when '1' => Out1 <= B;
when others => Out1 <= A; -- this isn't necessary, just for illustration
end case;
end process MUX;
- if you want to combine individual signals to form a vector, you can use
variables and the concatenation operator
107
Conditional Loops
• Conditional Loops
1) Loop
2) While
3) For
• Loops
108
Conditional Loops
• Loops
loop
clock <= '1' after 1ns;
clock <= '0' after 1ns;
end loop;
109
Conditional Loops
• While Loops
110
Conditional Loops
• For Loops
syntax:
seq-statement
seq-statement
end loop;
111
Conditional Loops
• For Loops
valid_state = TRUE;
end if;
end loop;
112
Attributes
• Attributes
- ability to get more information about a signal other than its current value
- previous value
- time since last change
- we put the attribute keyword after the signal name using the apostrophe (')
1) event
2) transaction
3) last_value
4) last_event
113
Attributes
• "event" Attribute
• "transaction" Attribute
114
Attributes
• "last_value" Attribute
• "last_event" Attribute
- good for tracking timing violations (Setup/Hold, signals changing too fast)
begin
if (Data'last_event < 0.5ns) then
too_fast <= TRUE;
else
too_fast <= FALSE;
end if;
115
VHDL : Test Benches
• Test Benches
- Stimulus in a real system is from an external source, not from our design
- We need a method to test our designs that is not part of the design itself
- We call these instantiations "Unit Under Test" (UUT) or "Device Under Test".
116
VHDL : Test Benches
• Test Benches
entity Mux_2to1 is
entity Mux_2to1;
117
VHDL : Test Benches
entity Test_Mux is
end entity Test_Mux; -- the test bench entity has no ports
begin
118
VHDL : Test Benches
:
:
:
In1_TB <= '1'; In2_TB <= '1'; Sel_TB <= '1' wait for 10ns -- end with a wait…
119
VHDL : Test Benches
• Test Bench Reporting
- There are reporting features that allow us to monitor the output of a design
- We can compare the output against "Golden" data and report if there are differences
- This is powerful when we evaluate our designs across power, temp, process…..
• Assert
- if the Boolean expression is FALSE, it will print a string following the "report" keyword
- Severity levels are also reported with possible values {ERROR, WARNING, NOTE, FAILURE}
120
VHDL : Test Benches
• Report
121
Logic Synthesis with VHDL
What is logic synthesis
v Logic synthesis is the process of converting a high-
level description of design into an optimized gate-
level representation
v Logic synthesis uses standard cell library which have
simple cells, such as basic logic gates like and, or, and
nor, or macro cells, such as adder, muxes, memory, and
special flip-flops
v The designer would first understand the architectural
description. Then he/she would consider design
constraints such as timing, area, testability, and power
pp. 2
What is logic synthesis
v Synthesis = translation + optimization + mapping
residue = 16’h0000;
Translate
if ( high_bits == 2’b10) residue =
state_table[index]; else
state_table[index] =16’h0000;
Optimize + Map
HDL Source
Generic Boolean
(GTECH)
Target Technology
pp. 3
Synthesis is Constraint Driven
optimization
speed
pp. 4
Technology Independent
v Design can be transferred to any technology
area
Technology A
Technology B
speed
pp. 5
What is logic synthesis(cont.)
Architectural
Description
High-Level
Description Design
Constraints
Computer-Aided
Logic Synthesis
Standard Cell
Optimized Gate- Library
Level Netlist (technology
dependent)
no Meets
Constraints
Basic Computer-Aided Logic
yes Synthesis Process
Place and Route
pp. 6
Impact of Logic Synthesis
v Limitation on manual design
v For large designs, manual conversion was prone human
error, such as a small gate missed somewhere
v The designer could never be sure that the design constraints
were going to be met until the gate-level implementation is
complete and tested
v A significant portion of the design cycle was dominated by
the time taken to convert a high-level design into gates
v Design reuse was not possible
v Each designer would implement design blocks differently.
For large designs, this could mean that smaller blocks were
optimized but the overall design was not optimal
pp. 7
Impact of Logic Synthesis(cont.)
v Automated Logic synthesis tools addressed these problems as
follows
v High-level design is less prone to human error because
designs are described at a higher level of abstraction
v High-level design is done without significant concern about
design constraints
v Conversion from high-level design to gates is fast
v Logic synthesis tools optimize the design as a whole. This
removes the problem with varied designer styles for the
different blocks in the design and suboptimal designs
v Logic synthesis tools allow technology-independent design
v Design reuse is possible for technology-independent
descriptions.
pp. 8
Logic Synthesis
v Takes place in two stages:
pp. 9
Logic Optimization
v Netlist optimization the critical enabling technology
v Takes a slow or large netlist and transforms it into one
that implements the same function more cheaply
v Typical operations
v Constant propagation
v Common subexpression elimination
v Function factoring
v Time-consuming operation
v Can take hours for large chips
pp. 10
Translating VHDL into Gates
vParts of the language easy to translate
vStructural descriptions with primitives
Already a netlist
vContinuous assignment
Expressions turn into little datapaths
pp. 11
What Can Be Translated
v Structural definitions
v Everything
v Behavioral blocks
v Depends on sensitivity list
v Only when they have reasonable interpretation as
combinational logic, edge, or level-sensitive latches
v Blocks sensitive to both edges of the clock, changes on
unrelated signals, changing sensitivity lists, etc. cannot be
synthesized
v User-defined primitives
v Primitives defined with truth tables
v Some sequential UDPs can’t be translated (not latches or
flip-flops)
pp. 12
What Isn’t Translated
v Initial blocks
v Used to set up initial state or describe finite testbench stimuli
v Don’t have obvious hardware component
v Delays
v May be in the Verilog source, but are simply ignored
v A variety of other obscure language features
v In general, things heavily dependent on discrete-
event simulation semantics
v Certain “disable” statements
v Pure events
pp. 13
Compile: the “Art” of Synthesis
vcompile command is design optimization
vLogic level Optimization
vflatten (off by default ):removes structure
vstructure : minimizes generic logic
vGate level Optimization
vmap : makes design technology dependent
pp. 14
Compile
pp. 15
Compile
pp. 16
Logic Level Optimization
vOperate with Boolean representation of
a circuit
vHas a global effect on the overall
area/speed characteristic of a design
vStrategy
vStructure
vFlatten
vIf both are true, the design is first flattened
and then structured
pp. 17
Gate Level Optimization
vSelect components to meet timing, design
rule & area goals specified for the circuit
vHas a local effect on the area/speed
characteristics of a design
vStrategy
vMapping
Combination mapping
Sequential Mapping
pp. 20
Combinational vs. Sequential Mapping
Combinational Mapping Sequential Mapping
v Mapping rearranges v Optimize the mapping to
components, combining and sequential cells from
re-combining logic into technology library
different components v Analyze combinational
v May use different algorithms surrounding a sequential cell
such as cloning, resizing or to see if it can absorb the
buffering logic attribute with HDL
v Try to meet the design rule v Try to save speed and area
constraints and timing/area by using a more complex
goals sequential cell
pp. 21
Mapping
Combinational mapping Sequential mapping
pp. 22
Design Methodology
pp. 23
Design Flow
v 1. Write a design description in the Verilog language. This
description can be a combination of structural and functional
elements. This description is used with both the Synopsys HDL
Compiler and the Verilog simulator.
v 2. Provide Verilog-language test drivers for the Verilog HDL
simulator. The drivers supply test vectors for simulation and
gather output data.
v 3. Simulate the design by using a Verilog HDL simulator. Verify
that the description is correct.
v 4. Synthesize the HDL description with HDL Compiler. HDL
Compiler performs architectural optimizations, then creates an
internal representation of the design.
pp. 24
Design Flow
v 5. Use Synopsys Design Compiler to produce an optimized
gate-level description in the target ASIC library. You can
optimize the generated circuits to meet the timing & area
constraints wanted.
v 6. Use Synopsys Design Compiler to output a gate-level Verilog
description. This netlist-style description uses ASIC components
as the leaf-level cells of the design. The gate-level description
has the same port and module definitions as the original high-
level Verilog description.
v 7. Use the original Verilog simulation drivers from Step 2
because module and port definitions are preserved.
v 8. Compare the output of the gate-level simulation with the
output of the original Verilog description simulation to verify that
the implementation is correct.
pp. 25
Basic Logic Design with VHDL
• Agenda
Combinational Logic Review
• Combinational logic circuits are memoryless
• No feedback path
• Output can have multiple logical transitions before settling to
correct value
146
Boolean Equations in VHDL
• Boolean equations and truth tables are both valid ways to
define a function (f = ???)
• Use logical operators in signal assignment statements
147
Boolean Equation Example
148
Binary Coding
• How do we represent information with more than two possible
values?
– eg, numbers
– N voltage levels? — No.
• Multiple binary signals (multiple bits)
• (a1, a0): (0, 0), (0, 1), (1, 0), (1, 1)
– This is a binary code
– Each pair of values is a code word
– Uses two signal wires for a1, a0
• Code Word Size
– An n-bit code has 2n code words
– To represent N possible values
• Need at least ⎡log2N⎤ code word bits
• More bits can be useful in some cases
• Example: code for inkjet printer
– black, cyan, magenta, yellow, red, blue
– six values, ⎡log26⎤ = 3
– black: (0, 0, 1), cyan: (0, 1, 0), magenta: (0, 1, 1), yellow: (1, 0, 0), red: (1, 0, 1), blue: (1, 1, 0)
149
One-Hot Codes
• Each code word has exactly one 1 bit
• Traffic light:
– red: (1,0,0), yellow: (0,1,0), green: (0,0,1)
– Three signal wires: red, yellow, green g,y,g
• Each bit of a one-hot code corresponds to an encoded value
– No hardware needed to decode values
150
Binary Codes in VHDL
• Multiple bits represented by a vector
• signal s: std_logic_vector(4 downto 0);
– This is a five-element signal
– s(4), s(3), s(2), s(1), s(0)
• signal a: std_logic_vector(1 to 3);
– This is a three-element signal
– a(1), a(2), a(3)
151
Binary Coding Example
152
Combinational Logic Design with VHDL
• Agenda
1. Decoders/Encoders
2. Multiplexers/Demultiplexers
3. Tri-State Buffers
4. Comparators
5. Adders (Ripple Carry, Carry-Look-Ahead)
6. Subtraction
7. Multiplication
8. Division (brief overview)
Integrated Circuit Scaling
• Integrated Circuit Scales
Example # of Transistors
- we use the terms SSI and MSI. Everything larger is typically just called "VLSI"
154
Decoders
• Decoders
- one and only one output is asserted for a given input combination
Input Output
00 0001
01 0010
10 0100
11 1000
155
Decoder
• Decoder Structure
2n AND gates
n Inverters
156
Decoders
• Decoders with ENABLES
EN = 0, Output = 0
EN =1, Output depends on input code
157
Decoders
• Decoder Example
158
Decoder
• Decoder Example
entity inv is
port (In1 : in STD_LOGIC;
Out1 : out STD_LOGIC);
end entity inv;
159
Decoders
• Decoder Example
entity and2 is
port (In1,In2 : in STD_LOGIC;
Out1 : out STD_LOGIC);
end entity and2;
160
Decoders
• Decoder Example
- Now let's work on the top level design entity called "decoder_2to4"
entity decoder_2to4 is
port (A,B : in STD_LOGIC;
Y0,Y1,Y2,Y3 : out STD_LOGIC);
end entity decoder_2to4;
161
Decoders
• Decoder Example
- Now let's work on the top level design architecture called "decoder_2to4_arch"
component inv
port (In1 : in STD_LOGIC;
Out1 : out STD_LOGIC);
end component;
component and2
port (In1,In2 : in STD_LOGIC;
Out1 : out STD_LOGIC);
end component;
begin
………
162
Decoders
• Decoder Example
- cont….
begin
U1 : inv port map (A, A_n);
U2 : inv port map (B, B_n);
163
Decoder Example
164
Encoders
• Encoder
Input Output
0001 00
0010 01
0100 10
1000 11
165
Encoders
• Encoder
I3 I2 I1 I0 Y1 Y0
0 0 0 1 0 0
0 0 1 0 0 1
0 1 0 0 1 0
1 0 0 0 1 1
Y1 = I3 + I2
Y0 = I3 + I1
166
Encoders
• Encoders in VHDL
entity encoder_8to3_binary is
generic (t_delay : time := 1.0 ns);
port (I : in STD_LOGIC_VECTOR (7 downto 0);
Y : out STD_LOGIC_VECTOR (2 downto 0) );
component or4 port (In1,In2,In3,In4: in STD_LOGIC; Out1: out STD_LOGIC); end component;
begin
U1 : or4 port map (In1 => I(1), In2 => I(3), In3 => I(5), In4 => I(7), Out1 => Y(0) );
U2 : or4 port map (In1 => I(2), In2 => I(3), In3 => I(6), In4 => I(7), Out1 => Y(1) );
U3 : or4 port map (In1 => I(4), In2 => I(5), In3 => I(6), In4 => I(7), Out1 => Y(2) );
167
Encoders
entity encoder_8to3_binary is
• Encoders in VHDL generic (t_delay : time := 1.0 ns);
port (I : in STD_LOGIC_VECTOR (7 downto 0);
- 8-to-3 binary encoder modeled Y : out STD_LOGIC_VECTOR (2 downto 0) );
168
Encoder Example
169
Priority Encoders
• Priority Encoder
- a generic encoder does not know what to do when multiple input bits are asserted
- we decide the list of priority (usually MSB to LSB) where the truth table can be written as follows:
- we can then write expressions for an intermediate stage of priority bits “H” (i.e., Highest Priority):
H3 = I3
H2 = I2∙I3’
H1 = I1∙I2’∙I3’
H0 = I0∙I1’∙I2’∙I3’
Y1 = H3 + H2
Y0 = H3 + H1
170
Priority Encoders
• Priority Encoders in VHDL
171
Priority Encoder Example
172
Seven-Segment Decoder
173
Multiplexer
• Multiplexer
- gates are combinational logic which generate an output depending on the current inputs
- what if we wanted to create a “Digital Switch” to pass along the input signal?
Sel Out
0 A
1 B
174
Multiplexer
• Multiplexer
- we can then use the behavior of an OR gate at the output state (since a 0 input has no effect)
to combine the signals into one output
175
Multiplexer
• Multiplexer
Sel AB Out
0 0x 0
0 1x 1
1 x0 0
1 x1 1
176
Multiplexer
• Multiplexers in VHDL
begin
U1 : inv1 port map (In1 => Sel(0), Out1 => Sel_n(0));
U2 : inv1 port map (In1 => Sel(1), Out1 => Sel_n(1));
U3 : and3 port map (In1 => D(0), In2 => Sel_n(1), In3 => Sel_n(0), Out1 => U3_out);
U4 : and3 port map (In1 => D(1), In2 => Sel_n(1), In3 => Sel(0), Out1 => U4_out);
U5 : and3 port map (In1 => D(2), In2 => Sel(1), In3 => Sel_n(0), Out1 => U5_out);
U6 : and3 port map (In1 => D(3), In2 => Sel(1), In3 => Sel(0), Out1 => U6_out);
U7 : or4 port map (In1 => U3_out, In2 => U4_out, In3 => U5_out, In4 => U6_out, Out1 => Y);
177
Multiplexer
• Multiplexers in VHDL
entity mux_4to1 is
- Structural port (D : in STD_LOGIC_VECTOR (3 downto 0);
Model Sel : in STD_LOGIC_VECTOR (1 downto 0);
w/ EN EN : in STD_LOGIC;
Y : out STD_LOGIC);
end entity mux_4to1;
component inv1 port (In1: in STD_LOGIC; Out1: out STD_LOGIC); end component;
component and4 port (In1,In2,In3,In4: in STD_LOGIC; Out1: out STD_LOGIC); end component;
component or4 port (In1,In2,In3,In4: in STD_LOGIC; Out1: out STD_LOGIC); end component;
begin
U1 : inv1 port map (In1 => Sel(0), Out1 => Sel_n(0));
U2 : inv1 port map (In1 => Sel(1), Out1 => Sel_n(1));
U3 : and4 port map (In1 => D(0), In2 => Sel_n(1), In3 => Sel_n(0), In4 => EN, Out1 => U3_out);
U4 : and4 port map (In1 => D(1), In2 => Sel_n(1), In3 => Sel(0), In4 => EN, Out1 => U4_out);
U5 : and4 port map (In1 => D(2), In2 => Sel(1), In3 => Sel_n(0), In4 => EN, Out1 => U5_out);
U6 : and4 port map (In1 => D(3), In2 => Sel(1), In3 => Sel(0), In4 => EN, Out1 => U6_out);
U7 : or4 port map (In1 => U3_out, In2 => U4_out, In3 => U5_out, In4 => U6_out, Out1 => Y);
end architecture mux_4to1_arch;
178
Multiplexer
• Multiplexers in VHDL entity mux_4to1 is
port (D : in STD_LOGIC_VECTOR (3 downto 0);
Sel : in STD_LOGIC_VECTOR (1 downto 0);
- Behavioral Model w/ EN EN : in STD_LOGIC;
Y : out STD_LOGIC);
end entity mux_4to1;
179
Multiplexer Example
180
Multi-bit Mux Example
181
Demultiplexer
• Demultiplexer
- a single input will be routed to a particular output pin depending on the Select setting
Sel Y0 Y1
0 In 0
1 0 In
182
Demultiplexer
• Demultiplexer
- we can again use the behavior of an AND gate to “pass” or “block” the input signal
183
Demultiplexer
• Demultiplexers in VHDL
component inv1 port (In1: in STD_LOGIC; Out1: out STD_LOGIC); end component;
component and4 port (In1,In2,In3,In4: in STD_LOGIC; Out1: out STD_LOGIC); end component;
begin
U1 : inv1 port map (In1 => Sel(0), Out1 => Sel_n(0));
U2 : inv1 port map (In1 => Sel(1), Out1 => Sel_n(1));
U3 : and4 port map (In1 => D, In2 => Sel_n(1), In3 => Sel_n(0), In4 => EN, Out1 => Y(0));
U4 : and4 port map (In1 => D, In2 => Sel_n(1), In3 => Sel(0), In4 => EN, Out1 => Y(1));
U5 : and4 port map (In1 => D, In2 => Sel(1), In3 => Sel_n(0), In4 => EN, Out1 => Y(2));
U6 : and4 port map (In1 => D, In2 => Sel(1), In3 => Sel(0), In4 => EN, Out1 => Y(3));
184
Demultiplexer
• Demultiplexers in VHDL entity demux_1to4 is
port (D : in STD_LOGIC;
Sel : in STD_LOGIC_VECTOR (1 downto 0);
- Behavioral Model with High Z Outputs
EN : in STD_LOGIC;
Y : out STD_LOGIC_VECTOR (3 downto 0));
end entity demux_1to4;
185
Tri-State Buffers
• Tri-State Buffers
- High Impedance (Z) allows the circuit to be connected to a line with multiple circuits driving/receiving
- This is used for "Multi-Drop" Buses (i.e., many Drivers/Receivers on the same bus)
ex) truth table of Tri-State Buffer ex) truth table of Bus Transceiver
186
Tri-State Buffers
• Tri-State Buffers in VHDL
- 'Z' is a resolved value in the STD_LOGIC data type defined in Package STD_LOGIC
-Z&0=0
-Z&1=1
-Z&L=L
-Z&H=H
TRISTATE: process (In1, ENB)
begin
if (ENB = '1') then
Out1 <= 'Z';
else
Out1 <= In1;
end if;
end process TRISTATE;
187
Comparators
• Comparators
- a circuit that compares digital values (i.e., Equal, Greater Than, Less Than)
AB EQ GT LT
0 0 1 0 0 EQ = (AB)'
0 1 0 0 1 GT = A·B'
1 0 0 1 0 LT = A'·B
1 1 1 0 0
188
Comparators
• Non-Iterative Comparators
- "Iterative" refers to a circuit make up of identical blocks. The first block performs its operation which
produces a result used in the 2nd block and so on.
- Iterative circuits tend to be slower due to the ripple, but take less area
"Equality"
- since each bit in a vector must be equal, the outputs of each bit's compare can be AND'd
189
Comparators
• Non-Iterative Comparators
"Greater Than"
- If it is, we are done and can ignore the rest of the LSB's.
- If it is NOT, but they are equal, we need to check the next MSB bit (n-1)
- to ensure the previous bit was equal, we include it in the next LSB's logic expression:
- 4-bit comparator
GT = (A3·B3') +
(A3B3)' · (A2·B2') +
(A3B3)' · (A2B2)' · (A1·B1') +
(A3B3)' · (A2B2)' · (A1B1)' · (A0·B0')
190
Comparators
• Non-Iterative Comparators
"Less Than"
- since we assume that if the vectors are either EQ, GT, or LT, we can create LT using:
LT = EQ' · GT'
• Iterative Comparators
- we can build an iterative comparator by passing signals between identical modules from MSB to LSB
- EQout is fed into the EQin port of the next LSB module
191
Comparators
• Comparators in VHDL
- Structural Model
entity comparator_4bit is
component xnor2 port (In1,In2: in STD_LOGIC; Out1: out STD_LOGIC); end component;
component or4 port (In1,In2,In3,In4: in STD_LOGIC; Out1: out STD_LOGIC); end component;
component nor2 port (In1,In2: in STD_LOGIC; Out1: out STD_LOGIC); end component;
component and2 port (In1,In2: in STD_LOGIC; Out1: out STD_LOGIC); end component;
component and3 port (In1,In2,In3: in STD_LOGIC; Out1: out STD_LOGIC); end component;
component and4 port (In1,In2,In3,In4: in STD_LOGIC; Out1: out STD_LOGIC); end component;
component inv1 port (In1: in STD_LOGIC; Out1: out STD_LOGIC); end component;
192
Comparators
• Comparators in VHDL
begin
-- "Equal" Circuitry
Cont… XN0 : xnor2 port map (In1(0), In2(0), Bit_Equal(0)); -- 1st level of XNOR tree
XN1 : xnor2 port map (In1(1), In2(1), Bit_Equal(1));
XN2 : xnor2 port map (In1(2), In2(2), Bit_Equal(2));
XN3 : xnor2 port map (In1(3), In2(3), Bit_Equal(3));
AN0 : and4 port map (Bit_Equal(0), Bit_Equal(1), Bit_Equal(2), Bit_Equal(3), Eq); -- 2nd level of "Equal" Tree
AN1 : and4 port map (Bit_Equal(0), Bit_Equal(1), Bit_Equal(2), Bit_Equal(3), Eq_temp);
193
Comparators
• Comparators in VHDL
- Behavioral Model
entity comparator_4bit is
194
Numeric Basics
• Representing and processing numeric data is a common
requirement
– unsigned integers
– signed integers
– fixed-point real numbers
– floating-point real numbers
– complex numbers
195
Unsigned Integers in VHDL
196
Extending/Truncating Unsigned Numbers
197
Increment/Decrement in VHDL
198
Scaling in VHDL
199
Signed Integers in VHDL
200
Resizing Signed Integers
201
Ripple Carry Adder
• Addition – Half Adder
- one bit addition can be accomplished with an XOR gate (modulo sum 2)
0 1 0 1
+0 +0 +1 +1
0 1 1 10
202
Ripple Carry Adder
• Addition – Full Adder
- to create a full adder, we need to include the “Carry In” in the Sum
- you could also use two "Half Adders" to accomplish the same thing
203
Ripple Carry Adder
• Addition – Ripple Carry Adder
- cascading Full Adders together will allow the Cout’s to propagate (or Ripple) through the circuit
204
Ripple Carry Adder
• Addition – Ripple Carry Adder
Sum = A B Cin
Cout = Cin∙A + A∙B + Cin∙B
- tFull-Adder will be the longest combinational logic delay path in the adder
205
Ripple Carry Adder
• Addition – Ripple Carry Adder
tRCA = n·tFull-Adder
- different topologies within the full-adder to reduce delay (Δt) will have a n·Δt effect
206
Carry Look Ahead Adders
• Addition – Carry Look Ahead Adder
tRCA = n·tFull-Adder
- different topologies within the full-adder to reduce delay (Δt) will have a n·Δt effect
- the linear increase in delay comes from waiting for the Carry to Ripple through
207
Carry Look Ahead Adders
• Addition – Carry Look Ahead Adder
- this circuit calculates the carry for all Full-Adders at the same time
Generate "g", an adder (i) generates a carry out (Ci+1)under input conditions Ai and Bi
independent of Ai-1, Bi-1, or Carry In (Ci)
Ai Bi Ci+1
0 0 0
0 1 0 we can say that: gi = Ai·Bi
1 0 0
1 1 1 remember, g does NOT consider carry in (Ci)
208
Carry Look Ahead Adders
• Addition – Carry Look Ahead Adder
Propagate "p", an adder (i) will propagate (or pass through) a carry in (Ci) depending on input
conditions Ai and Bi,:
Ci Ai Bi Ci+1
0 0 0 0
0 0 1 0 pi is defined when there is a carry in,
0 1 0 0 so we ignore the row entries where Ci=0
0 1 1 1
1 0 0 0 if we only look at the Ci=1 rows
1 0 1 1 we can say that:
1 1 0 1 pi = (Ai+Bi)·Ci
1 1 1 1
209
Carry Look Ahead Adders
• Addition – Carry Look Ahead Adder
- said another way, Adder(i) will "Generate" a Carry Out (Ci+1) if:
gi = Ai·Bi
pi = (Ai+Bi)·Ci
- a full expression for the Carry Out (Ci+1) in terms of p and g is given by:
Ci+1 = gi+pi·Ci
- this is good, but we still generate Carry's dependant on previous stages (i-1) of the iterative circuit
210
Carry Look Ahead Adders
• Addition – Carry Look Ahead Adder
C2 = g1+p1·C1
C2 = g1+p1·(g0+p0·C0)
C2 = g1+p1·g0+p1·p0·C0 (2-Level Product-of-Sums)
C3 = g2+p2·C2
C3 = g2+p2·(g1+p1·g0+p1·p0·C0)
C3 = g2+p2·g1+p2·p1·g0+p2·p1·p0·C0 (2-Level Product-of-Sums)
C4 = g3+p3·C3
C4 = g3+p3·(g2+p2·g1+p2·p1·g0+p2·p1·p0·C0)
C4 = g3+p3·g2+p3·p2·g1+p3·p2·p1·g0+p3·p2·p1·p0·C0 (2-Level Product-of-Sums)
- this gives us logic expressions that can generate a next stage carry based upon ONLY
the inputs to the adder and the original carry in (C0)
211
Carry Look Ahead Adders
• Addition – Carry Look Ahead Adder
1) g and p logic
2) product terms in the Ci equations
3) sum terms in the Ci equations
212
Carry Look Ahead Adders
• Addition – Carry Look Ahead Adder
- the 5 levels of logic are fixed no matter how many bits the adder is (really?)
- In reality, the most significant Carry equation will have i+1 inputs into its largest sum/product term
- this means that Fan-In becomes a problem since real gates tend to not have more than 4-6 inputs
- When the number of inputs gets larger than the Fan-In, the logic needs to be broken into another level
- In the worst case, the logic Fan-In would be 2. Even in this case, the delay associated with the
Carry Look Ahead logic would be proportional to log2(n)
- Area and Power are also concerns with CLA's. Typically CLA's are used in computationally intense
applications where performance outweighs Power and Area.
213
Carry Look Ahead Adders
• Adders in VHDL
- these are still resolved types (STD_LOGIC), but the equality and arithmetic operations are slightly
different depending on whether you are using Signed vs. Unsigned
• Considerations
- when adding signed and unsigned numbers, the type of the result will dictate how the operands are
handled/converted
- if assigning to an n-bit, SIGNED result, an n-1 UNSIGNED operand will automatically be converted
to signed by extending its vector length by 1 and filling it with a sign bit (0)
214
Carry Look Ahead Adders
• Adders in VHDL
S <= ('0' & A) + ('0' & B); -- manually increasing size of A and B to include Carry.
Carry will be kept in S(9)
215
Subtraction
• Half Subtractor
(A-B) A B Bout D
0 0 0 0
0 1 1 1 D =AB
1 0 0 1 Bout = A'·B
1 1 0 0
216
Subtraction
• Full Subtractor
- to create a full Subtractor, we need to include the “Borrow In” in the Difference
217
Subtraction
• Subtraction
- Can we manipulate the subtraction logic so that Full Adders can be used as Full Subtractors?
Addition Subtraction
S = A B Cin D = A B Bin
Cout = A∙B + A∙Cin + B∙Cin Bout = A'∙B + A'∙Bin + B∙Bin
Bout' = (A∙A∙B')+(A∙B'∙Bin')+(A∙B'∙B')+(B'∙B'∙Bin')+(A∙A∙Bin')+(A∙Bin'∙Bin')+(A∙B'∙Bin')+(B'∙Bin'∙Bin')
218
Subtraction
• Subtraction
Addition Subtraction
- But this requires the Subtrahend and Bin be inverted, how does this effect the Sum/Difference Logic?
Addition Subtraction
S = A B Cin D = A B Bin
- remember that both inputs of a 2-input XOR can be inverted without changing the logic
function which gives us:
219
Subtraction
• Subtraction
Addition Subtraction
- This means we can use "Full Adders" for subtraction as long as:
- In a ripple carry subtractor, intermediate Bout's are fed into Bin's, which is a double inversion
- We can now invert by the first Bin and the last Bout by inserting a '1' into the first Bin of the chain
220
Subtraction
• Subtraction
- this gives us the minimal logic for a "Ripple Carry Subtractor" using "Full Adders"
X-Y
221
Adders/Subtractors in VHDL
222
Signed Addition in VHDL
223
Multipliers
• Multipliers
A*B P
0 0 0
0 1 0 we can say that: P = A·B
1 0 0
1 1 1
- for multi-bit multiplication, we can mimic the algorithm that we use when doing multiplication by hand
224
Multipliers
• "Shift and Add" Multipliers
11 1011 - multiplicand
x 13 x 1101 - multiplier
33 1011
11 0000 - these are the individual multiplicands
1011
+ +1011
1 4 3 10001111 - the final product is the sum of all multiplicands
- this is simple and straight forward. BUT, the addition of the individual multiplicand products requires
as many as n-inputs.
- we would really like to re-use our Full Adder circuits, which only have 3 inputs.
225
Multipliers
• "Shift and Add" Multipliers
- to keep the algorithm consistent, we use "0000" as the first Partial Product
226
Multipliers
• "Shift and Add" Multipliers
227
Multipliers
• "Shift and Add" Multipliers
- Graphical View of interconnect for an 8x8 multiplier. Note the Full Adders
228
Multipliers
• "Sequential" Multipliers
- the main speed limitation of the Combinational "Shift and Add" multiplier is the delay through the
adder chain.
- in the worst case, the number of delay paths through the adders would be [n + 2(n-2)]
- we can decrease this delay by using a register to accumulate the incremental additions as they
take place.
- we can run the 0th carry from the first row of adders into adder for the 2nd row
- a final stage of adders is needed to recombine the carrys. But this reduces the delay to [n+(n-2)]
229
Multipliers
• "Carry Save" Multipliers
230
Unsigned Multiplication in VHDL
231
Signed Multipliers
• Multipliers
- we leaned the "Shift and Add" algorithm for constructing a combinational multiplier
• Convert to Positive
- one of the simplest ways is to first convert any negative numbers to positive, then use the unsigned
multiplier
pos x pos = pos Remember 0=pos and 1=neg is 2's comp so this is an XOR
pos x neg = neg
neg x pos = neg
neg x neg = pos
232
Signed Multipliers
• 2's Comp Multiplier
- we can use this same technique for 2's comp remembering that
- be on same-sized vectors
- the carry is ignored
- we can make partial products the same size as shifted multiplicands by doing a "2's comp sign extend"
- since the MSB has a negative weight, we NEGATE the shifted multiplicand for that bit prior to the
last addition.
233
Signed Multipliers
• 2's Comp Shift and Add Multipliers
- to keep the algorithm consistent, we use "0000" as the first Partial Product
234
Division
• Division - "Repeated Subtraction"
- a simple algorithm to divide is to count the number of times you can subtract the divisor from the
dividend
- the number of times it can be subtracted without going negative is the "Quotient"
- if the subtracted value results in a zero/negative number, whatever was left prior to the
subtraction is the "Remainder"
235
Division
• Division - "Shift and Subtract"
- Division is similar to multiplication, but instead of "Shift and Add", we "Shift and Subtract"
236
Fixed-Point in VHDL
• Many applications use non-integers
– especially signal-processing apps
– Fixed-point numbers allow for fractional parts
– represented as integers that are implicitly scaled by a power of 2
• Choosing Range and Precision
– Choice depends on application
– Need to understand the numerical behavior of computations performed
• some operations can magnify quantization
– In DSP: fixed-point range affects dynamic range
– In DSP: precision affects signal-to-noise ratio
• Use numeric_bit with implied scaling
• Use proposed fixed_pkg package
– Currently being standardized by IEEE
– Types ufixed and sfixed yp
– Arithmetic operations, resizing, conversion
237
Floating-Point in VHDL
• Similar to scientific notation for decimal
– e.g., 6.02214199×1023, 1.60217653×10–19
• Allow for larger range, with same
– relative precision throughout the range
238
Sequential Logic Design with VHDL
• Agenda
240
Example
241
Types of Memory Elements
• Flip-Flop
– Latch
– Registers
• Others
– Register Files
– Cache
– Flash memory
– ROM
– RAM
242
D-FF vs. D-Latch
• FF is edge sensitive (can be either positive or negative edge)
– At trigger edge of clock, input transferred to output
• Latch is level sensitive (can be either active-high or active-low)
– When clock is active, input passes to output (transparent)
– When clock is not active, output stays unchanged
243
Important Timing Parameters (1)
244
Important Timing Parameters (2)
245
System Timing: Minimum Period
246
System Timing: Minimum Delay
247
FF Based, Edge Trigger Clocking
• Td = delay of combinational logic
• Tcycle = cycle time of clock
– Duty cycle does not matter
248
Latch Based, Single Phase Clocking
• Aka. Pulse Mode clocking
• Tcycle = cycle time of clock; Tw = pulse width of clock
249
Comparison
• Flip-Flop Based
− Larger in area
− Larger clocking overhead (Tsetup, Tcq)
+ Design more robust
• Only have to worry about Tdmax
• Tdmin usually small, can be easily fixed by buffer
+ Pulse width does not matter
• Latch Based Single Phase
+ Smaller area
+ Smaller clocking overhead ( only Tdq)
− Worry about both Tdmax and Tdmin
– Pulse width DOES matter
(unfortunately, pulse width can vary on chip)
250
Latches
• Latches
– we’ve learned all of the VHDL syntax necessary to describe sequential
storage elements
• SR Latch
- To understand the SR Latch, we must remember the truth table for a NOR Gate AB F
00 1
01 0
10 0
11 0
251
Latches
• SR Latch
- when S=0 & R=0, it puts this circuit into a Bi-stable feedback mode where the output is either:
1 0 0 1
0 1
0 1 0
0
AB F AB F
00 1 (U2) 00 1 (U1)
01 0 01 0 (U2)
10 0 (U1) 10 0
11 0 11 0
252
Latches
• SR Latch
- we can force a known state using S & R:
0 1 1 0
1 0
1 0 1
0
AB F AB F
00 1 (U1) 00 1 (U2)
01 0 01 0 (U1)
10 0 (U2) 10 0
11 0 (U2) 11 0 (U1)
253
Latches
• SR Latch
- we can write a Truth Table for an SR Latch as follows
SR Q Qn .
0 0 Last Q Last Qn - Hold
0 1 0 1 - Reset
1 0 1 0 - Set
1 1 0 0 - Don’t Use
- S=1 & R=1 forces a 0 on both outputs. However, when the latch comes out of this state it is
metastable. This means the final state is unknown.
254
Latches
• S’R’ Latch
- we can also use NAND gates to form an inverted SR Latch
S’ R’ Q Qn .
0 0 1 1 - Don’t Use
0 1 1 0 - Set
1 0 0 1 - Reset
1 1 Last Q Last Qn - Hold
255
Latches
• SR Latch w/ Enable
- we then can add an enable line using NAND gates
AB F
00 1 - a 0 on any input forces a 1 on the output
01 1 - when C=0, the two EN NAND Gate outputs are 1, which forces “Last Q/Qn”
10 1 - when C=1, S & R are passed through INVERTED
11 0
256
Latches
• SR Latch w/ Enable
- the truth table then becomes
C SR Q Qn .
1 0 0 Last Q Last Qn - Hold
1 0 1 0 1 - Reset
1 1 0 1 0 - Set
1 1 1 1 1 - Don’t Use
0 x x Last Q Last Qn - Hold
257
Latches
• D Latch
- a modification to the SR Latch where R = S’ creates a D-latch
CD Q Qn .
1 0 0 1 - track
1 1 1 0 - track
0 x Last Q Last Qn - Hold
258
Latches
• VHDL of a D Latch
259
Flip Flops
• D-Flip-Flops
- we can combine D-latches to get an edge triggered storage device (or flop)
- the first D-latch is called the “Master”, the second D-latch the “Slave”
Master Slave
CLK=0, Q<=D “Open” CLK=0, Q<=Q “Close”
CLK=1, Q<=Q “Closed” CLK=1, Q<=D “Open”
- on a rising edge of clock, D is “latched” and held on Q until the next rising edge
260
Flip Flops
• VHDL of a D-Flip-Flop
261
Registers
• Store a multi-bit encoded value
– One D-flipflop per bit
– Stores a new value on each clock cycle
262
Register with Enable
• Storage controlled by a clock-enable
– stores only when CE = 1 on a rising edge of the clock
– CE is a synchronous control input
• One flipflop per bit
– clk and CE wired in common
263
Example: Accumulator
• Sum a sequence of signed numbers
– A new number arrives when data_en = 1
– Clear sum to 0 on synch reset
264
Flipflop and Register Variations
265
Shift Registers
• Performs shift operation on stored data
– Arithmetic scaling
– Serial transfer of data
• Example: Sequential Multiplier
– 16×16 multiply over 16 clock cycles, using one adder
– Shift register for multiplier bits
– Shift register for lsb’s of accumulated product
266
Counters
• Counters
- special name of any clocked sequential circuit whose state diagram is a circle
- there are many types of counters, each suited for particular applications
267
Counters
• Binary Counter
- state machine that produces a straight binary count
- the speed will be limited by the Setup/Hold and Combinational Delay of "F"
268
Counters
• Toggle Flop
- a D-Flip-Flop can product a "Divide-by-2" effect by feeding back Qn to D
269
Counters
• Ripple Counter
- Cascaded Toggle Flops can
be used to form rippled counter
270
Counters
• Synchronous Counter with ENABLE
- an enable can be included in a "Synchronous" binary counter using Toggle Flops
- the enabled is implemented by AND'ing the Q output prior to the next toggle flop
- this gives us the "ripple" effect, but also gives the ability to run synchronously
- a little faster, but still less gates than a straight binary circuit
271
Counters
• Shift Register
- a chain of D-Flip-Flops that
pass data to one another
272
Counters
• Ring Counter
- feeding the output of a
shift register back to the
input creates a "ring counter"
273
Counters
• Johnson Counter
- feeding the inverted output of a
shift register back to the
input creates a "Johnson Counter"
274
Counters
• Linear Feedback Shift Register (LFSR) Counter
- all of the counters based off of shift registers give far less states than the 2n counts that are possible
- for each size of shift register, a feedback equation is given which is the sum modulo 2 of a certain
set of output bits
- this type of counter can produce 2n-1 counts, nearly the maximum possible
275
Counters
• Linear Feedback Shift Register (LFSR) Counter
- the feedback equations are listed in Table 8.26 of the textbook
- It is defined that bits always shift from Xn-1 to X0 (or Q0 to Qn-1) as we defined the shift
register previously
- they each use XOR gates (sum modulo 2) of particular bits in the register chain
ex)
n Feedback Equation
2 X2 = X1 X0
3 X3 = X1 X0
4 X4 = X1 X0
5 X5 = X2 X0
6 X6 = X1 X0
7 X7 = X3 X0
8 X8 = X4 X3 X2 X0
: :
: :
276
Counters
• Linear Feedback Shift Register (LFSR) Counter
ex) 4-flip-flop LFSR Counter
Feedback Equation = X1 X0 (or Q2 Q3 as we defined it)
# Q(0:3) Sin
0 1000 0
1 0100 0
2 0010 1
3 1001 1
4 1100 0
5 0110 1
6 1011 0
7 0101 1
8 1010 1
9 1101 1
10 1110 1
11 1111 0
12 0111 0
13 0011 0
14 0001 1 - this is 2n-1 unique counts
repeat 1000
277
Counters
• Counters in VHDL
- strong type casting in VHDL can make modeling counters difficult (at first glance)
- the reason for this is that the STANDARD and STD_LOGIC Packages do not define
"+", "-", or inequality operators for BIT_VECTOR or STD_LOGIC_VECTOR types
278
Counters
• Counters in VHDL
- there are a couple ways that we get around this
279
Counters
• Counters in VHDL using STD_LOGIC_UNSIGNED
entity counter is
Port ( Clock : in STD_LOGIC;
Reset : in STD_LOGIC;
Direction : in STD_LOGIC;
Count_Out : out STD_LOGIC_VECTOR (3 downto 0));
end counter;
280
Counters
• Counters in VHDL using STD_LOGIC_UNSIGNED
architecture counter_arch of counter is
begin
process (Clock, Reset)
begin
if (Reset = '0') then
count_temp <= "0000";
elsif (Clock='1' and Clock'event) then
if (Direction='0') then
count_temp <= count_temp + '1'; -- count_temp can be used on both LHS and RHS
else
count_temp <= count_temp - '1';
end if;
end if;
end process;
end counter_arch;
281
Counters
• Counters in VHDL
2) Use integers for the counter and then convert back to STD_LOGIC_VECTOR
- SIZE is the number of bits in the vector to convert to, given as an integer
282
Counters
• Counters in VHDL using STD_LOGIC_ARITH
entity counter is
Port ( Clock : in STD_LOGIC;
Reset : in STD_LOGIC;
Direction : in STD_LOGIC;
Count_Out : out STD_LOGIC_VECTOR (3 downto 0));
end counter;
283
Counters
• Counters in VHDL using STD_LOGIC_ARITH
architecture counter_arch of counter is
signal count_temp : integer range 0 to 15; -- Notice internal integer specified with Range
begin
process (Clock, Reset)
begin
if (Reset = '0') then
count_temp <= 0; -- integer assignment doesn't requires quotes
elsif (Clock='1' and Clock'event) then
if (count_temp = 15) then
count_temp <= 0; -- we manually check for overflow
else
count_temp <= count_temp + 1;
end if;
end if;
end process;
Count_Out <= conv_std_logic_vector (count_temp, 4); -- convert integer into a 4-bit STD_LOGIC_VECTOR
end counter_arch;
284
Counters
• Counters in VHDL
3) Use UNSIGNED data types #'s
- STD_LOGIC_ARITH also defines "+", "-", and equality for UNSIGNED types
- the equality operators assume it is unsigned (as opposed to 2's comp SIGNED)
285
Counters
• Ring Counters in VHDL
- to mimic the shift register behavior, we need access to the signal value before and after clock'event
architecture ….
begin
Q0 <= Q3;
Q1 <= Q0;
Q2 <= Q1;
Q3 <= Q2;
end architecture…
286
Counters
• Ring Counters in VHDL
- since a process doesn't assign the signal values until it suspends, we can use this to model the
"before and after" behavior of a clock event.
- notice that the signals DO NOT appear in the sensitivity list. If they did the process would
continually execute and not be synthesized as a flip-flop structure
287
Counters
• Johnson Counters in VHDL
288
Counters
• Linear Feedback Shift Register Counters in VHDL
289
Terminal Count and Divide by k
• TC is '1' for one cycle in every 2n cycles
– frequency = clock frequency / 2n
– Called a clock divider
• Decode k–1 as terminal count and reset counter register
– Counter increments modulo k
• Example: decade counter
– Terminal count (TC) = 9
• Decade Counter in VHDL
290
Loadable Counter in VHDL
• Load a starting value, then decrement
– Terminal count = 0
– Useful for interval timer
291
Reloading Counter in VHDL
292
State Machines
What is FSM?
• A model of computation consisting of
– a set of states, (limited number)
– a start state,
– input symbols,
– a transition function that maps input symbols and current states to a next
state.
294
Counters
• Multiple Processes
- we can now use State Machines to control the start/stop/load/reset of counters
- each are independent processes that interact with each other through signals
3) when it reaches the certain value, disable the counter and continue to the next state
- since the counter runs off of a clock, we know how long it will count between the start and stop
295
State Machines
• State Machines
- there is a basic structure for a Clocked, Synchronous State Machine
- if we keep this structure in mind while designing digital machines in VHDL, then it is a very
straight forward task
- Each of the parts of the State Machine are modeled with individual processes
- let’s start by reviewing the design of a state machine using a manual method
296
Elements of FSM
• Memory Elements (ME)
– Memorize Current States (CS)
– Usually consist of FF or latch
– N-bit FF have 2n possible states
• Next-state Logic (NL)
– Combinational Logic
– Produce next state
• Based on current state (CS) and input (X)
• Output Logic (OL)
– Combinational Logic
– Produce outputs (Z)
• Based on current state, or
• Based on current state and input
297
Finite State Machine
• Used control the circuit core
• Partition FSM and non-FSM part
298
Finite State Machines
• Synchronous (i.e. clocked) finite state machines (FSMs) have
widespread application in digital systems, e.g. as datapath
controllers in computational units and processors.
Synchronous FSMs are characterized by a finite number of
states and by clock-driven state transitions.
• Mealy Machine: The next state and the outputs depend on the
present state and the inputs.
• Moore Machine: The next state depends on the present state
and the inputs, but the output depends on only the present
state.
299
State Machines
• State Machines
“Mealy Outputs” – outputs depend on the Current_State and the Inputs
300
State Machines
• State Machines
“Moore Outputs” – outputs depend on the Current_State only
301
State Machines
• State Machines
- the steps in a state machine design are:
302
State Machines
• State Machine Example “Sequence Detector”
1) Design a machine by hand that takes in a serial bit stream and looks for the pattern “1011”.
When the pattern is found, a signal called “Found” is asserted
2) State Diagram
303
State Machines
• State Machine Example “Sequence Detector”
3) State/Output Table
S0 0 S0 0
1 S1 0
S1 0 S2 0
1 S0 0
S2 0 S0 0
1 S3 0
S3 0 S0 0
1 S0 1
304
State Machines
• State Machine Example “Sequence Detector”
4) State Variable Assignment – let’s use binary
0 0 0 0 0 0
1 0 1 0
0 1 0 1 0 0
1 0 0 0
1 0 0 0 0 0
1 1 1 0
1 1 0 0 0 0
1 0 0 1
305
State Machines
• State Machine Example “Sequence Detector”
Q1 Q0 Q1
6) Construct Next State Logic “F” In
00 01 11 10
0 2 6 4
0 0 1 0 0
1 3 7 5
Q1* = Q1’∙Q0∙In’ + Q1∙Q0’∙In
In 1 0 0 0 1
Q0
Q1 Q0 Q1
In
00 01 11 10
0 2 6 4
0 0 0 0 0
1 3 7 5
Q0* = Q0’∙In
In 1 1 0 0 1
Q0
306
State Machines
• State Machine Example “Sequence Detector”
7) Construct Output Logic “G”
Q1 Q0 Q1
In
00 01 11 10
0 2 6 4
Found = Q1∙Q0∙In 0 0 0 0 0
1 3 7 5
In 1 0 0 1 0
Q0
8) Logic Diagram
307
State Machines in VHDL
• State Memory
- we use a process that updates the “Current_State” with the “Next_State”
308
State Machines in VHDL
• State Memory using “User-Enumerated Data Types"
- we always want to use descriptive names for our states
309
State Machines in VHDL
• State Memory with “Synchronous RESET”
end if;
end process;
- this design will only observe RESET on the positive edge of clock (i.e., synchronous)
310
State Machines in VHDL
• State Memory with “Asynchronous RESET”
end if;
end process;
- this design is sensitive to both RESET and the positive edge of clock (i.e., asynchronous)
311
State Machines in VHDL
• Next State Logic “F”
- we use another process to construct “F”
312
State Machines in VHDL
• Next State Logic “F”
- the process will be combinational logic
end case;
end process;
313
State Machines in VHDL
• Output Logic “G”
- we use another process to construct “G”
- the expressions in the sensitivity list dictate Mealy/Moore type outputs
- for now, let’s use combinational logic for G (we’ll go sequential later)
314
State Machines in VHDL
• Output Logic “G”
- Mealy type outputs
end case;
end process;
315
State Machines in VHDL
• Output Logic “G”
- Moore type outputs
end case;
end process;
316
State Machines in VHDL
• Example
- Let’s design a 2-bit Up/Down Gray Code Counter using User-Enumerated State Encoding
- In=0, Count Up
- In=1, Count Down
- this will be a Moore Type Machine
- no Reset
317
State Machines in VHDL
• Example
- let’s collect our thoughts using a State/Output Table
CNT0 0 CNT1 00
1 CNT3
CNT1 0 CNT2 01
1 CNT0
CNT2 0 CNT3 11
1 CNT1
CNT3 0 CNT0 10
1 CNT2
318
State Machines in VHDL
• Example
architecture CNT_arch of CNT is
begin
STATE_MEMORY : process (CLK)
begin
if (CLK’event and CLK='1') then
Current_State <= Next_State;
end if;
end process;
end architecture;
319
State Machines in VHDL
• Example
- in the lab, we may want to observe the states on the LEDs
- in this case we want to explicitly encode the STATE variables
320
State Encoding
• State Variable Encoding
- we can decide how we encode our state variables
- there are advantages/disadvantages to different techniques
• Binary Encoding
- straight encoding of states
S0 = “00”
S1 = “01”
S2 = “10”
S3 = “11”
- Drawbacks: - multiple bits switch at the same time = Increased Noise & Power
- the Next State Logic “F” is multi-level = Increased Power and Reduced Speed
321
State Encoding
• Gray-Code Encoding
- encoding using a gray code where only one bits switches at a time
S0 = “00”
S1 = “01”
S2 = “11”
S3 = “10”
- this gives low Power and Noise due to only one bit switching
- Drawbacks: - the Next State Logic “F” is multi-level = Increased Power and Reduced Speed
322
State Encoding
• One-Hot Encoding
- encoding one flip-flop for each state
S0 = “0001”
S1 = “0010”
S2 = “0100”
S3 = “1000”
323
State Encoding
• State Encoding Trade-Offs
- We typically trade off Speed, Area, and Power
One-Hot
speed
area
power
Binary Gray
324
Mealy Finite State Machine
• A serially-transmitted BCD (8421 code) word is to be
converted into an Excess-3 code. An Excess-3 code word
is obtained by adding 3 to the decimal value and taking
the binary equivalent. Excess-3 code is self-complementing
[Wakerly, p. 80], i.e. the 9's complement of a code word is
obtained by complementing the bits of the word.
325
Mealy Finite State Machine
• The serial code converter is described by the state transition
graph of a Mealy FSM below
• The vertices of the state transition graph of a Mealy machine
are labeled with the states.
• The branches are labeled with (1) the input that causes a
transition to the indicated next state, and (2) with the output
that is asserted in the present state for that input.
• The state transition is synchronized to a clock.
• The state table summarizes the machine's behavior in tabular
format.
326
Design of Mealy Finite State
Machine
327
Design of Mealy Finite State
Machine
328
Design of Mealy Finite State
Machine
329
Design of Mealy Finite State
Machine
330
Example: Design of A Serial Line Code Converter
331
Example: Design of A Serial Line Code Converter
332
Example: Design of A Serial Line Code Converter
333
Example: Design of A Serial Line Code Converter
334
Example: Design of A Serial Line Code Converter
335
Example: Design of A Serial Line Code Converter
336
Example: Design of A Serial Line Code Converter
337
Example: Design of A Serial Line Code Converter
338
Example: Design of A Serial Line Code Converter
339
Pipelined Outputs
• Pipelined Outputs
- Having combinational logic drive outputs can lead to:
- Both reduce the speed at which the system clock can be ran
- A good design practice is to pipeline the outputs (i.e., use DFF’s as the output driver)
340
Pipelined Outputs
• Pipelined Outputs
- This gives a smaller Data Uncertainty window on the output
- The only consideration is that the output is not present until one clock cycle later
341
Pipelined Outputs
• Pipelined Outputs
- we use a 4th process for this stage of the State Machine
342
Asynchronous Inputs
• Asynchronous Inputs
- Real world inputs are not phase-locked to the clock
- this means an input can change within the Setup/Hold window of the clock
343
Asynchronous Inputs
• Asynchronous Inputs
- We use D-Flip-Flops to take in the input
- with one D-Flip-Flop, the input can still occur within the Setup/Hold window
- the output of the first DFF may be metastable for a moment of time (trecovery)
- a second DFF is used to latch in the metastable input after it has had time to settle
- the output of the second flip-flop is now stable and synchronized as long as:
- where tcomb is the delay of any combinational logic in the input path
344
Comparison of Binary and Onehot Style
• Binary-encoded FSM
– fewer flip-flops for state register
– = log2(state number)
• Onehot-encoded FSM
– more flip-flops for state register
– = state number
346
A Simple Design Example:
Level-to-Pulse Converter
347
A Simple Design Example:
Level-to-Pulse Converter
348
A Simple Design Example:
Level-to-Pulse Converter
349
A Simple Design Example:
Level-to-Pulse Converter
350
A Simple Design Example:
Level-to-Pulse Converter
351
A Simple Design Example:
Level-to-Pulse Converter
352
A Simple Design Example:
Level-to-Pulse Converter
353
Datapaths and Control
• Digital systems perform sequences of operations on encoded
data
• Digital hardware systems = data-path + control
• Datapath: registers, counters, combinational functional units
(e.g., ALU), communication (e.g., busses)
– Combinational circuits for operations
– Registers for storing intermediate results
• Control section: control sequencing (FSM generating
sequences of control signals that instructs datapath what to do
next)
– Generates control signals
• Selecting operations to perform
• Enabling registers at the right times
– Uses status signals from datapath
354
Review of FSM Design
• FSM Design
– Partition FSM and non-FSM logic
– Partition combinational part and sequential part
– Use parameter to define names of the state vector
– Assign a default (reset) state
355
Homework
• Design a traffic signal controller at crossroads
• Other example:
– Automatic Vending Machine
– Automatic Teller Machine
356
Project Example:
DataPath - Digital combinational lock
(Verilog)
Digital combinational lock
• Door combination lock:
– punch in 3 values in sequence and the door opens; if there is an error
the lock must be reset; once the door opens the lock must be reset
– inputs: sequence of input values, reset
– outputs: door open/close
– memory: must remember combination or always have it available
– open questions: how do you set the internal combination?
• stored in registers (how loaded?)
• hardwired via switches set by user
358
Digital combinational lock
Implementation in software
359
Determining details of the specification
• How many bits per input value?
• How many values in sequence?
• How do we know a new input value is entered?
• What are the states and state transitions of the system?
360
Digital combination lock state diagram
• States: 5 states
– represent point in execution of machine
– each state has outputs
• Transitions: 6 from state to state, 5 self transitions, 1 global
– changes of state occur when clock says its ok
– based on value of inputs
• Inputs: reset, new, results of comparisons
• Output: open/closed
361
Digital combination lock
(state encoding)
• Verilog description including state encoding
module string (clk, value, new, rst, open); always @(posedge clk) begin
input clk, new; if rst state = ‘S1;
input [3:0] value; else
output open; case (state)
‘S1: if ((value== C1) & new) state = ‘S2
reg state[2:0]; else state = ‘ERR;
‘define S1 = [0,0,0]; ‘S2: if ((value== C2) & new) state = ‘S3
‘define S2 = [0,0,1]; else state = ‘ERR;
‘define S3 = [0,1,0]; ‘S3: if ((value== C3) & new) state = ‘OPEN
‘define OPEN = [0,1,1]; else state = ‘ERR;
‘define ERR = [1,0,0]; ‘OPEN: state = ‘OPEN;
‘ERR: state = ‘ERR;
‘define C1 = [1,1,0,1]; default: begin
‘define C2 = [0,1,1,1]; $display (“invalid state reached”);
‘define C3 = [0,1,0,0]; state = 3’bxxx;
end
assign open = (state == ‘OPEN); endcase
end
endmodule
362
Data-path and control structure
363
State table for combination lock
• Finite-state machine
– refine state diagram to take internal structure into account
– state table ready for encoding
next
reset new equal state state mux open/closed
1 – – – S1 C1 closed
0 0 – S1 S1 C1 closed
0 1 0 S1 ERR C1 closed
0 1 1 S1 S2 C1 closed
0 0 – S2 S2 C2 closed
0 1 0 S2 ERR C2 closed
0 1 1 S2 S3 C2 closed
0 0 – S3 S3 C3 closed
0 1 0 S3 ERR C3 closed
0 1 1 S3 OPEN C3 closed
0 – – OPEN OPEN – open
364
Encodings for combination lock
• Encode state table
– state can be: S1, S2, S3, OPEN, or ERR
• needs at least 3 bits to encode: 000, 001, 010, 011, 100
• and as many as 5: 00001, 00010, 00100, 01000, 10000
• choose 4 bits: 0001, 0010, 0100, 1000, 0000
– output mux can be: C1, C2, or C3
• needs 2 to 3 bits to encode
• choose 3 bits: 001, 010, 100
– output open/closed can be: open or closed
• needs 1 or 2 bits to encode
• choose 1 bit: 1, 0
365
Data-path implementation for combination lock
• Multiplexer
– easy to implement as combinational logic when few inputs
– logic can easily get too big for most PLDs
0 i 3
output mux can be: C1, C2, or C3 Value[i] C1[i] C2[i] C3[i]
3 Mux control bits: 001, 010, 100 mux
control
C1 C2 C3
4 4 4 mux
control
multiplexer
4
value comparator
4 equal
equal 366
Data-path implementation (cont’d)
• Tri-state logic
– utilize a third output state: “no connection” or “float”
– connect outputs together as long as only one is “enabled”
– open-collector gates can
0 i 3
only output 0, not 1
• can be used to implement Value[i] C1[i] C2[i] C3[i]
logical AND with only wires
mux
control
+ oc
C1 C2 C3
4 4 4 mux
control tri-state driver
multiplexer
4 (can disconnect
equal from output)
value comparator
4 equal open-collector connection
(zero whenever one connection is zero,
one otherwise – wired AND) 367
Tri-state gates
• The third value
– logic values: “0”, “1”
– don't care: “X” (must be 0 or 1 in real circuit!)
– third value or state: “Z” — high impedance, infinite R, no connection
• Tri-state gates
– additional input – output enable (OE)
– output values are 0, 1, and Z
– when OE is high, the gate functions normally
– when OE is low, the gate is disconnected from wire at output
– allows more than one gate to be connected to the same output wire
• as long as only one has its output enabled at any one time (otherwise, sparks
could fly)
368
Tri-state and multiplexing
• When using tri-state logic
– (1) make sure never more than one "driver" for a wire at any one time
(pulling high and low at the same time can severely damage circuits)
– (2) make sure to only use value on wire when its being driven (using a
floating value may cause failures)
• Using tri-state gates to implement an economical multiplexer
369
Open-collector gates and wired-AND
• Open collector: another way to connect gate outputs to the same wire
– gate only has the ability to pull its output low
– it cannot actively drive the wire high (default – pulled high through resistor)
• Wired-AND can be implemented with open collector logic
– if A and B are "1", output is actively pulled low
– if C and D are "1", output is actively pulled low
– if one gate output is low and the other high, then low wins
– if both gate outputs are "1", the wire value "floats", pulled high by resistor
• low to high transition usually slower than it would have been with a gate pulling
high
– hence, the two NAND functions are ANDed together
Equivalent circuits
370
Digital combination lock (new data-path)
• Decrease number of inputs
• Remove 3 code digits as inputs
– use code registers
– make them loadable from value
– need 3 load signal inputs (net gain in input (4*3)–3=9)
• could be done with 2 signals and decoder
(ld1, ld2, ld3, load none)
371
Complex Datapath
Complex Multiplier Datapath
373
Complex Multiplier in VHDL
374
Multiplier Control Sequence
• Avoid resource conflict
• First attempt
– 1. a_r * b_r → pp1_reg
– 2. a_i * b_i → pp2_reg
– 3. pp1 – pp2 → p_r_reg
– 4. a_r * b_i → pp1_reg
– 5. a_i * b_r → pp2_reg
– 6. pp1 + pp2 → p_i_reg
• Takes 6 clock cycles
• Merge steps where no resource conflict
• Revised attempt
– 1. a_r * b_r → pp1_reg
– 2. ai * bi → pp2reg
– 3. pp1 – pp2 → p_r_reg
– a_r * b_i → pp1_reg
– 4. a_i * b_r → pp2_reg
– 5. pp1 + pp2 → p_i_reg
375
• Takes 5 clock cycles
Finite-State Machines
• Used the implement control sequencing
– Based on mathematical automaton theory
• A FSM is defined by
– set of inputs: Σ
– set of outputs: Γ
– set of states: S
– initial state: s0 ∈ S
– transition function: δ: S × Σ → S
– output function: ω: S × Σ → Γ or ω: S → Γ
• FSM in Hardware
376
FSM Example: Multiplier Control
• One state per step
– Separate idle state?
– Wait for input_rdy = '1‘
– Then proceed to steps 1, 2, ...
– But this wastes a cycle!
• Use step 1 as idle state
– Repeat step 1 if input_rdy ≠ '1‘
– Proceed to step 2 otherwise
• Output function
– Defined by table on slide 43
– Moore or Mealy?
377
FSMs in VHDL
• Use an enumeration type for state values
– abstract, avoids specifying encoding
378
Multiplier Control in VHDL
379
Multiplier Control Diagram
• Input: input_rdy
• Outputs
– a_sel, b_sel, pp1_ce, pp2_ce, sub, p_r_ce, p_i_ce
380
Bubble Diagrams or VHDL?
• Many CAD tools provide editors for bubble diagrams
– Automatically generate VHDL for simulation and synthesis
• Diagrams are visually appealing
– but can become unwieldy for complex FSMs
• Your choice...
– or your manager's!
381
Verifying Sequential Circuits
• DUV may take multiple and varying number of cycles to
produce output
• Checker needs to
– „synchronize with test generator
– „ensure DUV outputs occur when expected
– „ensure DUV outputs are correct
– „ensure no spurious outputs occur
382
383
Computer Systems
• Agenda
1. Memory
2. Von Neumann Architecture
3. Sequence Controllers
4. Processing Units & Register Modeling
Memory
• Memory Types
Notes on definitions:
2) ROM memory typically refers to storage that can't be written during program execution.
It can hold program and data information, but under normal operation a CPU doesn't
use it for variable storage.
As Flash EEprom gets faster and more reliable, Flash may become used as RAM
385
Memory - SRAM
- SRAM is volatile memory (i.e., if the power is removed, the information is lost)
- two NMOS transistors acting as switches are used to Read and Write the stored data
386
Memory - SRAM
• SRAM Addressing
- we configure the cells into an array
Row Address
Column Address
387
Memory - SRAM
• SRAM Addressing
- The Word Lines are used to address a row of cells
- The Bit Lines are used to address a column in addition to reading and writing
388
Memory - SRAM
• SRAM Reading
- The capacitance of the Bit Lines can be very large due to multiple cells being attached
- This creates a problem during a READ because the small cell will need to drive this large capacitance
389
Memory - SRAM
• SRAM Reading
- In order to design a usable SRAM cell, we must meet the condition that:
"Reading the value does NOT destroy the contents of the cell"
- Let's look at what happens during a read to see how to meet this condition
Reading a '0'
390
Memory - SRAM
• SRAM Writing
- when writing to the SRAM cell, we inject full swing digital signals onto BL and BL'.
- when we assert the Word Lines, M3 and M4 will open and attempt to change the state of
the cell.
391
Memory - DRAM
- DRAM uses a capacitor to store the value of the digital information (instead of an inverter loop)
392
Memory - DRAM
• DRAM Operation
- When the cell is addressed, the charge on the storage capacitor (CS) is dumped onto the bit line (BL)
- To reduce the amount of charge the cell has to provide, the bit line capacitance (CBL) is
pre-charged to VDD/2
- When the NMOS switch closes, the two capacitances will share their charge and settle to a readable
level by amplifiers
393
Memory – ROM
• Nonvolatile Memory
- SRAM and DRAM and attractive due to their speed
- however, they are volatile which means when the power is removed, the data is lost
- for a microcomputer, we need a nonvolatile storage device so that upon power-up, the
computer knows what to do.
- before looking at the details of a Flash transistor, let’s first look at the different types
of ROM arrays and addressing modes
394
Memory – ROM
• ROM Arrays
- There are two basic types of ROM arrays
1) NOR-based ROM
2) NAND-based ROM
• NOR-based ROM
- All Column Lines are pulled-up using a PMOS transistor (or resistor)
- The Row Lines are connected to the gates of NMOS transistors at the intersection of
Row and Column Lines
- If the NMOS transistor is present, it will pull down the Column Line when its gate is
driven high by the Row Line
- if the NMOS transistor is absent, the Column Line will not be pulled down, so it will remain
pulled up by the PMOS’s
395
Memory – ROM
• NOR-based ROM
- In order to Read from the array, the Row line is asserted and the desired Column line is observed
396
Memory – ROM
• NAND-based ROM
- NAND-based ROM is a different array architecture
397
Memory – ROM
• NAND-based ROM
- In this configuration, if an NMOS is present, it will
represent a “stored 1” since in order to address its
location, the Row line is driven to a ‘0’ and the NMOS
not turned on. This leaves the Column line pulled HIGH
NOR NAND
NMOS present 0 1
NMOS absent 1 0
NOR NAND
Address Row Line by driving: 1 0
All other Row Lines driven to: 0 1
398
Memory – Flash
- this transistor is constructed such that the threshold of the device can be changed in-system
- if the threshold can be raised and lowered, this allows the transistor within the ROM array
to either be:
“present” i.e., Normal Row addressing will turn the device ON (VRow-HIGH>VT,n)
or
“absent” i.e., Normal Row addressing is not high enough to turn the device on (VRow-HIGH<VT,n)
- if this threshold change can be accomplished after fabrication, this allows a reconfigurable
ROM device that is nonvolatile, reusable, and programmable with electricity (i.e., EEprom)
399
Memory – Flash
- the Floating Gate is separated from the semiconductor substrate using a “Thin Tunneling Oxide”
- On top of the Floating Gate, a thick Dielectric is grown and another Control Gate is patterned
400
Memory – Flash
- if charge accumulates at the Floating Gate, this in effect makes the thin dielectric a better conductor
- If the thin dielectric becomes a conductor, this is the same as moving the functional Gate further
away from the substrate
- this makes it more difficult to create a channel in the substrate (i.e., VT,n gets higher)
401
Memory – Flash
- if we apply a high voltage across the Source and Drain (VD=6v), electrons near the Drain
region will receive enough energy to form electron/hole pairs
- if we apply a high voltage at the Gate (VG=12v), the hot electrons in the substrate
will be attracted to the gate
- since the electron/holes have enough energy to move freely, electrons will tunnel into the thin oxide
and holes will tunnel into the substrate
402
Memory – Flash
- if the Gate is grounded and a high voltage (12v) is applied to the Source, the electrons in the
Floating Gate will be ejected out of the dielectric and into the Source
- this has the effect of restoring the insulating ability of the Thin Dielectric and effectively moves
the functional gate of the transistor closer to the substrate
- this makes it easier to create a channel in the substrate (i.e., VT,n gets lower)
403
Memory – Flash
- If we position the threshold voltage at a raised level (>VDD), then a standard signal level
at the gate will NOT be able to turn on the transistor
404
Memory – Flash
• NAND/NOR Flash
- we can use Flash Cells in a NOR or NAND Array to implement a EEprom
- the Flash Cell requires one additional line on the Source of each transistor in order to accomplish
the programming and erasing.
405
Memory – Flash
- this is a specific type of EEprom and is cheaper to fabrication due to less programming circuitry
NOR Flash
NAND Flash
406
Memory in VHDL
• Memory in VHDL
– This data type can then be used to define either a signal (for RAM) or
constant (for ROM)
• RAM in VHDL
entity ram_256x8_sync is
port (clock : in std_logic;
data_in : in std_logic_vector(7 downto 0);
write : in std_logic;
address : in std_logic_vector(7 downto 0);
data_out : out std_logic_vector(7 downto 0)); This line defines a new data
end entity;
type called “ram_type” which
is a 2D array that is 256x8 of
architecture rtl of ram_256x8_sync is STD_LOGIC_VECTOR
408
Memory in VHDL
409
Memory in VHDL
begin
410
Memory Mapping
• Memory Mapping
411
Memory Mapping
if ((address >= 128 and address <= 191) and (write = '1')) then
RAM(conv_integer(address)) <= data_in;
elsif (address >= 128 and address <= 191) then
data_out <= RAM(conv_integer(address
end if;
end if;
end process;
412
More Details of Using VHDL
for Memories
Portions of this work are from the book, Digital Design: An Embedded
Systems Approach Using VHDL, by Peter J. Ashenden, published by Morgan
Kaufmann Publishers, Copyright 2007 Elsevier Inc. All rights reserved.
VHDL
General Concepts
A memory is an arrayof m bits
storage locations 0
Each with a unique address 1
2
Like a collection of 3
registers, but with 4
optimized implementation 5
6
Address is unsigned-binary
encoded
2n-2
n address bits ⇒ 2n locations 2n–1
All locations the same size
2n × m bit memory
2
VHDL
Memory Sizes
Use power-of-2 multipliers
Kilo (K): 210 = 1,024 ≈ 103
Mega (M): 2
20 = 1,048,576 ≈ 106
Example
32K × 32-bit memory
Capacity = 1,025K = 1Mbit
Requires 15 address bits
Size is determined by application
requirements
3
VHDL
4
VHDL
audio_in st st+1
audio_in_en
audio out en
5
VHDL
Step 1 1 Step 2 0 1 1 0 0
Step 2 – Step 1 1 1 0 1 1
7
VHDL
Wider Memories
Memory components have a fixed width
E.g., ×1, ×4, ×8, ×16, ...
Use memory en
wr
en
wr
components in a(13…0)
d_in(15…0)
a(13…0)
d_in(15…0)
a wider memory en
wr
a(13…0)
E.g, three 16K×16 d_in(31…16) d_in(15…0)
16K×48 memory en
wr
a(13…0)
d_in(47…32) d_in(15…0)
d_out(15…0) d_out(47…32)
8
VHDL
More Locations
To provide 2n locations with 0
1
2k-location components 2k-1
Use 2n/2k components 2k
2k+1
Address A 2×2k-1
at offset A mod 2k 2×2k
2×2k+1
least-significant k bits of A
3×2k-1
in component ⎣A/2k⎦
most-significant n-k bits of A
2n-2k
decode to select component 2n-2k +1
2n-1
n-k bits k bits
to decoder to address bus
to chip enables of all memory chips
-rks
9
VHDL
More Locations
en
wr wr
a(13…0) a(13…0)
d_in(7…0) d_in(7…0)
d_out(7…0)
en
wr
a(13…0)
en en 0
1 d_in(7…0)
2
a(15…14) 3 d_out(7…0)
Example: en
wr
0
1
2 d_out(7…0)
64K×8 memory a(130)
…
d_in(7…0)
3
composed of d_out(7…0)
16K×8 components en
wr
a(13…0)
d_in(7…0)
d_out(7…0)
10
VHDL
Tristate Drivers
Allow multiple outputs to be connected together
Only one active at a time
Remaining outputs are high-impedance
Both output transistors turned off
Allow bidirectional input/output ports
+V
+V +V +V
output
11
VHDL
During write wr
en
wr
memory senses d en
wr
During read en en
0
1
a(13…0)
a(15…14)
2 d(7…0)
selected memory drives d
3
en
Fewer pins and wires wr
a(13… 0)
12
VHDL
Memory Types
Random-Access Memory (RAM)
Can read and write
Static RAM (SRAM)
Stores data so long as power is supplied
Asynchronous SRAM: not clocked
Asynchronous SRAM
Data stored in 1-bit latch cells
Address decoded to enable a given cell
Usually use active-low control inputs
Not available as components in ASICs or FPGAs
A
A
CE
D
CE WE
WE
OE OE
tsu th
D stored data read data
15
VHDL
16
VHDL
17
VHDL
18
VHDL
Multiplier Datapath
ci×x 1. (mult_sel = 0)
SSRAM
i A x × (ci × x) 2. (mult_sel = 1)
c in D in D out
c_ram_en en
c_ram_wr wr 0
1
clk
× D Q y
x D Q 0 ce
x_ce ce 1 clk
clk
mult_sel
y_ce
clk
19
VHDL
0
step1 1 step2
1, 1, 0, 0 0, 0, 0, 1
step3
0, 0, 1, 1 step1 step1 step2 step3 step1
clk
start
c_ram_en
x_ce
mult_sel
y_ce
20
VHDL
Pipelined SSRAM
Data output also has a register
More suitable for high-speed systems
Access RAM in one cycle, use the data in
en
wr
D_in xx
D_out xx M(a2)
21
VHDL
Memories in VHDL
RAM storage represented by an array signal
22
VHDL
23
VHDL
mult_sel
if c_ram_en = '1' then y_ce
clk
if c_ram_wr = '1' then
c_RAM(to_integer(i)) <= c_in; Store (and use) the scaling values
c_out <= c_in;
else
c_out <= c_RAM(to_integer(i)); Use the previously stored values.
end if;
end if;
end if;
end process c_RAM_flow_through;
24
VHDL
25
VHDL
26
VHDL
27
VHDL
28
VHDL
Multiport Memories
Multiple address, data and control
connections to the storage locations
Allows concurrent accesses
Avoids multiplexing and sequencing
Scenario
Data producer and data consumer
What if two writes to a location occur
concurrently?
Result may be unpredictable
Some multi-port memories include an arbiter
29
VHDL
FIFO Memories
First-In/First-Out buffer
Connecting producer and consumer
Decouples rates of production/consumption
Producer Consumer
FIFO
subsystem subsystem
Implementation using
dual-port RAM
read
Circular buffer
Full: write_addr = read_addr
write
Empty: write_addr = read_addr
30
VHDL
counter
8-bit A_rd
rd_en ce Q
reset
clk = equal
counter dual-port
8-bit A_wr SSRAM
ce Q A_wr A_rd
reset reset D_wr D_rd D_rd
clk wr en rd en
emptying
full = filling and equal
wr_en, rd_en 1, 0 0, 1
empty = emptying and equal
filling
32
VHDL
33
VHDL
34
VHDL
DRAM Refresh
Charge on capacitor decays over time
Need to sense and rewrite periodically
Typically every cell every 64ms
Refresh each location
DRAMs organized into banks of rows
Refresh whole row at a time
Can’t access while refreshing
Interleave refresh among accesses
Or burst refresh every 64ms
35
VHDL
DDR DRAM
Feature DDR DDR2 DDR3
Voltage 2.5V 1.8V 1.5V
Max data rate per I/O pin (Mbits/sec) 800 1066 1600
Peak Bandwidth 3.2 4.2 6.4
(Gbytes/sec for a 32 bit data bus)
Sustained Bandwidth 1.9 2.5 3.8
(Gbytes/sec for a 32 bit data bus) - (60%)
Max Density 1 4 4
(Gbits per device)
36
VHDL
Flash RAM
37
VHDL
Combinational ROM
AROMmapsaddressinputtodataoutput
This is a combinational function!
Specify using a table
Example: 7-segment decoder
Address Content Address Content
BCD0 A0 D0 a
BCD1 A1 D1 b 0 0111111 6 1111101
BCD2 A2 D2 c 1 0000110 7 0000111
BCD3 A3 D3 d
blank A4 D4 e 2 1011011 8 1111111
D5 f 3 1001111 9 1101111
D6 g
4 1100110 10-15 1000000
5 1101101 16-31 0000000
38
VHDL
39
VHDL
Flash RAM
Non-volatile, readable (relatively fast), writable
(relatively slow)
Storage partitioned into blocks
Erase a whole block at a time, then write/read
Once a location is written, can't rewrite until erased
NOR Flash
Can write and read individual locations
Used for program storage, random-access data
NAND Flash
Denser, but can only write and read block at a time
Used for bulk data, e.g., cameras, memory sticks
40
VHDL
Memory Errors
Bits in memory can be flipped
Hard error
The chip is broken
E.g., manufacturing defect, wear (in Flash)
Soft error
Stored data corrupted, but cell still works
E.g., from atmospheric neutrons
Soft-error rate
frequency of occurrence
41
VHDL
42
VHDL
d8 d7 d6 d5 d4 d3 d2 d1
Check bits are in bit positions
whose indices are a power of 2
e12
12
e11
11
e10
10
e99 e88 e77 e66 e55 e44 e33 e22 e11
43
VHDL
Multiple-Error Detection
What if two bits flip
syndrome identifies wrong bit, or is invalid
One extra check bit allows
single-error correction, double-error detection
Single-bit correction Double-bit detection
N Check bits Overhead Check bits Overhead
8 4 50% 5 63%
16 5 31% 6 38%
32 6 19% 7 22%
64 7 11% 8 13%
128 8 6.3% 9 7.0%
256 9 3.5% 10 3.9%
46
VHDL
Summary
Memory: addressable storage locations
Read and Write operations
Asynchronous RAM
Synchronous RAM (SSRAM)
Dynamic RAM (DRAM)
Read-Only Memory (ROM) and Flash
Multiport RAM and FIFOs
Error Detection and Correction
Hamming Codes
47
Embedded Computers
VHDL
Embedded Computers
A computer as part of a digital system
Performs processing to implement or control the
system’s function
Components
Processor core
Instruction and data memory
Input,output,and input/output controllers
For interacting with the physical world
Accelerators
High-performance circuit for specialized functions
Interconnecting buses
2
VHDL
Memory Organization
Von Neumann architecture
Single memory for instructions and data
Harvard architecture
Separate instruction and data memories
Most common in embedded systems
Instruction Data
CPU Accelerator
memory memory
3
VHDL
Bus Organization
Single bus for low-cost low-performance
systems
Multiple buses for higher performance
Data
Accelerator
memory
Instruction
CPU
memory
4
VHDL
Bus Organization
5
VHDL
Bus Organization
6
VHDL
Bus Organization
Altera’s System Interconnect Fabric Example
7
VHDL
Bus Organization
Altera’s Memory-Mapped and Streaming System Interconnect Fabrics
SRIO:
Serial RapidIO is a high-
performance, point-to-
point, packet-switched
interconnect technology
defined by the RapidIO
Trade Association.
Full-duplex point-to-point
links are established with
single or multiple high-
speed serial lanes (1x and
4x are currently defined),
and industry-standard
8B/10B-encoded data
transmission at signaling
rates of 1.25, 2.50, or
3.125 Gbaud for peak
bandwidth of up to 20
Gbps.
8
VHDL
Microprocessors
Single-chip processor in a package
External connections to memory and
I/O buses
Most commonly seen in general
purpose computers
Eg , IntelPentiumfamily,PowerPC, …
9
VHDL
Microcontrollers
Single chip combining
Processor
A small amount of instruction/data memory
I/O controllers
Microcontroller families
Same processor, varying memory and I/O
8-bit microcontrollers NXP’s 50-MHz ARM Cortex-
M0-based LPC1100
Operate on 8-bit data microcontroller family
represents the latest 32-bit
Low cost, low performance challenge to 8- and 16-bit
processors. The parts are
Processor Cores
Processor as a component in an FPGA or
ASIC
In FPGA,can be a fixed-function block
E.g., PowerPC cores in some Xilinx FPGAs
Or can be a soft core
Implemented using programmable resources
E.g., Xilinx MicroBlaze, Altera Nios-II
In ASIC, provided as an IP block
E.g., ARM, PowerPC, MIPS, Tensilica cores
Can be customized for an application
11
VHDL
12
VHDL
Instruction Sets
Aprocessorexecutesaprogram
A sequence of instructions, each performing a
small step of a computation
Instruction set: the repertoire of available
instructions
Different processor types have different instruction
sets How are new instructions chosen to be
added to Instruction Set?
Instruction Execution
Instructions are encoded in binary
Stored in the instruction memory
Approcessor executes a program by
repeatedly
Fetching the next instruction
Decoding it to work out what to do
15
von Neumann Computer
von Neumann Computer
- to change the functionality of the computer, the program is changed (instead of the HW)
- John von Neumann was a mathematician who described a computer architecture where the
instructions and data reside in the same memory
- the drawback is the "von Neumann bottleneck" in getting data into and out of memory in order for
the computer to run
- this architecture is what we are using in the labs on the Freescale microcontrollers
476
von Neumann Computer
477
von Neumann Computer
• Bus Management
- There are a great deal of signal that exist in a microcomputer. Sharing lines reduces the amount
of wiring needed on the chip.
1) verbose routing – every devices has a dedicated input / output bus that connects to
or explicit any/all devices that it needs to communicate.
2) High Impedance - devices share a signel output bus but each devices has a high
impedance state. Only one device is allowed to drive the bus at any
given time.
3) Mulitiplexed - device share a single output bus, but each devices routes its output
to a multiplexer which then in turn drives the bus.
478
von Neumann Computer
479
von Neumann Computer
1) Control Unit - the state machine that directs the execution of instructions.
- for a given Opcode, the state machine traverses a specific
path within its state diagram
- also called the "Sequence Controller" or "Sequencer"
2) Processing Unit - contains all of the registers and ALU that hold and manipulate data
- memory signals (data/address) coming into/out-of this unit
3) Control Signals - signals sent to processing unit from the control unit
- direct data flow
- load data into registers
- select ALU operation
- manage memory access signals
4) Test Signals - signals sent to control unit from the processing unit
- results of operations that effect state machine flow
480
von Neumann Computer (Processing Unit)
• Processing Unit
Instruction Registers (IR) - holds the Opcode that is read from memory
- passes the Opcode to the Control Unit as a test signal
Memory Address Reg (MAR) - holds the current address being sent to memory
Program Counter (PC) - tracks the address of which instruction is being executed
- PC is sequential (0,1,2…)
- PC is loaded during a branch, incremented otherwise
- MAR tracks PC when executing instruction
User-Controlled Reg (X, Y,..) - these are operated on directly by the program
- can be loaded and stored
ALU Operand Register (Z) - holds one of the inputs to the ALU
- the other input comes from one of the user-controlled registers
481
von Neumann Computer (Processing Unit)
• Processing Unit
482
von Neumann Computer (Processing Unit)
• Buses
- we route data in the processing unit between registers/memory using shared lines called buses
- will drive to IR, MAR, PC, User-Controlled Registers, or ALU Operand Reg
- Information from Bus1 can be routed to Bus2 for feedback operations (PC = PC + 1)
- Bus select lines come from the Control Unit to select which information is on which bus at any
given time.
483
von Neumann Computer (Processing Unit)
• Control Signals
- the Bus1 and Bus2 control lines come from the control unit and drive the multiplexers
- CCR_Load will load the status bits (NZVC), whose values depend on the previous ALU operation
- the ALU_Sel line tells the ALU which function to perform (AND, ADD, …)
• Test Signals
- the Instruction Register (IR) holds the Opcode for the Control Unit to base state decisions on
- the CCR_Result is the NZVC status bits from an ALU operation and influence state decisions
484
von Neumann Computer (Processing Unit)
• Register Modeling
- each register in the processing unit can be loaded by the control unit.
- the loads are synchronous to clock and occur on the following state
485
von Neumann Computer (Processing Unit)
486
von Neumann Computer (Processing Unit)
• MUX Modeling
- The bus select signals come from the control unit. The Multiplexers are “combinational logic”
Bus 1 Bus 2
BUS1_CONTROL : process (Bus1_Sel, PC, X, Y) BUS2_CONTROL : process (Bus2_Sel, ALU, Bus1, Memory_Out)
begin begin
case (Bus1_sel) is case (Bus2_sel) is
when "00" => Bus1 <= PC; when "00" => Bus2 <= ALU;
when "01" => Bus1 <= X; when "01" => Bus2 <= Bus1;
when "10" => Bus1 <= Y; when "10" => Bus2 <= Memory_Out;
when others => Bus1 <= "XXXXXXXX"; when others => Bus2 <= "XXXXXXXX";
end case; end case;
end process; end process;
487
von Neumann Computer (ALU)
• ALU Modeling
- The ALU is combinational logic. It contains as many operations as desired. The operation being
performed is dictated by the control unit.
ALU
ALU_Functions : process (ALU_Sel, Z, Bus1)
begin
case (ALU_sel) is
when '0' => ALU <= Z and Bus1; -- AND
when '1' => ALU <= Z + Bus1; -- ADD
when others => ALU <= x"00";
end case;
end process;
488
von Neumann Computer (ALU)
• CCR Modeling
- The CCR is a register because we want it to hold the status flags across multiple instructions.
- Typical flags are: Negative (N), Zero (Z), 2’s Comp Overflow (V), and Carry (C)
- These flags are fed back to the control unit for state transition decisions during branch instructions
(i..e, Branch if Zero, Branch if Carry, etc…)
CCR example for Zero Flag
CCR_Register : process (Clock, Reset)
begin
if (Reset = '0') then
CCR_Result <= x"00";
elsif (Clock'event and Clock='1') then
if (CCR_Load = '1') then
if (ALU = x"00") then
CCR_Result <= "00000100";
else
CCR_Result <= "00000000";
end if;
end if;
end if;
end process;
489
von Neumann Computer (Control Unit)
- It consists of a single state transition path for Fetch & Decode followed by a set of parallel paths
which handle the execution of each instruction in the instruction set of the microcomputer.
- The Sequence Controller creates all of the control signals which drive the processing unit & ALU.
490
von Neumann Computer (Control Unit)
491
von Neumann Computer (Control Unit)
492
von Neumann Computer (Control Unit)
493
von Neumann Computer (Control Unit)
when S_LXIMM_4 => Next_State <= S_LXIMM_5; -- States when the instruction is Load X Immediate
when S_LXIMM_5 => Next_State <= S_LXIMM_6;
when S_LXIMM_6 => Next_State <= S_FETCH_0;
when S_STXDIR_4 => Next_State <= S_STXDIR_5; -- States when the instruction is Store X Direct
when S_STXDIR_5 => Next_State <= S_STXDIR_6;
when S_STXDIR_6 => Next_State <= S_STXDIR_7;
when S_STXDIR_7 => Next_State <= S_FETCH_0;
when S_BRA_4 => Next_State <= S_BRA_5; -- States when the instruction is a Branch Always
when S_BRA_5 => Next_State <= S_BRA_6;
when S_BRA_6 => Next_State <= S_FETCH_0;
494
von Neumann Computer (Control Unit)
495
Gumnut Core in VHDL
VHDL
assembler
Resources available on companions web site
16
VHDL
Gumnut Storage
General-Purpose Registers Condition Code Registers
How many
r0 0 C Carry
registers should r1 Z Zero
you encode for in r2
the instruction? r3
Two? Three?
r4 Program Counter
How many r5 PC
registers should r6
there be? r7
254 4094
255 4095
17
VHDL
Arithmetic Instructions
Operate on register data and put result
in a register
add,addc,sub,subc
Can have immediate value operand
Condition codes
Z: 1 if result is zero, 0 if result is non-zero
C: carry out of add/addc, borrow out of
sub/subc
addc and subc include C bit in
operation
18
VHDL
Arithmetic Instructions
Examples
add r3, r4, r1
19
VHDL
Logical Instructions
Operate on register data and put result
in a register
and, or, xor, mask (and not)
Operate bitwise on 8-bit operands
Condition codes
Z: 1 if result is zero, 0 if result is non-zero
C: always 0
20
VHDL
Logical Instructions
Examples
and r3, r4, r5
or r1, r1, 0x80 ; set r1(7)
xor r5, r5, 0xFF ; invert r5
Set Z if least-significant 4 bits of r2 are 0101
and r1, r2, 0x0F ; clear high bits
sub r0, r1, 0x05 ; compare with 0101
21
VHDL
Shift Instructions
Logical shift/rotate register data and
put result in a register
shl, shr, rol, ror
Count specified as a literal operand
Condition codes
Z: 1 if result is zero, 0iif result is non-zero
C: the value of the last bit shifted/rotated
22
VHDL
Shift Instructions
Examples
shl r4, r1, 3
ror r2, r2, 4
Multiply r4 by 8, ignoring overflow
shl r4, r4, 3
Multiply r4 by 10, ignoring overflow
shl r1, r4, 1; multiply by 2
shl r4, r4, 3 ; multiply by 8
add r4, r4, r1
23
VHDL
Memory Instructions
Transfer data between registers and data
memory
Compute address by adding an offset to a base
register value
Load register from memory
ldm r1, (r2)+5
Store from register to memory
stm r1, (r4)-2
Use r0 if base address is 0
ldm r3, 23 ≡ ldm r3, (r0)+23
Condition codes not affected
24
VHDL
Memory Instructions
Increment a 16-bit integer in memory
Little-endian: address of lsb in r2, msb in next
location
ldm r1, (r2) ; increment lsb
add r1, r1, 1
stm r1, (r2)
ldm r1, (r2)+1 ; increment msb
addc r1, r1, 0 ; with carry
stm r1, (r2)+1
25
VHDL
Input/Output Instructions
I/O controllers have registers that govern
their operation
Each has an address,like data memory
Gumnut has separate data and I/O address spaces
Input from I/O register
inp r3, 157 ≡ inp r3, (r0)+157
Output to I/O register
out r3, (r7) ≡ out r3, (r7)+0
Condition codes not affected
Further examples in Chapter 8
26
VHDL
Branch Instructions
Programs can evaluate conditions and take
alternate courses of action
Condition codes (Z,C) represent outcomes of
arithmetic/logical/shift instructions
Branch instructions examine Z or C
bz, bnz, bc, bnc
Add a displacement to PC if condition is true
Specifies how many instructions forward or
backward to skip
Counting from instruction after branch
27
VHDL
Branch Example
Elapsed seconds in location 100
Increment, wrapping to 0 after 59
ldm r1, 100
add r1, r1, 1
sub r0, r1, 60 ; Z set if r1 = 60
bnz +1 ; Skip to store if
add r1, r0, 0 ; Z is 0
stm r1, 100
28
VHDL
Jump Instruction
Unconditionally skips forward or backward to
specified address
Changes the PC to the address
Example: if r1 = 0, clear data location 100 to
0; otherwise clear location 200 to 0
Assume instructions start at address 10
10: sub r0, r1, 0
11: bnz +2
12: stm r0, 100
13: jmp 15
14: stm r0, 200
15: …
29
VHDL
Subroutines
Asequenceofinstructionsthatperform
some operation
Can call them from different parts of a
program using a jsb instruction
Subroutine returns with a ret instruction
jsb m m subroutine
… instructions
…
jsb m
…
…
ret
30
VHDL
Subroutine Example
Subroutine to increment second count
Address of count in r2
ldm r1, (r2)
add r1 ,r1,1
sub r0, r1, 60
bnz +1
add r1 ,r0,0
stm r1, (r2)
ret
Call to increment locations 100 and 102
add r2, r0, 100
jsb 20
add r2, r0, 102
jsb 20
31
VHDL
32
VHDL
Miscellaneous Instructions
Instructions supporting interrupts
See Chapter 8 (more later)
reti Return from interrupt
enai Enable interrupts
disi Disable interrupts
wait Wait for an interrupt
stby Stand by in low power mode until
an interrupt occurs
33
VHDL
34
VHDL
Example Program
; Program to determine greater of value_1 and value_2
text
org 0x000 ; start here on reset
jmp main
; Data memory layout
data
value_1: byte 10
value_2: byte 20
result: bss 1
; Main program
text
org 0x010
main: ldm r1, value_1 ; load values
ldm r2, value_2
sub r0, r1, r2 ; compare values
bc value_2_greater
stm r1, result ; value_1 is greater
jmp finish
value_2_greater: stm r2, result ; value_2 is greater
finish: jmp finish ; idle loop
35
VHDL
36
VHDL
37
VHDL
Encoding Examples
Encoding for addc r3 ,r5, 24
Arithmetic immediate, fn = 001
1 3 3 3 8
0 fn rd rs immed
0 0 0 1 0 1 1 1 0 1 0 0 0 1 1 0 0 0 05D18
38
VHDL
39
VHDL
40
VHDL
en inst_cyc_o data_cyc_o en
inst stb o data stb o
D Q inst_ack_i data_ack_i Q D
clk clk
data we o we
adr inst_adr_o
dat_o inst_dat_i data_adr_o adr
data_dat_o dat_i
data_dat_i dat_o
41
VHDL
42
VHDL
8051 SRAM
P2 A(15..8)
D
P0 D Q A(7..0)
ALE LE
PSEN A(16)
WR WE
OE
RD
CE
44
VHDL
32-bit Memory
Four bytes per memory word
Little-endian: lsb at least address
Big-endian: msb at least address
0 1 2 3
4 5 6 7
8 9 10 11
Partial-word read
Read all bytes, processor selects those needed
Partial-word write
Use byte-enable signals
45
VHDL
Write_Strobe en
Byte_Enable(0) wr
Byte_Enable(2) SSRAM
Byte_Enable(3) A
Read_Strobe 8:15 8:15
D_in D_out
en
Data Read
wr
+V
clk
Ready
Clk SSRAM
A
16:23 16:23
D_in D_out
en
wr
clk
SSRAM
A
24:31 24:31
D_in D_out
en
wr
clk
46
VHDL
Cache Memory
For high-performance processors
Memory access time is several clock cycles
Performance bottleneck
Cache memory
Small fast memory attached to a processor
Stores most frequently accessed items,
47
VHDL
Cache Memory
Memory contents divided into fixed-
sized blocks (lines)
Cache copies whole lines from memory
When processor accesses an item
If item is in cache: hit - fast access
Occurs most of the time
If item is not in cache: miss
Line containing item is copied from memory
Slower, but less frequent
48
VHDL
49
VHDL
Summary
Embedded computer
Processor, memory, I/O controllers, buses
Microprocessors,microcontrollers,and
processor cores
Soft-core processors for ASIC/FPGA
50
532