Anda di halaman 1dari 49

Control Unit :

Hardwired vs. Microprogrammed


Approach

Dr Shankar Balachandran
Indian Institute of Technology Madras
shankar@cse.iitm.ernet.in
14 October 2006

Two Major Blocks in a CPU

Datapath
Adders,

multipliers, dividers
Shifters, Registers
Anything that changes or stores data

Control Unit
Controls

the data
How data is stored?
Where is it stored?
When should data be available?

Control Unit
Correct sequencing of control signals
Much like human brain controlling various
parts of body
Sequence and timing is the key

Any

aberration will result in wrong operation

A Simplified Control Unit


Fetch
Fetch Unit
Decode

Control Unit

Decode Unit
Execute
Execution Unit
Write Back
Write Back Unit

A Possible Implementation

Mod-3
Counter

CLK

2 to 4
Decoder

Timing Diagram
CLK

Fetch

Decode

Execute

Write Back

Lets Sample The Signals

1 0 0 0

0 1 0 0

0 0 1 0

0 0 0 1

Another Way to Generate Signals


1000
0100
0010
0001

Hardwired vs Microprogrammed

Hardwired
Use

gates to generate signals


Squeeze out the juice for performance
Different logic styles possible

Microprogrammed
Store

the control signals in the sequence


Just read from the memory every clock cycle

A Model Computer
(Richard Eckert, SIGCSE Bulletin, Vol. 20, No. 3, September 1988)
IP
LP
EP

LM

12

8
PC

Accumulator
12

12
12

MAR

ALU

8
R
W

Register B
12

MDR

IR
4

Bus

S
A
EU

12

12

RAM
12

LD
ED

LA
EA

Control

LB
LI
EI

More Details
L = Load
E = Copy to bus
A,S = Add and Subtract
Sign bit to control unit
IP = Increment PC

IP
LP
EP

LM

R
W
LD
ED

ACC

PC

LA
EA

MAR

ALU

A
EU

LB

IR

LI
EI

RAM

MDR
Bus

Control

Mnemonic Opcode
LDA

Register Transfers

Active
Controls

A(Mem)

1. MAR IR
2. MDR M(MAR)
3. A MDR

EI,LM
R
ED,LA
EI,LM
EA,LD
W

Action

Load
Accumulator

STA
Store
Accumulator

(Mem) A

1. MAR IR
2.MDR A
3. M(MAR) MDR

ADD

A A+B

1. AALU(Add)

A,EU,LA

SUB

A A-B

1. AALU(Sub)

S,EU,LA

MBA

B A

1. BA

EA,LB

JMP

PC Mem

1. PCIR

EI,LP

JN

PC Mem
If ve flag
is set

1. PCIR if NF is set

NF : EI,LP

HLT

8-15

Stop Clock
1. MAR PC
2. MDR M(MAR)
3. IR MDR

EP,LM
R
ED,LI,IP

Fetch

IR Next
Instruction

Hardwired Unit
CLK

IR

Ring Counter

T5

Opcode

T1

LDA
STA
ADD

Decoder

SUB
MBA
JMP

Control
Matrix

JN

Halt

NF

Control Signals

Table with Sequencing


IP

LP

Fetch T2

EP LM R
T0

LD

ED LI

T0

T1

T2

LDA

T3

T4

T5

STA

T3

T5

EI

LA

T3

T5

EA A

EU LB

T2

T4

T3

T4

MBA

T3

ADD

T3

SUB

T3

JMP

T3

T3

JN

T3
*F

T3
*F

IP = T2;
LP = T3*JMP+T3*JN*NF;
EP = T0;
LM = T0+T3*LDA+T3*STA

R=T1+T4*LDA;
W=T5* STA;
LD = T4*STA;
ED=T2+T5*LDA;

T3
T3

T3
T3

LI=T2;
A = T3*ADD;
S = T3*SUB;
..

T3

Control Matrix
Implement using discrete gates
Usually done using PLAs
Large control matrices are implemented
hierarchically

For

speed

A well known process and design flows


are widespread

An Alternate Implementation
4-bit
opcode
IR

MAP
Starting
Address
Generator

CD

&

1*

NF

01
00
CLK

Map

CD

Meaning

From IR

Unconditional
Branch within
Microprogram

NF=0 =>
Increment
NF=1 =>
Conditional Branch

uPC

+1
32 x 24
Control ROM
Jump Address

Control
Store

Microinstruction
Register
HLT

Control

Instruction Op-Code
Fetch

LDA

STA

Control Store

uInstruction
Address

Control Signals

CD

00

0011000000000000

01

01

0000100000000000

02

02

1000000110000000

XX

03

0001000001000000

04

04

0000100000000000

05

05

0000000100100000

00

06

0001000001000000

07

07

0000001000010000

08

08

0000010000000000

00

MAP HLT Addr. Of Next

ADD

09

0000000000101010

00

SUB

0A

0000000000100110

00

MBA

0B

0000000000010001

00

JMP

0C

0100000001000000

00

JN

0D

0000000000000000

0F

0E

0000000000000000

00

0F

0100000001000000

00

Expansion

8-E

10-1E

Control Word

I
L E
Example 1 MBA followed by ADD
P P P

Fetch

LDA

STA

L
M

L
D

E
D

L
I

E
I

L
A

E
A

E
U

L
B

00

0011000000000000

01

01

0000100000000000

02

02

1000000110000000

XX 0B
09

03

0001000001000000

04

04

0000100000000000

05

05

0000000100100000

00

06

0001000001000000

07

07

0000001000010000

08

08

0000010000000000

00

ADD

09

0000000000101010

00

SUB

0A

0000000000100110

00

MBA

0B

0000000000010001

00

JMP

0C

0100000001000000

00

JN

0D

0000000000000000

0F

0E

0000000000000000

00

0F

0100000001000000

00

Expansion

8-E

10-1E

Sequence for MBA,ADD


MOV B,A

ADD

1. MAR PC
2. MDR M(MAR)
3. IR MDR
BA
1. MAR PC
2. MDR M(MAR)
3. IR MDR
AALU(Add)

0011000000000000
0000100000000000
1000000110000000
0000000000010001
0011000000000000
0000100000000000
1000000110000000
0000000000101010

I
P

L
P

E
P

L
M

L
D

E
D

L
I

E
I

L
A

E
A

E
U

L
B

Example 2 JN with
Flag Set

CD

Fetch

LDA

STA

00

0011000000000000

01

01

0000100000000000

02

02

1000000110000000

XX

03

0001000001000000

04

04

0000100000000000

05

05

0000000100100000

00

06

0001000001000000

07

07

0000001000010000

08

08

0000010000000000

00

ADD

09

0000000000101010

00

SUB

0A

0000000000100110

00

MBA

0B

0000000000010001

00

JMP

0C

0100000001000000

00

JN

0D

0000000000000000

0F

0E

0000000000000000

00

0F

0100000001000000

00

Expansion

8-E

10-1E

0D

If negative FLAG is set, jump to a new location by skipping to uInstruction at 0F

I
P

L
P

E
P

L
M

L
D

E
D

L
I

E
I

L
A

E
A

E
U

L
B

Example 3 JN with
Flag Not Set

CD

Fetch

LDA

STA

00

0011000000000000

01

01

0000100000000000

02

02

1000000110000000

XX

03

0001000001000000

04

04

0000100000000000

05

05

0000000100100000

00

06

0001000001000000

07

07

0000001000010000

08

08

0000010000000000

00

ADD

09

0000000000101010

00

SUB

0A

0000000000100110

00

MBA

0B

0000000000010001

00

JMP

0C

0100000001000000

00

JN

0D

0000000000000000

0F

0E

0000000000000000

00

0F

0100000001000000

00

Expansion

8-E

10-1E

0D

Lets Review the


Microprogramming Model
Store the microprogram in control store
Fetch the instruction
Get the set of control signals from the
control word
Move the microinstruction address
Lather, Rinse, Repeat

What is Microcode?

Michael Slater's "Microprocessor Based Design" (pg.42):


Microcode tells the processor every detailed step
required to execute each machine language instruction.
Microcode is thus at an even more detailed level than
machine language, and in fact defines the machine
language. In a standard microprocessor, the microcode
is stored in a ROM or a programmable logic array (PLA)
that is part of the microprocessor chip and cannot be
modified by the user.'

Thought Experiment
Why is the design a little clumsy?
What can we do about it?

Reason for Clumsiness


JN Conditional Flag check
Without any condition check, the whole
process is very smooth
Solution Avoid all conditional checks

Real Life
A little American Football Story
Theory vs. Practice

In

theory, there is no difference between


theory and practice
In practice, theory and practice are two
different things altogether

Live with condition checks


Keep

designs as clean as possible

A General Approach
IR

Starting
and Branch
Address
Generator

External Inputs
Conditional Codes

uPC

Control
Store

Control Word

Format of Microinstructions

Pick yours
Your

choice is as best as your neighbors

What we did :
One

bit position per control signal


Order of the bits ?

Dont matter

Can

result in long microinstructions

Not the number of microinstructions, but the width

A Note About Density


Observe that only a few bits are set to 1
Poor usage of bit space
This scheme is called Horizontal
Microprogram
Alternate Version : Encode the bits

Vertical

Microprogram

Vertical Microprogram
Encode the bits by grouping similar
elements together
General Idea :

Group

There can be only one source or destination


register

Some

similar resources together


operations are mutually exclusive

Read vs Write of memory

Design Issues

Encoding reduces the bit-space


But

requires decoders

Cost of decoder vs bit-space


Usually

decoder cost is very low

Another Idea
Group concuurently active signals
Every meaningful combination gets a code
Complex decoder to interpret every code

Vertical vs Horizontal

Horizontal
Faster
More

area
More common currently

Cheap transistors

Vertical
Slower
More

microinstructions

Microsequencing
Other ways to save on hardware
Every instruction had its own
microprogram sequence
Also, instructions have several addressing
modes

Only

the first few microinstructions differ

Can we share microcode?

A Powerful Technique in Sharing

Bit-ORing

Example
Two instructions share some microcode
Eventually, must branch
The default branch (one instructions) is X0
The other branch is stored at X1
Change the least significant bit(s?) to get a new address

Compare that with :

Having two conditional branches


Store two fields, one for each branch
Both very unclean

Thought Experiment :
What if we provided explicit branch
instead of storing next field in our
microprogram?
Typical instruction set will need a lot of
branches
Lot of time will be wasted on branching

A Pat on Our Back

We provided explicit field for address


Branch

location is now data


It is already saved

Caution :
Microinstruction

can get very wide

Solution :
There

is no free lunch.

Can we pipeline microfetch?

A neat idea :

Why wait till the current micro-op is over?


Branch field gives next operation
Get the next op

Caveat :

External inputs and status flags may change the order


What about interrupts?

Should have a mechanism that can invalidate microcode


prefetch

They are going to follow you everywhere

Similar to pipeline flush for instructions

Commonly used

Historical Perspectives

Hardwired Logic

Popular before 60s

Popular now

Speed Benefits

Microprogram

Popular in 70s

Only way people did it

Memory was slower than CPU


No on-chip cache
Best way is to store the microcode

Now Depends on who you ask?

Shades of gray :

Extremes of spectrum are harder to find nowadays

Tools for Design

Hardwired
Any

state machine optimizer


Assigning states, minimizing tranisitions, races,
hazards,..

Microcoding
Small

ones can be in binary


Large ones Use microassembler

Very useful debug tool


Can use microassembler simultaneously with actual hardware
development

Hardwired vs Microcoding
Hardwired units are faster and smaller
Emulation is easy with microcoding
Hardwired design is complex if large
Bugs in hardwired design cannot be fixed
in field
Hardwired control is not suited for loops

Looping

with microcode can be made as fast

Hardwired vs Microcode vs RISC

RISC
Simpler

instruction set
Hardwired Implementation

RISC instructions are like microcodes


Instructions

come from I-Cache instead of Control

Store

Difference :
Contents

are not fixed


Advantage : Only load what you want on the I-Cache

Keeps size smaller as compared to Control Stores

Microprogram vs Software

Imagine Floating Point Division


Solution 1 : Write in software
Long

process
Error prone
Many fetches repeatedly from memory for the given
sequence of operations

Solution 2 : Microcode
Long

process too but designers not programmers


Relatively error free more thorough design
Requires many cycles but fetched and used locally

Emulation

A very common use of microcoding


IBM System/360

32 bit architecture
16-bit registers

Secret :

Most implementations were 8-bit

Keep cost low

Heavy microcoding
Programmers oblivious

In 1992, International Meta Systems (IMS) announced


the 3250

Designed to emulate the x86, 68K, and 6502 architectures


Uses customizable microcode, among other techniques
Went bust, never released

Another Interesting Note

Writable Control Store


What

if you, a programmer, can write your


own control store?
Not a mad scientist thought

Implemented in
VAX

8800
PDP-11/60
IBM System/370

Current Trends

Microcode Update
Linux Utility - microcode_ctl
Companion

to IA32 microcode driver


It decodes and sends new microcode to the kernel
driver to be uploaded to Intel IA32 processors
Update is volatile lost on reboots

Microcode updates are also rolled into BIOS


updates typically
Ready

even before an OS is loaded

Intel Said..
The Pentium(R) Pro processor and Pentium(R) II processor may
contain design defects or errors known as errata that may cause the
product to deviate from published specifications. Many times, the
effects of the errata can be avoided by implementing hardware or
software work-arounds, which are documented in the Pentium Pro
Processor Specification Update and the Pentium II Processor
Specification Update. Pentium Pro and Pentium II processors include a
feature called "reprogrammable microcode", which allows certain types
of errata to be worked around via microcode updates. The microcode
updates reside in the system BIOS and are loaded into the processor
by the system BIOS during the Power-On Self Test, or POST.

Current Trends

Hyperthreading in P4
A second

logical CPU
Complete state of the system in both CPUs

Microcoding in P4
Two

pointers control flow independently


Both processors share the ROM entries
Access is alternated between the CPUs

Thank You