Page no.
2
2
2
4
4
5
CONTENT
Review of digital design
1.
Signals, logic operators and gates
2.
Gates as control element
3.
Combinational circuits
4.
Programmable combinational parts
5.
Sequential circuits
Main memory concepts
1.
Memory definition
2.
Memory hierarchy
3.
Memory performance parameters
4.
Memory structure and memory cycle, Memory chip organization
5.
Hitting memory wall, Pipelined memory and interleaved memory
7
7
7
9
9
11
III.
Types of memory
1.
Types
2.
Static RAM
3.
Dynamic RAM
4.
Other types
12
12
12
12
12
IV.
12
12
13
14
15
16
16
17
II.
V.
VI.
17
17
CMageshKumar_AP_AIHT
CS2071_Computer Architecture
NOT
AND
OR
XOR
Operator
sign and
alternat e(s)
x
_
x or x
xy
x y
x y
xy
xy
x y
Output
is 1 iff:
Input is 0
Both inputs
are 1s
At least one
input is 1
Inputs are
not equal
1x
x y or xy
x y xy
x y 2xy
Graphical
symbol
Arithmetic
expression
AND
OR
NAND
NOR
XNOR
CMageshKumar_AP_AIHT
CS2071_Computer Architecture
Enable/Pass signal
e
Enable/Pass signal
e
Data out
x or 0
Data in
x
Data in
x
Data out
x or high impedance
(b) Tristate buffer
e
0
0
x
e
0
ex
No data
or x
ex
x
x
ey
ey
Data out
(x, y, z,
or high
impedance)
Data out
(x, y, z, or 0)
ez
ez
z
z
(a) Wired OR of product terms
Enable
/
Compl
x
y
y
z
z
x
32
32
x
y
y
z
z
x
BCD-to-Seven-Segment Decoder:
The logic circuit that generates the enable signal for the lowermost segment (number 3) in a sevensegment display unit.
4-bit input in [0, 9]
x3 x2 x1 x0
Signals to
enable or
turn on the
segments
e0
e5
e6
e4
e3
6
4
2
3
e2
e1
CMageshKumar_AP_AIHT
CS2071_Computer Architecture
x0
x1
z
y
/
32
/
32
/
32
x0
x1
x2
x3
0
1
2
3
y
y1y0
x0
x1
x2
x3
x1
y1 y0
y
(c) Mux symbol
x0
x0
0
1
2
3
x1
0
y0
x2
y1
y1y0
y1y0
x0
x1
x2
x3
e
(Enable)
x3
y0
(e) 4-to-1 mux design
0
1
2
3
(c) Demultiplexer, or
decoder with enable
A programmable combinational part can do the job of many gates or gate networks.
To avoid having to use large number of small-scale integrated circuits for implementing Boolean
function of several variables.
Programmed by cutting existing connections (fuses) or establishing new connections (antifuses)
Programmable ROM (PROM)
Programmable connections and their use in a PROM are shown below
w
Inputs
Decoder
.
.
.
...
Outputs
(a) Program mable
OR gates
Programmable array logic (PAL): when OR array has fixed connections but the inputs to
AND gates can be programmed.
Programmable logic array (PLA): when both AND and OR arrays are programmed.
Programmable combinational logic: general structure and two classes known as PAL and
PLA devices. Not shown is PROM with fixed AND array (a decoder) and programmable
OR array is shown below
4
CMageshKumar_AP_AIHT
CS2071_Computer Architecture
x0
x1
x2
x3
Inputs
8-input
ANDs
...
AND
array
(AND
plane)
6-input
ANDs
OR
array
(OR
plane)
.
.
.
...
4-input
ORs
Outputs
(a) General programmable
combinational logic
x0
TG
z
N
x1
TG
TG
S
C
(a) SR latch
D
D
C
Q
Q
(b) D latch
D
C
Q
Q
FF
C
FF
C
CMageshKumar_AP_AIHT
CS2071_Computer Architecture
Sequential Machine Implementation (Hardware realization of Moore and Mealy sequential machines)
Only for Mealy machine
Inputs
Next-state
logic
/
m
/
l
Next-state
excitation
signals
Present
state
Outputs
Output
logic
State register
/
n
Output
e
FF2
C
FF1
C
FF0
C
Write
data
2 h k -bit registers
/
Write
/
address h
FF
C
Write
enable
FF
C
Decoder
Muxes
Write enable
Read
data 0
k
D
FF
FF
C
Write
data
Write
addr
Read
addr 0
Read
data 1
Read
data 0 k/
Read
data 1 k/
Read
addr 1
Read enable
Push
Read address 0
Read address 1
Read
enable
Input
Empty
Full
Output /
k
Pop
(NOTE:
CMageshKumar_AP_AIHT
CS2071_Computer Architecture
MEMORY HIERARCHY
CMageshKumar_AP_AIHT
CS2071_Computer Architecture
The data which is held in the registers is under thedirect control of the compiler or of the assembler
programmer.
CMageshKumar_AP_AIHT
CS2071_Computer Architecture
3.
4.
(With addition to the below notes & pictures, also refer page number 175 to 191 in Xerox)
SRAM:
Basically large array of storage cells that are accessed like registers
SRAM memory cell requires 4-6 transistors / bit
SRAM holds the stored data as long as it is powered on.
These storage cells are edge triggered D-flip flops
Limitations of flip flops:
o Adds complexity to cells
o Only fewer cells can be mounted on chip.
So, Latches are used instead of flip-flops but it will take more time write/read
Memory Structure and SRAM(page no. 317 in B.Parhami)
Conceptual inner structure of a 2hg SRAM chip and its shorthand representation is shown below
Output enable
Chip select
Storage
cells
Write enable
Data in
Address
FF
Data out
0
D
FF
Address
decoder
1
.
.
.
FF
C
2h 1
WE
D in
D out
Addr
CS
OE
Write enable
Data in/out
Address
Data in
Data out
CMageshKumar_AP_AIHT
CS2071_Computer Architecture
Multiple-Chip SRAM
Eight 128K 8 SRAM chips forming a 256K 32 memory unit is shown below
Data
in
32
Address
/
18
/
17
WE
D in
D out
Addr
CS
OE
WE
D in
D out
Addr
CS
OE
WE
D in
D out
Addr
CS
OE
WE
D in
D out
Addr
CS
OE
WE
D in
D out
Addr
CS
OE
WE
D in
D out
Addr
CS
OE
WE
D in
D out
Addr
CS
OE
WE
D in
D out
Addr
CS
OE
MSB
Data out,
byte 3
Data out,
byte 2
Data out,
byte 1
Data out,
byte 0
Stores data as electric charge on tiny capacitor, that is accessed by MOS transistor
When word line is asserted (declared),
o to write:
low voltage on bit line causes capacitor to discharged. i.e., bit = 0
high voltage on bit line causes capacitor to charged. i.e., bit = 1
o to read
read operation takes in 2 steps:
step1:row is accessed
step2: column selection
bit line is prefetched first to halfway voltage and sensed by sense amplifier.
Reading operation destroys the content, so a write operation is enabled after reading. This is
also called destructive readout
Single-transistor DRAM cell, which is considerably simpler than SRAM cell, leads to dense, high-capacity
DRAM memory chips.
Vcc
Word line
Word line
Pass
transistor
Capacitor
Bit
line
Compl.
bit
line
Bit
line
10
CMageshKumar_AP_AIHT
CS2071_Computer Architecture
Voltage
for 1
1 Written
Refreshed
Refreshed
Refreshed
Threshold
voltage
0 Stored
Voltage
for 0
10s of ms
before needing
refresh cycle
Time
DRAM Packaging:
24-pin dual in-line package (DIP) : Typical DRAM package housing a 16M 4 memory
Vss D4 D3 CAS OE A9 A8 A7 A6 A5 A4 Vss
Legend:
24
23
22
21
20
19
18 17
16
15
14
13
10
11
12
Ai
CAS
Dj
NC
OE
RAS
WE
Address bit i
Column address strobe
Data bit j
No connection
Output enable
Row address strobe
Write enable
5.
Relative performance
10 6
Processor
10 3
Memory
1
1980
1990
2000
2010
Calendar year
Bridging the CPU-Memory Speed Gap
Two ways of using a wide-access memory to bridge the speed gap between the processor and memory.
Idea: Retrieve more data from memory with each access
Wideaccess
memory
.
.
.
.
.
.
Mux
Narrow bus
to
processor
Wideaccess
memory
.
.
.
Wide bus
to
processor
.
.
.
Mux
11
CMageshKumar_AP_AIHT
CS2071_Computer Architecture
Memory latency may involve other supporting operationsbesides the physical access itself
o Virtual-to-physical address translation
o Tag comparison to determine cache hit/miss
Pipelined cache memory is shown below
Address
translation
Row
decoding
& read out
Column
decoding
& selection
Tag
comparison
& validation
Memory Interleaving:
o Interleaved memory is more flexible than wide-access memory in that it can handle multiple
independent accesses at once.
Module accessed
Addresses that
are 0 mod 4
0
1
Address
Data
in
Dispatch
(based on
2 LSBs of
address)
Addresses that
are 1 mod 4
Return
data
Data
out
3
0
Addresses that
are 2 mod 4
1
Bus cycle
Addresses that
are 3 mod 4
Memory cycle
2
3
Time
III. TYPES OF MEMORY (Refer page no. 171 to 175, 193 to 196, 200 to 209 in Xerox)
1. TYPES
2. Static RAM
3. Dynamic RAM
4. Other types
IV. CACHE MEMORY ORGANIZATION
1. CACHE MEMORY & NEED FOR CACHE:
A CPU cache is a cache used by the central processing unit of a computer to reduce the average time to access
memory. The cache is a smaller, faster memory which stores copies of the data from frequently used main
memory locations.
As long as most memory accesses are cached memory locations, the average latency of memory accesses will
be closer to the cache latency than to the latency of main memory.
A cache memory is a small, very fast memory that retains copies of recently used information frommain
memory. It operates transparently to theprogrammer, automatically deciding which valuesto keep and which to
overwrite.
The processor operates at its high clock rate only when the memory items it requires are held in the
cache.The overall system performance depends strongly on the proportion of the memory accesses which can
be satisfied by the cache
12
CMageshKumar_AP_AIHT
CS2071_Computer Architecture
Cache space (~KBytes) is much smaller than mainmemory (~MBytes) Items have to be placed in the
cache so that theyare available there when (and possibly only when)they are needed.
When memory size increases cost, speed, memory access time altogether decreases.
Processor speed memory speed
There is a huge gap between processor speed and memory speed when compared to the improvement and
processor performance
Cache memories act as intermediaries between the superfast processor and the much slower main
memory.
Multiple caches
An access to an item which is in the cache or finding required data in cache:cache hit
An access to an item which is not in the cacheor not finding the required data in cache : cachemiss.
The proportion of all memory accesses that aresatisfied by the cache or fraction of data accesses that can
be satisfied from cache as opposed to slower memory (main memory) : hit rate
The proportion of all memory accesses that are notsatisfied by the cache: miss rate
The miss rate of a well-designed cache: few %
Cfast cache memory access cycle
Cslow slower memory (main memory) access cycle
Ceff effective memory cycle time
One level of cache with hit rate h
Ceff =Cfast creates an illusion that entire memory space consist of fast memory (cache memory)
13
CMageshKumar_AP_AIHT
CS2071_Computer Architecture
Compulsory misses:also called as cold-start miss. Occurs at first access to any cache line. With on-demand
fetching, first access to any item is a miss. Some compulsory misses can be avoided by prefetching.
Capacity misses:Since cache capacity is limited, after accessing all cache block it should be overwritten with
next set of instruction. We have to oust (throw out) some items to make room for others. This leads to misses
that are not incurred with an infinitely large cache.
Conflict misses:Also called as collision miss, occurs when useless data are placed in cache that forces to
overwrite useful data to bring new required data block.Occasionally, there is free room, or space occupied by
useless data, but the mapping/placement scheme forces us to displace useful items to bring in other items.
This may lead to misses in future.
DESIGN PARAMETERS:
Cache size: in bytes or words, a larger cache can hold more of the programs useful data but is more costly
and likely to be slower.
Block size or cache line width: unit of data transfer between cache and main memory. With a larger cache
line, more data is brought in cache with each miss. This can improve the hit rate but also may bring lowutility data in.
Placement policy: To determine where an incoming cache line can be stored (where to store memory (data)
coming from main memory). More flexible policies imply higher hardware cost and may or may not have
performance benefits (due to more complex data location).
Replacement policy: To determine which block (cache) can be overwritten.Determining which of several
existing cache blocks (into which a new cache line can be mapped) should be overwritten. Typical policies:
choosing a random or the least recently used block.
Replacement in 2 ways:
1. choosing random block
2. choosing least recently used block.
Write policy: To determine Determining if updates to cache words are immediately forwarded to main
(write-through) or modified blocks are copied back to main if and when they must be replaced (write-back or
copy-back).
o When to forward / update main memory or the cache word are updated (memory write)
o Modified cache blocks are copied entirely replacing main memory
o When to transfer updated main memory to cache (copy back or write back policy)
14
CMageshKumar_AP_AIHT
CS2071_Computer Architecture
Temporal locality (locality in time): If an item isreferenced, it will tend to be referenced again
soon.Instruction or data once accessed & it will take more time to do a second access and further accesses.
Spacial locality (locality in space): If an item isreferenced, items whose addresses are close bywill tend to be
referenced soon.Consecutive access of nearby memory location frequently.
15
CMageshKumar_AP_AIHT
CS2071_Computer Architecture
Associative Mapping
Advantages:
Associative mapping provides the
highest flexibility concerning the line to
be replaced when a newblock is read
into the cache.
Disadvantages:
Complex
The tag field is long
Fast access can be achieved only using
highperformance associative memories
for the cache,which is difficult and
expensive.
CMageshKumar_AP_AIHT
CS2071_Computer Architecture
7. CACHE COHERENCY
(Refer page number 228 - 229 in xerox)
(Refer page no. 512-514 in text book B.Parhami)
VI. SECONDARY STORAGE(MASS MEMORY CONCEPTS)
(Refer page number 200 218 in xerox)
(Refer page no. 353 365 in text book B.Parhami)
1. Disk Memory Basics
2. Organizing Data on Disk
3.
4.
5.
6.
Disk Performance
Disk Caching
Disk Arrays and RAID (Refer page number 209 in xerox)
Other Types of Mass Memory
17
CMageshKumar_AP_AIHT
CS2071_Computer Architecture
Page Table
The page table has one entry for each page of thevirtual memory space.
Each entry of the page table holds the address ofthe memory frame which stores the respectivepage, if that page is
in main memory.
Each entry of the page table also includes somecontrol bits which describe the status of the page:
- whether the page is actually loaded into mainmemory or not;
- if since the last loading the page has beenmodified;
- information concerning the frequency ofaccess, etc.
Problems:
- The page table is very large (number of pagesin virtual memory space is very large).
- Access to the page table has to be very fast the page table has to be stored in very fastmemory, on chip.
A special cache is used for page table entries,called translation lookaside buffer (TLB); it works inthe same way
as an ordinary memory cache andcontains those page table entries which have beenmost recently used.
The page table is often too large to be stored inmain memory. Virtual memory techniques are used to store the
page table itself only part of thepage table is stored in main memory at a givenmoment.
The page table itself is distributed along thememory hierarchy:
- TLB (cache)
- main memory
- disk
Memory Reference with Virtual Memory and TLB
Memory access is solved by hardware except thepage fault sequence which is executed by the OSsoftware.
The hardware unit which is responsible fortranslation of a virtual address into a physical one isthe Memory
Management Unit (MMU).
18
CMageshKumar_AP_AIHT
CS2071_Computer Architecture