csc203 2011 Complete PDF

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/282845738
Computer System Architecture Lecturer Notes
Research · October 2015
DOI: 10.13140/RG.2.1.2592.8407
CITATIONS READS
0 35,147
1 author:
Budditha Hettige
General Sir John Kotelawala Defence University
46 PUBLICATIONS 120 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
EnSiMaS View project
masmt2.0 View project
All content following this page was uploaded by Budditha Hettige on 14 October 2015.
The user has requested enhancement of the downloaded file.
CSC 203 1.5
Computer System Architecture
By
Budditha Hettige
Department of Statistics and Computer Science
University of Sri Jayewardenepura
(2011) Computer System architectures 1

Course Outline
Course Type Core
Credit Value 1.5
Duration 22 lecture hours
Pre-requisites CSC 106 2.0
Course contents
• Introduction and Historical Developments
– About Historical System development
– Processor families
• Computer Architecture and Organization
– Instruction Set Architecture (ISA)
– Microarchitecture
– System architecture
– Processor architecture
– Processor structures
• Interfacing and I/O Strategies
– I/O fundamentals, Interrupt mechanisms, Buses
Course contents
• Memory Architecture
– Primary memory, Cache memory, Secondary memory
• Functional Organization
– Instruction pipelining
– Instruction level parallelism (ILP),
– Superscalar architectures
– Processor and system performance
• Multiprocessing
– Amdahl’s law
– Short vector processing
– Multi-core
– multithreaded processors
Introduction

What is Computer?
• Is a machine that can solve problems
for people by carrying out instructions
given to it
• The sequence of instructions is call
Program
• The language machine can understand
is call machine language

What is Machine Language?
• Machine language(ML) is a system of instructions and data
executed directly by a computer's Central Processing Unit
• The codes are strings of 0s and 1s, or binary digits (“bits”)
• Instructions typically use some bits to represent
– Operations (addition )
– Operands or
– Location of the next instruction.

Machine Language contd..
• Advantages
– Machine can directly access (Electronic
circuit)
– High Speed
• Disadvantages
– Human cannot identify
– Machine depended
(Hardware depended)

More on Machines
• Machine defines a language
– Set of instructions carried out by the machine
• Language defines by the machine
– Machine executing all the program, writing in the
language
Language Machine Language

Two Layer (Level) Machine
• This machine
Virtual Machine (L1)
contains only New
Language (L1) and Translate/
Interpreter
the Machine
language (LO) Machine Language (L0)
Virtual Machine
Machine Language Machine
(L1) (L0)

Translation (L1 → L0)
1. Replace each instruction written in L1 in to
LO
2. Program now execute new Program
3. Program is called compiler/ translator

Interpretation
• Each instruction in L1 can execute
through the relevant L0 instructions
directly
• Program is call interpreter

Multi Level Machine
High-level Language Program (C, C++)
Assembly Language Program
Machine Language

Multilevel Machine
Virtual Machine Ln
Virtual Machine Ln-1
.
.
.
Machine Language L0

Six-Level Machine
• Computer that is designed up to the 6th level of
computer architecture

Digital Logic Level
• The interesting objects at this level are gates;
• Each gate has one or more digital inputs (0 or
1)
• Each gate is built of at most a handful of
transistors
• A small number of gates can be combined to
form a 1-bit memory, which can store a 0 or 1;
• The 1-bit memories can be combined in
groups of, for example, 16, 32 or 64 to form
registers
• Each register can hold a single binary number
up to some maximum;
• Gates can also be combined to form the main
computing engine itself.

Microarchitecture level
• A collection of 8-32 registers that form a
local memory and a circuit called an ALU
(Arithmetic Logic Unit) that is capable of
performing simple arithmetic operations;
• The registers are connected to the ALU to
form a data path over which the data
flow;
• The basic operation of the data path
consists of selecting one or two registers
having the ALU operate on them;
• On some machines the operation of the
data path is controlled by a program called
a microprogram, on other machine it is
controlled by hardware.

Data Path

Instruction Set Architecture Level
• The ISA level is defined by the
machine’s instruction set
• This is a set of instructions carried
out interpretively by the
microprogram or
hardware execution sets

Operating System Level
• Uses different memory organization, a new
set of instructions, the ability to run one or
more programs concurrently
• Those level 3 instructions identical to level
2’s are carried out directly by the
microprogram (or hardwired control), not by
the OS;
• In other words, some of the level 3
instructions are interpreted by the OS and
some of the level 3 instructions are
interpreted directly by the microprogram;
• This level is hybrid

Assembly Language Level
• This level is really a symbolic form for the one
of the underlying languages;
• This level provides a method for people to write
programs for levels 1, 2 and 3 in a form that is
not as unpleasant as the virtual machine
languages themselves;
• Programs in assembly language are first
translated to level 1, 2 or 3 language and then
interpreted by the appropriate virtual or actual
machine;
• The program that performs the translation is
called an assembler.

Between Levels 3 and 4
• The lower 3 levels are not for the
average programmer – Instead
they are primarily for running the
interpreters and translators
needed to support the higher
levels;
• These are written by system
programmers who specialise in
developing new virtual machines;
• Levels 4 and above are intended
for the applications programmer
• Levels 2 and 3 are always
interpreted, Levels 4 and above
are usually, but not always,
supported by translation;
Problem-oriented Language Level
• This level usually consists of
languages designed to be used by
applications programmers;
• These languages are generally
called higher level languages
• Some examples: Java, C, BASIC,
LISP, Prolog;
• Programs written in these
languages are generally translated
to Level 3 or 4 by translators
known as compilers, although
occasionally they are interpreted
instead;

Multilevel Machines: Hardware
• Programs written in a computer’s true machine language (level
1) can be directly executed by the computer’s electronic circuits
(level 0), without any intervening interpreters or translators.
• These electronic circuits, along with the memory and
input/output devices, form the computer’s hardware.
• Hardware consists of tangible objects:
– integrated circuits
– printed circuit boards
– Cables
– power supplies
– Memories
– Printers
• Hardware is not abstract ideas, algorithms, or instructions.

Multi level machine Software
• Software consists of algorithms (detailed instructions
telling how to do something) and their computer
representations-namely, programs
• Programs can be stored on hard disk, floppy disk, CD-
ROM, or other media but the essence of software is the
set of instructions that makes up the programs, not the
physical media on which they are recorded.
• In the very first computers, the boundary between
hardware and software was crystal clear.
• Over time, however, it has blurred considerably, primarily
due to the addition, removal, and merging of levels as
computers have evolved.
• Hardware and software are logically equivalent

The Hardware/Software Boundary
• Any operation performed by software can also

be built directly into the hardware;
• Also, any instruction executed by the hardware
can also be simulated in software;
• The decision to put certain functions in
hardware and others in software is based on
such factors as:
– Cost
– Speed
– Reliability and
– Frequency of expected changes

Exercises
1. Explain each of the following terms in your own
words
– Machine Language
– Instruction
2. What are the differences between Interpretation
and translation?
3. What are Multilevel Machines?
4. What are the differences between two-level
machine and the six-level machine

Historical Developments

Computer Generation
1. Zeroth generation- Mechanical Computers (1642-1940)
2. First generation - Vacuum Tubes (1940-1955)
3. Second Generation -Transistors (1956-1963)
4. Third Generation - Integrated Circuits (1964-1971)
5. Forth Generation – VLS-Integration (1971-present)
6. Fifth Generation – Artificial Intelligence (Present and
Beyond)

The Zero Generation (1)
Year Name Made by Comments
Analytical
1834 Babbage First attempt to build a digital computer
Engine
1936 Z1 Zuse First working relay calculating machine
1943 COLOSSUS British gov't First electronic computer
1944 Mark I Aiken First American general-purpose computer
1946 ENIAC I EckerVMauchley Modern computer history starts here
1949 EDSAC Wilkes First stored-program computer
1951 Whirlwind I M.I.T. First real-time computer
1952 IAS Von Neumann Most current machines use this design
1960 PDP-1 DEC First minicomputer (50 sold)
1961 1401 IBM Enormously popular small business machine

Dominated scientific computing in the early
1962 7094 IBM
1960s
1963 B5000 Burroughs First machine designed for a high-level language
1964 360 IBM First product line designed as a family
1964 6600 CDC First scientific supercomputer
1965 PDP-8 DEC First mass-market minicomputer (50,000 sold)
1970 PDP-11 DEC Dominated minicomputers in the 1970s
1974 8080 Intel First general-purpose 8-bit computer on a chip
1974 CRAY-1 Cray First vector supercomputer
1978 VAX DEC First 32-bit superminicomputer
1981 IBM PC IBM Started the modern personal computer era
1985 MIPS MIPS First commercial RISC machine
1987 SPARC Sun First SPARC-based RISC workstation
1990 RS6000 IBM First superscalar machine

• Pascal’s machine
– Addition and Subtraction
• Analytical engine
– Four components (Store, mill, input,
output)

Charles Babbage
• Difference Engine 1823
• Analytic Engine 1833

– The forerunner of modern digital computer
– The first conception of a general purpose
computer

Von-Neumann machine

First Generation-Vacuum Tubes
(1945-1955)
• First generation computers are
characterized by the use of vacuum
tube logic
• Developments
– ABC
– ENIAC
– UNIVAC I

Brief Early Computer Timeline
First Generation- Time Line

Date Event Description Arithmetic Logic Memory
1942 ABC Atanasoff-Berry Computer binary vacuum tubes capacitors
Electronic Numerical
1946 ENIAC decimal vacuum tubes vacuum tubes
Integrator And Computer
Electronic Discrete Variable

1947 EDVAC binary vacuum tubes mercury delay lines
Automatic Computer
Manchester Small Scale

1948 The Baby binary vacuum tubes CRST
Experimental Machine
Universal Automatic
1949 UNIVAC I decimal vacuum tubes mercury delay lines
Computer
Electronic Delay Storage
1949 EDSAC binary vacuum tubes mercury delay lines
Automatic Computer
1952 IAS Institute for Advanced Study binary vacuum tubes cathode ray tubes
1953 IBM 701 binary vacuum tubes mercury delay lines

ABC - Atanasoff-Berry Computer
• world's first electronic digital computer

• The ABC used binary arithmetic

ENIAC – First general purpose
computer
• Electronic Numerical Integrator And Computer
• Designed and built by Eckert and Mauchly at the University of
Pennsylvania during 1943-45
• capable of being reprogrammed to solve a full range of computing
problems
• The first, completely electronic, operational, general-purpose analytical
calculator!
– 30 tons, 72 square meters, 200KW
• Performance
– Read in 120 cards per minute
– Addition took 200 µs, Division 6 ms

UNIVAC - UNIVersal Automatic
Computer
• The first commercial computer
• UNIVAC was delivered in 1951
• designed at the outset for business and
administrative use
• The UNIVAC I had 5200 vacuum tubes, weighed
29,000 pounds, and consumed 125 kilowatts of
electrical power
• Originally priced at US$159,000

The Second Generation-
Transistors (1955-1965)
• Second generation computers are
characterized by the use of discrete
transistor logic
• Use of magnetic core for primary storage
• Developments
– IBM 1620 System
– IBM 7030 System
– IBM 7090 System
– IBM 7094 System

IBM 7090
• The IBM 7090 system was announced in 1958.
• The 7090 included a multiplexor which supported up to 8
I/O channels.
• The 7090 supported both fixed point and floating point
arithmetic.
• Two fixed point numbers could be added in 4.8
microseconds, and two floating point numbers could be
added in 16.8 microseconds.
• The 7090 had 32,768 thirty-six bit words of core storage.
• In 1960, the American Airlines
• SABRE system used two 7090 systems.
• Cost of a 7090 system was in the
$3,000,000 range.

IBM 1620
• The IBM 1620 system was announced in 1959.
• The IBM 1620 system had up to 60,000 digits of core
storage (6 bits each.)
• Floating point hardware was optional.
• The IBM 1620 system performed decimal arithmetic.
• The system was digit oriented, not word oriented.

IBM 7030
• The IBM 7030 system was
announced in 1960.
• The IBM 7030 system used
magnetic core for main memory,
and magnetic disks for
secondary storage.
• The ALU could perform
1,000,000 operations per
second.
• Up to 32 I/O channels were
supported.
• The 7030 was also referred to
as "Stretch."
• Cost of a 7030 system was in
the $10,000,000 range.

IBM 7094
• The IBM 7094 system was announced in
1962.
• The 7094 was an improved 7090.
• The 7094 introduced double precision
floating point arithmetic.

Third Generation
• Third generation computers are
characterized by the use of integrated
circuit logic.
• Development
– IBM System/360

IBM S 360
• The IBM S/360 family was announced in 1964.
• Included both multiplexor and selector I/O
channels.
• Supported both fixed point and floating point
arithmetic.
• Had a microprogrammed instruction set.
• Cost between $133,000 and $12,500,000.

Forth Generation
• Very Large Scale(VLSI) and Ultra Large
scale(ULSI)
• Fourth generation computers are
characterized by the use of
microprocessors.
• Semiconductor memory was commonly
used
• Development
– Intel
– AMD etc

Intel 4004
• The Intel 4004 microprocessor was announced in
1971.
• The Intel 4004 microprocessor had
– 2,300 transistors.
– A clock speed of 108 KHz.
– A die size of 12 sq mm.
– 4 bit memory access.
– 4 bit registers.
• The Intel 4004 microprocessor supported
– Up to 32,768 bits of program storage.
– Up to 5,120 bits of data storage.
• The 4004 was used mainly in calculators.

Intel 4004 - 1971

MOS 6502
• The MOS 6502 microprocessor was announced in 1975.
• The MOS 6502 microprocessor had
– A clock speed of 1 MHz.
– 8 bit memory access.
– 8 bit registers.
• The MOS 6502 microprocessor supported
– Up to 65,536 bytes (8 bit) of main memory.
• The MOS 6502 was used in
– The Apple II personal computer.
– The Comodore PET personal computer.
– The KIM-1 computer kit.
– The Atari 2600 game system.
– The Nintendo Famicon game system.
• Initial price of the 6502 was $25.00.

Intel Pentium IV - 2001
• “State of the art”
• 42 million
transistors
• 2GHz
• 0.13µm process
• Could fit ~15,000

4004s on this chip!

Now
- zEnterprise196 Microprocessor
• 1.4 billion transistors, Quad core design
• Up to 96 cores (80 visible to OS) in one multichip module
• 5.2 GHz, IBM 45nm SOI CMOS technology
• 64-bit virtual addressing
– original 360 was 24-bit; 370 was a 31-bit extension
• Superscalar, out-of-order
– Up to 72 instructions in flight
• Variable length instruction pipeline: 15-17 stages
• Each core has 2 integer units, 2 load-store units and 2 floating point
units
• 8K-entry Branch Target Buffer
– Very large buffer to support commercial workload
• Four Levels of caches:
– 64KB L1 I-cache, 128KB L1 D-cache
– 1.5MB L2 cache per core
– 24MB shared on-chip L3 cache
– 192MB shared off-chip L4 cache

Fifth Generation
• Computing devices, based on artificial
intelligence
• Features
– Voice recognition,
– Parallel processing
– Quantum computation and molecular and
nanotechnology will radically change the face
of computers in years to come.
– The goal of fifth-generation computing is to
develop devices that respond to natural
language input and are capable of learning
and self-organization

Computer Architecture
2011 Computer System Architecture 54

What is Computer Architecture?
• Set of data types, Operations, and

features are call its architecture
• It deals with those aspects that are
visible to user of that level
• Study of how to design those parts a
computer is called Computer
Architecture

Why Computer Architecture
• Maximum overall performance of system
keeping within cost constraints
• Bridge performance gap between slowest
and fastest component in a computer
• Architecture design
– Search the space of possible design
– Evaluate the performance of design choose
– Identify bottlenecks, redesign and repeat
process

Computer Organization
• The Simple Computer concise with
– CPU
– I/O Devices
– Memory
– BUS (Connection method)

Simple Computer

CPU – Central Processing Unit
• Is the “Brain”
• It Execute the program and stored in
the main memory
• Composes with several parts
– Control Unit
– Arithmetic and Logic Units
– Registers

Registers
• High-speed memory
• Top of the memory hierarchy, and
provide the fastest way to access data
• Store temporary results
• Some useful registers
– PC – Program counters
• Point to the next instructions
– IR - Instruction Register
• Hold instruction currently being execute

Registers more…
• Types
– User-accessible Registers
– Data registers
– Address registers
– General purpose registers
– Special purpose registers
– Etc.

Instruction
• Types
– Data handling and Memory operations
• Set, Move, Read, Write
– Arithmetic and Logic
• Add, subtract, multiply, or divide
• Compare
– Control flow
• Complex instructions
– Take many instructions on other computers
• saving many registers on the stack at once
• moving large blocks of memory

Parts of an instruction
• Opcode
– Specifies the operation to be performed
• Operands
– Register values,
– Values in the stack,
– Other memory values,
– I/O ports

Type of the operation
• Register-Register Operation
– Add, subtract, compare, and logical
operations
• Memory Reference
– All loads from memory
• Multi Cycle Instructions
– Integer multiply and divide and all floating-
point operations

Fetch-Decode execute circle
• Instruction fetch
– 32-bit instruction was fetched from the cache
• Decode
• Execute
• Memory Access
• Write back

Fetch-Decode execute circle

MIcroprocessors
• Processors can be identify by two main
parameters
– Speed (MHz/ GHz)
– Processor with
• Data bus
• Address bus
• Internal registers

Data bus
• Known as Front side bus, CPU bus and
Processor side bus
• Use between CPU and main chipset
• Define a size of memory
– 32 bit
– 64 bit etc.

Data bus

The division of I/O buses is according to data transfer rate. Specifically,
I/O Ports with data transfer rates
Typical Data
Controller Port / Device
Transfer Rate
PS/2 (keyboard / mouse) 2 KB/s
Serial Port 25 KB/s
Super I/O
Floppy Disk 125 KB/s
Parallel Port 200 KB/s
Integrated Audio 1 MB/s
Integrated LAN 12 MB/s
USB 60 MB/s
Southbridge Integrated Video 133 MB/s
IDE (HDD, DVD) 133 MB/s
SATA (HDD, DVD) 300 MB/s

Address Bus
• Carries addressing information
• Each wire carries a single bit
• Width indicates maximum amount of RAM the
processor can handle
• Data bus and address bus are independent

How CPU works?
• A Simple CPU
– 4 Bit Address bus
– Registers A, B and C (4 Bit)
– 8 Bit Program ( 4 BIT Instruction, 4 BIT
Data)

How CPU works? Instruction SET
0000 Sleep
0001 LOAD M → A
0010 LOAD M → B
A B 0101 SET A → M
0110 SET B → M
1000 ADD A + B → C
1111 MOVE
IP 1001 RESET
ALU
IC
Register C
C Instruction Counter

Instruction SET
How CPU works? 0000 Sleep
0001 LOAD M → A
0010 LOAD M → B
0101 SET A → M
A B
0110 SET B → M
0111 SET C → M
1000 ADD A + B → C
C
0000
1 0 0 0 0 0 0 0 0
2 0 0 0 1 0 0 1 0
IC 3 0 0 1 0 0 1 0 1
01 4 1 0 0 0 0 0 0 0
C 5 0 1 1 1 0 0 0 0
6

Instruction SET
0001 LOAD M → A
0010 LOAD M → B
0101 SET A → M
A B
0110 SET B → M
0111 SET C → M
1000 ADD A + B → C
C
0001
1 0 0 0 0 0 0 0 0
2 0 0 0 1 000
0 1 10 0
IC 3 0 0 1 0 0 1 0 1
02 4 1 0 0 0 0 0 0 0
C 5 0 1 1 1 0 0 0 0
6

Instruction SET
0001 LOAD M → A
0010 LOAD M → B
0101 SET A → M
A B
0010 0110 SET B → M
0111 SET C → M
1000 ADD A + B → C
C
0010
1 0 0 0 0 0 0 0 0
2 0 0 0 1 0 0 1 0
IC 3 0 0 1 0 0 0 11 0 01 1
03 4 1 0 0 0 0 0 0 0
C 5 0 1 1 1 0 0 0 0
6

Instruction SET
0001 LOAD M → A
0010 LOAD M → B
0101 SET A → M
A B
0010 0101 0110 SET B → M
0111 SET C → M
1000 ADD A + B → C
C
1000
1 0 0 0 0 0 0 0 0
2 0 0 0 1 0 0 1 0
IC 3 0 0 1 0 0 1 0 1
04 4 1 0 0 0 0 0 0 0
C
5 0 1 1 1 0 0 0 0
0111 6 1 0 0 1 0 0 0 0
7 1 1 1 1 0 0 0 1
8

Instruction SET
1111 MOVE
1001 RESET
0101 SET A → M
A B
0010 0101 0110 SET B → M
0111 SET C → M
1000 ADD A + B → C
C
0111
1 0 0 0 0 0 0 0 0
2 0 0 0 1 0 0 1 0
IC 3 0 0 1 0 0 1 0 1
05 4 1 0 0 0 0 0 0 0
C
5 0 1 1 1 1 1 1 1
0111 6 1 0 0 1 0 0 0 0
0 1 1 1 7 1 1 1 1 0 0 0 1
8

Instruction SET
1111 MOVE
1001 RESET
0101 SET A → M
A B
0010 0101 0110 SET B → M
0111 SET C → M
1000 ADD A + B → C
C
1001
1 0 0 0 0 0 0 0 0
2 0 0 0 1 0 0 1 0
IC 3 0 0 1 0 0 1 0 1
06 4 1 0 0 0 0 0 0 0
C
5 0 1 1 1 1 1 1 1
0111 6 1 0 0 1 0 0 0 0
7 1 1 1 1 0 0 0 1
8 0 1 1 1

Instruction SET
1111 MOVE
1001 RESET
0101 SET A → M
A B
0000 0000 0110 SET B → M
0111 SET C → M
1000 ADD A + B → C
C
0000
1 0 0 0 0 0 0 0 0
2 0 0 0 1 0 0 1 0
IC 3 0 0 1 0 0 1 0 1
06 4 1 0 0 0 0 0 0 0
C
5 0 1 1 1 1 1 1 1
0000 6 1 0 0 1 0 0 0 0
7 1 1 1 1 0 0 0 1
8

Instruction SET
1111 MOVE
1001 RESET
0101 SET A → M
A B
0000 0000 0110 SET B → M
0111 SET C → M
1000 ADD A + B → C
C
1111
1 0 0 0 0 0 0 0 0
2 0 0 0 1 0 0 1 0
IC 3 0 0 1 0 0 1 0 1
07 4 1 0 0 0 0 0 0 0
C
5 0 1 1 1 1 1 1 1
0000 6 1 0 0 1 0 0 0 0
7 1 1 1 1 0 0 0 1
8

Instruction SET
1111 MOVE
1001 RESET
0101 SET A → M
A B
0000 0000 0110 SET B → M
0111 SET C → M
1000 ADD A + B → C
C
0000
1 0 0 0 0 0 0 0 0
2 0 0 0 1 0 0 1 0
IC 3 0 0 1 0 0 1 0 1
01 4 1 0 0 0 0 0 0 0
C
5 0 1 1 1 1 1 1 1
0000 6 1 0 0 1 0 0 0 0
7 1 1 1 1 0 0 0 1
8

How BUS System works
CPU
Device A Device B Device C
DATA BUS
ADDRESS BUS
CONTROL BUS

DATA BUS
ADDRESS BUS
CONTROL BUS

ADDRESS BUS 4 BIT

DATA BUS 4 BIT
CONTROL BUS 2 BIT
CPU
Device A Device B Device C
DATA BUS
ADDRESS BUS
CONTROL BUS

CONTROL 2 BIT CONTROL 2 BIT CONTROL 2 BIT
01 – READ, 01 – READ, 01 – READ,
10 – Write 10 – Write 10 – Write
ADDRESS 0100 ADDRESS 0010 ADDRESS 0001
DATA BUS
ADDRESS BUS
CONTROL BUS

01 – READ, 01 – READ, 01 – READ,
DATA BUS
ADDRESS BUS
CONTROL BUS
0000 0000 00
01 – READ, 01 – READ, 01 – READ,
DATA BUS
ADDRESS BUS
CONTROL BUS
0000 0100 00
01 – READ, 01 – READ, 01 – READ,
DATA BUS
ADDRESS BUS
CONTROL BUS
1 01 0 0100 10
01 – READ, 01 – READ, 01 – READ,
DATA BUS
ADDRESS BUS
CONTROL BUS
1 01 0 0010 00
01 – READ, 01 – READ, 01 – READ,
DATA BUS
ADDRESS BUS
CONTROL BUS
1 01 0 0010 01
Intel
Microprocessor History

• Intel 4004 (1971)
– 0.1 MHz
– 4 bit
– World first Single chip microprocessor
– Instruction set contained 46 instructions
– Register set contained 16 registers of 4 bits each

• Intel 8008 (1972)
– Max. CPU clock rate 0.5 MHz to 0.8 MHz
– 8-bit CPU with an external 14-bit address bus
– could address 16KB of memory
– had 3,500 transistors

• Intel 8080 (1974)
– second 8-bit microprocessor
– Max. CPU clock rate 2 MHz
– Large 40-pin DIP packaging
– 16-bit address bus and an 8-bit data bus
– Easy access to 64 kilobytes of memory
– Processor had seven 8-bit registers, (A, B,
C, D, E, H, and L)

• Intel 8086 (1978)
– 16-bit microprocessor
– Max. CPU clock rate 5 MHz to 10 MHz
– 20-bit external address bus gave a 1 MB
physical address
– 16-bit registers including the stack pointer,

• Intel 80286 (1978)
– 16-bit x86 microprocessor
– 134,000 transistors
– Run in two modes
• Protected mode
• Real mode

• Intel 80386 (1985)
– 32-bit Microprocessor
– 275,000 transistors
– 16-bit data bus
– Instruction set
• x86 (IA-32)

• Intel 80486 (1989)
– FSB speeds 16 MHz to 50 MHz
– Instruction set x86 (IA-32)
– An 8 KB on-chip SRAM cache stores
– 486 has a 32-bit data bus and a 32-bit address bus.
– Power Management Features and System Management
Mode (SMM) became a standard feature

• Intel Pentium I (1993)
– Intel's 5th generation micro architecture
– Operated at 60 MHz
– powered at 5V and generated enough heat to
require a CPU cooling fan
– Level 1 CPU cache from 16 KB to 32 KB
– Contained 4.5 million transistors
– compatible with the common Socket 7
motherboard configuration

• Intel Pentium II (1997)
– Intel's sixth-generation microarchitecture
– 296-pin Staggered Pin Grid Array (SPGA) package (Socket
7)
– speeds from 233 MHz to 450 MHz
– Instruction set IA-32, MMX
– cache size was increased to 512 KB
– better choice for consumer-level operating systems, such as
Windows 9x, and multimedia applications

• Intel Pentium III (1999)
– 400 MHz to 1.4 GHz
– Instruction set IA-32, MMX, SSE
– L1-Cache: 16 + 16 KB (Data + Instructions)
– L2-Cache: 512 KB, external chips on CPU
module at 50% of CPU-speed
– the first x86 CPU to include a unique, retrievable,
identification number

• Intel Pentium IV (2000)
– Max. CPU clock rate 1.3 GHz to 3.8 GHz
– Instruction set x86 (i386), x86-64, MMX, SSE,
SSE2, SSE3
– featured Hyper-Threading Technology (HTT)
– The 64-bit external data bus
– More than 42 million transistors
– Processor (front-side) bus runs at 400MHz,
533MHz, 800MHz, or 1066MHz
– L2 cache can handle up to 4GB RAM
– 2MB of full-speed L3 cache

• Intel Core Duo
– Processing Die Transistors 151 million
– Consists of two cores
– 2 MB L2 cache
– All models support: MMX, SSE, SSE2,
SSE3, EIST, XD bit
– FSB Speed 533 MHz
– Intel® Virtualization Technology (VT-x)
– Execute Disable Bit

• Pentium Dual-Core
– Max. CPU clock rate 1.3 GHz to 2.6 GHz
– based on either the 32-bit Yonah or (with quite
different microarchitectures) 64-bit Merom-2M
– Instruction set MMX, SSE, SSE2, SSE3, SSSE3,
x86-64
– FSB speeds 533 MHz to 800 MHz
– Cores 2

• Intel Core Due
– Clock Speed 1.2 GHz
– L2 Cache 2 MB
– Instruction Set 32-bit
– Advanced Technologies
• Intel® Virtualization Technology (VT-x)
• Enhanced Intel SpeedStep® Technolog
• Execute Disable Bit

• Core 2 due
– Cores 2 , Threads 2
– L2 Cache 6 MB
• Intel® Virtualization Technology for Directed IO (VT-d)
• Intel® Trusted Execution Technology
• Intel® 64
• Idle States
• Enhanced Intel SpeedStep® Technology
• Thermal Monitoring Technologies

• Intel Core 2 Quad
– Cores 4 , Threads 4
– Clock Speed 3. GHz
– L2 Cache 12 MB
• Intel® Virtualization Technology for Directed IO (VT-d)
• Intel® 64
• Idle States
• Enhanced Intel SpeedStep® Technology
• Thermal Monitoring Technologies

• Core i3
– Cores 2
– Threads 4
– Intel® Smart Cache 3 MB
– Instruction Set 64-bit Instruction Set Extensions
SSE4.1,SSE4.2
– Max Memory Size 8 GB
– Technologies
• Intel® Fast Memory Access
• Intel® Flex Memory Access

• Core i5
– Cores 2
– Threads 4
– Clock Speed 1.7 - 3.0 GHz
– Max Memory Size 8 GB
– Technologies
• Intel® Fast Memory Access
• Intel® Flex Memory Access
• Intel® Anti-Theft Technology
• Intel® My WiFi Technology
• 4G WiMAX Wireless Technology
• Idle States
–

Technologies
• Core i7 Intel® Turbo Boost Technology
– Cores 4 2.0Intel® vPro Technology
Intel® Hyper-Threading Technology
– Threads 8 Intel® Virtualization Technology (VT-x)
Intel® Virtualization Technology for
– Clock Speed 3.4 GHz Directed I/O (VT-d)
– Max Turbo Frequency Intel® Trusted Execution Technology
AES New Instructions
3.8 GHz Intel® 64
– Intel® Smart Cache 8 Idle States
MB Enhanced Intel SpeedStep® Technology
Thermal Monitoring Technologies
Intel® Fast Memory Access
Intel® Flex Memory Access
Execute Disable Bit

Summary –
Processor Family Vs Buses

Summary - Intel processors (1)

AMD processors (1)

AMD processors (2)

Microprocessors

Processor Instructions
• Intel 80386 (1985)
– x86 (IA-32)
• Intel 80486 (1989)
– x86 (IA-32)
• Intel Pentium I (1993)
– x86 (IA-32)
• Intel Pentium II (1997)
– IA-32, MMX

Processor Instructions(2)
• Intel Pentium III (1999)
– IA-32, MMX, SSE
• Intel Pentium IV (2000)
– x86 (i386), x86-64, MMX, SSE, SSE2,
SSE3
• Intel Core Duo
– MMX, SSE, SSE2, SSE3, EIST, XD bit
• Pentium Dual-Core
– MMX, SSE, SSE2, SSE3, SSSE3, x86-64

Processor Modes

Processor modes
• Intel and Compatible processors are
run in several modes
– Real Mode
– IA 32 Mode
• Protected Mode
• Virtual Real Mode
– IA 32e 64 bit mode
• 64-bit mode
• Compatibility mode

8086 Real Mode (x86)
• 80286 and later x86-compatible CPUs
• Execute 16 bit instructions
• Address only 1MB Memory
• Single task
• MS-Dos Programs are run in this mode
– Windows 1x, 3x
– 16 bit instructions
• No built in protection to keep one program
overwriting another in memory

IA-32 - Protected Mode
• First implemented in the Intel 80386 as
a 32-bit extension of x86 architecture
• Can run 32-bit instructions
• 32 bit OS and Application are Required
• Programs are protection to keep one
program overwriting another in memory

Virtual Real mode (IA- 32 Mode)
• Backward compatibility (can run 16 bit

apps)
– used to execute DOS programs in
Windows/386, Windows 3.x, Windows 9x/Me
• 16 bit program run on the 32 bit protected
mode
• Address only up to 1 Mb
• All Intel and Intel-supported processors
power up in real mode
IA-32e 64 bit Exaction Mode
• Originally design by AMD , later
adapted by Intel
• Processor can run
– Real mode
– IA 32 mode
– IA 32e mode
• IA -32e 64 bit is run 64 bit OS and 64
bit apps
• Need 64 bit OS and All 64 bit hardware
64-Bit Operating Systems
• Windows XP – 64 bit Edition for Itanium (IA-
64 bit processors)
• Windows XP professional x64( IA 32, Atholen
64)
• 32 bit Application can run without any
probem
• 16 bit and Dos application does not run
• Problem ?
– All 32-bit and 64 bit drivers are required

Physical memory limit

Processors Features

Processors Features
• System Management Mode (SMM)
• MMX Technology
• SSE, SSE2, SSE3, SSE4 etc
• 3DNow!, Technology
• Math core processor
• Hyper Threading
• Dual core technology
• Quad core technology
• Intel Virtualization
• Execute Disable bit
• Intel® Turbo Boost Technology

System Management Mode(SMM)
• is an operating mode
• is suspended, and special separate software is
executed in high-privilege mode
• It is available in all later microprocessors in the x86
architecture
• Some uses of SMM are
– Handle system events like memory or chipset errors.
– Manage system safety functions, such as shutdown
on high CPU temperature and turning the fans on and
off.
– Control power management operations, such as
managing the voltage regulator modules.

MMX Technology
• Multimedia extension / Matrix math
extension
• Improves audio/video compression
• MMX defined eight registers, known as
MM0 through MM7
• Each of the MMn registers holds 64 bits
• MMX provides only integer operations
• Used for both 2D and 3D calculations
• 57 new instructions + (SIMD- Single
instruction multiple data)

SSE -Streaming SIMD Extensions
• Used to accelerate floating point and parallel calculations

• is a SIMD instruction set extension to the x86 architecture
• subsequently expanded by Intel to SSE2, SSE3, SSSE3,
and SSE4
• it supports floating point math
• SSE originally added eight new 128-bit registers known as
XMM0 through XMM7
• SSE Instructions
– Floating point instructions
– Integer instructions
– Other instructions

SSE2- Streaming SIMD
Extensions 2
• Introduce in Pentium IV
• Add 114 additional instructions
• Also include MMX and SSE instructions
• SSE2 is an extension of the IA-32
architecture

SSE3- Streaming SIMD
Extensions 3
• Introduce in PIV Prescott processor
• Code name Prescott New
Instructions (PNI)
• Contains 13 new instructions
• Also include MMX, SSE, SSE2

SSE3- Supple
• Introduce in xeon and Core 2
processors
• Add new 32 SIMD instructions to SSE3

SSE4 (HD BOOT)
• Introduce by Intel in 2008
• Adds 54 new instructions
• 47 of SSE4 instructions are referred to as
SSE4.1
• 7 other instruction as SSE4.2
• SSE4.1 – is targeted to improve
performance of media, imaging and 3D
• SSE4.2 improves string and text
processing

SSE - Advantages
• Higher quality and high quality image
resolution
• High quality audio and MPEG2 Video
multi media application support
• Reduce CPU utilization for speech
recognition software
• SSEx instructions are useful
withMPEG2 decoding

3DNow! Technology
• AMD’s alternative to SSE
• Uses 21 instructions uses SIMD
technologies
• Enhanced 3DNow! ADDS 24 more
instructions
• Professional 3DNow! Adds 51 SSE
command to the Enhanced 3DNow!

Math coprocessor
• Provides hardware for plotting point
Math
• Speed Computer Operations
• All Intel processors since 486DX
include built-in floating point unit (FPU)
• Can performance high level
mathematical operation
• Instruction set differ from main CPU
Hyper-Threading Technology
• Is an Intel-proprietary technology used to
improve parallelization of computations
doing multiple tasks at once
• The operating system addresses two virtual
processors, and shares the workload
between them when possible
• Allowing multiple threads to run
simultaneously

Hyper-Threading Technology
• Originally introduce Xeon processor for
Servers (2002)
• Available all PIV processor with bus
speed 800 MHz
• HT enable processors has 2 set of
general purpose registers, control
registers
• Only Single Cache memory and Single
Buses
HT - Requirements
• Processor with HT Technology
• Compatible MB (Chipset)
• BIOS support
• Compatible OS
• Software written to Support HT

Dual Core Technology
• Introduce in 2005
• Consist of 2 CPU cores (Enable Single
processors to work as 2 processors)
• Multi Tasking performance is improved

Quad-Core Technology
• Consist of 4 CPU cores (Enable Single
processors to work as 4 processors)
• Less power consumption
• Design to provide multimedia and multi
tasking experience

Intel Virtualization
• Allows hardware platform to run
multiple platform
• Available in Core to Quad processors

Execute Disable Bit
• Is a hardware-based security feature
• Can reduce exposure to viruses and
malicious-code attacks and prevent
harmful software from executing and
propagating on the server or network.
• Help protect your customers' business
assets and reduce the need for costly
virus-related repairs by building systems
with built-in Intel Execute Disable Bit.
Intel® Turbo Boost Technology
• Provides more performance when needed
• Automatically allows processor cores to run
faster than the base operating frequency
• Depends on the workload and operating
environment
• Processor frequency will dynamically increase
until the upper limit of frequency is reached
• Has multiple algorithms operating in parallel to
manage current, power, and temperature to
maximize performance and energy efficiency

Bugs

Bugs
• Processor can contain defects or errors
• Only way to fix the bug
– Work around it or replace it with bugs free
• Now…
– Many bugs to be fixed by altering the
microcode
– Microcode gives set of information how
processor works
– Incorporate Reprogrammable Microcode

Fixing the Bugs
• Microcode updates reside in ROM
BIOS
• Each time the system rebooted fixed
code is loaded
• These microcode is provided by Intel to
motherboard manufacturers and they
can incorporate it into ROM BIOS
• Need to install most recent BIOS every
time
CPU Design Strategy
CISC & RISC

What is CISC?
• CISC is an acronym for Complex
Instruction Set Computer
• Most common microprocessor designs such
as the Intel 80x86 and Motorola 68K series
followed the CISC philosophy.
• But recent changes in software and hardware
technology have forced a re-examination of
CISC and many modern CISC processors
are hybrids, implementing many RISC
principles.
• CISC was developed to make compiler
development simpler.
CISC Characteristics
• 2-operand format,
• Variable length instructions where the length
often varies according to the addressing mode
• Instructions which require multiple clock cycles
to execute.
• E.g. Pentium is considered a modern CISC
processor
• Complex instruction-decoding logic, driven by
the need for a single instruction to support
multiple addressing modes.
• A small number of general purpose registers
• Several special purpose registers.
• A 'Condition code" register which is set as a
side-effect of most instructions.
CISC Advantages
• Microprogramniing is as easy as assembly
language to implement
• The ease of microcoding new instructions
allowed designers to make CISC machines
upwardly compatible: a new computer could
run the same programs as earlier computers
because the new computer would contain a
superset of the instructions of the earlier
computers.
• As each instruction became more capable,
fewer instructions could be used to
implement a given task. This made more
efficient use of the relatively slow main
memory.
CISC Disadvantages
• Instruction set & chip hardware become
more complex with each generation of
computers.
• Many specialized instructions aren't
used frequently enough to justify their
existence -
• CISC instructions typically set the
condition codes as a side effect of the
instruction.
What is RISC?
• RISC - Reduced Instruction Set Computer.
– is a type of microprocessor architecture
– utilizes a small, highly-optimized set of
instructions, rather than a more specialized set of
instructions often found in other types of
architectures.
• History
– The first RISC projects came from IBM,
Stanford, and UC-Berkeley in the late 70s
and early 80s.
– The IBM 801, Stanford MIPS, and Berkeley RISC
1 and 2 were all designed with a similar
philosophy which has become known as RISC.
RISC - Characteristic
• one cycle execution time: RISC processors
have a CPI (clock per instruction) of one
cycle. This is due to the optimization of each
instruction on the CPU and a technique
called PIPELINING
• pipelining: a techique that allows for
simultaneous execution of parts, or stages, of
instructions to more efficiently process
instructions;
• large number of registers: the RISC design
philosophy generally incorporates a larger
number of registers to prevent in large
amounts of interactions with memory

RISC Attributes
The main characteristics of CISC microprocessors are:
• Extensive instructions.
• Complex and efficient machine instructions.
• Microencoding of the machine instructions.
• Extensive addressing capabilities for memory
operations.
• Relatively few registers.
In comparison, RISC processors are more or less the
opposite of the above:
• Reduced instruction set.
• Less complex, simple instructions.
• Hardwired control unit and machine instructions.
• Few addressing schemes for memory operands with
only two basic instructions, LOAD and STORE
CISC Vs RISC
CISC RISC
Emphasis on hardware Emphasis on software

Includes multi-clock Single-clock,
complex instructions reduced instruction only
Memory-to-memory: Register to register:
"LOAD" and "STORE" "LOAD" and "STORE"
incorporated in instructions are independent instructions
Small code sizes, Low cycles per second,
high cycles per second large code sizes
Transistors used for storing Spends more transistors
complex instructions on memory registers
Performance of
Computers
Improving Performance of
Computers
• Increasing clock speed
– Physical limitation (Need new hardware)
• Parallelism (Doing more things at once)
– Instruction-level parallelism
• Getting more instruction per second
– Processor-level parallelism
• Having multiple CPUs working on the same
problem
Instruction-level parallelism
• Pipelining
– Instruction execution speed is affected by
time taken to fetch instruction from memory
– Early Computers fetch instructions in advance
and stored in registers (Prefetch buffer)
• Prefetching divides instruction execution into two
parts
– Fetching
– Actual execution
– Pipelining divides instruction in to many parts;
each handled by different hardware and can
run in parallel
Pipelining example
• Packaging cakes
– W1: Place an empty box on the belt every 10 second
– W2: Place the cake in the empty box
– W3: Close and seal the box
– W4: Label the box
– W5: Remove the box and place it in the large
container
162
Computer Pipelines
• S1: Fetch instruction from memory and place it in a

buffer until it is needed
• S2: Decode the instruction; determine it type and
operands it needs
• S3: locate the fetch operands from memory (or registers)
• S4: Execute instruction
• S5: Write back result in a register
163
Example
T - Cycle time
N - Number of stages in the pipeline
Latency:
Time taken to execute an instruction = N x
T
Processor Bandwidth:
No. of MIPS the CPU has = 1000 MIPS
T
164
Processor - pipeline depth
165
Dual pipelines
• Instruction fetch unit fetches a pair of instructions and puts

each one into own pipeline
• Pentium has two five-stage pipelines
– U pipeline (main) executes an arbitrary Pentium instructions
– V pipeline (second) executes inter instructions, one simple
floating point instruction
• If instructions in a pair conflict, instruction in u pipeline is
executed. Other instruction is held and is paired with next
instruction
166
Superscalar architecture
• Single pipeline with multiple functional
units
Processor level parallelism
• High bus traffic
• Low bus traffic

Measuring Performance
Moore’s law
• Describes a long-term trend in the
history of computing hardware
• Defined by Dr. Gordon Moore during
the sixties.
• Predicts an exponential increase in
component density over time, with a
doubling time of 18 months.
• Applicable to microprocessors, DRAMs
, DSPs and other microelectronics.
Moore's Law and Performance
• The performance of computers is
determined by architecture and clock
speed.
• Clock speed doubles over a 3 year period
due to the scaling laws on chip.
• Processors using identical or similar
architectures gain performance directly as
a function of Moore's Law.
• Improvements in internal architecture can
yield better gains than predicted by
Moore's Law.
Measuring Performance
• Execution time:
– Time between start and completion of a task
(including disk accesses, memory accesses )
• Throughput:
– Total amount of work dome a given time
Performance of a Computer
Two Computer X and Y;

Performance of (X) > Performance of (Y)
Execution Time (Y) > Execution Time (X)

Performance of difference 2
Computer
X is n Time faster than Y
CPU Time
• Time CPU spends on a task
• User CPU time
– CPU time spent in the program
• System CPU time
– CPU time spent in OS performing tasks on
behalf of the program
CPU Time (Example)
• User CPU time = 90.7s
• System CPU time 12.9s
• Execution time 2m 39 s 159s
• % of CPU time =
User CPU Time + System CPU Time
X 100
%
Execution time
CPU Time
% CPU time = (90.7 + 12.9 ) x 100
159
= 65 %
Clock Rate
• Computer clock runs at the constant
rate and determines when events take
place in the hardware
Clock Rate = 1
Clock Cycle
Amdahl’s law
• Performance improvement that can be
gained from some faster mode of
execution is limited by fraction of the
time the faster mode can be used
Amdahl’s law
• Speedup depends on
– Fraction of computation time in original
machine that can be converted to take
advantage of the enhancement
(Fraction Enhanced)
– Improvement gains by enhanced
execution mode
(Speedup Enhanced)
Example
Total execution time of a Program = 50
s
Execution time that can be enhanced
= 30 s
FractionEnhanced
= 30 /50
= 0.6
Speedup
Example
Normal mode execution time for some
portion of a program = 6s
Enhances mode execution time for the
same program = 2s
Speedup Enhanced = 6/2

=3
Execution Time
Example
• Suppose we consider an enhancement to the
processor of a server system used for Web serving.
New CPU is 10 times faster on computation in Web
application than original CPU. Assume original CPU
is busy with computation 40% of the time and is
waiting for I/O 60% of time.
What is the overall speedup gained

from enhancement?
Answer
188
Remark
• If an enhancement is only usable for
fraction of a task, we cannot speedup
by more than
189
Example
• A common transformation required in graphics
engines is square root. Implementation of
floating-point (FP) square root vary significantly
in performance, especially among processors
designed graphics
• Suppose FP square root (FPSQR) is responsible
for 20% of execution tine of a critical graphics
program
• Design alternative
1. Enhance EPSQR hardware and speed up this
operation by a factor of 10
2. Make all FP instruction run faster by a factor of 1.6
190
Example
• FP instruction are responsible for a total
of 50% of execution time. Design team
believes they can make all fp
instruction run 1.6 times faster with
same effort as required for fast square
root.
Compare these two design alternatives
191
192
CPU performance equation
CPU time = CPU clock cycles for a program x Clock cycle time
= CPU clock cycles / Clock rate

Example
A program runs in 10s on computer A
having 400 MHz clock. A new machine
B, which could run the same program in
6s, has to be designed. Further, B
should have 1.2 times as many clock
cycles as A.
What should be the clock rate of B?

Answer
CPU Clock Cycles
CPI (clock cycles per instruction)
average no. of clock cycles each instruction
takes to execute
IC (instruction count)
no. of instructions executed in the program
CPU clock cycles = CPI x IC
Note: CPI can be used to compare two different

implementations of the same instruction set
architecture (as IC required for a program is
same)
Example
• Consider two implementations of same
instruction set architecture. For a certain
program, details of time measurements of
two machines are given below
• Which machine is faster for this program and

by how much?
Answer
Measuring components
of CPU performance equation
• CPU Time: by running the program
• Clock Cycle Time: published in
documentation
• IC: by a software tools/simulator of the
architecture ((more difficult to obtain)
• CPI: by simulation of an implementation
(more difficult to obtain)
CPU clock cycles
Suppose n different types of instruction
Let
ICi – No. of times instruction i is executed in a program
CPIi – Avg. no. of clock cycles for instruction i
Example
Suppose we have made the following measurements:
– Frequency of FP operations (other than FPSQR) = 25%
– Average CPI of FP operations = 4.0
– Average CPI of other instructions = 1.33
– Frequency of FPSQR= 2%
– CPI of FPSQR = 20
Design alternatives:
1. decrease CPI of FPSQR to 2
2. decrease average CPI of all FP operation to 2.5
Compare these two design alternatives using CPU

performance equation
Answers
• Note that only CPI changes; clock rate; IC remain identical
MIPS as a performance measure
Problems
• MIPS is dependant on instruction set
– difficult to compare MIPS of computers
with different instruction sets
• MIPS can vary inversely to
performance
MFLOPS as a performance
measure
Problems
• MFLOPS is not dependable
– Cray C90 has no divide instructions while
Pentium has
• MFLOPS depends on the mixture of
fast and slow floating point operations
– add (fast) and divide (slow) operations
Instruction Set Architecture
(ISA) Level
207
Introduction
208
• Positioned between microarchitecture
level and operating system level
• Important to system architects
– interface between software and hardware
209
210
ISA contd..
• General approach of system designers:
– Build programs in high-level languages
– Translate to ISA level
– Build hardware that executes ISA level
programs directly
• Key challenge:
– Build better machines subject to backward
compatibility constraint
211
Features off a good ISA
• Define a set of instructions that can be
implemented efficiently in current and
future technologies resulting in cost
effective designs over several
generations
• Provide a clean target for compiled
code
212
Properties off ISA level
• ISA level code is what a compiler
outputs
• To produce ISA code, compiler writer
has to know
– What the memory model is
– What registers are there
– What data types and instructions are
available
213
ISA level memory models
• Computers divide memory into cells (8
bits) that have consecutive addresses
• Bytes are grouped into words (4-, 8-
byte) with instructions available for
manipulating entire words
• Many architectures require words to be
aligned on their natural boundaries
– Memories operate more efficiently that
way
214
ISA level Memory Models
• On Pentium II (fetches 8 bytes at a time from

memory), ISA programs can make memory
references to words starting at any address
– Requires extra logic circuits on the chip
– Intel allows it cause of backward compatibility
constraint (8088 programs made non-aligned
memory references)
215
ISA level registers
• Main function of ISA level registers:
– provide rapid access to heavily used data
• Registers are divided into 2 categories
– special purpose registers (program
counter, stack pointer)
– General purpose registers (hold key local
variables, intermediate results of
calculations).
• These are interchangeable
216
Instructions
• Main feature of ISA level is its set of
machine instructions
• They control what the machine can do
• Ex:
– LOAD and STORE instructions move data
between memory and registers
– MOVE instruction copies data among
registers
217
Pentium II ISA level (Intel’s IA-32)
• Maintains full support for execution of programs
written for 8086, 8088 processors (16-bit)
• Pentium II has 3 operating modes (Real mode,
Virtual 8086 mode, Protected mode)
• Address Add space: memory is divided into 16,384
segments, each going from address 0 to address
232-1 (Windows supports only one segment)
• Every byte has its own address, with words being
32 bits long
• Words are stored in Little endian format (low-
order byte has lowest address)
218
Little endian and Big endian
format
219
Pentium II’s primary registers
220
• EAX: Main arithmetic registers, 32-bit
– 16-bit register in low-order 16 bits
– 8-bit register in low-order 8 bits
– easy to manipulate 16-bit (in 80286) and 8-bit
(in 8088) quantities
• EBX: holds pointers
• ECX: used in looping
• EDX: used for multiplication and division,
where together with EAX, it holds 64-bit
products and dividends
221
• ESI,ESI EDI: holds pointers into memory
– Especially for hardware string manipulation
instructions (ESI points to source string, EDI
points to destination string)
• EBP: pointer register
• ESP: stack pointer
• CS through GS: segment registers
• EIP: program counter
• EFLAGS: flag register (holds various
miscellaneous bits such as conditional
codes)
222
Pentium II data Types
223
Instruction Formats
• An instruction consists of an opcode,
plus additional information such as
where operands come from, where
results go to
• Opcode tells what instruction does
• On some machines, all instructions
have same length
– Advantages: simple, easy to decode
– Disadvantages: waste space
224
Common Instruction Formats
(a) Zero address instruction

(b) One address instruction
(c) Two address instruction
(d) Three address instruction
225
Instruction and Word length
Relationships
226
Example
• An Instruction with 4bit Opcode and
Three 4bit address
227
Design of Instruction Formats
• Factors:
– Length of instruction
• short instructions are better than long
instructions (modern processors can execute
multiple instructions per clock cycle)
– Sufficient room in the instruction format to
express all operations required
– No. of bits in an address field
228
Intel® 64 and IA-32 Architectures
• Intel 64 and IA-32 instructions
– General purpose
– x87 FPU
– x87 FPU and SIMD state management
– Intel MMX technology
– SSE extensions
– SSE2 extensions
– SSE3 extensions
– SSSE3 extensions
– SSE4 extensions
– AESNI and PCLMULQDQ
– Intel AVX extensions
– F16C, RDRAND, FS/GS base access
– System instructions
– IA-32e mode: 64-bit mode instructions
– VMX instructions
– SMX instructions
229
Addressing
230
Addressing
• Subject of specifying where the operands
(addresses) are
– ADD instruction requires 2 or 3 operands, and
instruction must tell where to find operands and
where to put result
• Addressing Modes
– Methods of interpreting the bits of an address field
to find operand
• Immediate Addressing
• Direct Addressing
• Register Addressing
• Register Indirect Addressing
• Indexed Addressing
231
Immediate Addressing
• Simplest way to specify where the operand is
• Address part of instruction contains operand
itself (immediate operand)
• Operand is automatically fetched from memory
at the same time the instruction it self is fetched
– Immediately available for use
• No additional memory references are required
• Disadvantages
– only a constant can be supplied
– value of the constant is limited by size of address field
• Good for specifying small integers
232
Example
Immediate Addressing
MOV R1, #8 ; Reg[R1] ← 8
ADD R2R2, #3 ; Reg[R2] ← Reg[R2] + 3
233
Direct Addressing
• Operand is in memory, and is specified by giving
its full address (memory address is hardwired
into instruction)
• Instruction will always access exactly same
memory location, which cannot change
• Can only be used for global variables who
address is known at compile time
• Example Instruction:
– ADD R1, R1(1001) ; Reg[R1] ← Reg[R1]
+Mem[1001]
234
Direct Addressing Example
235
Register Addressing
• Same as direct addressing with the exception that it
specifies a register instead of memory location
• Most common addressing mode on most computers
since register accesses are very fast
• Compilers try to put most commonly accessed
variables in registers
• Cannot be used only in LOAD and STORE
instructions (one operand in is always a memory
address)
• Example instruction:
– ADD R3, R4 ; Reg[R3] ← Reg[R3] + Reg[R4]
236
Register Indirect Addressing
• Operand being specified comes from memory or
goes to memory
• Its address is not hardwired into instruction, but is
contained in a register (pointer)
• Can reference memory without having full memory
address in the instruction
• Different memory words can be used on different
executions of the instruction
– ADD R1,R1(R2) ; Reg[R1] ← Reg[R1] +
Mem[Reg[R2]]
237
Example
• Following generic assembly program calculates the
sum of elements (1024) of an array A of integers of 4
bytes each, and stores result in register R1
– MOV R1, #0 ; sum in R1 (0 initially)

– MOV R2, #A ; Reg[R2] = address of array A
– MOV R3, #A+4096 ; Reg[R3] = address of first
word beyond A
– LOOP: ADD R1, (R2) ; register indirect via R2 to get
operand
– ADD R2, #4 ; increment R2 by one word
– CMP R2, R3 ; is R2 < R3?
– BLT LOOP ; loop if R2 < R3
238
Indexed Addressing
• Memory is addressed by giving a register
plus a constant offset
• Used to access local variables
– ADD R3, 100(R2)
; Reg[R3] ← Reg[R3] + Mem[100+Reg[R2]]
239
Based-Indexed Addressing
• Memory address is computed by
adding up two registers plus an optional
offset
ADD R3, (R1+R2)
;Reg[R3] ← Reg[R3] + Mem[Reg[R1] +
Reg[R2]]
240
Instruction Types
• ISA level instructions are divided into few
categories
– Data Movement Instructions
• Copy data from one location to another
– Examples (Pentium II integer instructions):
• MOV DST, SRC – copies SRC (source) to DST
(destination)
• PUSH SRC – push SRC into the stack
• XCHG DS1, DS2 – exchanges DS1 and DS2
• CMOV DST, SRC – conditional move
241
Instruction Types contd..
– Dyadic Operations
• Combine two operands to produce a result
(arithmetic instructions, Boolean instructions)
– Examples (Pentium II integer instructions):
• ADD DST, SRC – adds SRC to DST, puts result in
DST
• SUB DST, SRC – subtracts DST from SRC
• AND DST, SRC – Boolean AND SRC into DST
• OR DST, SRC - Boolean OR SRC into DST
• XOR DST,DST SRC – Boolean Exclusive OR to
DST
242
• Monadic Operations
– Have one operand and produce one result
– Shorter than dyadic instructions
• Examples (Pentium II integer
instructions):
– INC DST – adds 1 to DST
– DEC DST – subtracts 1 from DST
– NOT DST – replace DST with 1’s
complement
243
• Comparison and Conditional Branch
Instructions

instructions):
– TST SRC1, SRC2 – Boolean AND operands, set flags
(EFLAGS)
– CMP SRC1, SRC2 – sets flags based on SRC1-SRC2
244
• Procedure (Subroutine) call
Instructions
– When the procedure has finished its task,
transfer is returned to statement after the call

instructions):
– CALL ADDR -Calls procedure at ADDR
– RET - Returns from procedure
245
• Loop Control Instructions
– LOOPxx – loops until condition is met
• Input / Output Instructions
There are several input/output schemes
currently used in personal computers
– Programmed I/O with busy waiting
– Interrupt-driven I/O
– DMA (Direct Memory Access) I/O
246
Programmed I/O with busy waiting
• Simplest I/O method

• Commonly used in low-end processors
• Processors have a single input instruction and a
single output instruction, and each of them
selects one of the I/O devices
• A single character is transferred between a fixed
register in the processor and selected I/O device
• Processor must execute an explicit sequence of
instructions for each and every character read or
written
247
DMA I/O
• DMA controller is a chip that has a direct
access to the bus
• It consists of at least four registers, each
can be loaded by software.
– Register 1 contains memory address to be
read/written
– Register 2 contains the count of how many
bytes / words to be transferred
– Register 3 specifies the device number or I/O
space address to use
– Register 4 indicates whether data are to be
read from or written to I/O device
248
Structure of a DMA
249
Registers in the DMA
• Status register: readable by the CPU to determine the status
of the DMA device (idle, busy, etc)
• Command register: writable by the CPU to issue a command
to the DMA
• Data register: readable and writable. It is the buffering place
for data that is being transferred between the memory and the
IO device.
• Address register: contains the starting location of memory
where from or where to the data will be transferred. The
Address register must be programmed by the CPU before
issuing a "start" command to the DMA.
• Count register: contains the number of bytes that need to be
transferred. The information in the address and the count
register combined will specify exactly what information need to
be transferred.
250
Example
• Writing a block of 32 bytes from memory
address 100 to a terminal device (4)
251
Example contd..
• CPU writes numbers 32, 100, and 4 into first three
DMA registers, and writes the code for WRITE (1, for
example) in the fourth register
• DMA controller makes a bus request to read byte
100 from memory
• DMA controller makes an I/O request to device 4 to
write the byte to it
• DMA controller increments its address register by 1
and decrements its count register by 1
• If the count register is > 0, another byte is read from
memory and then written to device
• DMA controller stops transferring data when count =
0
252
Sample Questions
Q1.
1. Explain the processor architecture of 8086.
2. What are differences in Intel Pentium
Processor and dual core processor.
3. What are the advantages and disadvantage
of the multi-core processors
253
Sample Questions
Q2.
1. What is addressing.
2. Comparing advantages,
disadvantages and features briefly
explain each addressing modes.
3. What is DMA and why it useful for
Programming?. Explain your answer
254
Computer Memory
• Primary Memory
• Secondary Memory
• Virtual Memory
255
Levels in Memory Hierarchy
Cache Virtual Memory
C
Regs a
8B 32 B 4 KB
c Memory disk
CPU h
e
Register Cache Memory Disk Memory

size: 32 B 32 KB-4MB 4096 MB 1 TB
speed: 0.3 ns 2 ns? 7.5 ns 8 ms
$/Mbyte: $75/MB $0.014/MB $0.00012/MB
line size: 4B 32 B 4 KB
larger, slower, cheaper

Primary Memory
257
Primary memory
• Memory is the workspace for CPU
• When a file is loaded into memory, it is a copy of the
file that is actually loaded
• Consists of a no. of cells, each having a number
(address)
• n cells → addresses: 0 to n‐1 ‐
• Same no. off bits in each cell
• Adjacent cells have consecutive addresses
‐ address 2m addressable cells
• m‐bit
• A portion of RAM address space is mapped into one
or more ROM chips
258
Ways of organizing a 96-bit
memory
259
SRAM (Static RAM)
• Constructed using flip flops
• 6 transistors for each bit of storage
• Very fast
• Contents are retained as long as power is
kept on
• Expensive
• Used in level 2 cache
260
DRAM (Dynamic RAM)
• No flip‐flops
• Array of cells, each consisting a transistor and a capacitor
• Capacitors can be charged or discharged, allowing 0s
and 1s to be Stored
• Electric charge tends to leak out Þ each bit in a DRAM
must be reloaded (refreshed) every few milliseconds (15
ms) to prevent data from leaking away
• Refreshing takes several CPU cycles to complete (less
than 1% of overall bandwidth)
• High density (30 times smaller than SRAM)
• Used in main memories
• Slower than SRAM
• Inexpensive (30 times lower than SRAM)
261
SDRAM (Synchronous DRAM)
• Hybrid of SRAM and DRAM
• Runs in synchronization with the system bus
• Driven by a single synchronous clock
• Used in large caches, main memories
262
DDR (Double Data Rate) SDRAM
• An upgrade to standard SDRAM

• Performs 2 transfers per clock cycle (one at falling
edge, one at rising edge) without doubling actual
clock rate
263
Dual channel DDR
• Technique in which 2 DDR DIMMs are installed at one time and
function as a single bank doubling the bandwidth of a single module
• DDR2 SDRAM
– A faster version of DDR SDRAM (doubles the data rate of DDR)
– Less power consumption than DDR
– Achieves higher throughput by using differential pairs of signal wires
– Additional signal add to the pin count
• DDR3 SDRAM
– An improved version off DDR2 SDRAM
– Same no. of pins as in DDR2,
– Not compatible with DDR2
– Can transfer twice the data rate of DDR2
– DDR3 standard allows chip sizes of 512 Megabits to
8 Gigabits (max module size – 16GB)
264
DRAM Memory module
265
DRAM Memory module
266
SDRAM and DDR DIMM versions
• Buffered
• Unbuffered
• Registered
267
SDRAM and DDR DIMM
• Buffered Module
– Has additional buffer circuits between memory
chips and the connector to buffer signals
– New motherboards are not designed to use
buffered modules
• Unbuffered Module
– Allows memory controller signals to pass directly
to memory chips with no interference
– Fast and most efficient design
– Most motherboards are designed to use
unbuffered modules
268
SDRAM and DDR DIMM
• Registered Module
– Uses register chips on the module that act
as an interface between RAM chip and
chipset
– Used in systems designed to accept
extremely large amounts of RAM (server
motherboards)
269
Memory Errors
270
Memory errors
• Hard errors
– Permanent failure
– How to fix? (replace the chip)
• Soft errors
– Non permanent failure
– Occurs at infrequent intervals
– How to fix? (restart the system)
• Best way to deal with soft errors is to
increase system’s fault tolerance
(implement ways of detecting and
correcting errors)
271
Techniques used for fault
tolerance
• Parity
• ECC (Error Correcting Code)
272
Parity Checking
• 9 bits are used in the memory chip to
store 1 byte of information
• Extra bit (parity bit) keeps tabs on other
8 bits
• Parity can only detect errors, but
cannot correct them
273
ODD Parity stranded for error
checking
• Parity generator/checker is a part of CPU
or located in a special chip on
motherboard
• Parity checker evaluates the 8 data bits
by adding the no. of 1s in the byte
• If an even no. of 1s is found, parity
generator creates a 1 and stores it as the
parity bit in memory chip
274
ODD Parity stranded for error
checking (contd.)
• If the sum is odd, parity bit would be 0
• If a (9 bit) byte has an even no. of 1s, that
byte must have an error · System cannot
tell which bit or bits have changed
• If 2 bits changed, bad byte could pass
unnoticed
• Multiple bit errors in a single byte are very
rare
• System halts when a parity check error is
detected
275
ECC- Error Correcting Code
• Successor to parity checking
• Can detect and correct memory errors
• Only a single bit error can be corrected
though it can detect doubled bit errors
• This type of ECC is known as single bit
error correction double bit error detection
(SEC DED)
• SEC DED requires an additional 7 check
bits over 32 bits in a 4 byte system, or 8
check bits over 64 bits in an 8 byte system
276
ECC- Error Correcting Code
• ECC entails memory controller
calculating check bits on a
memory write operation, performing a
compare between read and calculated
check bits on a read operation
• Cost of additional ECC logic in memory
controller is not significant
• It affects memory performance on a
write
277
Cache memory
278
Cache Memory
• A high speed,speed small memory
• Most frequently used memory words are kept in
• When CPU needs a word, it first checks it in
cache. If not found, checks in memory
279
Cache and Main Memory
280
Cache memory Vs Main Memory
281
Cache Hit and Miss
• Cache Hit: a request to
read from memory,
which can satisfy from
the cache without using
the main memory.
• Cache Miss: A request
to read from memory,
which cannot be
satisfied from the cache,
for which the main
memory has to be
consulted.
282
Locality Principle
• PRINCIPAL OF LOCALITY is the tendency to
reference data items that are near other
recently referenced data items, or that were
recently referenced themselves.
• TEMPORAL LOCALITY : memory location that
is referenced once is likely to be referenced
multiple times in near future.
• SPATIAL LOCALITY : memory location that is
referenced once, then the program is likely to
be reference a nearby memory location in
near future.
283
Locality Principle
Let
c – cache access time
m – main memory access time
h – hit ratio (fraction of all references that can
be satisfied out of cache)
miss ratio = 1‐h
Average memory access time = c + (1 h) m
H =1 No memory references
H=0 all are memory references
284
Example:
Suppose that a word is read k times in a
short interval
First reference: memory, Other k 1
references: cache
h = k–1
k
Memory access time = c + m
k
285
Cache Memory
• Main memories and caches are divided into fixed sized
blocks
• Cache lines – blocks inside the cache
• On a cache miss, entire cache line is loaded into cache
from memory
• Example:
– 64K cache can be divided into 1K lines of 64 bytes, 2K lines of
32 byte etc
• Unified cache
– instruction and data use the same cache
• Split cache
– Instructions in one cache and data in another
286
A system with three levels of
cache
287
Pentium 4 Block Diagram
288
Replacement Algorithm
• Optimal Replacement: replace the
block which is no longer needed in the
future. If all blocks currently in Cache
Memory will be used again, replace the
one which will not be used in the future
for the longest time.
• Random selection: replace a randomly
selected block among all blocks
currently in Cache Memory.
289
Replacement Algorithm
• FIFO (first-in first-out): replace the block
that has been in Cache Memory for the
longest time.
• LRU (Least recently used): replace the
block in Cache Memory that has not
been used for the longest time.
• LFU (Least frequently used): replace
the block in Cache Memory that has
been used for the least number of times
290
Cache Memory Placement Policy
• Three commonly used methods to
translate main memory addresses to
cache memory addresses.
– Associative Mapped Cache
– Direct-Mapped Cache
– Set-Associative Mapped Cache
• The choice of cache mapping scheme
affects cost and performance, and there
is no single best method that is
appropriate for all situations
291
Associative Mapping
292
Associative Mapping
• A block in the Main Memory can
be mapped to any block in the
Cache Memory available (not
already occupied)
• Advantage: Flexibility. An Main
Memory block can be mapped
anywhere in Cache Memory.
• Disadvantage: Slow or
expensive. A search through all
the Cache Memory blocks is
needed to check whether the
address can be matched to any
of the tags.
293
Direct Mapping
294
Direct Mapping
To avoid the search through all
CM blocks needed by
associative mapping, this
method only allows
# blocks in main memory
# blocks in cache memory
Blocks to be mapped to each
Cache Memory block.
• Each entry (row) in cache can
hold exactly one cache line
from main memory
• 32‐byte
‐ cache line size →
cache can hold 64KB
295
Direct Mapping
• Advantage: Direct mapping is faster than
the associative mapping as it avoids
searching through all the CM tags for a
match.
• Disadvantage: But it lacks mapping
flexibility. For example, if two MM blocks
mapped to same CM block are needed
repeatedly (e.g., in a loop), they will keep
replacing each other, even though all
other CM blocks may be available.
296
Set-Associative Mapping
297
Set-Associative Mapping
• This is a trade-off between
associative and direct mappings
where each address is mapped
to a certain set of cache
locations.
• The cache is broken into sets
where each set contains "N"
cache lines, let's say 4. Then,
each memory address is
assigned a set, and can be
cached in any one of those 4
locations within the set that it is
assigned to. In other words,
within each set the cache is
associative, and thus the name.298
Set Associative cache
• LRU (Least Recently Used) algorithm
is used
– keep an ordering of each set of locations
that could be accessed from a given
memory location
– whenever any of present lines are
accessed, it updates list, making that entry
the most recently accessed
– when it comes to replace an entry, one at
the end of list is discarded
299
Load-Through and Store-Through
• Load-Through : When the CPU
needs to read a word from the
memory, the block containing the
word is brought from MM to CM,
while at the same time the word is
forwarded to the CPU.
• Store-Through : If store-through is
used, a word to be stored from
CPU to memory is written to both
CM (if the word is in there) and
MM. By doing so, a CM block to be
replaced can be overwritten by an
in-coming block without being
saved to MM.
300
Cache Write Methods
• Words in a cache have been viewed simply
as copies of words from main memory that
are read from the cache to provide faster
access. However this view point changes.
• There are 3 possible write actions:
– Write the result into the main memory
– Write the result into the cache
– Write the result into both main memory and cache
memory
301
Cache Write Methods
• Write Through: A cache architecture in which
data is written to main memory at the same
time as it is cached.
• Write Back / Copy Back: CPU performs write
only to the cache in case of a cache hit. If there
is a cache miss, CPU performs a write to main
memory.
• When the cache is missed :
– Write Allocate: loads the memory block into cache
and updates the cache block
– No-Write allocation: this bypasses the cache and
writes the word directly into the memory.
302
Cache Evaluation
Processor on
Problem Solution which feature first
appears
External memory
Add external cache using
slower than the system 386
faster memory technology
bus
Increased processor
speed results in Move external cache on-chip,
external bus becoming operating at the same speed 486
a bottleneck for cache as the processor
access.
Internal cache is rather Add external L2 cache using
small, due to limited faster technology than main 486
space on chip memory
303
Cache Evaluation
Processor on
Problem Solution which feature first
appears
Increased processor speed Move L2 cache on to the Pentium II
results in external bus processor chip.
becoming a bottleneck for
L2 cache access Create separate back-side bus that Pentium Pro
runs at higher speed than the main
(front-side) external bus. The BSB
is dedicated to the L2 cache.
Some applications deal Add external L3 cache. Pentium III

with massive databases
and must have rapid
Move L3 cache on-chip Pentium IV
access to large amounts of
data. The on-chip caches
are too small.
304
Comparison of Cache Sizes
Year of
Processor Type L1 cache L2 cache L3 cache
Introduction
IBM 360/85 Mainframe 1968 16 to 32 KB — —
PDP-11/70 Minicomputer 1975 1 KB — —
VAX 11/780 Minicomputer 1978 16 KB — —
IBM 3033 Mainframe 1978 64 KB — —
IBM 3090 Mainframe 1985 128 to 256 KB — —
Intel 80486 PC 1989 8 KB — —
Pentium PC 1993 8 KB/8 KB 256 to 512 KB —
PowerPC 601 PC 1993 32 KB — —
PowerPC 620 PC 1996 32 KB/32 KB — —
PowerPC G4 PC/server 1999 32 KB/32 KB 256 KB to 1 MB 2 MB
IBM S/390 G4 Mainframe 1997 32 KB 256 KB 2 MB
IBM S/390 G6 Mainframe 1999 256 KB 8 MB —
Pentium 4 PC/server 2000 8 KB/8 KB 256 KB —
IBM SP High-end server 2000 64 KB/32 KB 8 MB —
CRAY MTAb Supercomputer 2000 8 KB 2 MB —
Itanium PC/server 2001 16 KB/16 KB 96 KB 4 MB
SGI Origin 2001 High-end server 2001 32 KB/32 KB 4 MB —
Itanium 2 PC/server 2002 32 KB 256 KB 6 MB
IBM POWER5 High-end server 2003 64 KB 1.9 MB 36 MB
CRAY XD-1 Supercomputer 2004 64 KB/64 KB 1MB —
Memory stall cycles
No. of clock cycles during which CPU is
stalled waiting for a memory access
CPU time =
(CPU clock cycles + Memory stall cycles)
x Clock cycle time
Memory stall cycles = No. of misses x Miss
penalty
= IC x Misses per instruction x Miss penalty
= IC x Memory accesses per instruction x
Miss ratio x Miss penalty
306
Example
Assume we have a machine where CPI is 2.0
when all memory accesses hit in the cache.
Only data accesses are loads and stores,
and these total 40% of instructions. If the
miss penalty is 25 clock cycles and miss ratio
is 2%, how much faster would the machine
be if all instructions were cache hits?
307
Answer
308
Secondary Memory
309
Technologies
• Magnetic storage
– Floppy, Zip disk, Hard drives, Tapes
• Optical storage
– CD, DVD, Blue-Ray, HD-DVD
• Solid state memory
– USB flash drive, Memory cards for mobile
phones/digital cameras/MP3 players, Solid
State Drives
310
Magnetic Disk
• Purpose:
– Long term, nonvolatile storage
– Large, inexpensive, and slow
– Lowest level in the memory hierarchy
• Two major types:
– Floppy disk
– Hard disk
• Both types of disks:
– Rely on a rotating platter coated with a magnetic surface
– Use a moveable read/write head to access the disk
• Advantages of hard disks over floppy disks:
– Platters are more rigid ( metal or glass) so they can be larger
– Higher density because it can be controlled more precisely
– Higher data rate because it spins faster
– Can incorporate more than one platter
Disk Track
Components of a Disk
Spindle
• The arm assembly is Tracks
Disk head
moved in or out to
position a head on a
desired track. Tracks Sector
under heads make a
cylinder (imaginary!).
• Only one head
reads/writes at any one
Platters
time. Arm movement
• Block size is a multiple of

sector size (which is often
fixed).
Arm assembly
313
Internal Hard-Disk
Page 223
Magnetic Disk
• A stack of platters, a surface with a magnetic
coating
• Typical numbers (depending on the disk size):
– 500 to 2,000 tracks per surface
– 32 to 128 sectors per track
• A sector is the smallest unit that can be read or
written
• Traditionally all tracks have the same number
of sectors:
• Constant bit density: record more sectors on
the outer tracks
Magnetic Disk Characteristic
• Disk head: each side of a platter has separate disk head
• Cylinder: all the tracks under the head at a given point on all
surface
• Read/write data is a three-stage process:
– Seek time: position the arm over the proper track
– Rotational latency: wait for the desired sector to rotate under the
read/write head
– Transfer time: transfer a block of bits (sector) under the read-write
head
• Average seek time as reported by the industry:
– Typically in the range of 8 ms to 15 ms
– (Sum of the time for all possible seek) / (total # of possible seeks)
• Due to locality of disk reference, actual average seek time may:
– Only be 25% to 33% of the advertised number
Typical Numbers of a Magnetic
Disk
• Rotational Latency:
– Most disks rotate at 3,600/5400/7200 RPM
– Approximately 16 ms per revolution
– An average latency to the desired information is
halfway around the disk: 8 ms
• Transfer Time is a function of :
– Transfer size (usually a sector): 1 KB / sector
– Rotation speed: 3600 RPM to 5400 RPM to 7200
– Recording density: typical diameter ranges from 2
to 14 in
– Typical values: 2 to 4 MB per second
Disk I/O Performance
Disk Access Time =

Seek time + Rotational Latency
+ Transfer time + Controller Time
+ Queueing Delay
Disk I/O Performance
• Disk Access Time = Seek time +
Rotational Latency + Transfer time +
Controller Time + Queueing Delay
• Estimating Queue Length:
– Utilization = U = Request Rate / Service
Rate
– Mean Queue Length = U / (1 - U)
– As Request Rate Service Rate -> Mean
Queue Length ->Infinity
Example
• Setup parameters:
– 16383 Cycliders, 63 sectors per track, 3 platters,
6 heads
• Bytes per sector: 512
• RPM: 7200
• Transfer mode: 66.6MB/s
• Average Read Seek time: 9.0ms (read), 9.5ms
(write)
• Average latency: 4.17ms
• Physical dimension: 1’’ x 4’’ x 5.75’’
• Interleave: 1:1
Disk performance
• Preamble: allows head to be synchronized before read/write
• ECC (Error Correction Code): corrects errors
• Unformatted capacity: preambles, ECCs and inter sector gaps are
counted as data
• Disk performance depends on
– seek time ‐ time to move arm to desired track
– rotational latency – time needed for requested sector to
rotate under head
• Rotational speed: 5400, 7200, 10000, 15000 rpm
• Transfer time – time needed to transfer a block of

bits under head (e.g., 40 MB/s)
321
Disk performance
Disk controller
– chip that controls the drive. Its tasks include accepting
– commands (READ, WRITE, FORMAT) from software,
controlling arm motion, detecting and correcting errors
Controller time
– overhead the disk controller imposes in performing an
I/O access
Avg. disk access time = avg. seek time +

avg. rotational delay +
Transfer time + controller
overhead
322
Example
• Advertised average seek time of a disk is 5
ms, transfer rate is 40 MB per second, and it
rotates at 10,000 rpm Controller overhead is
0.1 ms. Calculate the average time to read a
512 byte sector.
323
RAID-
(Redundant Array of Inexpensive
Disks)
• A disk organization used to improve
performance of storage systems
• An array of disks controlled by a
controller (RAID Controller)
• Data are distributed over disks
(striping) to allow parallel operation
324
RAID 0- No redundancy
• No redundancy to tolerate disk failure
• · Each strip has k sectors (say)
– Strip 0: sectors 0 to k 1
– Strip 1: sectors k to 2k 1 ...etc
• Works well with large accesses
• Less reliable than having a single large
disk
325
Example (RAID 0)
• Suppose that RAID consists of 4 disks
with MTTF (mean time to failure) of
20,000 hours.
– A drive will fail once in every 5,000 hours
– A single large drive with MTTF of 20,000
hours is 4 times reliable
326
RAID 1 (Mirroring)
• Uses twice as many disk as does RAID 0
(first half: primary, next half: backup)
• Duplicates all disks
• On a write, every strip is written twice

• Excellent fault tolerance (if a disk fails, backup
copy is used)
• Requires more disks
327
RAID 3 (Bit Interleaved Parity)
• Reads/writes go to all disks in the group,
with one extra disk (parity disk) to hold
check information in case off a failure
• Parity contains sum of all data in other

disks
• If a disk fails, subtract all data in good
disks from parity disk
328
RAD 4 (Block Interleaved Parity)
• RAID 4 is much like RAID 3 with a

strip for strip parity written onto an extra
disk
– A write involves accessing 2 disks instead
of all
– Parity disk must be updated on every write
329
RAID 5- Block Interleaved
Distributed Parity
• In RAID 5, parity information is spread
throughout all disks
• In RAID 5, multiple writes can occur
simultaneously as long as stripe units are not
located in same disks, but it is not possible in
RAID 4
330
Secondary Storage Devices:
CD-ROM
331
Physical Organization of CD-ROM
• Compact Disk – read only memory (write once)
• Data is encoded and read optically with a laser
• Can store around 600MB data
• Digital data is represented as a series of Pits and
Lands:
– Pit = a little depression, forming a lower level in the track
– Land = the flat part between pits, or the upper levels in the
track
• Reading a CD is done by shining a laser at the disc and detecting
changing reflections patterns.
– 1 = change in height (land to pit or pit to land)
– 0 = a “fixed” amount of time between 1’s
332
Organization of data
LAND PIT LAND PIT LAND
...------+ +-------------+ +---...
|_____| |_______|
..0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 ..
• Cannot have two 1’s in a row!

=> uses Eight to Fourteen Modulation (EFM) encoding table.
• 0's are represented by the length of time between transitions, we
must travel at constant linear velocity (CLV)on the tracks.
• Sectors are organized along a spiral
• Sectors have same linear length
• Advantage: takes advantage of all storage space available.
• Disadvantage: has to change rotational speed when seeking
(slower towards the outside)
333
CD-ROM
• Addressing
– 1 second of play time is divided up into 75 sectors.
– Each sector holds 2KB
– 60 min CD:
60min * 60 sec/min * 75 sectors/sec = 270,000 sectors = 540,000 KB ~ 540
MB
– A sector is addressed by: Minute:Second:Sector e.g. 16:22:34
• Type of laser
– CD: 780nm (infrared)
– DVD: 635nm or 650nm (visible red)
– HD-DVD/Blu-ray Disc: 405nm (visible blue)
• Capacity
– CD: 650 MB, 700 MB
– DVD: 4.7 GB per layer, up to 2 layers
– HD-DVD: 15 GB per layer, up to 3 layers
– BD: 25 GB per layer, up to 2 layers
334
Solid state storage
335
Solid state storage
• Memory cards
– For Digital cameras, mobile phones, MP3 players...
– Many types: Compact flash, Smart Media, Memory Stick,
Secure Digital card...
• USB flash drives
– Replace floppies/CD-RW
• Solid State Drives
– Replace traditional hard disks
• Uses flash memory
– Type of EEPROM
• Electrically erasable programmable read only memory
– Grid of cells (1 cell = 1 bit)
– Write/erase cells by blocks
336
Solid state storage
• Cell=two transistors
– Bit 1: no electrons in between
– Bit 0: many electrons in between
• Performance
– Acces time: 10X faster than hard drive
– Transfer rate
• 1x=150 kb/sec, up to 100X for memory cards
• Similar to normal hard drive for SSD ( 100-150
MB/sec)
– Limited write: 100k to 1,000k cycles
337
Solid state storage
• Size
– Very small: 1cm² for some memory cards
• Capacity
– Memory cards: up to 32 GB
– USB flash drives: up to 32 GB
– Solid State Drives: up to 256 GB
338
Solid state storage
• Reliability
– Resists to shocks
– Silent!
– Avoid extreme heat/cold
– Limited number of erase/write
• Challenges
– Increasing size
– Improving writing limits
339
Virtual Memory
340
Virtual Memory
• Virtual memory is a memory management
technique developed for multitasking kernels
• Separation of user logical memory from
physical memory.
• Logical address space can therefore be
much larger than physical address space
341
A System with
Physical Memory Only
• Examples:
– Most Cray machines, early PCs, nearly all embedded systems, etc.
Memory
0:
Physical 1:
Addresses
CPU
N-1:
Addresses generated by the CPU correspond directly to bytes in physical

memory
A System with Virtual Memory
• Examples: Memory
– Workstations, servers, modern PCs, etc.
0:
Page Table 1:
Virtual Physical
Addresses 0: Addresses
1:
CPU
P-1:
N-1:
Disk
Address Translation: Hardware converts virtual addresses to physical ones
via OS-managed lookup table (page table)
Page Tables
Virtual Page Memory-resident
Number page table
(physical page
Valid
Physical Memory
or disk address)
1
1
0
1
1
1
0
1
0 Disk Storage
1 (swap file or
regular file system file)
VM – Windows
• Can change the
paging file size
• Can set multiple
Virtual memory on
difference drivers
345
Windows Memory management
346
IO Fundamentals
I/O Fundamentals
• Computer System has three major
functions
– CPU
– Memory
– I/O
PC with PCI and ISA bus
Types and Characteristics of I/O
Devices
• Behavior: how does an I/O device behave?
– Input – Read only
– Output - write only, cannot read
– Storage - can be reread and usually rewritten
• Partner:
– Either a human or a machine is at the other end of
the I/O device
– Either feeding data on input or reading data on
output
• Data rate:
– The peak rate at which data can be transferred
• between the I/O device and the main memory
• Or between the I/O device and the CPU
Data Rate
Buses
• A bus is a shared communication link
• Multiple sources and multiple destinations
• It uses one set of wires to connect multiple
subsystems
• Different uses:
– Data
– Address
– Control
Motherboard
Advantages
• Versatility:
– New devices can be added easily
– Peripherals can be moved between
computer
– systems that use the same bus standard
• Low Cost:
– A single set of wires is shared in multiple
ways
Disadvantages
• It creates a communication bottleneck
– The bandwidth of that bus can limit the
maximum I/O throughput
• The maximum bus speed is largely limited
by:
– The length of the bus
– The number of devices on the bus
– The need to support a range of devices with:
• Widely varying latencies
• Widely varying data transfer rates
The General Organization of a Bus
• Control lines:
– Signal requests and acknowledgments
– Indicate what type of information is on the
data lines
• Data lines carry information between the
source and the destination:
– Data and Addresses
– Complex commands
• A bus transaction includes two parts:
– Sending the address
– Receiving or sending the data
Master Vs Slave
• A bus transaction includes two parts:
– Receiving or sending the data
• Master is the one who starts the bus
transaction by:
• Salve is the one who responds to the
address by:
– Sending data to the master if the master ask
for data
– Receiving data from the master if the master
wants to send data
Output Operation
Input Operation
• Input is defined as the Processor
receiving data from the I/O device
Type of Buses
• Processor-Memory Bus (design specific or proprietary)
– Short and high speed
– Only need to match the memory system
– Maximize memory-to-processor bandwidth
– Connects directly to the processor
• I/O Bus (industry standard)
– Usually is lengthy and slower
– Need to match a wide range of I/O devices
– Connects to the processor-memory bus or backplane bus
• Backplane Bus (industry standard)
– Backplane: an interconnection structure within the chassis
– Allow processors, memory, and I/O devices to coexist
• Cost advantage: one single bus for all components

Increasing the Bus Bandwidth
• Separate versus multiplexed address and data lines:
– Address and data can be transmitted in one bus cycle if
separate address and data lines are available
– Cost: (a) more bus lines, (b) increased complexity
• Data bus width:
– By increasing the width of the data bus, transfers of multiple
words require fewer bus cycles
– Example: SPARCstation 20’s memory bus is 128 bit wide
– cost: more bus lines
• Block transfers:
– Allow the bus to transfer multiple words in back-to-back bus
cycles
– Only one address needs to be sent at the beginning
– The bus is not released until the last word is transferred
– Cost: (a) increased complexity (b) decreased response time
for request
Operating System Requirements
• Provide protection to shared I/O resources
– Guarantees that a user’s program can only access the
portions of an I/O device to which the user has rights
• Provides abstraction for accessing devices:
– Supply routines that handle low-level device operation
• Handles the interrupts generated by I/O devices
• Provide equitable access to the shared I/O
resources
– All user programs must have equal access to the I/O
resources
• Schedule accesses in order to enhance system
throughput
OS and I/O Systems
Communication Requirements
• The Operating System must be able to prevent:
– The user program from communicating with the I/O
device directly
• If user programs could perform I/O directly:
– Protection to the shared I/O resources could not be
provided
• Three types of communication are required:
– The OS must be able to give commands to the I/O
devices
– The I/O device must be able to notify the OS when the
I/O device has completed an operation or has
encountered an error
• Data must be transferred between memory and an I/O
device
Commands to I/O Devices
• Two methods are used to address the device:
– Special I/O instructions
– Memory-mapped I/O
• Special I/O instructions specify:

– Both the device number and the command word
– Device number: the processor communicates this via a set of
wires normally included as part of the I/O bus
– Command word: this is usually send on the bus’s data lines
• Memory-mapped I/O:
– Portions of the address space are assigned to I/O device
– Read and writes to those addresses are interpreted as
commands to the I/O devices
– User programs are prevented from issuing I/O operations
directly:
• The I/O address space is protected by the address translation
I/O Device Notifying the OS
• The OS needs to know when:
– The I/O device has completed an operation
– The I/O operation has encountered an error
• This can be accomplished in two different
ways:
– Polling:
• The I/O device put information in a status register
• The OS periodically check the status register
– I/O Interrupt:
• Whenever an I/O device needs attention from the
processor, it interrupts the processor from what it is
currently doing.
Polling
• Advantage:
– Simple: the processor is
totally in control and does all
the work
• Disadvantage:
– Polling overhead can
consume a lot of CPU time
Interrupts
• interrupt is an asynchronous signal
indicating the need for attention or a
synchronous event in software indicating the
need for a change in execution
• Advantage:
– User program progress is only halted during actual
transfer
• Disadvantage, special hardware is needed to:
– Cause an interrupt (I/O device)
– Detect an interrupt (processor)
– Save the proper states to resume after the interrupt
(processor)
Interrupt Driven Data Transfer
• An I/O interrupt is just like the
exceptions except:
– An I/O interrupt is asynchronous
– Further information needs to be
conveyed
• An I/O interrupt is
asynchronous with respect to
instruction execution:
– I/O interrupt is not associated
with any instruction
– I/O interrupt does not prevent
any instruction from completion
– You can pick your own
convenient point to take an
interrupt
I/O Interrupt
• I/O interrupt is more complicated than
exception:
– Needs to convey the identity of the device
generating the interrupt
– Interrupt requests can have different
urgencies:
– Interrupt request needs to be prioritized
• Interrupt Logic
– Detect and synchronize interrupt requests
• Ignore interrupts that are disabled (masked off)
• Rank the pending interrupt requests
• Create interrupt microsequence address
• Provide select signals for interrupt microsequence
Multi-core architectures
Single Computer
Single Core CPU
Multi core architecture
• Replicate multiple processor cores on a
single die
Multi-core CPU chip
• The cores fit on a single processor
socket
• Also called CMP (Chip Multi-Processor)
Why Multi-core
• Difficult to make single-core clock
frequencies even higher
• Deeply pipelined circuits:
– heat problems
– speed of light problems
– difficult design and verification
– large design teams necessary
– server farms need expensive air-conditioning
• Many new applications are multithreaded
• General trend in computer architecture (shift
towards more parallelism)
Instruction-level parallelism
• Parallelism at the machine-instruction
level
• The processor can re-order, pipeline
instructions, split them into
microinstructions, do aggressive branch
prediction, etc.
• Instruction-level parallelism enabled
rapid increases in processor speeds
over the last 15 years
Thread-level parallelism (TLP)
• This is parallelism on a more coarser scale
• Server can serve each client in a separate
thread (Web server, database server)
• A computer game can do AI, graphics, and
physics in three separate threads
• Single-core superscalar processors cannot
fully exploit TLP
• Multi-core architectures are the next step in
processor evolution: explicitly exploiting TLP
Multiprocessor memory types
• Shared memory:
In this model, there is one (large)
common shared memory for all
processors
• Distributed memory:
In this model, each processor has its
own (small) local memory, and its
content is not replicated anywhere else
Multi-core processor is a special
kind of a multiprocessor:
All processors are on the same chip
• Multi-core processors are MIMD:

Different cores execute different threads
(Multiple Instructions), operating on different
parts of memory (Multiple Data).
• Multi-core is a shared memory multiprocessor:

All cores share the same memory
What applications benefit
from multi-core?
• Database servers
• Web servers (Web commerce) Each can
• Compilers run on its
own core
• Multimedia applications
• Scientific applications,
CAD/CAM
• In general, applications with
Thread-level parallelism
(as opposed to instruction-level
parallelism)
More examples
• Editing a photo while recording a TV
show through a digital video recorder
• Downloading software while running an
anti-virus program
• “Anything that can be threaded today
will map efficiently to multi-core”
• BUT: some applications difficult to
parallelize
A technique complementary to multi-core:
Simultaneous multithreading
• Problem addressed: L1 D-Cache D-TLB
The processor pipeline Integer Floating Point

can get stalled:
L2 Cache and Control

Schedulers
– Waiting for the result
of a long floating point Uop queues
(or integer) operation
Rename/Alloc
– Waiting for data to
BTB Trace Cache uCode
arrive from memory ROM
Other execution units Decoder

Bus
wait unused BTB and I-TLB
Source: Intel
Simultaneous multithreading (SMT)
• Permits multiple independent threads to execute

SIMULTANEOUSLY on the SAME core
• Weaving together multiple “threads”
on the same core
• Example: if one thread is waiting for a floating

point operation to complete, another thread can
use the integer units
Without SMT, only a single thread
can run at any given time
L1 D-Cache D-TLB
Integer Floating Point

Schedulers
Uop queues
Rename/Alloc
BTB Trace Cache uCode ROM
Decoder
Bus
BTB and I-TLB
Thread 1: floating point

Without SMT, only a single thread
can run at any given time
L1 D-Cache D-TLB

Schedulers
Uop queues
Rename/Alloc
Decoder
Bus
BTB and I-TLB
Thread 2:
integer operation
SMT processor: both threads can
run concurrently
L1 D-Cache D-TLB

Schedulers
Uop queues
Rename/Alloc
Decoder
Bus
BTB and I-TLB
Thread 2: Thread 1: floating point

integer operation
But: Can’t simultaneously use the
same functional unit
L1 D-Cache D-TLB

Schedulers
Uop queues
Rename/Alloc
Decoder This scenario is

impossible with SMT
Bus
BTB and I-TLB

on a single core
Thread 1 Thread 2 (assuming a single
IMPOSSIBLE integer unit)
SMT not a “true” parallel
processor
• Enables better threading (e.g. up to 30%)
• OS and applications perceive each
simultaneous thread as a separate
“virtual processor”
• The chip has only a single copy
of each resource
• Compare to multi-core:
each core has its own copy of resources
Multi-core:
threads can run on separate cores
L1 D-Cache D-TLB L1 D-Cache D-TLB
Integer Floating Point Integer Floating Point


Schedulers Schedulers
Uop queues Uop queues
Rename/Alloc Rename/Alloc
BTB Trace Cache uCode BTB Trace Cache uCode

ROM ROM
Decoder Decoder
Bus
Bus
BTB and I-TLB BTB and I-TLB
Thread 1 Thread 2
Multi-core:
threads can run on separate cores



ROM ROM
Decoder Decoder
Bus
Bus
Thread 3 Thread 4
Combining Multi-core and SMT
• Cores can be SMT-enabled (or not)
• The different combinations:
– Single-core, non-SMT: standard uniprocessor
– Single-core, with SMT
– Multi-core, non-SMT
– Multi-core, with SMT: our fish machines
• The number of SMT threads:
2, 4, or sometimes 8 simultaneous threads
• Intel calls them “hyper-threads”
SMT Dual-core: all four threads
can run concurrently



ROM ROM
Decoder Decoder
Bus
Bus
Thread 1 Thread 3 Thread 2 Thread 4

Comparison: multi-core vs SMT
• Advantages/disadvantages?
Comparison: multi-core vs SMT
• Multi-core:
– Since there are several cores,
each is smaller and not as powerful
(but also easier to design and manufacture)
– However, great with thread-level parallelism
• SMT
– Can have one large and fast superscalar core
– Great performance on a single thread
– Mostly still only exploits instruction-level
parallelism
The memory hierarchy
• If simultaneous multithreading only:
– all caches shared
• Multi-core chips:
– L1 caches private
– L2 caches private in some architectures
and shared in others
• Memory is always shared
“Fish” machines
hyper-threads
• Dual-core
Intel Xeon processors
CORE1
CORE0
• Each core is L1 cache L1 cache
hyper-threaded
L2 cache
• Private L1 caches
memory
• Shared L2 caches
Designs with private L2 caches
CORE0
CORE1
CORE0
CORE1
L1 cache L1 cache L1 cache L1 cache
L2 cache L2 cache L2 cache L2 cache
L3 cache L3 cache
memory
memory
Both L1 and L2 are private
A design with L3 caches
Examples: AMD Opteron,
AMD Athlon, Intel Pentium D Example: Intel Itanium 2
Private vs shared caches?
• Advantages/disadvantages?
Private vs shared caches
• Advantages of private:
– They are closer to core, so faster access
– Reduces contention
• Advantages of shared:
– Threads on different cores can share the
same cache data
– More cache space available if a single (or
a few) high-performance thread runs on
the system
View publication stats
Windows Task Manager
core 2
core 1

csc203 2011 Complete PDF

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

csc203 2011 Complete PDF

Diunggah oleh

Hak Cipta:

Format Tersedia

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

(2011) Computer System architectures 1

(2011) Computer System architectures 5

(2011) Computer System architectures 6

(2011) Computer System architectures 7

(2011) Computer System architectures 8

Language Machine Language

(2011) Computer System architectures 9

(2011) Computer System architectures 10

(2011) Computer System architectures 11

(2011) Computer System architectures 12

Assembly Language Program

(2011) Computer System architectures 13

Virtual Machine Ln-1

(2011) Computer System architectures 14

(2011) Computer System architectures 15

(2011) Computer System architectures 16

(2011) Computer System architectures 17

(2011) Computer System architectures 18

(2011) Computer System architectures 19

(2011) Computer System architectures 20

(2011) Computer System architectures 21

(2011) Computer System architectures 23

(2011) Computer System architectures 24

(2011) Computer System architectures 25

• Any operation performed by software can also

(2011) Computer System architectures 26

(2011) Computer System architectures 27

(2011) Computer System architectures 28

(2011) Computer System architectures 29

1943 COLOSSUS British gov't First electronic computer

1944 Mark I Aiken First American general-purpose computer

1946 ENIAC I EckerVMauchley Modern computer history starts here

1949 EDSAC Wilkes First stored-program computer

1951 Whirlwind I M.I.T. First real-time computer

1960 PDP-1 DEC First minicomputer (50 sold)

1961 1401 IBM Enormously popular small business machine

1964 360 IBM First product line designed as a family

1964 6600 CDC First scientific supercomputer

1965 PDP-8 DEC First mass-market minicomputer (50,000 sold)

1970 PDP-11 DEC Dominated minicomputers in the 1970s

1974 8080 Intel First general-purpose 8-bit computer on a chip

1974 CRAY-1 Cray First vector supercomputer

1978 VAX DEC First 32-bit superminicomputer

1981 IBM PC IBM Started the modern personal computer era

1985 MIPS MIPS First commercial RISC machine

1987 SPARC Sun First SPARC-based RISC workstation

1990 RS6000 IBM First superscalar machine

(2011) Computer System architectures 32

• Difference Engine 1823

• Analytic Engine 1833

(2011) Computer System architectures 33

(2011) Computer System architectures 34

(2011) Computer System architectures 35

First Generation- Time Line

Electronic Discrete Variable

Manchester Small Scale

(2011) Computer System architectures 36

• world's first electronic digital computer

(2011) Computer System architectures 37

(2011) Computer System architectures 38

(2011) Computer System architectures 39

(2011) Computer System architectures 40