CIS 501
Intro to Computer Architecture
addresses/meeting times/recitation
list of topics
expected background
homeworks
Fall 2001
University of Pennsylvania
exams
projects
grading + cheating
tentative syllabus
People + Places
Topic List
people
Expected Background
textbooks
hardwired/microprogrammed control
other materials
simple pipelining
basic caches
C/UNIX programming
Homework
Exams
in-class mid-term
October 25 (tentatively)
final
cumulative
Class Project
Grading
grade breakdown
homework: 10%
project: 30%
groups of 2-3
proposal + progress report + final report + presentation
cheating
Approximate Schedule
10
Approximate Schedule
week 1: intro
week 5: ILP
week 6: ILP
11
12
Levels of Architecture
SOFTWARE
1
System application
2
Language processors
3
Logical resource
4
management
Physical resource mgmt.
5
5
Program execution
6
Input/Output processors
Controllers
Controllers
9
Communication paths
and devices
Storage
HARDWARE
13
Levels of Architecture
14
ARCHITECTURE
(ISA)
programmer/compiler view
- Functional appearance to its immediate user/system programmer
REALIZATION
(Chip)
15
16
decisions based on
applications
performance
cost
reliability
power . . .
CIS 501 Lecture Notes: Chapter 1
17
Classes of Computers
18
workstations - SPARCstations
servers - SGI Origin, UltraSPARC, AS/400
low cost/power
low-end PCs, laptops, PDAs - mobile Pentiums, TM5400?
2000 by Hill, Wood, Sohi,
Smith, Vijaykumar, Lipasti &
Roth
19
20
are they?
21
Moores Law
example I: caches
corollaries
cost / transistor halves annually
power decreases with scaling
speed increases with scaling
reliability increases with scaling
22
23
24
Moores Law
Transistor Count
1971-1980
10K-100K
1981-1990
100K-1M
1991-2000
1M-100M
2010
1B
Clock Frequency
0.2-2MHz
2-20MHz
20M-1GHz
10GHz
IPC
< 0.1
0.1-0.9
0.9- 2.0
10 (?)
MIPS/MFLOPS
< 0.2
0.2-20
20-2,000
100,000
25
26
performance metrics
2001 - 2003
Amdahls law
27
28
Performance Metrics
fetch
latch
latch
decode
reg
read
latch
latch
ALU
mem
latch
reg
write
clock
performance
in real processors there is always overlap (pipelining)
29
MIPS
30
Relative MIPS
relative MIPS = (timereference / timenew) x MIPSreference
31
32
MFLOPS
Normalized MFLOPS
argument #1: FP operations are same across machines?
33
34
Iron Law
seconds
program
instructions
cycles
=
x
x
program
instruction
seconds
cycle
implementation
(micro-architecture)
processor-designer
CIS501
architecture
(ISA)
compiler-designer
2000 by Hill, Wood, Sohi,
Smith, Vijaykumar, Lipasti &
Roth
realization
(physical layout)
circuit-designer
35
36
Iron Law
37
38
39
40
repeatable!!
41
Benchmarking Process
42
Benchmarking Process
steps
define workload
Define Workload
w1
w2
w3
w4
t1
t2
t3
t4
w5
t5
Extract Benchmarks
w1
t1
w2
w3
t2
t3
w4
w5
t4
t5
Run Benchmarks
^t1
^
t2
^
t3
^
t4
^
t5
Project Performance
^w1
^t1
43
^w2
^
w3
^
w4
^t2
^t3
^
t4
^
w5
^t5
44
ignore dependences
45
Benchmarks: Kernels
46
47
48
SPEC95
real programs
8 integer programs
49
SPEC2000 Benchmarks
50
Benchmarking Pitfalls
12 integer programs
51
52
Benchmarking Pitfalls
weighted means
geometric mean
53
54
time ( i ) n
1
weighted AM
n
( weight ( i ) time ( i ) ) n
1
weighted HM is similar
n
--------------------------------------n
weight ( i )
------------------------rate ( i )
n
-----------------------------n
1
-
---------------- rate ( i )
55
56
Geometric Mean
Geometric Mean
mach A
mach B
B/A
A/B
P1
10
10
0.1
P2
1000
100
0.1
10
5.05
5.05
SPEC uses GM
nT
base, i
n ---------------------T new , i
1
AM/HM
n ratio ( i )
1
CIS 501 Lecture Notes: Chapter 1
57
Qualitative Performance
58
Amdahls Law
balance
bursty behavior
59
60
Amdahls Law
f = 95% and s = 1.10 -> speedup common case a little bit (10%)
lim --------------------------- =
s 1 f + f s
1
----------- => make common case fast
1f
10
8
Speedup
6
4
2
0
0
61
Amdahls Law
0.2
0.4
0.6
0.8
62
exploit locality
10
Speedup
1/\(0.1+0.9/x\)
implementation facts
4
0
0
f = 0.9
2000 by Hill, Wood, Sohi,
Smith, Vijaykumar, Lipasti &
Roth
2000
4000
6000
8000
10000
63
64
Memory Hierarchy
reg
L1
L2
memory
disk (swap)
reg
L1
L2
L3
size
speed
bandwidth
register
< 1 KB
1-5 ns
9600 MB/s
L1 cache
< 256 KB
10 ns
3200 MB/s
L2 cache
< 8 MB
30 ns
800 MB/s
memory
< 4 GB
100 ns
133 MB/s
disk
> 1 GB
20 ms
4 MB/s
memory
disk (swap)
type
slow
large
cheap
65
Balance
reg
L1
L2
memory
disk (swap)
66
Balance Example
67
68
Balancing a System
Bound
e.g., larger memory => less paging => lower I/O demand
Amdahls rule:
1 MIPS <=> 1 MB memory <=> 1 Mbits/s I/O
69
70
System
copy
scale
sum
saxpy
Cray C90
7000
7000
9400
9500
Cray T932
10800
10200
13000
13700
Alpha 150Mhz
98
90
68
90
Cray T3D
380
330
190
180
10
10
100
problem size
71
72
Bursty Behavior
Cost
1 MB
$60
256 KB
$50
$40
$20
16 KB
$10
64 KB
$30
1976
73
IC Cost
cost (IC) =
1980
1982
1984
1986
1988
1990
74
Total Cost
costs
component: processor, DRAM, disk, power, packaging
direct: manufacturing (labor, scrap), warranty
cos t ( wafer )
--------------------------------------------------------------------( die wafer ) yield ( die )
cost (die) =
1978
yield (die) =
defects cm area
yield ( wafer ) 1 + --------------------------------------------------------
often is 0.30
cost (die) = f (die area4)
2000 by Hill, Wood, Sohi,
Smith, Vijaykumar, Lipasti &
Roth
75
76
Manufacturing Cost
Price
per unit
~$5B
chip testers/debuggers
~$5M a piece, typically several hundreds of them
77
78
H+P
chapter 1
HJ+S
Moore, Cramming...
Amdahl, Validity...
e.g., cant spend too much on R&D for the price/volume point
Emer+Clark, A Characterization...
79
80