Anda di halaman 1dari 24

A Conceptual Framework for Computer Architecture*

S.S. REDDI
W. W. Gaertner Research, Inc., 1492 High Ridge Road, Stamford, Connecticut 06908
E.A. FEUSTEL
Laboratory for Computer Science and Engineering,
Department of Electrical Engineering, Rice University, Houston, Texas 77005

The purpose of this paper is to describe the concepts, definitions, and ideas of
computer architecture and to suggest that architecture can be viewed as composed
of three components: physical organization; control and flow of information; and
representation, interpretation and transformation of information. This framework
can accommodate diverse architectural concepts such as array processing,
mieroprogramming, stack processing and tagged architecture. Architectures of
some existing machines are considered and methods of associating architectural
concepts with the components are established. Architecture design problems and
trade-offs are discussed in terms of the proposed framework.
Keywords and Phrases: computer architecture, framework, composition of
architecture, information flow, physical organization, unification of diverse
architectural concepts.
CR Categories: 6.0, 6.20, 6.22, 6.29.

INTRODUCTION viously possible. In the literature there is


a multitude of proposals as to how computer
Computer architecture is receiving, and architecture can be defined and how an
will continue to receive special attention architect's job can be described. Unfortu-
as novel architectures differing from the nately, most of these proposed concepts
classic von Neumann organization emerge touch only different facets of computer
as viable approaches to the problem of architecture and do not encompass the
increasing the computational speeds and complete spectrum of architectures. In this
cost-effectiveness of computer systems. paper we present a conceptual viewpoint
Computers such as the CDC 6600, CDC that allows a coherent and unified treatment
STAR-100, TI ASC, Burroughs B6700, of computer architecture. We believe that
Goodyear STARAN and CRAY-1 are con- computer architecture can be viewed as
vincing arguments that architecture plays
composed of 1)physical organization; 2)
a prominent role in deciding computer
system performance and in achieving faster control and flow of information; and 3)
computational speeds than has been pre- representation, interpretation and trans-
formation of information, and we develop a
* This work was supported by NSF" Grant GJ framework for architecture based on this
36471, and was performed while the first author
was at Rice University. viewpoint. We consider some existing com-

Copyright © 1976, Association for Computing Machinery, Inc. General permission to republish,
but not for profit, all or part of this material is granted provided that ACM's copyright notice is
given and that reference is made to the publication, to its date of issue, and to the fact that reprinting
privileges were granted by permission of the Association for Computing Machinery.

Computing Surveys, Vol. 8, No. 2, June 1976


278 • S. S. Reddi and E. A. Feus~l

CONTENTS EXISTING DEFINITIONS AND INTERPRETATIONS


OF COMPUTER ARCHITECTURE
We first consider the various definitions and
interpretations of a computer architect's
job and of computer architecture as pre-
sented in the literature. According to Brooks
[8]: "The computer architect designs the
INTRODUCTION external specifications, gross data flow, and
EXISTING DEFINITIONS AND
INTERPRETATIONS OF COMPUTER gross sequencing of a system. He is, like
ARCHITECTURE the building architect, the user's advocate.
FRAMEWORK FOR COMPUTER ARCHITECTURE
SOME EXISTING COMPUTER ARCHITECTURES
He must balance the conflicting demands
Phye/cal Organization of engineer (cost, speed), programmer (func-
CONTROL AND FLOW OF INFORMATION tion, ease of use) and marketing (function,
Procemor Organization
Stack Mechanism speed, cost) to yield the machine of greatest
Repreeentation, Interpretation and Transformation true value to the user . . . . " Foster in Com-
of Information puter architecture [17] introduces the archi-
Representation and Interpretation of Instructions
Representation and Interpretation of Data tect as follows: "The computer architect in
Transformation (or Dynamic Representation) turn is unconcerned with the insides of an
of Information
Physical Organization and Control of Information Flow
adder or a shift register. His job is to as-
Basic CPU Organization semble the units turned out by the logical
I/O Handling designer into a useful, flexible tool that is
Memory Organization
Physical Organization, Control of Information called a computer." Beizer [6] describes the
Flow and Representation and Interpretation of architect's job as " . . . the design of a hard-
Information ware/software complex, subject to realistic
ARCHITECTURAL CONCEPTS AND
CONSIDERATIONS technical, economic, operational and social
Array Orgamzation constraints such that it 1) works, 2) is opti-
Pipeline Orgamzation
Modular Organization mum and 3) survives." He summarizes the
Stack Proce~ing architect's role by stating that "it is syn-
Virtual Memory thetical, catalytic and translative. His de-
Virtual Machines
Parallel ProceMing sign is a synthesis of the substances of sub-
Tagging of Information ordinate disciplines." Abrams and Stein
Emulation
Developing an Architecture
[1] explain the architect's duties: "The job
Dynamic Architectures of the computer system architect is to de-
REFERENCES velop an overall concept of a machine--what
it can do and how that solves the problem
for which the machine is intended. Just as an
architect who designs houses must consider
utility, appearance, and compatibility with
the neighborhood, so must the computer
designer balance requirements, user inter-
face, and costs to make a viable design."
Several explicit definitions of architecture
also can be found. The term "Architecture"
puter architectures and show how to associ- is used by Amdahl, et el., [2] in introducing
ate architectural concepts and innovations the IBM System/360 "to describe the at-
with these three components. We then
develop architectural concepts that a system tributes of a system as seen by the pro-
architect can use and verify that these con- grammer, i.e., the conceptual structure and
cepts can be accommodated within our functional behavior, as distinct from the
framework. Finally, we indicate how our organization of the data flow and controls,
hypothesis can lead to the concept of dy- the logical design, and the physical imple-
namic system architecture. mentation." Foster [17] explains: "The field

Computing Survey¢, VoL 8, No. 2, June 1976


A Conceptual Framework for Computer Architecture • 279

of computer architecture, or 'the art of de- organization have different architectures;


signing a machine that will be a pleasure to they differ in their physical organization
work with' is only gradually receiving the and the way they handle control and flow of
recognition it deserves. This art, one can- information. Tagged architecture [12] dif-
not call it a science, is one step more ab- fers from yon Neumann architecture in
stract than that of a logical designer, which representation and interpretation of informa-
in turn is abstracted from the study of elec- tion. The term "transformation" in the
tronic circuits." Finally Foster [16] also third component refers to the situation
suggests: "Computer Architecture is the where representation and interpretation of
profession of adopting present day tech- information are changed dynamically by
nology to the solution of current computing the system for user convenience, system
problems and of dreaming about the future efficiency, or for implementing a software
of the field in such a way as to influence it system to support privacy and protection.
for the better." This kind of transformation is to be dis-
tinguished from the algorithmic transfor-
mation on data undertaken in solving spe-
A FRAMEWORK FOR COMPUTER
cific problems.
ARCHITECTURE
The disparities in existing definitions and SOME EXISTING COMPUTER ARCHITECTURES
interpretations of computer architecture are In this section we consider some existing
directly traceable to its multifaceted nature. computer systems and explain their archi-
Since computer architecture can be viewed tectural features using our three components.
from different perspectives, each individual These explanations are sufficiently detailed
forms his own notion and interpretation. so that the reader may become familiar
A FORTRAN user may not perceive signifi- with the process of associating architectural
cant architectural differences between IBM concepts with the components. In addition
and CDC computers other than word and we consider microprogramming, an estab-
core size. On the other hand a system pro- lished architectural feature, to reinforce our
grammer can easily distinguish the archi- arguments regarding the validity of our
tectures of these computers in terms of their hypothesis.
operation codes, address formation, and
input/output. However, he will find it diffi- Physical Organization
cult to distinguish an IBM 370/125 from an
IBM 370/155, or a CDC 6600 from a CDC Technological advances achieved in the past
7600, and will be unable to perceive hard- decade enabled architects to propose many
ware parallelism that may exist in the central innovative physical organizations for com-
processing unit. Thus architectural details puter systems, some of which have already
can be transparent or visible depending on been realized in practice. Three computer
the viewer and his level of perception. systems, the CDC 6600, ILLIAC IV and TI
Our hypothesis is conditioned by the ASC, are studied to show the different
above observations, as we compare and con- organizations that can be conceived. The
trast architectures and architectural features CDC 6600 is an example of distributed
in terms of the following three components. computing and employs an organization
For a valid comparison we should choose which exploits functional parallelism. The
commensurate levels, i.e., user level, system ILLI•C IV and TI ASC use array and pipe-
programmer level. line organizations respectively for enhanced
1) Physical organization; performance. All three systems depart from
2) control and flow of information; and the conventional yon Neumann machine
3) representation, interpretation and in their physical system organization.
transformation of information. CDC 6600 Central Processor Organization:
Let us consider some examples. The The CDC 6600 [32] organization shown in
ILLIAC IV and a conventional von Neumann Figure 1 has ten independent functional

Computing Surveys, Vol. 8. No, 2, June 1976


280 • S. S. Reddi and E. A. Feustel
18-BIT 18-BIT
ADDRESS INDEX P REGISTER
REGISTERS REGISTERS
I I
I0
FUNCTIONAL
UNITS

PPO I ADD
MULTIPLY
MULTIPLY
EXTERNAL PP3 1 DIVIDE
PERIPHERAL ,m.-m FIXED ADD
EQUIPMENT INCREMENT
CHANNELS INCREMENT
,
BOOLEAN
---I,, SHIFT
ppcj J BRANCH
|

PERIPHEI
PROCESSq
T
J SCOREBOARD

60-BIT
INSTRUCTION
STACK

I6
I5
I4

I2
I1
INSTRUCTION
REGISTERS
INPUT REGISTER
I 4 I

Figure 1. Block diagram of the CDC 6600 computer system, an example of distributed
computing architecture.

units, twenty-four data and address registers, i = 6, 7, into the memory location specified
an instruction stack and a scoreboard in its by A,. The index registers used are to modify
central processor. The ten functional units, the addresses in the address registers. (For
which are independent of each other, are instance, there is an instruction which adds
capable of simultaneous operation. The As and Bk and transfers the result to A,).
functional units are specialized to perform The index registers are also used to store
operations such as floating add, floating fixed-point integers and manipulate float-
multiply, floating divide, shift, fixed add, ing-point exponents. Registers Ao and Xo
etc. The twenty-four registers are divided are used to reference the extended core
into data (X~), address (A,) and index (B~) storage. The partner relationship permits
groups; each group consists of eight registers. concurrent calculation of arithmetic and
Except for data registers which are sixty address information. All arithmetic compu-
bits long, all registers are eighteen bits tations are performed on operands from the
long. Data registers XI through X5 are twenty-four registers with results returned
used as "read" registers and X6 and X7 as to these registers.
"store" registers. There is a "partner" re- Instructions are loaded into the processing
lationship between X, and A~ registers. An unit under the control of P register (see
operand is fetched into X~, i = 1 to 5, from Figure 1). The instruction stack contains
memory by loading its addresses into A,; eight 60-bit registers capable of holding
similarly an operand is stored from X~, up to 32 instructions. Instructions are ae-

Computing Surveys, Vol. 8. No. 2, June 1976


A Conceptual Framework for Computer Architecture • 281

cessed from memory and stored in a last in, The processing element consists of the
first out (LIFO) manner in the instruction processing element memory (2048 64-bit
stack. When the program executes a branch words), arithmetic and logic circuitry, and
to one of the instructions stored in the registers. Each element executes the common
stack, that instruction is directly fetched instruction issued by the control unit on
from the stack rather than from memory. data stored in its memory. An element will
In this way it is possible to cut down the not execute the common instruction if its
number of memory accesses in the case of enableflip-flop is not set. In other words all
loops that can be accommodated in the the enabled elements execute the same in-
stack. Instructions pass from the instruc- struction issued by the control unit. Digital
tion stack to three instruction registers Equipment PDP-10 computers perform the
U0, U1 and U2 whose contents are interpreted executive control of system operation, I / O
and decoded by the scoreboard. The score- processing and supervision, and compilation
board examines the instruction to be exe- of the ILLIAC IV programs.
cuted and determines whether the func- The TI ASC System: A principal feature
tional unit and registers needed for the execu- of the Texas Instruments Advanced Scien-
tion of the instruction are available; if they tiiic Computer (ASC) [31] is the central
are available the instruction is initiated. processor's four pipelines (Figure 2). If
Otherwise, the instruction is held until computational requirements do not warrant
they are available. The scoreboard main- the capacity of four pipelined arithmetic
tains the status of each central register and units, one or two pipeline CPs are available.
functional unit so that it can decide when The CP pipeline consists of the instruction
an instruction can be issued. The score- processing unit (IPU), the memory buffer
board provides interlocks between instruc- unit (MBU), and the arithmetic unit (AU).
tions so that though parallelism in instruc- The I P U supplies a continuous stream of in-
tions is exploited, precedences among in- structions for execution by the other units
structions as specified by the programmer of the pipeline. It fetches and decodes in-
are still preserved. structions, accesses and stores operands, and
The ILLIAC I V System: This system [5] handles branch conditions. The MBU pro-
employs an array structure for organizing vides an interface between the central mem-
its processors. It consists of 256 processing ory and AU by supplying a continuous
elements arranged into four quadrants. stream of operands to the AU and storing
Only one quadrant has been constructed results in the memory. The AU performs
because of economic considerations. arithmetic operations and is pipelined as
A quadrant consists of a control unit and shown in Figure 2. Depending on the mode
sixty four processing elements. Processing of arithmetic operation, the pipeline is struc-
element i is connected to processing elements t~ured to divert operand flow along the dot-
i W 1 (mod 64), i - 1 (mod 64), i -~ 8 (mod ted or solid lines. The central processor is a
64) and i - 8 (mod 64). The control unit vector as well as scalar processor and has
fetches instructions from a processing ele- vector instructions at machine level.
ment memory, decodes them, and issues
control pulses to the processing elements
for execution. It broadcasts memory ad- CONTROL AND FLOW OF INFORMATION
dresses and data words when they are Information flow and its control in a com-
common to all processors. An instruction puter play a prominent role in deciding the
can be either a control or processing unit computer architecture. This component en-
instruction. The former directs operations compasses the aspects of control mechanisms
local to the control unit whereas the latter and schemes used for controling and direct-
controls the execution of the processing ing the system's information flow. Note the
units. The control unit is designed to over- interdependence between the control and
lap the executions of the two different in- flow elements. Sometimes control initiates
struction types. information flow (as in conventional syn-

ComputingSurveys, Vol. 8, No. 2, June 1976


282 • S. S. Reddi and E. A. Feus~l
FLOATINGADO FIXED MULT
I
J J
I I
PRIMARY
MEMORY f RECEIVERREGISTER
PORTS
L ....

EXPONENTSUBTRACT

ALIGN

FOUR--RIPFLINECP
ASC 4X
+t
t
I--
[ j
PRIMARY f
MEMORY
PORTS

I ACCUMULATE

L- _I
TWO--PIPELINECP
ASCZX
RESULT RESULT
Figure 2. Processing and arithmetic pipelines of the TI/ASC system.
(Reprinted, by permission of Texas Instruments Inc., from A description of the advanced scientific computer system).
chronous computers where control pulses are designing the B6700. The system is pro-
issued periodically to direct information grammed using high level languages (e.g.,
flow) and sometimes information flow initi- ALGOL, COBOL and FORTRAN) and does not
ates control (as in interrupt and data driven offer the user a traditional machine level
schemes); thus the control element may be language. The system hardware is designed
isolated from or incorporated in information to handle block structured languages and
flow. Control may be centralized (as in most procedures efficiently by means of a stack
computer systems) or decentralized (as in a mechanism and display registers. One learns
network of microprocessors). to appreciate the system architecture more
Alteration of information flow in a tradi- when one becomes familiar with the aspects
tional computer leads to novel and interest- of system implementation associated with
ing architectures. A familiar example is the block structured languages, such as access-
Burroughs B6700, a stack processor. In the ing global and local variables, address for-
following we describe briefly the B6700 mation for accessing procedure segments,
processor and stack organization and indi- and nesting of blocks.
cate how information flow is affected by the
Processor Organization
stack.
The Burroughs B6700: An integrated A block diagram of the processor [10] is
hardware-software approach is taken in shown in Figure 3 [11]. A program word is

Computing SurveyB, Vol. 8, No. 2, June 1976


A ConceptualFrameworkfor Compub~rArchitecture • 283

ARITNMETI¢ PROGRAMCONIROLLER I STRING 1


CONTROLLER (SYLLAIL[ DECOOE) OffERATOR
CONTROtLER

~ L ~10 mJS
1 | | , I I l |

I I
FAMILYJ
ARIThmETIC LOGICAL SU|ROUTINE JWORDORIENTEDJ SCALING VALUECALL NAMECALl.
OPS OPS O,S 1 OPS I OPS OP OP
| I ! ! I I
OPERATORDEI~ENOENTINTERRUPTS ,~21I ll(JS

,,X/z STACK
.U. ADJUST
CONTROLLER
FROMCONTROLLERS
INTERRU~S CONTROLLER INTERRUPTS AND FAMILYOPS

TRANSFERCONTROLLER
[J ~ INTERNAL
TRANSTER~ /
INPUT ~ OUTPUf

J r .... fOR , / o PRC~

Mf MORY
J
INFO
• I

-I
PROG~
SEGMENT
A

PROGRAMWORD . . . n J PROGRAMtNOIEXREGISTER /I-

PROGRAMWORD 3 f -------~ PROGRAM~ ~GISTER J


I
PROGRAMWORD 2
--J
I I
PROGRAMWORD !
PROGRAMWORD o "
ADDER

Io I, 12 I~ I. I,
II "P" REGISTER

LlllLl PSR ]

W OPERATOR F A M I L Y " T " REGISTERS


W
Figure 3. A functional diagram of the Burroughs B6700, a stack processor.
(Reprinted, by permission of Burroughs Corporation, from The B6700 iuformat,~treferenc~manual I0586~).

Computing Surveys, Vol, 8, No. 2, June 1976


284 • S. S. Reddi and E. A. Feustel

transferred to the P register through the


I Tog)of stink rl~ters
memory controler under the control of the I
program controler. The program controler u/o I
examines the instruction to be executed and
initiates the proper operator family. Re-
lated operators are grouped into a single
!
operator family; thus family A performs all
arithmetic operations, family B performs I
all logic operations, and so on. The arith-
metic and string controlers are enabled by
their respective family operators; they con- ---7-]
trol and supervise the execution of arith-
metic and string operations.
The transfer controler contains registers
A, X, B and Y necessary for setting up the
stack. The interrupt controler provides a
method of interrupting the program flow'by
StiCk a(em

to p r o g r m
+5
W O R D nt

setting up necessary control words in the


stack. The processor takes the appropriate
+:++y ,2 i=i
action by examining the control words.
The memory controler handles storage re-
quests and contains an adder which is used
to generate addresses. The stack controler
is responsible for automatic stack adjust- .. J. L......... j
Sll<:k
ment; it manipulates data between main I'I'II~
IM+lll
memory and the A and B registers during
the pop-up and push-down operations of Figure 4. The stack mechanism of the B6700.
( R e p r i n t e d , b y p e r m i s s i o n of J o h n W i l e y a n d S o n s ,
the stack. I n c . , f r o m Mult~processors and parallel processing,
The machine language operators are P.H. Enslow (Ed.).)
composed of eight-bit syllables and each
memory word consists of six syllables. The
operator syllable pointed to by the program the bottom of stack (BOS) and stack limit
syllable register is decoded, and the operator (SL) registers.
family specified by the operator syllable is Word formats used in the system are
selected to receive the execute signal. The shown in Figure 5 and are determined by
T register in each family specifies the opera- the hardware using the leading three tag
tion to be performed in that particular bits. Data is addressed by means of data
family. descriptors (DD), indirect reference words
(IRW) and stuffed indirect reference words
(SIRW). Program code is accessed through
Stack Mechanism
segment descriptors. In descriptor words
Registers A and B are used to set up the the address field contains the absolute ad-
stack [13] (Figure 4); registers X and Y are dress of an array in either main or disc
used as extension to A and B when double memory depending on whether the presence
precision operands are used. The stack is bit P is one or zero. The length field defines
formed by linking an assigned area of the the length of the array. I R W and SIt~W
memory to A and B. When A and B are are used to address data located in the
filled up with operands and a third operand stack memory area of the job.
is entered into the stack the operand in B The stack keeps track of the stack history
is pushed into the stack's memory area. list and the addressing environment list.
Register S contains the address of the last The stack history list consists of blocks and
word placed into the memory area. The procedures contained in the stack and hence
stack memory area of a job is bounded by is dynamic in nature. The addressing en-

C o m p u t i n g Surveys, Vol. 8, No. 2, June 1976


A Conceptual Framework for Computer Architecture • 285
DATA WORDS
Smgle
IOOO IJ Exponent I M...... I precision
operand
I Double
Jolo j zx~ .... I Mant,ssa(MS) I precision
operant
] ] I 1St word
Double
Io,o E,p..... ,Ms,] M...... ,Ls, J Orecislon
operand
2nd word
L-6 b.... I. 39 8,, .I
DESCRIPTOR WORDS

I°Pt I co°. I Address


Data
descriptor (DD)
I I I
Address
Segment
I°PI I! descriptor (SO)
20 b*ts ~r, 20 bits d
SPT.CIAL CONTROL WORDS
Mark stack
1011 Stack no t D,spl....... LL I DF I control
word (MSCW)

I 111 Stack no t rogram


syllable index
Address
couple
] Program
control word (PCW)

I
I 011 1 Program
syllable index
Address
couple
J Return
control
word (RCW)
Indirect
Address J reference
J0o, couple word (IRW)

Stuffed indirect
Joo1 Stack no Displacement Delta I reference
wood (SIRW)

Top of stack
[011 FF i
states DS OF J control
word (TOSCW)
I ~=~
10 b~ts I ,~b,,, .~
- 4 bGts-~J i( 20 bit= .v_, 14 bits ---~
Figure 5 Control word and data formats of the B6700.
(Reprinted, by permission of John Wiley and Sons, Inc., from Multiprocessor8and parallel processing,
P. H. Enslow ted.).)

vironment list enables the system to access and an index value ((7) (Figure 6). When a
local and global variables relative to the variable is specified by an address couple,
block or procedure presently under execu- one can deduce from the execution environ-
tion. ment and the lexicographie level which
Marlc stack control words (MSCW) are block or procedure specifies the variable.
used to maintain both the stack history and The display registers (DO to D31) contain
addressing environment lists. Figure 6 shows the addresses of the MSCWs of the pro-
an example of how an MSCW is entered cedures that are linked by the address en-
when a procedure is entered. The parame- vironment list to the procedure under exe-
ters and local variables of the procedure are cution. By adding the index value to the
entered following the MSCW and hence contents of the appropriate display register
are referenced by addressing that is relative the absolute address of the variable is
to the location of the MSCW. The DF generated. As an example let V1 be ref-
fields of the MSCWs keep track of the erenced in Procedure B. Since V1 is repre-
stack history whereas the DISP fields main- sented by the address couple (2,2) the
tain the addressing environment list. system obtains the address of the appropri-
The variables are accessed by means of ate MSCW from the display register D2
address couples. An address couple consists and adds 2 to it to obtain the address of V1.
of the lexicographic level (LL) of the variable Note that the display registers must always

Computing Surveys, VoL 8, No. 2, June 1976


286 • S. S. Reddi and E. A. Feuslel

ii II ii

~o
'4-

X
> > > > > $-

O--J
N~
£ II II II £ II II £ II ~ 11 II ~ II

0 c~c~ c~ o r ~ m o cu o e o ~ o o ~
cJ ~J ~ ~J ~J

g g ~

=..0 9 o= e~

<

l ' ~ ~ ' I ¢ I I D-~ l

. . . . . . . . . . . . . . . . . . . . . . . . . . .

i I m I
I I
i I I II I I ii
I I I I i I I
I I I I I
I I I I
...... I~
Y: 'I ' ~

~>~ I I
o I I
< w J I I
i I i i
I I I I I , I

'~ ~- < 1 ,, 11
L 1

Computing S u r v e y s , V o l . 8, N o 2, J u n e 1976
A Conceptual Frameworkfor Computer Architecture 287

I sqrt J-

SEGMENT
TRUNK

SEGMENT
DICTIONARY

begln L.~
Integer a, b, c
Fprocedure neg(x:y) ; ip Fp
n e g i ~nteger x, y,
~ X ~ x-y
L I n p u t values f o r a and b,
neg(a,b),
f--beg i n
J integer x,
I x ÷ a+b,
Jr-begl n
B JJ Integer y,
IC/ ~ ~ x**z, STACK
/ / x ÷ x+y;
J~en~, AREA

Len~ sqrt(x),
prlnt (a+b+c),
end

Figure 7. Program representation in the B6700.

point to the address environment list and tation, and transformation of information
have to be changed upon procedure exit or that a computer may use.
entry to the correct MSCWs. Program Representation in the Burroughs
The stack mechanism also provides the B6700: Consider the ALGOL program shown
capability of handling several active stacks in Figure 7. The system keeps track of the
by organizing them as a tree. It is beyond program segments by means of the segment
the scope of the present paper to go into dictionary. The instruction pointer (IP) and
these details. Interested readers should Environment Pointer (EP) provide the
refer to Organick [27]. physical location of the instruction in the
program as well as the environment under
Representation, Interpretation and Transforma- which the instruction will be executed. (We
tion of Information discussed in the preceding section how the
display registers which constitute the EP
The information processing capabilities of provide environmental information and
a computer system depend to a large extent pointers are set up in the stack area for the
on how the system interprets and represents execution record of the program). The IP
the information. In this section we consider is a three-tuple of the form either (1, i, j) or
different types of representation, interpre- (0, i, j). (1, i, 3) indicates that the instruction

ComputingSurveys, VoL 8, No. 2, June 1976


288 • S. S. Reddi and E. A. Feustel

resides in the j t h location of the ith seg- of the operand can be found in the location
ment in the segment dictionary. (0, i, j) specified by the address part of the instruc-
means that a "system intrinsic" (i.e., sys- tion. The instruction repertory of a com-
tem routine) is addressed. The j t h location puter system includes instructions which
of the ith segment in the stack trunk which allow jump operations and which handle sub-
contains supervisory code segments and routine operations.
system tables is accessed. In the example The instruction set can be significantly
print and sqrt are system intrinsics. The affected by the system architecture. For
segment descriptors indicate whether the instance, consider tagged architecture [12]
segments are in the main memory and how which differentiates integer operands from
to locate the segments on the disc if they floating point operands at machine level.
are not in the main memory. In this case it is not necessary to have sepa-
This kind of program representation leads rate instructions for floating point add and
to efficient memory utilization. Since the integer add. When an add instruction is
stack trunk, stack segment and stack dic- issued, the system, by examining the ope-
tionary compactly represent the program rands involved, decides whether it is a
status and its code requirements, the work- floating point or integer add. Another ex-
ing sets tend to be small. Also the repre- ample of the effect of architecture on the
sentation leads to sharing of programs and instruction set is the stack processor (e.g.,
data. Though two programs may share a the B6700).
common procedure segment and hence
have the same instruction pointers, their Representation and Interpretation of Data
EPs are different. For further discussion of
the B6700 and its system description see Knuth [25] explains data as: "representa-
[27]. tion in a precise, formalized language of
some facts or concepts, often numeric or al-
Representation and Interpretation of phabetic values, in a manner which can be
Tnstructions manipulated by a computational method."
Inevitably, when data represents mathemat-
Instructions determine the information flow ical concepts or real life situations, relation-
and reflect the structure and capabilities of ships between the data elements are bound
the system. One of the principal duties of to exist. Since data is a representation, the
the computer architect is to develop a com- precision to which the representation should
prehensive instruction set which is simple to be expressed becomes a factor. A properly
use but exploits the system resources to conceived computer system should cope with
their fullest extent. the problems of relational and precision re-
An instruction can be of zero-address, presentations of data.
one-address or multi-address format. Zero Floating point numbers can be represented
address instructions are used in computers in the IBM 370 system series in short (32
with stack processing where the use of bits), long (64 bits) or extended precision
operands on the top of the stack is implied (128 bits) form. The system uses the byte
by the instruction. Single- and multi-address as its basic storage unit which consists of
formats are used in computer systems 8 bits. When a floating point instruction
modeled after yon Neumann systems. like ADR (add long floating point) is to
Richards [29] gives an excellent discussion of be executed, the system fetches 8 consecu-
the various address formats. tire bytes into the central processing unit
Addressing of operands in an instruction (CPU) where the left-most (or the first)
can be indexed or indirect. When the ad- byte is the one addressed in the instruction.
dressing is indexed, the addresses of the The system recognizes the operands as float-
operands are formed by adding or subtract- ing point, integer, etc., by examining the in-
ing the contents of "index" registers to the struction rather than the operands.
address parts of the instruction. When the In the Burroughs B6500/7500 organiza-
addressing mode is indirect, then the address tion, data words are distinguished as single

Computing Surveys, Vo! 8, No 2, June 1976


A Conceptual Frameworkfor Computer Architecture • 289

precision (48 bits) or double precision (96 results because binary patterns can usually
bits) operands by attaching 3 tag bits to be compacted and stored as binary words.
every word (51 bits). Data may be refer- When the matrices are not sparse it may
enced as an operand (without any qualifi- be efficient to store them in major order
cations), and the processor knows by exa- fashion in rows or columns. Another ex-
mining the tag bits whether the operand is ample of where representations may be
single or double precision. For example, when changed is in the storing and accessing of
a command is issued to store an operand on data elements. The elements may be stored
the top of the stack, the word specified by and accessed by a table look-up or by hash
the operand's address is fetched and exa- coding techniques depending on the appli-
mined. If the operand is double precision, cation.
then the next word is also fetched and Privacy considerations may also warrant
stored in the stack. The system recognizes making changes in the representations of
the type of operand, e.g., integer, or floating information. Consider the system where a
point integer, or extended precision by common data bank has to be shared by
examining the instruction. Tag bits also different users and each user is authorized
distinguish data words from the program to access only some portions of the data.
code. Hence, when a job attempts either to The system may encode the data supplied
execute data as part of the program or to by the user and store the encoded data in
modify the program, an interrupt is issued. the bank. The conversions of encoding or
privacy transformation are performed to
Transformation (or Dynamic Representation) ensure that only authorized users can gain
of Information access to a data set. When the user sup-
plies the proper identification the system
The representation of information can be decodes and presents the requested data to
static or dynamic. However, a computer him. Privacy transformations are discussed
may be used to determine dynamically the in [20].
changes in the representation of informa-
tion that are needed for user convenience, Physical Organization and Control and
system efficiency, and privacy. Programs
Information Flow
are usually represented at the user level in
high level languages. The representation of It is sometimes necessary to create new
programs is then changed to machine level control paths among the physical resources
languages for execution. Changes of repre- of a computer system to exploit the paral-
sentation are performed for the convenience lelism that is present in hardware and pro-
of the user. (Note that this concept makes grams and to increase the system's per-
it possible to distinguish between systems formance. In such cases both the physical
which use compilers and interpreters for and control elements contribute to the
program execution.) Further examples of desired objective. A typical application of
this type of representational changes in- this approach may be seen in the over-
clude automatic transformation of ASCII lapped operation of I/O and processor
characters to EBCDIC by the system. computation found in most contemporary
System efficiency may dictate that differ- computer systems. Sometimes information
ent representations of information be used flow is controled to exhibit to the user a
in different situations. Sparse matrices machine architecture that is not real. An
(matrices whose elements are mostly zero) example of this is the compatibility feature
can be economically stored by means of found in the IBM 370 series. We examine
binary patterns and lists of nonzero values. selected architectural features of the IBM
The use of ones in binary patterns indicates 370 series and the CDC 6600 computers
that their corresponding matrix elements and indicate how these features can be ex-
are nonzero. The values of these elements plained as a combination of physical organi-
are obtained by choosing the appropriate zation and control of information flow.
values from the list. An economy in storage TBM System~370 Series. Let us now

Computing Surveys~Vol. 8, No. 2, June 1976


290 • S. S. Reddi and E. A. Feustel
Channels IO lnlerface Control Unlls Input/Output
(IO)
Devices

MainSIorage - - Mul[iplexer

- - CPU-ChannelControl Lines ~[
Data Transfer Lines

m
Storage Addrtss •I
I MAIN STORAGE

x ij 1
I 1 IFI-"n.-P°'ntR°, .....

Figure 8. Architecture of the IBM System/370.


(Reprinted, by permissionof IBM Corporation, from Harry Katzan, Jr., Computer organization and
the System/370.)

consider some salient architectural features some of their performance in maintaining


of the IBM 370 series, [24] the main feature compatibility. Small models use much of
of which is the compatibility between the their memory in providing compatible
various models in the series. Though the software and are burdened by sophisticated
models vary in their designs, performance I/O features suitable for larger models.
indices, and prices, they exhibit the same Large models can be operated more effi-
architecture to the user. This is accomplished ciently if the downward compatibility
by providing the same instruction set and feature is not required [13].
employing microprogramming in all btlt the
largest models. An advantage of this feature Basic CP U Organization
is that the user is offered models of varying
performance indices and storage capacities. The user sees the architecture of the system
If the user finds that a more (less) powerful as in Figure 8. The central processor con-
system is needed than the one currently sists of sixteen general registers and four
being used, he can switch to a higher (lower) floating point registers. The general registers
numbered model without reprogramming. can be used to hold operands or as index and
A disadvantage is that the models lose base address registers. The floating point

Computing S u r v e y s , V o l . 8, N o . 2, J u n e 1976
A ConceptualFrameworkfor Computer Architecture • 291

I Ir,t II dl W~lrd I f Sct~nd Halt'Word 2 Third Half Word 3 I/0 Handling


Byt~ I I Hyl~ 2

Rt~ll~tt r RL~l~lJr The responsibility of I/O handling is shared


Op~rmd I Op~rmd 2
by the control unit of the CPU and I/O
~ ] RR Fornl.I
o Op('od¢ TII ~tl2 Is I channels. The channels have their own
I
i
t
Rt~l~tt c I Addrcs~ registers and are capable of performing
Ol%rind I I Oper,lnd 2
data transfers between memory and I/O
i o0.. i ~, ] x: I s~ I °, ~XF .... devices. The CPU specifies in its I/O com-
io 7 !a Ii 1~ ts le m ~o 311
I Rtgl,l( r Rt gi~lttr Address I '
mands where the channels can find their
[ Ol~.raml [ Ol~-rand 3 OF~rand2
I ~ r ~- I commands in the main storage. The I/O
00~Cod. I R, [ R, i s2 I °' i~SF ..... requests for memory are given top priority
7 it I T 12 16116 tl 20 31l
: I I mmtdlat* : Address : (i.e., the cycle stealing technique is used).
I I Operand I Operand I I
A significant feature of the I/O organization
OpCode S~ ] D, J Si Format is that the user can configure the system
Io 7iS Is~a lifo 311
~I Length I Address I Address
by attaching or removing I/O devices.
I Operand I Operand 2 Operand I a Operand 2

I o,,c,,,,° i ,-, l ,-, i s, I o~ is~l o, Memory Organization


• 7I t I 12 Is IS lip ~0 3t SS Format 47

Figure 9. Instruction formats of the IBM/370. There is no one fixed memory organization
(Reprinted, by permissionof IBM Corporation, from for all the models. Models 155 and 165
Harry Katzan, Jr., Computerorganizationand the provide 4K buffer storage systems in ad-
dition to their main storage units. The
buffer and main storage are organized into
registers hold floating point quantities. rows and columns. At the intersection of
The registers reduce the number of memory each column and row there is a block of 32
accesses for data by storing temporary bytes, i.e., the storages are partitioned into
operands; this reduction in memory ac- blocks of 32 bytes and each block can be
cesses in turn reduces conflicts for memory specified by a row and column. (A byte
by the I/O and CPU units. The system has consists of 8 bits.) An address array main-
a program status word (PSW) register tains the addresses of the elements in the
whose contents indicate the status of the buffer. When the CPU makes a storage
program under execution; this enables the reference, the address array is consulted to
system to handle interrupts and multi- determine whether the referenced element
programming. is in the buffer. If the element is present it
Each model has a different engineering is sent to the CPU; otherwise the element is
design. For instance, the simplest model fetched from the main storage and dis-
125 does not provide any hardware adders patched to the CPU. Then the block con-
whereas model 165 has an address adder, taining the element is stored in the buffer
a parallel adder, and serial adder. The in- in the following manner: Blocks are trans-
struction formats for the system are shown ferred from the main storage to the buffer
in Figure 9. They specify the contents of columnwise, i.e., a block in column i of the
registers and/or memory locations as main memory is transferred to column i of
operands. The instruction execution ap- the buffer. The block that is to be stored in
pears to the user as sequential; however the buffer replaces the least recently used
high performance models employ over- block in its column. Model 165 also uses
lapping of instruction and operand fetching interleaving for its memory organization.
with instruction execution, and prefetching The storage system provides a protection
instructions along both paths of a branch. feature which can be used in multipro-
The system hardware cannot recognize gramming. The feature is implemented by
structured operands (e.g., vectors and dividing the main store into blocks (of
matrices) and it is up "to the programmer to 2048 bytes) and assigning storage keys to
make the system recognize such operands the blocks. Each active program has a
by programming. protection key associated with it. Usually

Computing Surveys, ~rol 8, N o . 2, J u n e 1976


292 • S. S. Reddi and E. A. Feustel

the operating system assigns the protection also checks whether the requested address
keys to the programs. A program can store violates bounds. The storage distribution
in a block only when the protection key of system is responsible for transferring re-
the program matches the storage key of quests and data to and from the central
the block or the protection key is zero. storage.
The storage operation is inhibited and an The secondary storage consists of 15,744
alarm signal is given if the keys do not 488-bit word core memory. The 488-bit
match and the protection key is not zero. words (8 of which are parity bits) are dis-
The storage key has an extra bit which pro- assembled to 60-bit words used in the central
tects fetch operation. If the bit is zero, only storage. The CPU can transfer any number
store operation is protected. Otherwise of 60-bit words between the central store
both store and fetch are protected. and ECS by simple commands. The major
CDC 6600 Memory Organization. The advantage of the ECS is that it can trans-
CDC 6600 memory hierarchy consists of a fer blocks of information at a rate of 60
fast central storage and slow extended core million bits/second. It may be directly ad-
storage (ECS) [32]. The central storage of dressed but at a considerably slower rate.
131,072 60-bit words is composed of 32 Thus, its principal use is as a high speed
independent banks. The banks are inter- buffer.
leaved to provide high block transfer rates. CDC 6600 I/O Handling. Ten peripheral
The computer has two cycles, major (1000 processing units (PPU) handle I/O ac-
nanoseconds (nsec) and minor (100 nsec). tivities. Each PPU consists of four registers
The storage read and store cycle take one and a storage unit of 4096 12-bit words.
major cycle, whereas transferring a data The processors are arranged in the form of
word through the storage distribution a "barrel". The barrel has ten positions
system takes one minor cycle. There is a and each position is occupied by a PPU.
mechanism called the stunt box which exam- There is one position called "slot" which
ines the requests and directs information flow is capable of accessing and utilizing arith-
in and out of the central storage. When the metic and logic hardware; the ten PPU's
stunt box accepts a new access request it share the slot by circulating the contents of
decides whether the bank requested is busy their four registers. When a PPU is in the
or free; if the bank is free the read and store slot it stays there for 100 nsec and uses the
cycle is initiated. If the bank is busy, the arithmetic and logic hardware to execute
address requested is circulated within the the program stored in its storage unit. A
stunt box. The stunt box can hold three PPU instruction requires one or more steps
circulating addresses and each circulation of execution with each step taking 1000
takes 300 nsec. Top priority is given to the nsec. It can be noted that the central storage
addresses in circulation for access to the read and store cycle takes 1000 nsec which
storage. Because of the circulation time is the time interval between consecutive
(300 nsec) and the major cycle time (1000 sharing of the slot by any PPU. The PPU
nsec) the mechanism prevents permanent can transfer data between peripheral de-
recirculation of any request. In case of vices and main memory and supervise
consecutive requests to the same bank the the operation of the devices. The PPU
requests are satisfied after at most two major have the capability of establishing paths
cycles. to I/O devices through twelve peripheral
The stunt box is also responsible for at- channels. A PPU can interrupt the opera-
taching priorities to requests coming from tion of the central processor by means of
the central processor unit and peripheral an exchange jump. When an exchange
processing units. It prevents the situation jump is issued, the CPU makes an exchange
where in the recirculating addresses read and between the contents of its 24 registers and
write requests are made to the same storage the contents of the "exchange package"
location. This is because of the stunt box's which starts at a location in the central
out-of-order recirculation properties. It storage specified by the PPU. The exchange

Computing Surveys, Vol 8, No 2, June 1976


A Conceptual Framework for Computer Architecture • 293

package consists of 16 words and specifies poses the following definition: "Micro-
the new contents of the 24 central registers. programming is a technique for designing
Once the exchange is made the CPU starts and implementing the control function of a
on the new program specified by the pro- data processing system as a sequence of
gram address register (note this register is control signals, to interpret fixed or dy-
one of the central registers). A P P U is also namically changeable data processing func-
capable of monitoring the CPU by trans- tions. These control signals, organized on a
ferring the contents of the CPU program word basis and stored in a fixed or dy-
address register to one of its registers. namically changeable control memory, repre-
The CPU can also initiate the exchange sent the states of the signals which control
jump. the flow of information between the exe-
cuting functions and the orderly transition
Physical Organization, Control of Information between these signal states."
Flow and Representation and Interpretation of A basic microprogramming scheme [33]
Information is shown in Figure 10. Register I contains
an address which is decoded by the decoder
Now we consider the architectural feature (D). The horizontal line in the read only
of microprogramming which can only be memory (ROM) that corresponds to the
explained when all three components of address is activated and issues signals. The
architecture are used. In the literature this signals under Matrix A control the data
feature is usually associated with system paths of the arithmetic units, registers, etc.,
architecture. The reason for this association of the computer system. The signals of
is that microprogramming is able to present Matrix B specify the next address to be
to the user an architecture that is not a decoded and are forwarded to Register II.
real machine architecture. Conditional jumps can be handled as shown
Microprogramming: Husson [21] pro- at X. A flip-flop whose state can be controled

REGISTER II REGISTER I
1 MATRIX A MATRIX B

m i

\ \
Y

CONTROL SIGNALS
TO
ARITHMETIC UNIT,
ETC.

FROM
C ON D IT ION
FLIP-FLOP

Figure 10. A simple elementary microprogramming scheme.

Computing Surveys, Vol. 8, No. 2, June 1976


294 • S. S. Reddi and E. A. Feustel

by the previous orders issued by Matrix A these seemingly unrelated and diverse con-
decides which of the two lines in Matrix B cepts. We conclude by considering some of
is to be energized. the problems and trade-offs an architect
The signals issued when any line of the faces in implementing these concepts and in
ROM is activated form a microorder. The evolving an architecture.
format used for microorders can be either
horizontal or vertical depending on how Array Organization
the orders are interpreted. In a horizontal
format, each signal under Matrix A directly In this organization identical processors are
controls a gated data path. In a vertical connected in an array fashion. The ILLIAC
format, the signals are organized into fields IV is a familiar example of this type of or-
and each field controls the operations of a ganization. The ILLIAe IV operates in a
particular section (like an adder) of the single instruction stream--multiple data
computer system. In this format, encoding stream mode (SIMD) [14], i.e., at any time
of signals is performed and hence hori- all the enabled processors execute a single
zontally formatted microorders are usually instruction (issued by a single control unit)
longer than vertically formatted ones. Verti- on different data; the processors that are
cal format microorders sometimes resemble not enabled do not execute the instruction.
machine language instructions in that they. However with suitable operating systems,
have operand and address fields. Maximal it should be possible for array processors to
parallelism at hardware level can be ex- handle the multiple instruction stream--
ploited by using horizontal format micro- multiple data stream mode of operation.
orders, but generating these orders can be The array organization is very effective in
cumbersome and time consuming. exploiting parallelism when the character-
Microprogramming has been used in istics of the problem to be solved match the
widely differing contexts. For its applica- physical structure. Matrix operations pro-
tions the interested reader should refer to vide an example of this kind of problem.
Flynn and Rosin [15]. Present day large When all the processors are identical, man-
systems like the CDC 6600/7600 and IBM ufacturing and maintenance are greatly
360/195 do not use microprogramming simplified. A disadvantage of the array or-
for their control units. It appears that ganization is the poor utilization of resources
microprogramming is used in practice, not that may result when the problem structure
for its systematic implementation of the does not match the physical structure. The
control section, but for its ability to offer failure of a single processing element can
emulation capabilities. It is interesting how- hamper the operation of the entire system;
ever, to note that microprogramming is a sophisticated system could, however,
used to implement the control of the stream- create new and alternate data paths for
ing unit of the CDC STAR 100 [23]. continued operation of the system.

ARCHITECTURAL CONCEPTS AND CONSIDERA- Pipeline Organization


TIONS
This organization consists of functional
In this section we discuss the advantages units arranged in a pipeline where each
and disadvantages of some architectural functional unit handles a particular task.
concepts. At first view they may appear to It is often used in commercial computer
be totally unrelated to each other; however a systems to improve system performance.
little thought will reveal that each of these Examples of this organization include in-
concepts can be categorized under one or a struction handling in the IBM 360/91 and
combination of the three components of the arithmetic pipeline units in the TI ASC
architecture. Thus, a framework based on [31] and CDC STAR [19] systems. It is well
our proposal that architecture is composed suited to handling job streams where all
of these three components can accommodate the jobs go through the same processing

Computing Surveys, Vo! 8, No 2, June 1976


A Conceptual Framework for Computer Architecture • 295

stages. Most vector operations can, for ing tasks and ensuring correct execution of
example, be operated in this manner. Pipe- the program (by preserving task prece-
line organization loses its efficiency when dences). The lack of structure in this organi-
some jobs require a processing sequence zation can increase the overhead of dis-
different trom that of the pipeline. Job de- patching tasks. The processing of jobs is
pendencies adversely affect the job flow "diffused" as in the pipeline organization.
and hence the efficiency of this organiza-
tion. Since the processing of jobs becomes
"diffused"--at any instant the pipeline Stack Processing
contains jobs at different levels of comple- In this type of processing, information flow
tion-interrupts and machine malfunctions between central registers is controled in a
cannot be handled satisfactorily. For in- such way that a pushdown store (or a stack)
stance, the architecture of IBM 360/91 has is realized [7]. New operands, which are
to settle for what is referred to as an "im- entered into the top register of the stack,
precise interrupt" [3]. cause a "pushdown" action to occur, i.e.,
the contents of each register move down by
Modular Organization one register level. Binary operations can be
performed on the top two registers with the
This organization consists of independent result being returned to the top register.
functional units (capable of performing The contents of the top register can be
specialized tasks) and/or processors (cap- stored in main memory. The Burroughs
able of performing any task). Tasks, when B6500 and English Electric KDF-9 employ
they are ready, are dispatched to the ap- stack processing.
propriate functional units or processors The following discussion of advantages
(usually by the supervisor of the organi- and disadvantages of stack processing is
zation). The central processing unit of a based on Brooks [7]. Stack processing mini-
CDC 6600 employs a modular organization mizes main memory data references when
in which there are ten independent modules. evaluating algebraic expressions. With stack
In the SYMBOLsystem, function modules are processing, shorter program representation
dedicated to perform portions of the com- is possible as most operand addresses can
puting process such as translation, memory be eliminated. It simplifies subroutine
control, garbage collection, central processor, management and compilation of source
or other processes. In contrast to array and programs, especially those programs with
pipeline organizations the modular organi- recursive definitions. Stack processing makes
zation usually has a variable structure. The it easier to handle block structured lan-
supervisor of the modular organization can, guages like ALGOL. However, this type of
by establishing appropriate data flow paths, processing is helpful only if the items that
simulate any particular structure (e.g., a are to be processed can be made to "surface"
pipeline or an array). to the top of the stack. A further disad-
An advantage of this type of organiza- vantage is that many stacks, such as a stack
tion is the enhanced performance obtain- for control and a stack for data, are often
able by using overlap and distributed func- needed for satisfactory operation. When
tion computation. The organization can variable length fields are used, stack registers
ensure graceful degradation of performance must be of ~rariable length to accommodate
in ease of system component failures. Grace- the values selected from these fields. This
ful degradation is achieved by having often proves to be difficult to implement.
multiple function modules of the same type;
when a module fails, its task can be assigned Virtual Memory
to another module. On the other hand, the
supervisory system for such an organization By automatic control of information flow
tends to be complex because it has the ad- between the main and secondary memories,
ditional responsibility of properly dispatch- a system with virtual memory [11] gives the

Computing Surveys. VoL 8, No 2, June 1976


296 • S. S. Reddi and E. A. Feustel

programmer an illusion of operating with a ing system and the real machine. For details
main memory that is larger in capacity concerning the implementation of the moni-
than the actual memory. This is accom- tor, refer to Madnick and Donovan [26].
plished by dividing the address space into An advantage of virtual machines is that
blocks of contiguous addresses and storing the users can run different operating systems
them in both the main and secondary on the same real machine at the same time.
memories. When the programmer makes a On the negative side, the virtual machine is
reference to an item not present in the several times slower because there is over-
main memory, the computer system auto- head associated with the monitor.
matically transfers the block containing
the referred item from the secondary to the Parallel Processing
primary memory. The new incoming block
will displace a resident block according to In this type of processing, the performance
some fixed rule if the main memory cannot of a computer system is increased by in-
accommodate the new block. When the troducing control and data paths among
blocks are of variable size, one has "seg- its hardware resources. For our purposes
mentation;" when they are of fixed size, we consider parallel processing at bit and
the situation is referred to as "paging." task levels. We follow the model of Shore
The principal advantage of virtual mem- [30] for bit level processing. Figure 11 shows
ory is that the user can be indifferent to a system which consists of a data memory
main memory limitations in his program- (DM), an instruction memory (IM) and a
ming. He need not concern himself with control unit (CU). In the DM, words are
the problems of overlays and memory stored horizontally. A bit (word) slice is any
management. The large address space pro- set of bits exposed by a single vertical
vided by virtual memory also simplifies (horizontal) cut through the DM. The word
multiprogramming. On the other hand, slice processing unit (WSPU) can operate
efficient utilization of the main memory is on word slices whereas the bit slice process-
not always possible. Paged systems round ing unit (BSPU) operates on the bit slices.
up storage requests to the nearest integral In Shore's terminology, Machine I refers
number of pages and this sometimes causes to the system with only word slice processing
appreciable loss of the main memory ("frag- capabilities, Machine II refers to the system
mentation"). Multiprogrammed systems with only bit slice processing capabilities
sometimes exhibit performance degradation and Machine III has both of the processing
which is due to a phenomenon known as capabilities. (There are also Machines IV,
"thrashing" [11]. V and VI which are best considered at task
level.) It is interesting to note that Machine
I is a conventional sequential processor and
Virtual Machines
Machine II is a bit serial associative proces-
By means of hardware and software con- sor. Shore's scheme does not fit Flynn's
trol of information flow a single computer classification [14]. Shore states: " I n terms of
system presents to the users multiple exact a taxonomy introduced by Flynn, it is often
copies of the system. Each user is given the stated that Machine II is a single-instruc-
illusion that he has the complete computer tion-stream, multiple-data-stream processor
system at his disposal. As an example the whereas Machine I is not. In fact, they
IBM's VM/370 offers the user a virtual both are. Machine I processes multiple-bit-
IBM 370 system on which he can run any streams a word slice at a time, whereas
system/370 or system/360 operating system. Machine II processes multiple-word-streams
The virtual machine, of course, runs several a bit slice at a time. The myopic association
times slower than the real machine. The of multiple-data-streams with multiple-word
appearance of multiple copies of the basic streams is a conceptual error having nothing
machine is handled by the virtual machine to do with computing power."
monitor which interfaces the user's operat- Shore considers the ratio of processing

Computing Surveys, Vol. 8, No. 2, June 1976


A Conceptual Frameworkfor Computer Architecture 297
BIT
f SLICE

WORD . ~ ,\\\\\\\\\\\\\\\'E~~
SLICE /
/
/
/
DATA / BIT
/ ~ SLICE
MEMORY /
/ PROCESSING
/ UNIT

YA

II I
WORD SLICE PROCESSING UNIT
1

IINSTRUCTION
MEMORY

Figure 11. A bit-level processing computer system.

hardware to memory hardware for evalu- Tagging of Information


ating the effectiveness of his machines. As
can be noted, the ratio also reflects the Iliffe [22] in his proposal of a basic language
effect of creating information flow paths by machine (BLM) suggests tagging of data
means of physical organization. and address descriptions for identification
At task level there are many approaches at machine level. From the tags associated
to parallel processing. One approach consists with data, it is possible to recognize their
of the functional decomposition of tasks and precision and type (floating or fixed point,
the dispatching to independent functional etc.). Addresses can be specified by "code
units which are specialized to execute them words". A code word can specify a block of
(e.g., the CPUs of the CDC 6600 and IBM contiguous words which starts at the loca-
360/91). In another approach, the func- tion given by the address part of the code
tional or processing units have a fixed word. The length of the block is also specified
operand and control routing structure (like by the code word. It is possible that a code
a pipeline or an array) imposed on them. word can specify a set which consists of
When the system consists of equally cap- code words and data. It can be noted that
structural data representations can be
able processing systems, its mode of opera-
easily handled by the code word scheme.
tion can be characterized by single instruc- In the BLM tags are also used to imple-
tion-multiple data streams (e.g., the ILLIAC ment "escape actions." Whenever the ma-
IV and PEPE) or multiple instruction- chine encounters an operand whose tag
multiple data streams [14]. specifies an escape action, the machine in-
Central to parallel processing is the terrupts and follows the appropriate action.
problem of recognition of parallelism in Numerical overflows, invalid addresses and
programs and task scheduling to achieve unauthorized storage accesses can be handled
maximal concurrency. Extensive work has by escape actions.
been done and is continuing in these problem Iliffe claims the following advantages for
areas. Baer [4] gives a good survey of the his BLM. The machine can recognize the
work done. information structure of a program at ma-

Computing Surveys, Vol. 8, No. 2, June 1976


298 • S. S. Reddi and E. A. Feustel

chine level; this increases the versatility Developing an Architecture


of the machine. Because of the use of code-
words, the linear store structure of a con- Now we briefly consider some of the prob-
ventional computer system is avoided. This lems and trade-offs an architect faces in
is helpful in multiprogramming storage al- evolving an architecture. These considera-
location schemes. Since data are identified tions are discussed within the framework of
by tags, instructions need not specify the the three components of architecture. As-
data types; a single add instruction is suffi- sume that the architect decides to make the
cient to specify every add type. This results computer system provide the capacity of
in a smaller instruction set. Also one can ten processing units. This decision can be
detect "mixed arithmetic errors" (such as implemented either by replicating process-
addition of a floating point number to an ing units (physical organization) or by time
integer) simply by examining the tags. multiplexing a single processing unit (con-
The BL1V[ also has certain disadvantages. trol and flow of information). The first
In a linear store every item can be addressed approach, which is expensive, provides
directly, whereas extra store accesses may higher performance and graceful perform-
have to be made in data structures com- ance degradation in case of processing unit
posed of code words. Overhead is associated failures. The second is more economical but
with memory allocation because of the data might require a sophisticated control.
structure involved. For further discussions Consider a system in a list processing
of tagged architecture, see Feustel [12]. environment. The architect wishes to pro-
vide the capabilities of linked data struc-
tures. He may design a conventional system
Emulation without any list processing capabilities and
leave the task of handling data structures
Emulation is a combined hardware-software to the system programmer (control and
approach to the process of modeling the flow of information). Alternately, the archi-
physical behavior of one machine on an-
tect can provide data structure capabilities
other [21]. A host machine A can be made to at machine level itself by making appropri-
emulate a target machine B with the aid of
ate provisions at the hardware level (repre-
microprogramming. This means that A
sentation and interpretation of informa-
can interpret and execute the machine
tion). To improve the reliability of data
program written for B by means of micro-
transmission links the architect may either
programming. As an example, an emulator
resort to replication and major voting or
is available which makes it possible to emu-
better technology (physical organization)
late the IBM 7080 on the IBM 360/65.
or incorporate parity bits to the words
The emulator considers the machine in- transmitted (representation and interpreta-
struction of the IBM 7080 and performs
tion of information). Similarly, the reli-
necessary storage mapping conversions; it
ability of adders can be improved by repli-
interprets and translates the instruction
cation or by coding the operands (e.g., AN
into a 360/65 machine instruction. Then the coding [9]). Thus the architectural problems
host machine executes the instruction. and decisions involved in implementing an
There are many advantages to emulation. architecture can be viewed in terms of the
When the user changes computer systems three components of architecture.
he does not have to reprogram if he can
emulate his old system on the new one. Dynamic Architectures
Emulation leads to compatibility, a prin- The computer user is becoming increasingly
cipal feature of the IBM 370. A disadvan- aware of the effect of architecture on system
tage of emulation is that it is inherently performance. He realizes that the array
slow and does not fully utilize the resources organization is ideal for solving relaxation
of the host machine. problems, that the pipeline organization is

Computing Surveys, Vol. 8, No. 2, June 1976


A Conceptual Framework for Computer Architecture • 299

effective in handling matrix and vector Model 91: machine philosophy and instruc-
operations, and t h a t stack processing makes tion handling," IBM J. R. & D. 11, 1, (Jan.
1967), 8-24.
it easier to compile and execute ALGOL pro- [4] BAER, J . L . "A survey of some theoretical
grams. Since no single architecture can aspects of multiprocessing," Computing
Surveys 5, 1 (March 1973), 31-80.
satisfy the needs of all users, it has become [5] BARNES,G. H. et al, "The ILLIACIV com-
desirable to have a computer system whose puter," IEEE Trans. Computers (August
architecture can be defined and varied dy- 1968), 746-757
[6] BEIZEa, B. The architecture and engineer-
namically. ing of digital computer complexes, Vols.
)~t present, emulation is the main prin- 1 and 2, Plenum Press, New York, 1971.
[7] BaooKs, F. P., JR., "Recent developments
ciple used to offer variable architectures to in computer organization," in Advances
the user. B u t emulation is inherently slow in electronic and electron physics, Vol. 18,
and inefficient and would defeat our pur- Academic Press, New York, 1963, pp. 45-65.
[8] BROOKS,F . P , JR., "The future of computer
pose, which is to speed up computation architecture," in Proc. IFIP Congress 65,
with dynamic architecture. Using our three Vol. 1, Spartan Book Co., Washington, D.C.,
component approach to architecture, it is 1965, pp. 87-91.
[9] BRow~, D. T. "Error detecting and cor-
possible to conceive a system with dynamic recting binary codes for arithmetic opera-
organization. The user can specify the tions," IEEE Trans. Electronzc Computers
(Sept. 1960), 333-337.
architecture he needs in terms of the three [10] BURROUGHS CORPORATION, Burroughs B
components, and the system will exhibit 6700 information processing systems reference
this architecture b y introducing appropriate manual, Burroughs Corp., Detroit, Michi-
gan, 1972.
changes in its control and data paths and [11] DENNING, P. J. "Virtual memory," Com-
b y altering its representation and interpre- puting Surveys 2, 3 (Sept. 1970), 153-189.
tation of information. The speed require- [12] FEUSTEL, E. A. "On the advantages of
tagged architecture," IEEE Trans. Com-
ments dictate t h a t these changes be exe- puters (July 1973), 644-656.
cuted at hardware level. The authors [28] [13] FLORES, I. Computer organ~zahon, Pren-
tice-Hall, Englewood Cliffs, N.J., 1969.
propose a system where it is possible to [14] FLYNN,M.J. "Very high-speed computing
structure system resources as a pipeline, systems," in Proe. of IEEE, 1966, IEEE,
an array, or in any configuration the user New York, 1966, pp. 1901-1909.
[15] FLYNN, M. J.; AND ROSIN, R . F . "Micro-
m a y want. Structuring is accomplished b y programming: an introduction and a vmw-
dynamically establishing bus paths between point," IEEE Trans. Computers (July 1971),
the resources. Thus the physical element of 727-731.
[16] FOSTER, C. C. "Computer architecture,"
architecture is 'altered' b y suitable con- IEEE Trans. Computers, (March 1972), 19.
trol of information flow. Similarly, the [17] FOSTER, C. C Computer architecture, Van
Nostrand Reinhold Company, New York,
other components of architecture can be 1970.
altered. For instance, information flow can [18] HAUCK, E. A.; AND DENT, B. A. "Bur-
be controled to exhibit a stack or nonstack roughs' B6500/B7500 stack mechanism,"
in AFIPS Sprang Jr. Computer Conf., 1968,
structure depending on the program en- Thompson Book Co., Washington, D.C.,
vironment. B y attaching tags to operands pp. 245-251.
[19] HINTZ, R. G.; .~ND TATE, D. P. "Control
and interpreting t h e m dynamically, we Data STAR-100 processor design," in
can obtain an architecture in which the COMPCON 72 Szxth Annual IEEE Comp.
third component is a variable. Soc. Internatl. Conf., IEEE, New York,
1972, pp. 1-4.
[20] HOFFMAN,L. (Ed) Securzty and privacy zn
REFERENCES computer systems, Melville Publ. Co., Los
Angeles, Cahf, 1973.
[1] ABRAMS,M. D.; AND STEIN, P.G. Computer [21] HcssoN, S. S. Mieroprogramm~ng: prin-
hardware and software, an znterdisczplmary ciples and practice, Prentice-Hall, Engle-
introduction, Addison-Wesley, Reading, wood Chffs, N . J , 1970.
Mass., 1973. [22] ILIFFE, J. K. Basic machine principles,
[2] AMDAHL, G. M.; BLAAUW, G. A.; AND (2d Ed.), American Elsevmr, New York,
BROOKS, F P , JR, " Architecture
" of t h e 1972.
IBM System/360," IBM J. R & D. (April [23] JONES, L H.; ANDMERWIN, R.E. "Trends
1964), 87-101. in mmroprogramming: a second reading,"
[3] ANDERSON, D. W.; SPARACIO, F. J.; AND IEEE Trans. Computers (August 1974), 754-
TOMASULO, R. M. "The IBM System~360 759.

Computing Surveys, VoL 8, No 2, June 1976


300 * S. S. Reddi and E. A. Feustel

[24] KATZAN, H., JR., Computer organization [29] RICHARDS,R . K . Electronic digital systems,
and the System~870, Von Nostrand Rein- John Wiley & Sons, New York, 1966.
hold Co., New York, 1971. [30] SHORE, J. E "Second thoughts on parallel
[25] KNVTH, D E. The art of computer pro- processing," Computers and Electrical Engi-
gramming, Vol. 1, Addison-Wesley, Reading, neemng (June 1973), 95-109.
Mass., 1968. [31] TEXAS INSTRUMENTS INC. A description of
[26] MADNICK,S. E.; AND DONOVAN, J . J . Ope- the advanced scientific computer system,
ratzng systems, McGraw-Hill, New York, Equipment Group, Texas Instruments, Inc.,
1974. Austin, Texas, 1973.
[27] ORQANICK, E. I. Computer system organi- [32] THORNTON, J E. Design of a computer:
zation, the B5700/B6700 seines, Academic the CDC 6600, Scott, Foresman & Co., Glen-
Press, New York, 1974. view, Ill., 1970.
[28] REEDI, S. S.; AND FEUSTEL, E. A. "An [33] WILKES, M. V.; AND STRINGER, J . B . "Mi-
approach to restructurable computer sys- croprogramming and the design of the con-
tems," in Proc. Sagamore Computer Conf., trol circuits in an electronic digital com-
1974, Lecture notes in Computer science, puter," in Proc. Cambmdge Phil. Soc., Part
Vol. 24, Springer Verlag, New York, 1975, 2, 1953, Cambridge Univ. Press, New York,
319-337. 1953, pp. 230-238.

Computing Surveys, Vol. 8, N o 2, June 1976

Anda mungkin juga menyukai