S.S. REDDI
W. W. Gaertner Research, Inc., 1492 High Ridge Road, Stamford, Connecticut 06908
E.A. FEUSTEL
Laboratory for Computer Science and Engineering,
Department of Electrical Engineering, Rice University, Houston, Texas 77005
The purpose of this paper is to describe the concepts, definitions, and ideas of
computer architecture and to suggest that architecture can be viewed as composed
of three components: physical organization; control and flow of information; and
representation, interpretation and transformation of information. This framework
can accommodate diverse architectural concepts such as array processing,
mieroprogramming, stack processing and tagged architecture. Architectures of
some existing machines are considered and methods of associating architectural
concepts with the components are established. Architecture design problems and
trade-offs are discussed in terms of the proposed framework.
Keywords and Phrases: computer architecture, framework, composition of
architecture, information flow, physical organization, unification of diverse
architectural concepts.
CR Categories: 6.0, 6.20, 6.22, 6.29.
Copyright © 1976, Association for Computing Machinery, Inc. General permission to republish,
but not for profit, all or part of this material is granted provided that ACM's copyright notice is
given and that reference is made to the publication, to its date of issue, and to the fact that reprinting
privileges were granted by permission of the Association for Computing Machinery.
PPO I ADD
MULTIPLY
MULTIPLY
EXTERNAL PP3 1 DIVIDE
PERIPHERAL ,m.-m FIXED ADD
EQUIPMENT INCREMENT
CHANNELS INCREMENT
,
BOOLEAN
---I,, SHIFT
ppcj J BRANCH
|
PERIPHEI
PROCESSq
T
J SCOREBOARD
60-BIT
INSTRUCTION
STACK
I6
I5
I4
I2
I1
INSTRUCTION
REGISTERS
INPUT REGISTER
I 4 I
Figure 1. Block diagram of the CDC 6600 computer system, an example of distributed
computing architecture.
units, twenty-four data and address registers, i = 6, 7, into the memory location specified
an instruction stack and a scoreboard in its by A,. The index registers used are to modify
central processor. The ten functional units, the addresses in the address registers. (For
which are independent of each other, are instance, there is an instruction which adds
capable of simultaneous operation. The As and Bk and transfers the result to A,).
functional units are specialized to perform The index registers are also used to store
operations such as floating add, floating fixed-point integers and manipulate float-
multiply, floating divide, shift, fixed add, ing-point exponents. Registers Ao and Xo
etc. The twenty-four registers are divided are used to reference the extended core
into data (X~), address (A,) and index (B~) storage. The partner relationship permits
groups; each group consists of eight registers. concurrent calculation of arithmetic and
Except for data registers which are sixty address information. All arithmetic compu-
bits long, all registers are eighteen bits tations are performed on operands from the
long. Data registers XI through X5 are twenty-four registers with results returned
used as "read" registers and X6 and X7 as to these registers.
"store" registers. There is a "partner" re- Instructions are loaded into the processing
lationship between X, and A~ registers. An unit under the control of P register (see
operand is fetched into X~, i = 1 to 5, from Figure 1). The instruction stack contains
memory by loading its addresses into A,; eight 60-bit registers capable of holding
similarly an operand is stored from X~, up to 32 instructions. Instructions are ae-
cessed from memory and stored in a last in, The processing element consists of the
first out (LIFO) manner in the instruction processing element memory (2048 64-bit
stack. When the program executes a branch words), arithmetic and logic circuitry, and
to one of the instructions stored in the registers. Each element executes the common
stack, that instruction is directly fetched instruction issued by the control unit on
from the stack rather than from memory. data stored in its memory. An element will
In this way it is possible to cut down the not execute the common instruction if its
number of memory accesses in the case of enableflip-flop is not set. In other words all
loops that can be accommodated in the the enabled elements execute the same in-
stack. Instructions pass from the instruc- struction issued by the control unit. Digital
tion stack to three instruction registers Equipment PDP-10 computers perform the
U0, U1 and U2 whose contents are interpreted executive control of system operation, I / O
and decoded by the scoreboard. The score- processing and supervision, and compilation
board examines the instruction to be exe- of the ILLIAC IV programs.
cuted and determines whether the func- The TI ASC System: A principal feature
tional unit and registers needed for the execu- of the Texas Instruments Advanced Scien-
tion of the instruction are available; if they tiiic Computer (ASC) [31] is the central
are available the instruction is initiated. processor's four pipelines (Figure 2). If
Otherwise, the instruction is held until computational requirements do not warrant
they are available. The scoreboard main- the capacity of four pipelined arithmetic
tains the status of each central register and units, one or two pipeline CPs are available.
functional unit so that it can decide when The CP pipeline consists of the instruction
an instruction can be issued. The score- processing unit (IPU), the memory buffer
board provides interlocks between instruc- unit (MBU), and the arithmetic unit (AU).
tions so that though parallelism in instruc- The I P U supplies a continuous stream of in-
tions is exploited, precedences among in- structions for execution by the other units
structions as specified by the programmer of the pipeline. It fetches and decodes in-
are still preserved. structions, accesses and stores operands, and
The ILLIAC I V System: This system [5] handles branch conditions. The MBU pro-
employs an array structure for organizing vides an interface between the central mem-
its processors. It consists of 256 processing ory and AU by supplying a continuous
elements arranged into four quadrants. stream of operands to the AU and storing
Only one quadrant has been constructed results in the memory. The AU performs
because of economic considerations. arithmetic operations and is pipelined as
A quadrant consists of a control unit and shown in Figure 2. Depending on the mode
sixty four processing elements. Processing of arithmetic operation, the pipeline is struc-
element i is connected to processing elements t~ured to divert operand flow along the dot-
i W 1 (mod 64), i - 1 (mod 64), i -~ 8 (mod ted or solid lines. The central processor is a
64) and i - 8 (mod 64). The control unit vector as well as scalar processor and has
fetches instructions from a processing ele- vector instructions at machine level.
ment memory, decodes them, and issues
control pulses to the processing elements
for execution. It broadcasts memory ad- CONTROL AND FLOW OF INFORMATION
dresses and data words when they are Information flow and its control in a com-
common to all processors. An instruction puter play a prominent role in deciding the
can be either a control or processing unit computer architecture. This component en-
instruction. The former directs operations compasses the aspects of control mechanisms
local to the control unit whereas the latter and schemes used for controling and direct-
controls the execution of the processing ing the system's information flow. Note the
units. The control unit is designed to over- interdependence between the control and
lap the executions of the two different in- flow elements. Sometimes control initiates
struction types. information flow (as in conventional syn-
EXPONENTSUBTRACT
ALIGN
FOUR--RIPFLINECP
ASC 4X
+t
t
I--
[ j
PRIMARY f
MEMORY
PORTS
I ACCUMULATE
L- _I
TWO--PIPELINECP
ASCZX
RESULT RESULT
Figure 2. Processing and arithmetic pipelines of the TI/ASC system.
(Reprinted, by permission of Texas Instruments Inc., from A description of the advanced scientific computer system).
chronous computers where control pulses are designing the B6700. The system is pro-
issued periodically to direct information grammed using high level languages (e.g.,
flow) and sometimes information flow initi- ALGOL, COBOL and FORTRAN) and does not
ates control (as in interrupt and data driven offer the user a traditional machine level
schemes); thus the control element may be language. The system hardware is designed
isolated from or incorporated in information to handle block structured languages and
flow. Control may be centralized (as in most procedures efficiently by means of a stack
computer systems) or decentralized (as in a mechanism and display registers. One learns
network of microprocessors). to appreciate the system architecture more
Alteration of information flow in a tradi- when one becomes familiar with the aspects
tional computer leads to novel and interest- of system implementation associated with
ing architectures. A familiar example is the block structured languages, such as access-
Burroughs B6700, a stack processor. In the ing global and local variables, address for-
following we describe briefly the B6700 mation for accessing procedure segments,
processor and stack organization and indi- and nesting of blocks.
cate how information flow is affected by the
Processor Organization
stack.
The Burroughs B6700: An integrated A block diagram of the processor [10] is
hardware-software approach is taken in shown in Figure 3 [11]. A program word is
~ L ~10 mJS
1 | | , I I l |
I I
FAMILYJ
ARIThmETIC LOGICAL SU|ROUTINE JWORDORIENTEDJ SCALING VALUECALL NAMECALl.
OPS OPS O,S 1 OPS I OPS OP OP
| I ! ! I I
OPERATORDEI~ENOENTINTERRUPTS ,~21I ll(JS
,,X/z STACK
.U. ADJUST
CONTROLLER
FROMCONTROLLERS
INTERRU~S CONTROLLER INTERRUPTS AND FAMILYOPS
TRANSFERCONTROLLER
[J ~ INTERNAL
TRANSTER~ /
INPUT ~ OUTPUf
Mf MORY
J
INFO
• I
-I
PROG~
SEGMENT
A
Io I, 12 I~ I. I,
II "P" REGISTER
LlllLl PSR ]
to p r o g r m
+5
W O R D nt
I
I 011 1 Program
syllable index
Address
couple
J Return
control
word (RCW)
Indirect
Address J reference
J0o, couple word (IRW)
Stuffed indirect
Joo1 Stack no Displacement Delta I reference
wood (SIRW)
Top of stack
[011 FF i
states DS OF J control
word (TOSCW)
I ~=~
10 b~ts I ,~b,,, .~
- 4 bGts-~J i( 20 bit= .v_, 14 bits ---~
Figure 5 Control word and data formats of the B6700.
(Reprinted, by permission of John Wiley and Sons, Inc., from Multiprocessor8and parallel processing,
P. H. Enslow ted.).)
vironment list enables the system to access and an index value ((7) (Figure 6). When a
local and global variables relative to the variable is specified by an address couple,
block or procedure presently under execu- one can deduce from the execution environ-
tion. ment and the lexicographie level which
Marlc stack control words (MSCW) are block or procedure specifies the variable.
used to maintain both the stack history and The display registers (DO to D31) contain
addressing environment lists. Figure 6 shows the addresses of the MSCWs of the pro-
an example of how an MSCW is entered cedures that are linked by the address en-
when a procedure is entered. The parame- vironment list to the procedure under exe-
ters and local variables of the procedure are cution. By adding the index value to the
entered following the MSCW and hence contents of the appropriate display register
are referenced by addressing that is relative the absolute address of the variable is
to the location of the MSCW. The DF generated. As an example let V1 be ref-
fields of the MSCWs keep track of the erenced in Procedure B. Since V1 is repre-
stack history whereas the DISP fields main- sented by the address couple (2,2) the
tain the addressing environment list. system obtains the address of the appropri-
The variables are accessed by means of ate MSCW from the display register D2
address couples. An address couple consists and adds 2 to it to obtain the address of V1.
of the lexicographic level (LL) of the variable Note that the display registers must always
ii II ii
~o
'4-
X
> > > > > $-
O--J
N~
£ II II II £ II II £ II ~ 11 II ~ II
0 c~c~ c~ o r ~ m o cu o e o ~ o o ~
cJ ~J ~ ~J ~J
g g ~
=..0 9 o= e~
<
. . . . . . . . . . . . . . . . . . . . . . . . . . .
i I m I
I I
i I I II I I ii
I I I I i I I
I I I I I
I I I I
...... I~
Y: 'I ' ~
~>~ I I
o I I
< w J I I
i I i i
I I I I I , I
'~ ~- < 1 ,, 11
L 1
Computing S u r v e y s , V o l . 8, N o 2, J u n e 1976
A Conceptual Frameworkfor Computer Architecture 287
I sqrt J-
SEGMENT
TRUNK
SEGMENT
DICTIONARY
begln L.~
Integer a, b, c
Fprocedure neg(x:y) ; ip Fp
n e g i ~nteger x, y,
~ X ~ x-y
L I n p u t values f o r a and b,
neg(a,b),
f--beg i n
J integer x,
I x ÷ a+b,
Jr-begl n
B JJ Integer y,
IC/ ~ ~ x**z, STACK
/ / x ÷ x+y;
J~en~, AREA
Len~ sqrt(x),
prlnt (a+b+c),
end
point to the address environment list and tation, and transformation of information
have to be changed upon procedure exit or that a computer may use.
entry to the correct MSCWs. Program Representation in the Burroughs
The stack mechanism also provides the B6700: Consider the ALGOL program shown
capability of handling several active stacks in Figure 7. The system keeps track of the
by organizing them as a tree. It is beyond program segments by means of the segment
the scope of the present paper to go into dictionary. The instruction pointer (IP) and
these details. Interested readers should Environment Pointer (EP) provide the
refer to Organick [27]. physical location of the instruction in the
program as well as the environment under
Representation, Interpretation and Transforma- which the instruction will be executed. (We
tion of Information discussed in the preceding section how the
display registers which constitute the EP
The information processing capabilities of provide environmental information and
a computer system depend to a large extent pointers are set up in the stack area for the
on how the system interprets and represents execution record of the program). The IP
the information. In this section we consider is a three-tuple of the form either (1, i, j) or
different types of representation, interpre- (0, i, j). (1, i, 3) indicates that the instruction
resides in the j t h location of the ith seg- of the operand can be found in the location
ment in the segment dictionary. (0, i, j) specified by the address part of the instruc-
means that a "system intrinsic" (i.e., sys- tion. The instruction repertory of a com-
tem routine) is addressed. The j t h location puter system includes instructions which
of the ith segment in the stack trunk which allow jump operations and which handle sub-
contains supervisory code segments and routine operations.
system tables is accessed. In the example The instruction set can be significantly
print and sqrt are system intrinsics. The affected by the system architecture. For
segment descriptors indicate whether the instance, consider tagged architecture [12]
segments are in the main memory and how which differentiates integer operands from
to locate the segments on the disc if they floating point operands at machine level.
are not in the main memory. In this case it is not necessary to have sepa-
This kind of program representation leads rate instructions for floating point add and
to efficient memory utilization. Since the integer add. When an add instruction is
stack trunk, stack segment and stack dic- issued, the system, by examining the ope-
tionary compactly represent the program rands involved, decides whether it is a
status and its code requirements, the work- floating point or integer add. Another ex-
ing sets tend to be small. Also the repre- ample of the effect of architecture on the
sentation leads to sharing of programs and instruction set is the stack processor (e.g.,
data. Though two programs may share a the B6700).
common procedure segment and hence
have the same instruction pointers, their Representation and Interpretation of Data
EPs are different. For further discussion of
the B6700 and its system description see Knuth [25] explains data as: "representa-
[27]. tion in a precise, formalized language of
some facts or concepts, often numeric or al-
Representation and Interpretation of phabetic values, in a manner which can be
Tnstructions manipulated by a computational method."
Inevitably, when data represents mathemat-
Instructions determine the information flow ical concepts or real life situations, relation-
and reflect the structure and capabilities of ships between the data elements are bound
the system. One of the principal duties of to exist. Since data is a representation, the
the computer architect is to develop a com- precision to which the representation should
prehensive instruction set which is simple to be expressed becomes a factor. A properly
use but exploits the system resources to conceived computer system should cope with
their fullest extent. the problems of relational and precision re-
An instruction can be of zero-address, presentations of data.
one-address or multi-address format. Zero Floating point numbers can be represented
address instructions are used in computers in the IBM 370 system series in short (32
with stack processing where the use of bits), long (64 bits) or extended precision
operands on the top of the stack is implied (128 bits) form. The system uses the byte
by the instruction. Single- and multi-address as its basic storage unit which consists of
formats are used in computer systems 8 bits. When a floating point instruction
modeled after yon Neumann systems. like ADR (add long floating point) is to
Richards [29] gives an excellent discussion of be executed, the system fetches 8 consecu-
the various address formats. tire bytes into the central processing unit
Addressing of operands in an instruction (CPU) where the left-most (or the first)
can be indexed or indirect. When the ad- byte is the one addressed in the instruction.
dressing is indexed, the addresses of the The system recognizes the operands as float-
operands are formed by adding or subtract- ing point, integer, etc., by examining the in-
ing the contents of "index" registers to the struction rather than the operands.
address parts of the instruction. When the In the Burroughs B6500/7500 organiza-
addressing mode is indirect, then the address tion, data words are distinguished as single
precision (48 bits) or double precision (96 results because binary patterns can usually
bits) operands by attaching 3 tag bits to be compacted and stored as binary words.
every word (51 bits). Data may be refer- When the matrices are not sparse it may
enced as an operand (without any qualifi- be efficient to store them in major order
cations), and the processor knows by exa- fashion in rows or columns. Another ex-
mining the tag bits whether the operand is ample of where representations may be
single or double precision. For example, when changed is in the storing and accessing of
a command is issued to store an operand on data elements. The elements may be stored
the top of the stack, the word specified by and accessed by a table look-up or by hash
the operand's address is fetched and exa- coding techniques depending on the appli-
mined. If the operand is double precision, cation.
then the next word is also fetched and Privacy considerations may also warrant
stored in the stack. The system recognizes making changes in the representations of
the type of operand, e.g., integer, or floating information. Consider the system where a
point integer, or extended precision by common data bank has to be shared by
examining the instruction. Tag bits also different users and each user is authorized
distinguish data words from the program to access only some portions of the data.
code. Hence, when a job attempts either to The system may encode the data supplied
execute data as part of the program or to by the user and store the encoded data in
modify the program, an interrupt is issued. the bank. The conversions of encoding or
privacy transformation are performed to
Transformation (or Dynamic Representation) ensure that only authorized users can gain
of Information access to a data set. When the user sup-
plies the proper identification the system
The representation of information can be decodes and presents the requested data to
static or dynamic. However, a computer him. Privacy transformations are discussed
may be used to determine dynamically the in [20].
changes in the representation of informa-
tion that are needed for user convenience, Physical Organization and Control and
system efficiency, and privacy. Programs
Information Flow
are usually represented at the user level in
high level languages. The representation of It is sometimes necessary to create new
programs is then changed to machine level control paths among the physical resources
languages for execution. Changes of repre- of a computer system to exploit the paral-
sentation are performed for the convenience lelism that is present in hardware and pro-
of the user. (Note that this concept makes grams and to increase the system's per-
it possible to distinguish between systems formance. In such cases both the physical
which use compilers and interpreters for and control elements contribute to the
program execution.) Further examples of desired objective. A typical application of
this type of representational changes in- this approach may be seen in the over-
clude automatic transformation of ASCII lapped operation of I/O and processor
characters to EBCDIC by the system. computation found in most contemporary
System efficiency may dictate that differ- computer systems. Sometimes information
ent representations of information be used flow is controled to exhibit to the user a
in different situations. Sparse matrices machine architecture that is not real. An
(matrices whose elements are mostly zero) example of this is the compatibility feature
can be economically stored by means of found in the IBM 370 series. We examine
binary patterns and lists of nonzero values. selected architectural features of the IBM
The use of ones in binary patterns indicates 370 series and the CDC 6600 computers
that their corresponding matrix elements and indicate how these features can be ex-
are nonzero. The values of these elements plained as a combination of physical organi-
are obtained by choosing the appropriate zation and control of information flow.
values from the list. An economy in storage TBM System~370 Series. Let us now
MainSIorage - - Mul[iplexer
- - CPU-ChannelControl Lines ~[
Data Transfer Lines
m
Storage Addrtss •I
I MAIN STORAGE
x ij 1
I 1 IFI-"n.-P°'ntR°, .....
Computing S u r v e y s , V o l . 8, N o . 2, J u n e 1976
A ConceptualFrameworkfor Computer Architecture • 291
Figure 9. Instruction formats of the IBM/370. There is no one fixed memory organization
(Reprinted, by permissionof IBM Corporation, from for all the models. Models 155 and 165
Harry Katzan, Jr., Computerorganizationand the provide 4K buffer storage systems in ad-
dition to their main storage units. The
buffer and main storage are organized into
registers hold floating point quantities. rows and columns. At the intersection of
The registers reduce the number of memory each column and row there is a block of 32
accesses for data by storing temporary bytes, i.e., the storages are partitioned into
operands; this reduction in memory ac- blocks of 32 bytes and each block can be
cesses in turn reduces conflicts for memory specified by a row and column. (A byte
by the I/O and CPU units. The system has consists of 8 bits.) An address array main-
a program status word (PSW) register tains the addresses of the elements in the
whose contents indicate the status of the buffer. When the CPU makes a storage
program under execution; this enables the reference, the address array is consulted to
system to handle interrupts and multi- determine whether the referenced element
programming. is in the buffer. If the element is present it
Each model has a different engineering is sent to the CPU; otherwise the element is
design. For instance, the simplest model fetched from the main storage and dis-
125 does not provide any hardware adders patched to the CPU. Then the block con-
whereas model 165 has an address adder, taining the element is stored in the buffer
a parallel adder, and serial adder. The in- in the following manner: Blocks are trans-
struction formats for the system are shown ferred from the main storage to the buffer
in Figure 9. They specify the contents of columnwise, i.e., a block in column i of the
registers and/or memory locations as main memory is transferred to column i of
operands. The instruction execution ap- the buffer. The block that is to be stored in
pears to the user as sequential; however the buffer replaces the least recently used
high performance models employ over- block in its column. Model 165 also uses
lapping of instruction and operand fetching interleaving for its memory organization.
with instruction execution, and prefetching The storage system provides a protection
instructions along both paths of a branch. feature which can be used in multipro-
The system hardware cannot recognize gramming. The feature is implemented by
structured operands (e.g., vectors and dividing the main store into blocks (of
matrices) and it is up "to the programmer to 2048 bytes) and assigning storage keys to
make the system recognize such operands the blocks. Each active program has a
by programming. protection key associated with it. Usually
the operating system assigns the protection also checks whether the requested address
keys to the programs. A program can store violates bounds. The storage distribution
in a block only when the protection key of system is responsible for transferring re-
the program matches the storage key of quests and data to and from the central
the block or the protection key is zero. storage.
The storage operation is inhibited and an The secondary storage consists of 15,744
alarm signal is given if the keys do not 488-bit word core memory. The 488-bit
match and the protection key is not zero. words (8 of which are parity bits) are dis-
The storage key has an extra bit which pro- assembled to 60-bit words used in the central
tects fetch operation. If the bit is zero, only storage. The CPU can transfer any number
store operation is protected. Otherwise of 60-bit words between the central store
both store and fetch are protected. and ECS by simple commands. The major
CDC 6600 Memory Organization. The advantage of the ECS is that it can trans-
CDC 6600 memory hierarchy consists of a fer blocks of information at a rate of 60
fast central storage and slow extended core million bits/second. It may be directly ad-
storage (ECS) [32]. The central storage of dressed but at a considerably slower rate.
131,072 60-bit words is composed of 32 Thus, its principal use is as a high speed
independent banks. The banks are inter- buffer.
leaved to provide high block transfer rates. CDC 6600 I/O Handling. Ten peripheral
The computer has two cycles, major (1000 processing units (PPU) handle I/O ac-
nanoseconds (nsec) and minor (100 nsec). tivities. Each PPU consists of four registers
The storage read and store cycle take one and a storage unit of 4096 12-bit words.
major cycle, whereas transferring a data The processors are arranged in the form of
word through the storage distribution a "barrel". The barrel has ten positions
system takes one minor cycle. There is a and each position is occupied by a PPU.
mechanism called the stunt box which exam- There is one position called "slot" which
ines the requests and directs information flow is capable of accessing and utilizing arith-
in and out of the central storage. When the metic and logic hardware; the ten PPU's
stunt box accepts a new access request it share the slot by circulating the contents of
decides whether the bank requested is busy their four registers. When a PPU is in the
or free; if the bank is free the read and store slot it stays there for 100 nsec and uses the
cycle is initiated. If the bank is busy, the arithmetic and logic hardware to execute
address requested is circulated within the the program stored in its storage unit. A
stunt box. The stunt box can hold three PPU instruction requires one or more steps
circulating addresses and each circulation of execution with each step taking 1000
takes 300 nsec. Top priority is given to the nsec. It can be noted that the central storage
addresses in circulation for access to the read and store cycle takes 1000 nsec which
storage. Because of the circulation time is the time interval between consecutive
(300 nsec) and the major cycle time (1000 sharing of the slot by any PPU. The PPU
nsec) the mechanism prevents permanent can transfer data between peripheral de-
recirculation of any request. In case of vices and main memory and supervise
consecutive requests to the same bank the the operation of the devices. The PPU
requests are satisfied after at most two major have the capability of establishing paths
cycles. to I/O devices through twelve peripheral
The stunt box is also responsible for at- channels. A PPU can interrupt the opera-
taching priorities to requests coming from tion of the central processor by means of
the central processor unit and peripheral an exchange jump. When an exchange
processing units. It prevents the situation jump is issued, the CPU makes an exchange
where in the recirculating addresses read and between the contents of its 24 registers and
write requests are made to the same storage the contents of the "exchange package"
location. This is because of the stunt box's which starts at a location in the central
out-of-order recirculation properties. It storage specified by the PPU. The exchange
package consists of 16 words and specifies poses the following definition: "Micro-
the new contents of the 24 central registers. programming is a technique for designing
Once the exchange is made the CPU starts and implementing the control function of a
on the new program specified by the pro- data processing system as a sequence of
gram address register (note this register is control signals, to interpret fixed or dy-
one of the central registers). A P P U is also namically changeable data processing func-
capable of monitoring the CPU by trans- tions. These control signals, organized on a
ferring the contents of the CPU program word basis and stored in a fixed or dy-
address register to one of its registers. namically changeable control memory, repre-
The CPU can also initiate the exchange sent the states of the signals which control
jump. the flow of information between the exe-
cuting functions and the orderly transition
Physical Organization, Control of Information between these signal states."
Flow and Representation and Interpretation of A basic microprogramming scheme [33]
Information is shown in Figure 10. Register I contains
an address which is decoded by the decoder
Now we consider the architectural feature (D). The horizontal line in the read only
of microprogramming which can only be memory (ROM) that corresponds to the
explained when all three components of address is activated and issues signals. The
architecture are used. In the literature this signals under Matrix A control the data
feature is usually associated with system paths of the arithmetic units, registers, etc.,
architecture. The reason for this association of the computer system. The signals of
is that microprogramming is able to present Matrix B specify the next address to be
to the user an architecture that is not a decoded and are forwarded to Register II.
real machine architecture. Conditional jumps can be handled as shown
Microprogramming: Husson [21] pro- at X. A flip-flop whose state can be controled
REGISTER II REGISTER I
1 MATRIX A MATRIX B
m i
\ \
Y
CONTROL SIGNALS
TO
ARITHMETIC UNIT,
ETC.
FROM
C ON D IT ION
FLIP-FLOP
by the previous orders issued by Matrix A these seemingly unrelated and diverse con-
decides which of the two lines in Matrix B cepts. We conclude by considering some of
is to be energized. the problems and trade-offs an architect
The signals issued when any line of the faces in implementing these concepts and in
ROM is activated form a microorder. The evolving an architecture.
format used for microorders can be either
horizontal or vertical depending on how Array Organization
the orders are interpreted. In a horizontal
format, each signal under Matrix A directly In this organization identical processors are
controls a gated data path. In a vertical connected in an array fashion. The ILLIAC
format, the signals are organized into fields IV is a familiar example of this type of or-
and each field controls the operations of a ganization. The ILLIAe IV operates in a
particular section (like an adder) of the single instruction stream--multiple data
computer system. In this format, encoding stream mode (SIMD) [14], i.e., at any time
of signals is performed and hence hori- all the enabled processors execute a single
zontally formatted microorders are usually instruction (issued by a single control unit)
longer than vertically formatted ones. Verti- on different data; the processors that are
cal format microorders sometimes resemble not enabled do not execute the instruction.
machine language instructions in that they. However with suitable operating systems,
have operand and address fields. Maximal it should be possible for array processors to
parallelism at hardware level can be ex- handle the multiple instruction stream--
ploited by using horizontal format micro- multiple data stream mode of operation.
orders, but generating these orders can be The array organization is very effective in
cumbersome and time consuming. exploiting parallelism when the character-
Microprogramming has been used in istics of the problem to be solved match the
widely differing contexts. For its applica- physical structure. Matrix operations pro-
tions the interested reader should refer to vide an example of this kind of problem.
Flynn and Rosin [15]. Present day large When all the processors are identical, man-
systems like the CDC 6600/7600 and IBM ufacturing and maintenance are greatly
360/195 do not use microprogramming simplified. A disadvantage of the array or-
for their control units. It appears that ganization is the poor utilization of resources
microprogramming is used in practice, not that may result when the problem structure
for its systematic implementation of the does not match the physical structure. The
control section, but for its ability to offer failure of a single processing element can
emulation capabilities. It is interesting how- hamper the operation of the entire system;
ever, to note that microprogramming is a sophisticated system could, however,
used to implement the control of the stream- create new and alternate data paths for
ing unit of the CDC STAR 100 [23]. continued operation of the system.
stages. Most vector operations can, for ing tasks and ensuring correct execution of
example, be operated in this manner. Pipe- the program (by preserving task prece-
line organization loses its efficiency when dences). The lack of structure in this organi-
some jobs require a processing sequence zation can increase the overhead of dis-
different trom that of the pipeline. Job de- patching tasks. The processing of jobs is
pendencies adversely affect the job flow "diffused" as in the pipeline organization.
and hence the efficiency of this organiza-
tion. Since the processing of jobs becomes
"diffused"--at any instant the pipeline Stack Processing
contains jobs at different levels of comple- In this type of processing, information flow
tion-interrupts and machine malfunctions between central registers is controled in a
cannot be handled satisfactorily. For in- such way that a pushdown store (or a stack)
stance, the architecture of IBM 360/91 has is realized [7]. New operands, which are
to settle for what is referred to as an "im- entered into the top register of the stack,
precise interrupt" [3]. cause a "pushdown" action to occur, i.e.,
the contents of each register move down by
Modular Organization one register level. Binary operations can be
performed on the top two registers with the
This organization consists of independent result being returned to the top register.
functional units (capable of performing The contents of the top register can be
specialized tasks) and/or processors (cap- stored in main memory. The Burroughs
able of performing any task). Tasks, when B6500 and English Electric KDF-9 employ
they are ready, are dispatched to the ap- stack processing.
propriate functional units or processors The following discussion of advantages
(usually by the supervisor of the organi- and disadvantages of stack processing is
zation). The central processing unit of a based on Brooks [7]. Stack processing mini-
CDC 6600 employs a modular organization mizes main memory data references when
in which there are ten independent modules. evaluating algebraic expressions. With stack
In the SYMBOLsystem, function modules are processing, shorter program representation
dedicated to perform portions of the com- is possible as most operand addresses can
puting process such as translation, memory be eliminated. It simplifies subroutine
control, garbage collection, central processor, management and compilation of source
or other processes. In contrast to array and programs, especially those programs with
pipeline organizations the modular organi- recursive definitions. Stack processing makes
zation usually has a variable structure. The it easier to handle block structured lan-
supervisor of the modular organization can, guages like ALGOL. However, this type of
by establishing appropriate data flow paths, processing is helpful only if the items that
simulate any particular structure (e.g., a are to be processed can be made to "surface"
pipeline or an array). to the top of the stack. A further disad-
An advantage of this type of organiza- vantage is that many stacks, such as a stack
tion is the enhanced performance obtain- for control and a stack for data, are often
able by using overlap and distributed func- needed for satisfactory operation. When
tion computation. The organization can variable length fields are used, stack registers
ensure graceful degradation of performance must be of ~rariable length to accommodate
in ease of system component failures. Grace- the values selected from these fields. This
ful degradation is achieved by having often proves to be difficult to implement.
multiple function modules of the same type;
when a module fails, its task can be assigned Virtual Memory
to another module. On the other hand, the
supervisory system for such an organization By automatic control of information flow
tends to be complex because it has the ad- between the main and secondary memories,
ditional responsibility of properly dispatch- a system with virtual memory [11] gives the
programmer an illusion of operating with a ing system and the real machine. For details
main memory that is larger in capacity concerning the implementation of the moni-
than the actual memory. This is accom- tor, refer to Madnick and Donovan [26].
plished by dividing the address space into An advantage of virtual machines is that
blocks of contiguous addresses and storing the users can run different operating systems
them in both the main and secondary on the same real machine at the same time.
memories. When the programmer makes a On the negative side, the virtual machine is
reference to an item not present in the several times slower because there is over-
main memory, the computer system auto- head associated with the monitor.
matically transfers the block containing
the referred item from the secondary to the Parallel Processing
primary memory. The new incoming block
will displace a resident block according to In this type of processing, the performance
some fixed rule if the main memory cannot of a computer system is increased by in-
accommodate the new block. When the troducing control and data paths among
blocks are of variable size, one has "seg- its hardware resources. For our purposes
mentation;" when they are of fixed size, we consider parallel processing at bit and
the situation is referred to as "paging." task levels. We follow the model of Shore
The principal advantage of virtual mem- [30] for bit level processing. Figure 11 shows
ory is that the user can be indifferent to a system which consists of a data memory
main memory limitations in his program- (DM), an instruction memory (IM) and a
ming. He need not concern himself with control unit (CU). In the DM, words are
the problems of overlays and memory stored horizontally. A bit (word) slice is any
management. The large address space pro- set of bits exposed by a single vertical
vided by virtual memory also simplifies (horizontal) cut through the DM. The word
multiprogramming. On the other hand, slice processing unit (WSPU) can operate
efficient utilization of the main memory is on word slices whereas the bit slice process-
not always possible. Paged systems round ing unit (BSPU) operates on the bit slices.
up storage requests to the nearest integral In Shore's terminology, Machine I refers
number of pages and this sometimes causes to the system with only word slice processing
appreciable loss of the main memory ("frag- capabilities, Machine II refers to the system
mentation"). Multiprogrammed systems with only bit slice processing capabilities
sometimes exhibit performance degradation and Machine III has both of the processing
which is due to a phenomenon known as capabilities. (There are also Machines IV,
"thrashing" [11]. V and VI which are best considered at task
level.) It is interesting to note that Machine
I is a conventional sequential processor and
Virtual Machines
Machine II is a bit serial associative proces-
By means of hardware and software con- sor. Shore's scheme does not fit Flynn's
trol of information flow a single computer classification [14]. Shore states: " I n terms of
system presents to the users multiple exact a taxonomy introduced by Flynn, it is often
copies of the system. Each user is given the stated that Machine II is a single-instruc-
illusion that he has the complete computer tion-stream, multiple-data-stream processor
system at his disposal. As an example the whereas Machine I is not. In fact, they
IBM's VM/370 offers the user a virtual both are. Machine I processes multiple-bit-
IBM 370 system on which he can run any streams a word slice at a time, whereas
system/370 or system/360 operating system. Machine II processes multiple-word-streams
The virtual machine, of course, runs several a bit slice at a time. The myopic association
times slower than the real machine. The of multiple-data-streams with multiple-word
appearance of multiple copies of the basic streams is a conceptual error having nothing
machine is handled by the virtual machine to do with computing power."
monitor which interfaces the user's operat- Shore considers the ratio of processing
WORD . ~ ,\\\\\\\\\\\\\\\'E~~
SLICE /
/
/
/
DATA / BIT
/ ~ SLICE
MEMORY /
/ PROCESSING
/ UNIT
YA
II I
WORD SLICE PROCESSING UNIT
1
IINSTRUCTION
MEMORY
effective in handling matrix and vector Model 91: machine philosophy and instruc-
operations, and t h a t stack processing makes tion handling," IBM J. R. & D. 11, 1, (Jan.
1967), 8-24.
it easier to compile and execute ALGOL pro- [4] BAER, J . L . "A survey of some theoretical
grams. Since no single architecture can aspects of multiprocessing," Computing
Surveys 5, 1 (March 1973), 31-80.
satisfy the needs of all users, it has become [5] BARNES,G. H. et al, "The ILLIACIV com-
desirable to have a computer system whose puter," IEEE Trans. Computers (August
architecture can be defined and varied dy- 1968), 746-757
[6] BEIZEa, B. The architecture and engineer-
namically. ing of digital computer complexes, Vols.
)~t present, emulation is the main prin- 1 and 2, Plenum Press, New York, 1971.
[7] BaooKs, F. P., JR., "Recent developments
ciple used to offer variable architectures to in computer organization," in Advances
the user. B u t emulation is inherently slow in electronic and electron physics, Vol. 18,
and inefficient and would defeat our pur- Academic Press, New York, 1963, pp. 45-65.
[8] BROOKS,F . P , JR., "The future of computer
pose, which is to speed up computation architecture," in Proc. IFIP Congress 65,
with dynamic architecture. Using our three Vol. 1, Spartan Book Co., Washington, D.C.,
component approach to architecture, it is 1965, pp. 87-91.
[9] BRow~, D. T. "Error detecting and cor-
possible to conceive a system with dynamic recting binary codes for arithmetic opera-
organization. The user can specify the tions," IEEE Trans. Electronzc Computers
(Sept. 1960), 333-337.
architecture he needs in terms of the three [10] BURROUGHS CORPORATION, Burroughs B
components, and the system will exhibit 6700 information processing systems reference
this architecture b y introducing appropriate manual, Burroughs Corp., Detroit, Michi-
gan, 1972.
changes in its control and data paths and [11] DENNING, P. J. "Virtual memory," Com-
b y altering its representation and interpre- puting Surveys 2, 3 (Sept. 1970), 153-189.
tation of information. The speed require- [12] FEUSTEL, E. A. "On the advantages of
tagged architecture," IEEE Trans. Com-
ments dictate t h a t these changes be exe- puters (July 1973), 644-656.
cuted at hardware level. The authors [28] [13] FLORES, I. Computer organ~zahon, Pren-
tice-Hall, Englewood Cliffs, N.J., 1969.
propose a system where it is possible to [14] FLYNN,M.J. "Very high-speed computing
structure system resources as a pipeline, systems," in Proe. of IEEE, 1966, IEEE,
an array, or in any configuration the user New York, 1966, pp. 1901-1909.
[15] FLYNN, M. J.; AND ROSIN, R . F . "Micro-
m a y want. Structuring is accomplished b y programming: an introduction and a vmw-
dynamically establishing bus paths between point," IEEE Trans. Computers (July 1971),
the resources. Thus the physical element of 727-731.
[16] FOSTER, C. C. "Computer architecture,"
architecture is 'altered' b y suitable con- IEEE Trans. Computers, (March 1972), 19.
trol of information flow. Similarly, the [17] FOSTER, C. C Computer architecture, Van
Nostrand Reinhold Company, New York,
other components of architecture can be 1970.
altered. For instance, information flow can [18] HAUCK, E. A.; AND DENT, B. A. "Bur-
be controled to exhibit a stack or nonstack roughs' B6500/B7500 stack mechanism,"
in AFIPS Sprang Jr. Computer Conf., 1968,
structure depending on the program en- Thompson Book Co., Washington, D.C.,
vironment. B y attaching tags to operands pp. 245-251.
[19] HINTZ, R. G.; .~ND TATE, D. P. "Control
and interpreting t h e m dynamically, we Data STAR-100 processor design," in
can obtain an architecture in which the COMPCON 72 Szxth Annual IEEE Comp.
third component is a variable. Soc. Internatl. Conf., IEEE, New York,
1972, pp. 1-4.
[20] HOFFMAN,L. (Ed) Securzty and privacy zn
REFERENCES computer systems, Melville Publ. Co., Los
Angeles, Cahf, 1973.
[1] ABRAMS,M. D.; AND STEIN, P.G. Computer [21] HcssoN, S. S. Mieroprogramm~ng: prin-
hardware and software, an znterdisczplmary ciples and practice, Prentice-Hall, Engle-
introduction, Addison-Wesley, Reading, wood Chffs, N . J , 1970.
Mass., 1973. [22] ILIFFE, J. K. Basic machine principles,
[2] AMDAHL, G. M.; BLAAUW, G. A.; AND (2d Ed.), American Elsevmr, New York,
BROOKS, F P , JR, " Architecture
" of t h e 1972.
IBM System/360," IBM J. R & D. (April [23] JONES, L H.; ANDMERWIN, R.E. "Trends
1964), 87-101. in mmroprogramming: a second reading,"
[3] ANDERSON, D. W.; SPARACIO, F. J.; AND IEEE Trans. Computers (August 1974), 754-
TOMASULO, R. M. "The IBM System~360 759.
[24] KATZAN, H., JR., Computer organization [29] RICHARDS,R . K . Electronic digital systems,
and the System~870, Von Nostrand Rein- John Wiley & Sons, New York, 1966.
hold Co., New York, 1971. [30] SHORE, J. E "Second thoughts on parallel
[25] KNVTH, D E. The art of computer pro- processing," Computers and Electrical Engi-
gramming, Vol. 1, Addison-Wesley, Reading, neemng (June 1973), 95-109.
Mass., 1968. [31] TEXAS INSTRUMENTS INC. A description of
[26] MADNICK,S. E.; AND DONOVAN, J . J . Ope- the advanced scientific computer system,
ratzng systems, McGraw-Hill, New York, Equipment Group, Texas Instruments, Inc.,
1974. Austin, Texas, 1973.
[27] ORQANICK, E. I. Computer system organi- [32] THORNTON, J E. Design of a computer:
zation, the B5700/B6700 seines, Academic the CDC 6600, Scott, Foresman & Co., Glen-
Press, New York, 1974. view, Ill., 1970.
[28] REEDI, S. S.; AND FEUSTEL, E. A. "An [33] WILKES, M. V.; AND STRINGER, J . B . "Mi-
approach to restructurable computer sys- croprogramming and the design of the con-
tems," in Proc. Sagamore Computer Conf., trol circuits in an electronic digital com-
1974, Lecture notes in Computer science, puter," in Proc. Cambmdge Phil. Soc., Part
Vol. 24, Springer Verlag, New York, 1975, 2, 1953, Cambridge Univ. Press, New York,
319-337. 1953, pp. 230-238.