354 33 Powerpoint-Slides CH17

N.
Senthil Kumar,
M. Saravanan &
S. Jeevananthan
Chapter 17

MULTIPROCESSOR CONFIGURATION

Oxford University Press 2013. All rights reserved.

Introduction
The speed of any microprocessor based system
depends upon the clock frequency at which connected
processors and peripherals works.
When bulk I/O data transfer is done under the control
of microprocessor then the processor has to spend
most of its time idle
Enhancement of the speed, on appropriate system
involving several connected processors, requires a
certain topology - Such a system is known as
multiprocessor system.


Introduction
The simplest type of multiprocessor system is one
containing a CPU (such as 8086) and a numeric data
processor (NDP) and/or input/output processor (IOP).
The NDP (such as 8087)and IOP work in synchronism
with the main processor to complete the specific tasks
and are known as coprocessors.
Additional hardware circuits such as bus arbiter, bus
controller may be needed to co-ordinate the activities
of the number of processors working at a time in the
system.


Need and Advantages of
Multiprocessor System
By using a DMA controller with a CPU (such as 8086),
the system throughput could be improved by
concurrently performing I/O data transfer as the CPU
continued its processing.
In general, if a system includes two or more processors
that can execute instructions simultaneously, it is called
a multiprocessing system.

Need and Advantages of
The added processors could be special purpose
processors which are specifically designed to perform
certain tasks efficiently or other general purpose
processors.

Multiprocessor Configuration
Vs. Single-chip Microprocessor
Several processors may be combined to fit the needs of
an application while avoiding the expense of the
unneeded capabilities of a single complex multiple-chip
processor.
The modularity of a multiprocessor system provides
means for expansion because it is easy to add more
processors as the need arises.


Vs. Single-chip Microprocessor
In a multiprocessor system, tasks are divided among
the modules. If the failure occurs, it is easier and
cheaper to find and replace the malfunctioning
processor than it is to find and replace the failing part
in a complex processor.

Different Configurations Of

The maximum mode operation of the 8086 is
specifically designed to implement multiprocessor
systems.
Three basic configurations - They are the coprocessor,
closely coupled and loosely coupled configurations.
The first two of the configurations are very similar in
that both the CPU (i.e 8086) and the external or
supporting processor share not only the entire memory
and I/O subsystem, but they also share the same bus
control logic and clock generator as shown in figure.

Closely Coupled


Closely Coupled


In a closely coupled configuration the supporting
processor may act independently from the CPU, but
in a coprocessor design it is dependent on the CPU
and must interact directly with the CPU.
Loosely Coupled

Loosely Coupled


In Closely Coupled Configuration, several modules
may share the system resources, and system bus
control logic must resolve the bus contention
problem.
Each potential bus master runs independently and
there are no direct connections between them as
shown in fig.
In addition to the shared resources, each module
may include its own memory and I/O devices.
Loosely Coupled

The processors in the separate modules can
simultaneously access their private subsystems through
the local buses and perform their local data references
and instruction fetches independently, thus improving
the degree of concurrent processing.
In a loosely coupled multiprocessor system, two 8086
processors can not be tied directly together.
Each CPU has its own bus control logic, and bus
arbitration is resolved by extending this logic and
adding external logic that is common to all master
modules.

Loosely Coupled


Therefore, several CPUs can form a very large system
and each CPU may have independent processors
and/or a coprocessor attached to it
Advantages of Loosely
Coupled Configuration

The system can be expanded in a modular form. Each
bus master module is an independent unit and
normally resides on a separate PC board and hence, a
bus master module can be added or removed without
affecting the other modules in the system.
High system throughput can be achieved by having
more than one CPU
A failure in one module does not cause a breakdown
of the entire system and faulty module can be easily
detected and replaced.

Advantages of Loosely
Coupled Configuration

Each bus master may have a local bus to access
dedicated memory or I/O devices so that a greater
degree of parallel processing can be achieved.
Bus Arbitration


In a loosely coupled multiprocessor system, more
than one bus master module may have access to the
shared system bus.
Since each master is running independently, extra bus
control logic must be provided to resolve the bus
arbitration (i.e. allotment of system bus to a
particular requesting master) problem.
This extra logic is called bus access logic and its
responsibility is to make sure that only one bus
master at a time has control of the bus. Simultaneous
bus requests are resolved on a priority basis.
Daisy Chaining


There are three schemes for establishing priority
namely daisy chaining, polling and independent
requesting.
Daisy Chaining


In this simple and low cost methods, all the masters
use the same line for making bus requests.
To respond to a bus request (BR) signal, the controller
sends bus grant (BG) signal if the bus busy signal is
inactive.
The grant signal serially propagates through each
master until it encounters the first one that is
requesting access to the bus.
Who Gets Priority in Daisy
Chaining?


The first module blocks the propagation of the bus
grant signal, activates the bus busy line and gains
control of the bus.
Any other requesting module which is present after
the master that has now gained the control of the
bus, will not receive the grant signal and therefore,
the priority is determined by the physical location of
the modules.
The requesting module located closest to the
controller has the highest priority.

Merits and Demerits of
Daisy Chaining

Compared to the other two methods, the daisy chain
scheme requires least number of control lines and
this number is independent of the number of
modules in the system.
However, the arbitration time is slow due to the
propagation delay of the bus grant signal through the
different masters.
Merits and Demerits of
Daisy Chaining

This delay is proportional to the number of modules
and therefore, a daisy-chain based system is limited
to the multiprocessor system having only a few
modules.
Further, the priority of each module is fixed by its
physical location and failure of a module in the
system causes the whole system to fail.

Polling


It uses a set of lines sufficient to address each
module.
In response to a bus request (BR), the controller
generates and sends out a sequence of module
addresses to the requesting modules.

Polling


When a requesting module recognizes its address, it
activates the busy line and begins to use the bus.
The major advantage of polling is that the priority
can be dynamically changed by altering the polling
sequence (i.e. the order in which the module
addresses are sent) stored in the controller.

Independent Requesting


This fastest and high BR & BG lines (2m lines are
needed for m modules)scheme resolves the priority
in a parallel fashion.
Independent Requesting


Each module has a separate pair of bus request (BR)
and bus grant lines (BG) and each pair has a priority
assigned to it.
The controller includes a priority decoder, which
selects the request with the highest priority and
returns the corresponding bus grant signal.
Interconnection Topologies in
a Multiprocessor System


A microprocessor with its external bus connections
needs memory to form a minimum workable
processing system.
In a multiprocessor system, a number of
microprocessors are connected with each other using
a single bus.
The bus is also used to address a multi port memory
or a shared single I/O port.
Interconnection Topologies in
a Multiprocessor System


A microprocessor with its external bus connections
needs memory to form a minimum workable
processing system.
In a multiprocessor system, a number of
microprocessors are connected with each other using
a single bus.
The bus is also used to address a multi port memory
or a shared single I/O port.
The method of communication among the
microprocessors in a multiprocessor system

Shared Bus Architecture


Shared Bus Architecture


The shared bus architecture uses a common memory
which may be partitioned into local memory banks for
different processors.
At a time, only one processor performs a bus cycle
to fetch instructions or data from the memory.
Multi-port Memory


Multi-port Memory


The processors P1 and P2 address a multi-port memory
which can be accessed at a time by both the
processors.
Both the processors also have local memories which
are used by them to store individual instructions, data
and the execution of its individual task.
The multiport memory may be used for storing the
instructions, data and the results to be shared by more
than one processor.
Linked Input/Output


Linked Input/Output


Linked Input/Output interconnection utilizes
input/output capabilities of a microprocessor based
system to communicate with other systems
The direct access of common instructions and data
which are available in a local system memory is not
possible in this method.
Crossbar Switching


Crossbar Switching


It uses an extension of the concept of shared memory
for a number of processors.
In this method, more than one processor can have
simultaneous accesses to the different memory
modules to be shared individually as long as there is
no conflict.
The total memory is divided into modules. While one
processor is accessing a memory module, the other
processor will be denied an access of the same
module till it is relinquished by the former processor.
Crossbar Switching


The crossbar switch provides the interconnection
paths between the memory modules and the
processors.
In crossbar switch interconnection, several parallel
data paths are possible. Each node of the crossbar
represents a bus switch.
All these nodes may be controlled by one of these
processors or by a separate one.
Physical interconnections
between processors in a
multiprocessor system


Star configuration

Loop configuration

Complete interconnection

Regular topologies and

Irregular topologies
Star Configuration


Star Configuration


All the processors are connected to a central switching
element via dedicated paths.
The central switching element may be an independent
processor.
The switching element controls the interconnections
between the processing elements.

Ring or loop Configuration


Ring or loop Configuration


The processors are arranged in a loop and each
processor can communicate with the other one
through intermediate processors in the path.
The number of intermediate processors depends upon
the position of sender and receiver in the loop.
The direction of data transfer along the loop may be
unidirectional or bidirectional.
Completely Connected
Configuration


Completely Connected
Configuration


Every processing element can directly communicate
with another processor at a time.
the required number of dedicated interconnection
paths which is given by equation
Interconnection path.

Regular Topology


Regular Topology


The processors can be arranged in any of the regular
structures such as linear array, square, hexagonal or
cubical configurations. Each of the processors (nodes)
has a local memory to be accessed only by that
processor.
Each of the processors can communicate with a fixed
number of neighbours in the specific regular structure.
Irregular Topology


The processors in this scheme do not follow any
uniform or regular connection pattern.

The number of neighboring processors, with which a
processor can communicate is not fixed and may even
be programmable.
Operating System Used in a

Once the microprocessors are arranged in a particular
topology, an appropriate operating system and system
software are required which will be able to work in
coordination with new system resources.
An operating system is a program that resides in the
computer memory and acts as an interface between
the user or application program and the computer
resources.
Operating System Used in a

The success of a multiprocessor system relies on a
suitable operating system.
The operating system used for single processor can
not be used for multiprocessor system.
Distributed Operating System

Distributed operating systems are designed to run
parallel processes.
Hence it is essential that a proper environment exists
for concurrent processes to communicate and
cooperate in order to complete the allotted task.
The features expected from a distributed operating
system used in a multiprocessor system are listed.

Features of Distributed OS


A distributed operating system should provide a
mechanism for inter-process and inter-processor
communication.
A distributed operating system must be capable of
handling the structural or architectural changes in the
system due to expected or unexpected reasons like
faults or modifications in the configuration.
The distributed operating system should also take care
of the unauthorized data access and data protection,
as the data sets in these systems are referred by more
than one processor.
Features of Distributed OS


The distributed operating system must have a
mechanism to split the given tasks into concurrent
subtasks which can be executed in parallel on different
processors and to collect the results of the subtasks
and further process these to obtain the final result.
Multiprocessor system having
8086 and 8087

Typical multiprocessor system consisting of 8086 and
8087 (Numeric coprocessor)
The 8087 is a coprocessor which has been designed to
work under the control of 8086 and gives additional
numeric processing capabilities to 8086.
8087 is a 40 pin IC and is available in 5,8 and 10 MHZ
versions compatible with different versions o
Multiprocessor system having
8086 and 8087

When 8086 is interfaced with 8087, the instructions of
the 8087 can be included in the program which has to
be executed by 8086. The 8086 performs the opcode
fetch cycles and identifies the instructions for 8087. f
8086.
Architecture of 8087




8087 has two internal sections namely Control Unit
(CU) and Numeric Extension Unit (NEU).
The NEU executes all the numeric processor
instructions while the CU receives and decodes
instructions, and reads or writes memory operands.
The control unit is also responsible for establishing
communication between the CPU (8086) and memory,
and also for coordinating the internal coprocessor
execution.


The internal data bus in 8087 is 84 bits wide including
68-bit fraction, 15-bit exponent and a sign bit.
The microcode control unit in 8087 generates the
control signals required for execution of the 8087
instructions.
8087 contains a programmable shifter which is
responsible for shifting the operands during the
execution of instructions like FMUL and FDIV.
The data bus interface in 8087 connects the internal
data bus of 8087 with the 8086s system data bus.
Pin Details of 8087


Interconnection of 8087 with
8086

Interconnection of 8087 with
8086

8087 can be connected with 8086 only when 8086 is
operating in maximum mode.
In maximum mode, all control signals are derived using a
separate chip known as Bus Controller (8288).
The BUSY pin of 8087 is connected with the pin of 8086.
The QS0 and QS1 lines in 8087 are directly connected to
he corresponding pins in 8086 based system.
The pins AD15-AD0, A19/S6-A16/S3, /S7, Reset and
Ready of 8087 are connected to the corresponding pins
of 8086.
Multiprocessor System Having
8086 and 8089

While accessing the I/O devices by non-DMA data
transfer, such as serial port and parallel port in the
personal computer, the CPU (such as 8086) is required
to set up the interfacing chips used to access the I/O
devices and perform the actual data transfer.
For high speed devices, data are transferred using DMA,
but the CPU has to set up the device controller, initiate
the DMA operation, and check the post-transfer status
after the completion of each DMA operation.

Multiprocessor System Having
8086 and 8089

The 8089 I/O processor (IOP) is designed to handle the
tasks involved in I/O processing. An IOP can fetch and
executes its own instructions, unlike a DMA controller.
8089


The instruction set of 8089 is specifically designed for
I/O operations, but in addition to data transfer, they can
perform arithmetic and logic operations, branches,
searching and translation.

The CPU communicates with the 8089 through
memory-based control blocks. The CPU prepares
control blocks that describe the task to be performed,
and then sends the task to the 8089 through an
interrupt like signal. The 8089 reads the control blocks
to locate a program called a channel program, which is
written using the 8089s instruction set.

8089


Then the 8089 performs the assigned task by fetching
and executing instructions from the channel program.
When 8089 has finished the task, it informs that to the
CPU either through an interrupt or by updating a status
location in memory.
Pin details of 8089


Local and Remote operation of 8089


The 8089 assumes all of the work involved in an I/O transfer
including device set up, DMA operation and programmed I/O,
thereby relieving the CPU from the burden of the I/O
processing.
This allows the CPU to concentrate on higher-level tasks while
the 8089 takes care of the I/O processing.
This greatly simplifies system software and hardware efforts,
and improves system performance and flexibility, by distributed
processing approach.
The 8089 may be operated in a local (closely coupled)
configuration or a remote (loosely coupled) configuration.
Local Configuration


Local Configuration


In a local configuration, the 8089 shares the bus
interface with the host (8086) by using its pins. All
resources are accessed through the system bus.
Remote Configuration


Remote Configuration


In a remote configuration, the 8089 may have its own
local I/O bus and requires a bus arbiter and controller,
address latches and data transceivers for accessing the
shared system bus.
8089 (IOP) Architecture




Each of the two channels can be programmed and
operated independently while sharing the common
control logic and ALU.
The channel control pointer (CCP) can not be
manipulated by the user.
It stores the address of the control block (CB) for
channel 1 during the initialization sequence.
For channel 2, its control block (CB) starts at the address
that is indicated by adding 8 to the contents of the CCP.



To dispatch a task to either channel, the CPU (8086)
sends out a channel attention (CA) signal along with
the select (SEL) signal which selects channel 1 (if SEL=0)
or channel 2 (if SEL=1).
Since the channels occupy two consecutive I/O port
addresses, the A0 address line of 8086 is connected to
the SEL pin so that when A0=0, one channel is selected
and when A0=1, another channel is selected.
Each channel has an identical set of registers, each set
being divided into two groups according to size.



The pointer group consists of those registers having 20
bits, and the register group consists of those registers
having 16 bits.
Each pointer, with the exception of PP, has an associated
tag bit. When used to access a memory operand, the tag
bit indicates whether the contents of that pointer
represents a 20-bit system (i.e. memory) space address
(if tag=0) or a 16-bit local (i.e. I/O) space address (if
tag=1).
In accessing the local space, only the low-order 16-bits
of the pointer are used as the address. Register PP always
points to an address in the system space.

Registers in 8089 IOP




The registers GA, GB, GC, IX, BC, and MC can be used as
general purpose registers for arithmetic and logic
operations in a channel program. In addition, they
perform special functions when addressing memory
operands and executing DMA operations.
A memory operand can only be addressed by using
one of the pointers GA, GB, GC, or PP as a base register.
During a DMA operation, GA and GB are used for the
source and destination pointers. If GA points to the
source, then GB points to the destination, and vice-
versa.


When a translation operation is performed along with
the DMA transfer, the contents of GC are used as the
base address of a 256-byte translation table.
Register BC is used as the byte counter during a DMA
transfer, and is decremented by 1 after each byte
transfer and by 2 after each word transfer.

354 33 Powerpoint-Slides CH17

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

354 33 Powerpoint-Slides CH17

Diunggah oleh

Hak Cipta:

Format Tersedia

N.

Anda mungkin juga menyukai