Anda di halaman 1dari 268

Figure 1.

1 Astrophysical N-body
simulation by Scott Linssen (undergraduate
University of North Carolina at Charlotte
[UNCC] student).

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

Main memory
Instructions (to processor)
Data (to or from processor)
Processor

Figure 1.2 Conventional computer having


a single processor and memory.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

One
address
space

Memory modules

Interconnection
network

Processors

Figure 1.3 Traditional shared memory


multiprocessor model.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

Interconnection
network
Messages
Processor

Local
memory

Computers

Figure 1.4 Message-passing


multiprocessor model (multicomputer).

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

Interconnection
network
Messages
Processor

Shared
memory
Computers

Figure 1.5 Shared memory multiprocessor


implementation.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

Program
Instructions

Program
Instructions

Processor

Processor

Data

Data
Figure 1.6 MPMD structure.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

Computers

M
C

Network with direct links


between computers

C
P

Figure 1.7 Static link multicomputer.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

Computer (node)

Links
to other
nodes

Switch

Processor

Links
to other
nodes

Memory

Figure 1.8 Node with a switch for internode message transfers.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

Link
Node

Node

Figure 1.9 A link between two nodes with


separate wires in each direction.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 1.10 Ring.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

10

Links

Computer/
processor

Figure 1.11
(mesh).

Two-dimensional array

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

11

Root

Links

Processing
element

Figure 1.12 Tree structure.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

12

110
100

111

101

010
000

011
001

Figure 1.13 Three-dimensional hypercube.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

13

0110
0100

0111

0101

0010
0000

1100

0011
0001
Figure 1.14

1110

1111

1101

1010

1000

1011
1001

Four-dimensional hypercube.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

14

Ring

Figure 1.15 Embedding a ring onto a torus.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

15

Nodal address
1011
10

11

01

00
x

00

01

11

10

Figure 1.16 Embedding a mesh into a


hypercube.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

16

A
Root
A

Figure 1.17 Embedding a tree into a mesh.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

17

Packet

Head
Movement

Flit buffer

Request/
Acknowledge
signal(s)
Figure 1.18 Distribution of flits.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

18

Source
processor

Destination
processor

Data
R/A

Figure 1.19 A signaling method between


processors for wormhole routing (Ni and
McKinley, 1993).

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

19

Packet switching

Network
latency
Wormhole routing
Circuit switching

Distance
(number of nodes between source and destination)

Figure 1.20

Network delay characteristics.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

20

Node 4

Node 3

Messages

Node 1

Node 2

Figure 1.21 Deadlock in store-and-forward


networks.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

21

Virtual channel
buffer

Node

Node
Route
Physical link

Figure 1.22 Multiple virtual channels mapped onto a single physical channel.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

22

Ethernet

Workstation/
file server

Workstations

Figure 1.23 Ethernet-type single wire


network.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

23

Frame check
sequence
(32 bits)

Data
(variable)

Type
(16 bits)

Source
address
(48 bits)

Destination
address
(48 bits)

Preamble
(64 bits)

Direction
Figure 1.24

Ethernet frame format.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

24

Network

Workstation/
file server
Workstations
Figure 1.25 Network of workstations connected via a ring.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

25

Workstations

Workstation/
file server
Figure 1.26 Star connected network.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

26

Parallel programming cluster

(a) Using specially designed adaptors

(b) Using separate Ethernet interfaces


Figure 1.27 Overlapping connectivity Ethernets.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

27

Process 1
Process 2

Computing

Process 3

Slope indicating time


to send message

Process 4

Waiting to send a message

Message

Time

Figure 1.28 Space-time diagram of a message-passing program.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

28

ts
fts

(1 f)ts

Serial section

Parallelizable sections

(a) One processor

(b) Multiple
processors

n processors

tp
Figure 1.29

(1 f)ts /n

Parallelizing sequential problem Amdahls law.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

29

f = 0%

20

20

16
12
f = 5%
8

f = 10%
f = 20%

Speedup factor, S(n)

Speedup factor, S(n)

n = 256
16

12
8
4
n = 16

4
8
12
16
Number of processors, n
(a)

20

0.2

0.4
0.6
0.8
Serial fraction, f
(b)

1.0

Figure 1.30 (a) Speedup against number of processors. (b) Speedup against serial fraction, f.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

30

Source
file

Compile to suit
processor
Executables

Processor 0

Processor n 1

Figure 2.1 Single program, multiple data


operation.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

31

Process 1

spawn();

Start execution
of process 2

Process 2

Time

Figure 2.2 Spawning a process.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

32

Process 1

Process 2

send(&x, 2);

Movement
of data
recv(&y, 1);

Figure 2.3 Passing a message between


processes using send() and recv()
library calls.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

33

Process 1

Time

send();
Suspend
process
Both processes
continue

Process 2

Request to send
Acknowledgment

recv();

Message

(a) When send() occurs before recv()


Process 1

Process 2

Time

recv();
Request to send
send();
Both processes
continue

Suspend
process

Message
Acknowledgment
(b) When recv() occurs before send()

Figure 2.4 Synchronous send() and recv() library calls using a three-way protocol.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

34

Process 1

Process 2

Message buffer

Time

send();
Continue
process

recv();

Read
message buffer

Figure 2.5 Using a message buffer.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

35

Process 0

Process 1
data

data

Process n 1
data

Action
buf

bcast();

bcast();

bcast();

Code

Figure 2.6

Broadcast operation.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

36

Process 0

Process 1

Process n 1

data

data

data

scatter();

scatter();

scatter();

Action
buf

Code

Figure 2.7 Scatter operation.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

37

Process 0

Process 1

Process n 1

data

data

data

gather();

gather();

gather();

Action
buf

Code

Figure 2.8 Gather operation.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

38

Process 0

Process 1

data

Process n 1

data

data

reduce();

reduce();

Action
buf

+
reduce();

Code

Figure 2.9 Reduce operation (addition).

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

39

Workstation

PVM
daemon
Application
program
(executable)
Messages
sent through
network

Workstation

Workstation
PVM
daemon
Application
program
(executable)

PVM
daemon
Application
program
(executable)

Figure 2.10

Message passing between workstations using PVM.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

40

Workstation
PVM
daemon

Messages
sent through
network

Workstation
PVM
daemon

Workstation

PVM
daemon

Application
program
(executable)
Figure 2.11 Multiple processes allocated to each processor (workstation).

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

41

Array
holding
data

Process 1

Send buffer
Pack

Process 2

Array to
receive
data

pvm_psend();
Continue
process

pvm_precv(); Wait for message

Figure 2.12 pvm_psend() and pvm_precv() system calls.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

42

Process_1

Process_2

pvm_initsend();
pvm_pkint( &x );
pvm_pkstr( &s );
pvm_pkfloat( &y );
pvm_send(process_2 );

x
s
y

Send
buffer

Message
Receive
buffer

Figure 2.13

pvm_recv(process_1 );
pvm_upkint( &x );
pvm_upkstr( &s );
pvm_upkfloat( &y );

PVM packing messages, sending, and unpacking.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

43

#include <stdio.h>
Master
#include <stdlib.h>
#include <pvm3.h>
#define SLAVE spsum
#define PROC 10
#define NELEM 1000
main() {
int mytid,tids[PROC];
int n = NELEM, nproc = PROC;
int no, i, who, msgtype;
int data[NELEM],result[PROC],tot=0;
char fn[255];
FILE *fp;
mytid=pvm_mytid();/*Enroll in PVM */

Slave
#include <stdio.h>
#include pvm3.h
#define PROC 10
#define NELEM 1000

/* Start Slave Tasks */


no=
pvm_spawn(SLAVE,(char**)0,0,,nproc,tids);
if (no < nproc) {
printf(Trouble spawning slaves \n);
for (i=0; i<no; i++) pvm_kill(tids[i]);
pvm_exit(); exit(1);
}

main()
int
int
int
int
int

/* Open Input File and Initialize Data */


strcpy(fn,getenv(HOME));
strcat(fn,/pvm3/src/rand_data.txt);
if ((fp = fopen(fn,r)) == NULL) {
printf(Cant open input file %s\n,fn);
exit(1);
}
for(i=0;i<n;i++)fscanf(fp,%d,&data[i]);

/* Receive data from master */


msgtype = 0;
pvm_recv(-1, msgtype);
pvm_upkint(&nproc, 1, 1);
pvm_upkint(tids, nproc, 1);
pvm_upkint(&n, 1, 1);
pvm_upkint(data, n, 1);

/* Broadcast data To slaves*/


pvm_initsend(PvmDataDefault);
msgtype = 0;
pvm_pkint(&nproc, 1, 1);
pvm_pkint(tids, nproc, 1);
pvm_pkint(&n, 1, 1);
pvm_pkint(data, n, 1);
pvm_mcast(tids, nproc, msgtag);

{
mytid;
tids[PROC];
n, me, i, msgtype;
x, nproc, master;
data[NELEM], sum;

mytid = pvm_mytid();

/* Determine my tid */
for (i=0; i<nproc; i++)
if(mytid==tids[i])
{me = i;break;}
Broadcast data

/* Get results from Slaves*/


msgtype = 5;
for (i=0; i<nproc; i++){
pvm_recv(-1, msgtype);
Receive results
pvm_upkint(&who, 1, 1);
pvm_upkint(&result[who], 1, 1);
printf(%d from %d\n,result[who],who);
}
/* Compute global sum */
for (i=0; i<nproc; i++) tot += result[i];
printf (The total is %d.\n\n, tot);
pvm_exit(); /* Program finished. Exit PVM */
return(0);

/* Add my portion Of data */


x = n/nproc;
low = me * x;
high = low + x;
for(i = low; i < high; i++)
sum += data[i];
/* Send result to master */
pvm_initsend(PvmDataDefault);
pvm_pkint(&me, 1, 1);
pvm_pkint(&sum, 1, 1);
msgtype = 5;
master = pvm_parent();
pvm_send(master, msgtype);
/* Exit PVM */
pvm_exit();
return(0);
}

Figure 2.14 Sample PVM program.


Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

44

Process 0

Process 1

Destination
send(,1,);
lib()

send(,1,);

Source
recv(,0,);

lib()

recv(,0,);

(a) Intended behavior


Process 0

Process 1

send(,1,);
lib()

send(,1,);
recv(,0,);

lib()

recv(,0,);

(b) Possible behavior


Figure 2.15 Unsafe message passing with libraries.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

45

#include mpi.h
#include <stdio.h>
#include <math.h>
#define MAXSIZE 1000
void main(int argc, char *argv)
{
int myid, numprocs;
int data[MAXSIZE], i, x, low, high, myresult, result;
char fn[255];
char *fp;
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&myid);
if (myid == 0) {
/* Open input file and initialize data */
strcpy(fn,getenv(HOME));
strcat(fn,/MPI/rand_data.txt);
if ((fp = fopen(fn,r)) == NULL) {
printf(Cant open the input file: %s\n\n, fn);
exit(1);
}
for(i = 0; i < MAXSIZE; i++) fscanf(fp,%d, &data[i]);
}
/* broadcast data */
MPI_Bcast(data, MAXSIZE, MPI_INT, 0, MPI_COMM_WORLD);
/* Add my portion Of data */
x = n/nproc;
low = myid * x;
high = low + x;
for(i = low; i < high; i++)
myresult += data[i];
printf(I got %d from %d\n, myresult, myid);
/* Compute global sum */
MPI_Reduce(&myresult, &result, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);
if (myid == 0) printf(The sum is %d.\n, result);
MPI_Finalize();
}
Figure 2.16

Sample MPI program.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

46

Time

Startup time
Number of data items (n)

Figure 2.17 Theoretical communication


time.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

47

c2g(x) = 6x2
160
f(x) = 4x2 + 2x + 12

140
120
100
80

c1g(x) = 2x2

60
40
20
0
0

3
x0

Figure 2.18 Growth of function f(x) = 4x2 + 2x + 12.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

48

110
100

111

101

3rd step
010

2nd step
1st step

000

011
001

Figure 2.19 Broadcast in a three-dimensional hypercube.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

49

P000
Message

Step 1
P000

P001

Step 2
P000

P010

P001

P011

Step 3
P000

P100

P010

P110

P001

P101

P011

P111

Figure 2.20 Broadcast as a tree construction.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

50

Steps
1

6
Figure 2.21 Broadcast in a mesh.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

51

Message

Source

Destinations

Figure 2.22 Broadcast on an Ethernet


network.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

52

Source

Sequential

N destinations

Figure 2.23 1-to-N fan-out broadcast.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

53

Source

Sequential message issue

Destinations

Figure 2.24 1-to-N fan-out broadcast on a


tree structure.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

54

Process 1
Process 2
Process 3

Time
Computing
Waiting
Message-passing system routine
Message
Figure 2.25 Space-time diagram of a parallel program.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

55

Number of repetitions or time

2
3
4
5
6
7
8
9
Statement number or regions of program

10
Figure 2.26 Program profile.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

56

Input data

Processes

Results

Figure 3.1 Disconnected computational


graph (embarrassingly parallel problem).

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

57

spawn()
send()

Send initial data


recv()
Slaves

Master
send()
recv()
Collect results

Figure 3.2 Practical embarrassingly parallel computational graph with dynamic process
creation and the master-slave approach.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

58

x
Process

80
640

Map

80

480

(a) Square region for each process


Process
10

640
Map

480

(b) Row region for each process


Figure 3.3 Partitioning into regions for individual processes.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

59

+2
Imaginary

2
2

Real

+2

Figure 3.4 Mandelbrot set.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

60

Work pool

(xc, yc)

(xa, ya)

(xb, yb)

(xe, ye)
(xd, yd)

Task
Return results/
request new task

Figure 3.5 Work pool approach.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

61

Rows outstanding in slaves (count)


0

Row sent

disp_height

Increment
Row returned
Terminate

Decrement

Figure 3.6 Counter termination.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

62

Total area = 4

Area =

Figure 3.7 Computing by a Monte Carlo


method.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

63

f(x)

y =
x
1

1 x2
Figure 3.8 Function being integrated in
computing by a Monte Carlo method.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

64

Master

Partial sum
Request
Slaves

Random
number

Random number
process

Figure 3.9 Parallel Monte Carlo


integration.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

65

x1

x2

xk-1

xk

xk+1

xk+2

x2k-1

x2k

Figure 3.10 Parallel computation of a sequence.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

66

x0 x(n/m)1 xn/m x(2n/m)1

x(m1)n/m xn1

+
Partial sums

+
Sum
Figure 4.1

Partitioning a sequence of numbers into parts and adding the parts.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

67

Initial problem

Divide
problem

Final tasks

Figure 4.2

Tree construction.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

68

Original list

P0

P0

P4

P0

P0

P2

P1

P2

P4

P3

P4

P6

P5

x0

P6

P7

xn1
Figure 4.3 Dividing a list into parts.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

69

x0

xn1

P0

P1

P2

P0

P3

P4

P2

P5

P6

P4

P0

P7

P6

P4

P0

Final sum
Figure 4.4

Partial summation.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

70

Found/
Not found
OR

OR

OR
Figure 4.5

Part of a search tree.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

71

Figure 4.6

Quadtree.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

72

Image area

First division
into four parts

Second division

Figure 4.7

Dividing an image.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

73

Unsorted numbers

Buckets
Sort
contents
of buckets
Merge lists
Sorted numbers
Figure 4.8 Bucket sort.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

74

Unsorted numbers

p processors

Buckets
Sort
contents
of buckets
Merge lists
Sorted numbers
Figure 4.9 One parallel version of bucket sort.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

75

n/m numbers

Unsorted numbers

p processors
Small
buckets
Empty
small
buckets
Large
buckets
Sort
contents
of buckets
Merge lists
Sorted numbers
Figure 4.10 Parallel version of bucket sort.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

76

Process n 1

Process 0

Receive
buffer

Send
buffer

Send
buffer

n1

Process 1

n1

Process n 1

n1

Process 0

n1

Process n 2

Figure 4.11 All-to-all broadcast.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

77

All-to-all
P0

A0,0 A0,1 A0,2 A0,3

A0,0 A1,0 A2,0 A3,0

P1

A1,0 A1,1 A1,2 A1,3

A0,1 A1,1 A2,1 A3,1

P2

A2,0 A2,1 A2,2 A2,3

A0,2 A1,2 A2,2 A3,2

P3

A3,0 A3,1 A3,2 A3,3

A0,3 A1,3 A2,3 A3,3

Figure 4.12 Effect of all-to-all on an


array.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

78

f(x)

f(p)

f(q)

Figure 4.13 Numerical integration using


rectangles.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

79

f(x)

f(p)

f(q)

Figure 4.14 More accurate numerical


integration using rectangles.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

80

f(x)

f(p)

f(q)

Figure 4.15 Numerical integration using


the trapezoidal method.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

81

f(x)
C

Figure 4.16 Adaptive quadrature


construction.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

82

f(x)
C=0

Figure 4.17 Adaptive quadrature with false


termination.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

83

Center of mass

Distant cluster of bodies

Figure 4.18 Clustering distant bodies.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

84

Subdivision
direction

Particles

Partial quadtree

Figure 4.19 Recursive division of two-dimensional space.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

85

Figure 4.20 Orthogonal recursive bisection


method.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

86

log n numbers

+
+

+
+

+
+

+
+

Binary Tree

Result
Figure 4.21

Process diagram for Problem 4-12(b).

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

87

y
f(x)

f(a)

b
a

f(b)

Figure 4.22 Bisection method for finding


the zero crossing location of a function.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

88

Figure 4.23 Convex hull (Problem 4-22).

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

89

P0

P1

P2

P3

P4

P5

Figure 5.1 Pipelined processes.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

90

sum

a[0]

a[1]

a[2]

a[3]

a[4]

sin

sout

sin

sout

Figure 5.2

sin

sout

sin

sout

sin

sout

Pipeline for an unfolded loop.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

91

Signal without Signal without Signal without Signal without


frequency f0
frequency f1
frequency f2
frequency f3
f0
f(t)

fin

f1
fout

fin

f2
fout

fin

f3
fout

fin

f4
fout

fin

fout

Filtered signal

Figure 5.3 Pipeline for a frequency filter.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

92

p1

P5
P4
P3
P2
P1
P0

Instance
1
Instance Instance
1
2
Instance Instance Instance
1
2
3
Instance Instance Instance Instance
1
2
3
4

Instance
1
Instance
2
Instance
3
Instance
4
Instance
5

Instance
1
Instance
2
Instance
3
Instance
4
Instance
5
Instance
6

Instance
2
Instance
3
Instance
4
Instance
5
Instance
6
Instance
7

Instance
3
Instance
4
Instance
5
Instance
6
Instance
7

Instance Instance
4
5
Instance Instance
5
6
Instance Instance
6
7
Instance
7

Time
Figure 5.4 Space-time diagram of a pipeline.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

93

Instance 0
Instance 1
Instance 2
Instance 3
Instance 4

P0

P1

P2

P3

P4

P5

P0

P1

P2

P3

P4

P5

P0

P1

P2

P3

P4

P5

P0

P1

P2

P3

P4

P5

P0

P1

P2

P3

P4

P5

Time
Figure 5.5 Alternative space-time diagram.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

94

Input sequence
d9d8d7d6d5d4d3d2d1d0

P0

P1

P2

P3

P4

P5

P6

P7

P8

P9

(a) Pipeline structure


p1

n
d0

d1

d2

d3

d4

d5

d6

d0

d1

d2

d3

d4

d5

d6

d7

d0

d1

d2

d3

d4

d5

d6

d7

d8

d0

d1

d2

d3

d4

d5

d6

d7

d8

d9

d0

d1

d2

d3

d4

d5

d6

d7

d8

d9

d0

d1

d2

d3

d4

d5

d6

d7

d8

d9

d0

d1

d2

d3

d4

d5

d6

d7

d8

d9

d0

d1

d2

d3

d4

d5

d6

d7

d8

d9

d0

d1

d2

d3

d4

d5

d6

d7

d8

d9

d1

d2

d3

d4

d5

d6

d7

d8

d9

P9
P8
P7
P6
P5
P4
P3
P2
P1
P0

d0

Time
(b) Timing diagram
Figure 5.6

Pipeline processing 10 data elements.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

95

P5

P5

P4
Information
transfer
sufficient to
start next
process

P4

P3

P3

P2

P2

P1
P0

P1
Information passed
to next stage

Time
(a) Processes with the same
execution time

P0
Time
(b) Processes not with the
same execution time

Figure 5.7 Pipeline processing where information passes to next stage before end of process.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

96

Processor 0
P0

P1

P2

Processor 1
P3

P4

P5

P6

Processor 2
P7

P8

P9

P10

P11

Figure 5.8 Partitioning processes onto processors.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

97

Multiprocessor

Host
computer

Figure 5.9 Multiprocessor system with a line configuration.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

98

1 i
P0

1 i
P1

P2

Figure 5.10

1 i

1 i
P3

1 i
P4

Pipelined addition.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

99

Master process
dn1 d2d1d0

Slaves
P0

P1

P2

Pn1

Sum
Figure 5.11 Pipelined addition numbers with a master process and ring configuration.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

100

Master process
Numbers
d0

d1

P0

P1

Slaves

P2

dn1

Pn1

Sum
Figure 5.12 Pipelined addition of numbers with direct access to slave processes.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

101

P0
1

4, 3, 1, 2, 5

4, 3, 1, 2

4, 3, 1

4, 3

P1

P2

P3

P4

2
1
2
3
Time
(cycles)

1
2

10

1
2

Figure 5.13 Steps in insertion sort with five numbers.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

102

P0
Series of numbers
xn1 x1x0

Smaller
numbers

P1

P2

Compare
xmax
Largest number

Next largest
number

Figure 5.14 Pipeline for sorting using insertion sort.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

103

Master process
dn1 d2d1d0
Sorted sequence

P0

P1

P2

Pn1

Figure 5.15 Insertion sort with results returned to the master process using a bidirectional line configuration.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

104

Sorting phase

Returning sorted numbers

2n 1

n
Shown for n = 5

P4
P3
P2
P1
P0
Time
Figure 5.16

Insertion sort with results returned.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

105

P0

Not multiples of
1st prime number

P1

P2

2nd prime
number

3rd prime
number

Series of numbers
xn1 x1x0
Compare
multiples

1st prime
number

Figure 5.17 Pipeline for sieve of Eratosthenes.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

106

P0

P1

Compute x0

x0

Compute x1

P2
x0
x1

Compute x2

P3
x0
x1
x2

Compute x3

x0
x1
x2
x3

Figure 5.18 Solving an upper triangular set of linear equation using a pipeline.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

107

P5
P4
Processes

P3

Final computed value

P2
P1
P0

First value passed onward


Time

Figure 5.19 Pipeline processing using back


substitution.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

108

P0
divide
send(x0)
end

Time

P1
recv(x0)
send(x0)
multiply/add
divide/subtract
send(x1)
end

P2

recv(x0)
send(x0)
multiply/add
recv(x1)
send(x1)
multiply/add
divide/subtract
send(x2)
end

P3

recv(x0)
send(x0)
multiply/add
recv(x1)
send(x1)
multiply/add
recv(x2)
send(x2)
multiply/add
divide/subtract
send(x3)
end

P4

recv(x0)
send(x1)
multiply/add
recv(x1)
send(x1)
multiply/add
recv(x2)
send(x2)
multiply/add
recv(x3)
send(x3)
multiply/add
divide/subtract
send(x4)
end

Figure 5.20 Operations in back substitution pipeline.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

109

y4y3y2y1

x1

x2

x3

x4

yin

yout

yin

yout

yin

yout

yin

yout

a1

a2

a3

a4

Figure 5.21

Output

Pipeline for Problem 5-9.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

110

Display

Display

Audio input
(digitized)
Pipeline
Audio input
(digitized)

(a) Pipeline solution

(b) Direct decomposition

Figure 5.22 Audio histogram display.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

111

Processes
P0

P1

P2

Pn1

Active

Time

Waiting

Barrier

Figure 6.1 Processes reaching the barrier at


different times.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

112

Processes
P0

P1

Pn1

Barrier();
Barrier();
Processes wait until
all reach their
barrier call

Barrier();

Figure 6.2 Library call barriers.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

113

Processes
P0

P1

Pn1

Counter, C

Increment
and check for n

Barrier();
Barrier();
Barrier();

Figure 6.3 Barrier using a centralized counter.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

114

Slave processes

Master

Arrival
phase
Departure
phase

for(i=0;i<n;i++)
recv(Pany);
for(i=0;i<n;i++)
send(Pi);

Barrier:
send(Pmaster);
recv(Pmaster);
Barrier:
send(Pmaster);
recv(Pmaster);

Figure 6.4 Barrier implementation in a message-passing system.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

115

P0

P1

P2

P3

Arrival
at barrier

P4

P5

P6

P7

Sychronizing
message

Departure
from barrier

Figure 6.5 Tree barrier.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

116

P0

P1

P2

P3

P4

P5

P6

P7

1st stage

Time

2nd stage

3rd stage

Figure 6.6 Butterfly construction.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

117

Instruction
a[] = a[] + k;

Processors

a[0]=a[0]+k;

a[1]=a[1]+k;

a[n-1]=a[n-1]+k;

a[0]

a[1]

a[n-1]

Figure 6.7 Data parallel computation.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

118

Numbers

x0

x1

x2

x3

x4

x5

x6

x7

x8

x9

x10

x11

x12

x13

x14

x15

10

11

12

13

14

15

i=0

i=0

i=1

i=2

i=3

i=4

i=5

i=6

i=7

i=8

10

11

12

i=0

i=0

i=0

i=0

i=1

i=2

i=3

i=4

i=5

i=6

i=7

i=8

i=9 i=10 i=11 i=12

10

11

12

13

14

15

i=0

i=0

i=0

i=0

i=0

i=0

i=0

i=0

i=1

i=2

i=3

i=4

i=5

i=6

i=7

i=8

10

11

12

13

14

15

i=0

i=0

i=0

i=0

i=0

i=0

i=0

i=0

i=0

i=0

i=0

i=0

i=0

i=0

i=0

Add
Step 1
(j = 0)

i=9 i=10 i=11 i=12 i=13 i=14

Add
Step 2
(j = 1)

13

14

15

Add
Step 3
(j = 2)

Add
Final step
i=0
(j = 3)


Figure 6.8 Data parallel prefix sum operation.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

119

Computed
value

Error
Exact value

t+1

Iteration

Figure 6.9 Convergence rate.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

120

Process 0
Send
buffer

data
x0

Process 1
data
x1

Process n 1
data
xn1

Receive
buffer
Allgather();

Allgather();

Allgather();

Figure 6.10 Allgather operation.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

121

2 106
Execution
time
( = 1)
1 106
Overall
Communication
Computation

0
0

12

16

20

24

28

32

Number of processors, p
Figure 6.11 Effects of computation and communication in Jacobi iteration.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

122

Metal plate

Enlarged
hi1,j
hi,j
hi,j1

hi,j+1
hi+1,j

Figure 6.12 Heat distribution problem.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

123

x1

x2

xk1

xk+1 xk+2

xk

x2k1 x2k

xik
xi1

xi+1
xi
xi+k
xk2

Figure 6.13 Natural ordering of heat


distribution problem.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

124

row

send(g, Pi-1,j);
send(g, Pi+1,j);
send(g, Pi,j-1);
send(g, Pi,j+1);
recv(w, Pi-1,j)
recv(x, Pi+1,j);
recv(y, Pi,j-1);
recv(z, Pi,j+1);

send(g, Pi-1,j);
send(g, Pi+1,j);
send(g, Pi,j-1);
send(g, Pi,j+1);
recv(w, Pi-1,j)
recv(x, Pi+1,j);
recv(y, Pi,j-1);
recv(z, Pi,j+1);

send(g, Pi-1,j);
send(g, Pi+1,j);
send(g, Pi,j-1);
send(g, Pi,j+1);
recv(w, Pi-1,j)
recv(x, Pi+1,j);
recv(y, Pi,j-1);
recv(z, Pi,j+1);

column
i

send(g, Pi-1,j);
send(g, Pi+1,j);
send(g, Pi,j-1);
send(g, Pi,j+1);
recv(w, Pi-1,j)
recv(x, Pi+1,j);
recv(y, Pi,j-1);
recv(z, Pi,j+1);

send(g, Pi-1,j);
send(g, Pi+1,j);
send(g, Pi,j-1);
send(g, Pi,j+1);
recv(w, Pi-1,j)
recv(x, Pi+1,j);
recv(y, Pi,j-1);
recv(z, Pi,j+1);

Figure 6.14

Message passing for heat distribution problem.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

125

P0 P1
P0

Pp1

P1

Pp1
Blocks

Strips (columns)

Figure 6.15 Partitioning heat distribution problem.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

126

n
--p

Square blocks

Strips
Figure 6.16 Communication consequences of partitioning.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

127

2000
Strip partition best

tstartup

1000

Block partition best

0
1

10

100

1000
Processors, p

Figure 6.17 Startup times for block and


strip partitions.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

128

Process i
Array held
by process i
One row
of points
Ghost points
Copy
Array held
by process i+1
Process i+1
Figure 6.18

Configurating array into contiguous rows for each process, with ghost points.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

129

20C

4ft
100C

10ft

10ft
Figure 6.19 Room for Problem 6-14.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

130

vehicle

Figure 6.20 Road junction for


Problem 6-16.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

131

Airflow

Actual dimensions
selected at will
Figure 6.21 Figure for Problem 6-23.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

132

P5
P4
P
Processors 3
P2
P1
P0
Time
(a) Imperfect load balancing leading
to increased execution time

P5
P4
P
Processors 3
P2
P1
P0

(b) Perfect load balancing

Figure 7.1 Load balancing.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

133

Work pool
Queue
Master
process

Tasks

Send task
Request task
(and possibly
submit new tasks)
Slave worker processes
Figure 7.2 Centralized work pool.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

134

Initial tasks
Master, Pmaster

Process M0

Process Mn1

Slaves

Figure 7.3 A distributed work pool.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

135

Process
Process
Requests/tasks

Process

Process
Figure 7.4 Decentralized work pool.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

136

Slave Pi

Requests

Local
selection
algorithm

Requests

Slave Pj

Local
selection
algorithm

Figure 7.5 Decentralized selection algorithm requesting tasks between slaves.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

137

Master
process
P0

P1

Figure 7.6

P2

P3

Pn1

Load balancing using a pipeline structure.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

138

Pcomm
If buffer empty,
make request

Request for task

Receive task
from request

If buffer full,
send task

If free,
request
task

Receive
task from
request
Ptask

Figure 7.7 Using a communication process in line load balancing.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

139

P0

Task
when
requested
P1

P3

P2

P5

P4

P6

Figure 7.8 Load balancing using a tree.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

140

Parent
Process
Inactive

Final
acknowledgment
First task
Acknowledgment
Task
Other processes

Active

Figure 7.9 Termination using message


acknowledgments.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

141

Token passed to next processor


when reached local termination condition

P0

P1

Figure 7.10

P2

Pn1

Ring termination detection algorithm.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

142

Token
AND

Terminated

Figure 7.11 Process algorithm for local


termination.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

143

Task

P0

Pj

Figure 7.12

Pi

Pn1

Passing task to previous processes.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

144

AND

Terminated

AND

AND

Terminated

Terminated
Figure 7.13 Tree termination.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

145

Summit
F

E
D

C
A
Base camp

Possible intermediate camps


Figure 7.14

Climbing a mountain.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

146

17
E
9

51

24

D
13

10
A

14
8

Figure 7.15 Graph of mountain climb.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

147

Destination
C
D
E

10

13

24

51

14

17

Source

(a) Adjacency matrix


Weight NULL
A

B 10

C 8

D 14

E 9

F 17

D 13

E 24

F 51

Source

F
(b) Adjacency list
Figure 7.16 Representing a graph.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

148

Vertex j
di

Vertex i

wi,j

dj

Figure 7.17 Moores shortest-path algorithm.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

149

Master process

Start at
source
vertex
Vertex

Vertex w[]
w[]

New
distance
dist
dist

Process A

Vertex w[]

New
distance

Process C
Other processes

dist

Process B
Figure 7.18

Distributed graph search.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

150

Entrance

Search path

Exit

Figure 7.19 Sample maze for Problem 7-9.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

151

Gold
Entrance

Figure 7.20 Plan of rooms for Problem 7-10.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

152

Room B

Door

Room A

Figure 7.21 Graph representation for


Problem 7-10.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

153

Bus
Cache

Processors

Memory modules

Figure 8.1 Shared memory multiprocessor


using a single bus.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

154

TABLE 8.1
Language

SOME EARLY PARALLEL PROGRAMMING LANGUAGES


Originator/date

Comments

Concurrent Pascal

Brinch Hansen, 1975a

Extension to Pascal

Ada

U.S. Dept. of Defense, 1979b

Completely new language

Modula-P

Brunl, 1986c

Extension to Modula 2

C*

Thinking Machines, 1987d

Extension to C for SIMD systems

Concurrent C

Gehani and Roome, 1989e

Extension to C

Fortran D

Fox et al., 1990f

Extension to Fortran for data parallel programming

a. Brinch Hansen, P. (1975), The Programming Language Concurrent Pascal, IEEE Trans. Software Eng.,
Vol. 1, No. 2 (June), pp. 199207.
b. U.S. Department of Defense (1981), The Programming Language Ada Reference Manual, Lecture
Notes in Computer Science, No. 106, Springer-Verlag, Berlin.
c. Brunl, T., R. Norz (1992), Modula-P User Manual, Computer Science Report, No. 5/92 (August), Univ.
Stuttgart, Germany.
d. Thinking Machines Corp. (1990), C* Programming Guide, Version 6, Thinking Machines System Documentation.
e. Gehani, N., and W. D. Roome (1989), The Concurrent C Programming Language, Silicon Press, New
Jersey.
f. Fox, G., S. Hiranandani, K. Kennedy, C. Koelbel, U. Kremer, C. Tseng, and M. Wu (1990), Fortran D
Language Specification, Technical Report TR90-141, Dept. of Computer Science, Rice University.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

155

Main program

FORK
Spawned processes
FORK

FORK

JOIN
JOIN

JOIN
JOIN

Figure 8.2

FORK-JOIN construct.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

156

Code

Heap

IP
Stack
Interrupt routines
Files

(a) Process

Code
Stack

Heap

Thread
IP
Interrupt routines

Stack

Thread
IP

Files

(b) Threads

Figure 8.3 Differences between a process


and threads.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

157

Main program
thread1

pthread_create(&thread1, NULL, proc1, &arg);

proc1(&arg)
{

return(*status);
}
pthread_join(thread1, *status);

Figure 8.4 pthread_create() and pthread_join().

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

158

Main program

Thread

pthread_create();

pthread_create();
Thread
pthread_create();

Thread

Termination

Termination
Termination

Figure 8.5 Detached threads.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

159

Shared variable, x

Write

Write
Read Read

+1

+1

Process 1

Process 2

Figure 8.6 Conflict in accessing shared


variable.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

160

Process 1
while (lock == 1) do_nothing;
lock = 1;

Process 2
while (lock == 1)do_nothing;

Critical section

lock = 0;
lock = 1;

Critical section
lock = 0;
Figure 8.7 Control of critical sections through busy waiting.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

161

R1

R2

Resource

P1

P2

Process

(a) Two-process deadlock

R1

R2

Rn 1

Rn

P1

P2

Pn 1

Pn

(b) n-process deadlock

Figure 8.8 Deadlock (deadly embrace).

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

162

Main memory

Block

7
6
5
4
3
2
1
0

Address
tag

Cache

Cache
Block in cache
Processor 1

Processor 2

Figure 8.9 False sharing in caches.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

163

sum
Array a[]

addr
Figure 8.10 Shared memory locations for Section 8.4.1 program example.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

164

global_index sum
Array a[]

addr
Figure 8.11 Shared memory locations for Section 8.4.2 program example.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

165

TABLE 8.2 LOGIC CIRCUIT DESCRIPTION FOR FIGURE 8.12

Test1
Test2

Test3

Gate

Function

Input 1

Input 2

Output

AND

Test1

Test2

Gate1

NOT

Gate1

OR

Test3

Output1

Output2

Output1
Gate1

Output2

Figure 8.12 Sample logic circuit.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

166

Log

Movement
of logs

River

Frog
Figure 8.13 River and frog for Problem 8-23.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

167

Pool of threads
Request

Request
serviced

Slaves

Master Signal
Figure 8.14 Thread pool for Problem 8-24.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

168

a[i] a[0]

a[i] a[n-1]

Compare

Increment
counter, x

b[x] = a[i]

Figure 9.1 Finding the rank in parallel.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

169

a[i] a[0] a[i] a[1]

a[i] a[2] a[i] a[3]

Compare
0/1

0/1

0/1

0/1

Add

Add

0/1/2

0/1/2

Tree
Add
0/1/2/3/4

Figure 9.2 Parallelizing the rank computation.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

170

Master
a[]

b[]

Read
numbers

Place selected
number
Slaves

Figure 9.3 Rank sort using a master and


slaves.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

171

Sequence of steps
P1
A

P2

1
Send(A)

If A > B send(B)
else send(A)

If A > B load A
else load B

2
Compare

Figure 9.4 Compare and exchange on a message-passing system Version 1.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

172

P1
A

P2

1
Send(A)

B
Send(B)
2

If A > B load B
3

If A > B load A
Compare

Compare

Figure 9.5 Compare and exchange on a message-passing system Version 2.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

173

P2

P1

Merge
88
Original 50
numbers 28
25

88
50
28
25

43
42
Final
numbers 28
25

98
80
43
42

98
88
80
50
43
42
28
25

Keep
higher
numbers
Return
lower
numbers

Figure 9.6 Merging two sublists Version 1.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

174

P1

P2
Original
numbers

Merge

Keep
lower
numbers
(final
numbers)

98
88
80
50
43
42
28
25

Merge

98
80
43
42

98
80
43
42

88
50
28
25

88
50
28
25

Original
numbers

98
88
80
50
43
42
28
25

Keep
higher
numbers
(final
numbers)

Figure 9.7 Merging two sublists Version 2.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

175

Original
sequence:

Phase 1
Place
largest
number

Phase 2
Place
next
largest
number

Phase 3

Time
Figure 9.8 Steps in bubble sort.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

176

Phase 1
1

1
Phase 2
1

Time

Phase 3
3

Phase 4
4

Figure 9.9 Overlapping bubble sort actions in a pipeline.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

177

P0

P1

P2

P3

P4

P5

P6

P7

Step

Time

Figure 9.10

Odd-even transposition sort sorting eight numbers.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

178

Smallest
number

Largest
number

Figure 9.11 Snakelike sorted list.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

179

14

14

10

13

16

16

13

10

15

15

12

11

14

12

11

12

11

16

13

10

15

(a) Original placement


of numbers

(b) Phase 1 Row sort

(c) Phase 2 Column sort

11

12

14

11

12

10

10

11

12

16

15

13

10

16

15

13

14

16

15

14

13

(d) Phase 3 Row sort

(e) Phase 4 Column sort

(f) Final phase Row sort

Figure 9.12 Shearsort.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

180

(a) Operations between elements


in rows
Figure 9.13

(b) Transpose operation

(c) Operations between elements


in rows (originally columns)

Using the transpose operation to maintain operations in rows.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

181

Unsorted list
4

P0

P0

Divide
list
4

P4

P0

P0

P2

P1

P2

P0

P4

P3

P4

P2

P6

P5

P6

P4

P7

P6

Merge
2

Sorted list

P0

P4

P0
Process allocation

Figure 9.14 Mergesort using tree allocation of processes.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

182

Unsorted list

Pivot
4

Sorted list

P0

P0

P0

P0

P4

P2

P6

P4

P6

P1

P7

Process allocation

Figure 9.15 Quicksort using tree allocation of processes.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

183

Unsorted list

Pivot
4

6
Sorted list

Pivots

Figure 9.16 Quicksort showing pivot withheld in processes.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

184

Work pool

Sublists
Request
sublist

Return
sublist
Slave processes

Figure 9.17 Work pool implementation of


quicksort.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

185

(a) Phase 1

000

001

010

011

100

101

p1

(b) Phase 2

000

001

Figure 9.18

111

110

111

> p1

010

p2

(c) Phase 3

110

011

100

> p2

101
p3

> p3

000

001

010

011

100

101

110

111

p4

> p4

p5

> p5

p6

> p6

p7

> p7

Hypercube quicksort algorithm when the numbers are originally in node 000.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

186

Broadcast pivot, p1

(a) Phase 1

000

001

010

011

100

101

p1

(c) Phase 3

000

001

111

110

111

> p1

Broadcast pivot, p2

(b) Phase 2

110

Broadcast pivot, p3

010

011

100

101

p2

> p2

p3

> p3

Broadcast
pivot, p4

Broadcast
pivot, p5

Broadcast
pivot, p6

Broadcast
pivot, p7

000

001

010

011

100

101

110

111

p4

> p4

p5

> p5

p6

> p6

p7

> p7

Figure 9.19 Hypercube quicksort algorithm when numbers are distributed among nodes.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

187

110

111

(a) Phase 1 communication 010

011
100

000

101

001
110

111

(b) Phase 2 communication 010

011
100

000

101

001
110

111

(c) Phase 3 communication 010

011
100

101
Figure 9.20 Hypercube quicksort
communication.

000

001

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

188

Broadcast pivot, p1

(a) Phase 1

000

001

011

010

110

111

p1

(c) Phase 3

000

001

100

101

100

> p1

Broadcast pivot, p2

(b) Phase 2

101

Broadcast pivot, p3

011

010

110

111

p2

> p2

p3

> p3

Broadcast
pivot, p4

Broadcast
pivot, p5

Broadcast
pivot, p6

Broadcast
pivot, p7

000

001

011

010

110

111

101

100

p4

> p4

p5

> p5

p6

> p6

p7

> p7

Figure 9.21

Quicksort hypercube algorithm with Gray code ordering.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

189

a[]

b[]

2 4 5 8

Sorted lists
Even indices
Odd indices
c[]

1 3 6 7
Merge
Merge

1 2 5 6

d[] 3 4 7 8

Compare and exchange

Final sorted list

e[]

Figure 9.22 Odd-even merging of two


sorted lists.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

190

Compare and
exchange
c2n
c2n1
c2n2

bn
bn1
Even
mergesort
b4
b3
b2
b1
an
an1
Odd
mergesort
a4
a3
a2
a1

c7
c6
c5
c4
c3
c2
c1

Figure 9.23 Odd-even mergesort.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

191

Value

a0, a1, a2, a3,

an2, an1

(a) Single maximum

a0, a1, a2, a3,

an2, an1

(b) Single maximum and single minimum

Figure 9.24 Bitonic sequences.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

192

Bitonic sequence
3

Compare and
exchange

Bitonic sequence

Bitonic sequence

Figure 9.25 Creating two bitonic


sequences from one bitonic sequence.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

193

Unsorted numbers
8
9
7
4

Compare and
exchange

4
5
Sorted list

Figure 9.26 Sorting a bitonic sequence.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

194

Unsorted numbers

Bitonic
sorting
operation

Direction
of increasing
numbers
Sorted list
Figure 9.27

Bitonic mergesort.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

195

Compare and exchange


ai with ai+n/2 (n numbers)
8

= bitonic list
[Fig. 9.24 (a) or (b)]

Step

1
Form
bitonic lists
of four
numbers

n=2
1

ai with ai+2
Split

ai with ai+4
Split

n=4
2

ai with ai+1
Sort

n=8

Higher

Split

Lower

ai with ai+2

n=2
3

Compare and
exchange

n=4
1

Sort bitonic list

ai with ai+1

Form
bitonic list
of eight
numbers

n=2

ai with ai+1

9
Sort

Figure 9.28 Bitonic mergesort on eight numbers.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

196

Step 1

88
50
28
25

98
80
43
42

Step 2

50
42
28
25

98
88
80
43

Step 3

43
42
28
25

98
88
80
50

Terminates when insertions at top/bottom of lists

Figure 9.29 Compare-and-exchange


algorithm for Problem 9-5.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

197

Column
a0,0

a0,1

a0,m2

a0,m1

a1,0

a1,1

a1,m2

a1,m1

an2,0

an2,1

an2,m-2 an2,m1

an1,0

an1,1

an1,m2 an1,m1

Row

Figure 10.1 An n m matrix.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

198

Column
Multiply

Sum
results

Row
i

ci,j
A

Figure 10.2 Matrix multiplication, C = A B.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

199

Row
sum
i

ci
Figure 10.3 Matrix-vector multiplication
c = A b.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

200

Multiply

Sum
results

Figure 10.4 Block matrix multiplication.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

201

a0,0

a0,1

a0,2

a0,3

b0,0

b0,1

b0,2

b0,3

a1,0

a1,1

a1,2

a1,3

b1,0

b1,1

b1,2

b1,3

a2,0

a2,1

a2,2

a2,3

b2,0

b2,1

b2,2

b2,3

a3,0

a3,1

a3,2

a3,3

b3,0

b3,1

b3,2

b3,3

(a) Matrices
A0,0
a0,0

a0,1

a1,0

a1,1

B0,0
b0,0

b0,1

b1,0

b1,1

A0,1

a0,2

a0,3

a1,2

a1,3

b2,0

b2,1

b3,0

b3,1

a0,0b0,0 + a0,1b1,0 a0,0b0,1 + a0,1b1,1


=

B1,0

a0,2b2,0 + a0,3b3,0

a0,2b2,1 + a0,3b3,1

a1,2b2,0 + a1,3b3,0

a1,2b2,1 + a1,3b3,1

a1,0b0,0 + a1,1b1,0 a1,0b0,1 + a1,1b1,1

a0,0b0,0 + a0,1b1,0 + a0,2b2,0 + a0,3b3,0

a0,0b0,1 + a0,1b1,1 + a0,2b2,1 + a0,3b3,1

a1,0b0,0 + a1,1b1,0 + a1,2b2,0 + a1,3b3,0

a1,0b0,1 + a1,1b1,1 + a1,2b2,1 + a1,3b3,1

= C0,0
(b) Multiplying A0,0 B0,0 to obtain C0,0
Figure 10.5 Submatrix multiplication.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

202

Column j

Row i

b[][j]

a[i][]

Processor Pi,j

c[i][j]

Figure 10.6 Direct implementation of


matrix multiplication.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

203

a0,0 b0,0 a0,1 b1,0 a0,2 b2,0 a0,3 b3,0

P0

P1

P2

P3

P0

P2

P0

+
c0,0

Figure 10.7 Accumulation using a tree


construction.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

204

j
P0 P1 P2 P3

i
App

Apq

Bpp

Bpq

P0 + P1
Cpp

Aqp

Aqq

Bqp

Bqq

P4 + P5
Cqp

P2 + P3
Cpq
P6 + P7
Cqq

P4 P5 P6 P7
Figure 10.8 Submatrix multiplication and summation.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

205

i
A

Pi,j
B

Figure 10.9 Movement of A and B


elements.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

206

j
B
i
i places
A
j places

ai,j+i
bi+j,j

Figure 10.10 Step 2 Alignment of


elements of A and B.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

207

j
B
i
A
Pi,j
Figure 10.11 Step 4 One-place shift of
elements of A and B.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

208

Pumping
action

a0,3 a0,2 a0,1 a0,0

b3,0
b2,0
b1,0
b0,0

b3,1
b2,1
b1,1
b0,1

b3,2
b2,2
b1,2
b0,2

b3,3
b2,3
b1,3
b0,3

c0,0

c0,1

c0,2

c0,3

c1,0

c1,1

c1,2

c1,3

c2,0

c2,1

c2,2

c2,3

c3,0

c3,1

c3,2

c3,3

One cycle delay


a1,3 a1,2 a1,1 a1,0

a2,3 a2,2 a2,1 a2,0

a3,3 a3,2 a3,1 a3,0

Figure 10.12 Matrix multiplication using a systolic array.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

209

Pumping
action

a0,3 a0,2 a0,1 a0,0

a1,3 a1,2 a1,1 a1,0

a2,3 a2,2 a2,1 a2,0

a3,3 a3,2 a3,1 a3,0

b3
b2
b1
b0

c0

c1

c2

c3

Figure 10.13 Matrix-vector multiplication


using a systolic array.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

210

Column

Row

Row i
aji

Step through

Row j

Already
cleared
to zero

Cleared
to zero
Column i
Figure 10.14 Gaussian elimination.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

211

Column

Row

n i +1 elements
(including b[i])

Row i

Broadcast
ith row
Already
cleared
to zero
Figure 10.15 Broadcast in parallel implementation of Gaussian elimination.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

212

P0

P1

P2

Pn1

Row

Broadcast
rows

Figure 10.16 Pipeline implementation of


Gaussian elimination.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

213

Row
0

P0
n/p

P1
2n/p

P2
3n/p

P3
Figure 10.17 Strip partitioning.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

214

Row
0

n/p

P0
2n/p

P1

3n/p

Figure 10.18 Cyclic partitioning to


equalize workload.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

215

Solution space

f(x, y)
y
x

Figure 10.19 Finite difference method.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

216

Boundary points (see text)


x1

x2

x3

x4

x5

x6

x7

x8

x9

x10

x11

x12

x13

x14

x15

x16

x17

x18

x19

x20

x21

x22

x23

x24

x25

x26

x27

x28

x29

x30

x31

x32

x33

x34

x35

x36

x37

x38

x39

x40

x41

x42

x43

x44

x45

x46

x47

x48

x49

x50

x51

x52

x53

x54

x55

x56

x57

x58

x59

x60

x61

x62

x63

x64

x65

x66

x67

x68

x69

x70

x71

x72

x73

x74

x75

x76

x77

x78

x79

x80

x81

x82 x83

x85

x86

x87

x88

x89

x90

x91

x92

x95

x96

x97

x98

x99 x100

Figure 10.20

x84

x93

x94

Mesh of points numbered in natural order.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

217

Those equations with a boundary


point on diagonal unnecessary
for solution

1
1
ith equation

To include
boundary values
and some zero
entries (see text)

1 4 1
1
1 4 1
1
1
1
1 4 1
ai,in ai,i1 ai,i ai,i+1 ai,i+n
1
1

1 4 1
1 4 1

x1
x2

0
0

1
1

xN-1
xN
x

0
0

Figure 10.21 Sparse matrix for Laplaces equation.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

218

Sequential order of computation

Point
computed
Point to be
computed

Figure 10.22 Gauss-Seidel relaxation with natural order, computed sequentially.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

219

Red
Black

Figure 10.23 Red-black ordering.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

220

Figure 10.24 Nine-point stencil.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

221

Coarsest grid points

Finer grid points


Processor

Figure 10.25 Multigrid processor


allocation.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

222

50C
40C

60C

Ambient temperature at edges of board = 20C


Figure 10.26 Printed circuit board for Problem 10-18.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

223

Origin (0, 0)

i
Picture element
(pixel)

p(i, j)

Figure 11.1 Pixmap.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

224

Number
of pixels

Gray level

255

Figure 11.2 Image histogram.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

225

x0

x1

x2

x3

x4

x5

x6

x7

x8

Figure 11.3 Pixel values for a 3 3 group.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

226

Step 1
Each pixel adds
pixel from left

Step 2
Each pixel adds
pixel from right

Step 3
Each pixel adds pixel
from above

Step 4
Each pixel adds pixel
from below

Figure 11.4 Four-step data transfer for the computation of mean.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

227

x0

x1

x2

x0

x0 + x1
x3

x4

x0

x7

x5

x3

x8

x6

(a) Step 1

(b) Step 2
x2

x0

x7

x1

x8

x2

x0 + x1 + x2
x5

x3

x4
x0 + x1 + x2
x3 + x4 + x5
x6 + x7 + x8

x5

x8

x6

x7

x8

x0 + x1 + x2
x3 + x4 + x5
x6

x7
x6 + x7 + x8

x4

x5

x3 + x4 + x5

x0 + x1 + x2
x3

x4

x6 + x7

x1

x2

x0 + x1 + x2

x3 + x4
x6

x1

x6 + x7 + x8

x6 + x7 + x8

(c) Step 3

(d) Step 4

Figure 11.5 Parallel mean data accumulation.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

228

Largest
in row

Next largest
in row

Next largest
in column

Figure 11.6 Approximate median algorithm requiring six steps.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

229

Mask

Pixels

w0

w1

w2

w3

w4

w5

w6

w7

w8

Result

x0

x1

x2

x3

x4

x5

x6

x7

x8

x4'
Figure 11.7 Using a 3 3 weighted mask.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

230

k=

1
9

Figure 11.8

Mask to compute mean.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

231

k=

1
16

Figure 11.9 A noise reduction mask.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

232

1
k=
9

Figure 11.10 High-pass sharpening filter


mask.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

233

Intensity transition

First derivative

Second derivative

Figure 11.11 Edge detection using


differentiation.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

234

Image

y
Constant
intensity
f(x, y)

Gradient

Figure 11.12 Gray level gradient and


direction.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

235

Figure 11.13 Prewitt operator.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

236

Figure 11.14 Sobel operator.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

237

(a) Original image (Annabel)

(b) Effect of Sobel operator

Figure 11.15 Edge detection with Sobel operator.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

238

Figure 11.16 Laplace operator.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

239

Upper pixel
x1
x3
Left pixel

x4

x5
Right pixel

x7
Lower pixel

Figure 11.17 Pixels used in Laplace


operator.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

240

Figure 11.18 Effect of Laplace operator.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

241

b = x1a + y1

y = ax + b
b = xa + y

(x1, y1)

(a, b)
Pixel in image

x
(a) (x, y) plane

a
(b) Parameter space

Figure 11.19 Mapping a line into (a, b) space.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

242

y = ax + b

r = x cos + y sin
(r, )

x
(a) (x, y) plane

(b) (r, ) plane

Figure 11.20 Mapping a line into (r, ) space.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

243

Figure 11.21 Normal representation using


image coordinate system.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

244

Accumulator

15
10
5
0

0102030

Figure 11.22 Accumulators, acc[r][], for


the Hough transform.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

245

Transform
rows

Transform
columns

xjk

Xjm

Xlm

Figure 11.23 Two-dimensional DFT.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

246

Transform

Image
fj,k

Convolution

f(j, k)

F(j, k)

hj,k

gj,k

g(j, k)

Inverse
transform

Multiply

H(j, k)

h(j, k)

G(j, k)

Filter/image
(a) Direct convolution

(b) Using Fourier transform

Figure 11.24 Convolution using Fourier transforms.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

247

Master process

w0

w1

wn1
Slave processes

X[0]

X[1]

X[n1]

Figure 11.25 Master-slave approach for


implementing the DFT directly.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

248

x[j]
Process j
X[k]

a
wk

Values for
next iteration
X[k]

a x[j]

a
wk

Figure 11.26 One stage of a pipeline


implementation of DFT algorithm.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

249

x[0]

x[1]

x[2]

x[3]

x[N1]
Output sequence

0
1

X[k]
a

wk

wk

X[0],X[1],X[2],X[3]

P0

P1

P2

P3

PN1

(a) Pipeline structure


X[0] X[1] X[2] X[3] X[4] X[5] X[6]

PN1
PN2

Pipeline
stages

P2
P1
P0
Time
(b) Timing diagram
Figure 11.27 Discrete Fourier transform with a pipeline.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

250

Input sequence
x0
x1
x2
x3

xN2
xN1

Transform

N/2 pt
DFT

N/2 pt
DFT

Xeven

Xodd

wk

Xk

Xk+N/2

k = 0, 1, N/2

Figure 11.28 Decomposition of N-point DFT into two N/2-point DFTs.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

251

x0

X0

x1

X1

x2

X2

x3

X3

Figure 11.29 Four-point discrete Fourier


transform.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

252

Xk = (0,2,4,6,8,10,12,14)+wk(1,3,5,7,9,11,13,15)

{(0,4,8,12)+wk(2,6,10,14)}+wk{(1,5,9,13)+wk(3,7,11,15)}

{[(0,8)+wk(4,12)]+wk[(2,10)+wk(6,14)]}+{[(1,9)+wk(5,13)]+wk[(3,11)+wk(7,15)]}
x0
x8
x4
x12
x2
x10
x6
x14
x1
x9
x5
x13
x3
x11
x7
x15
0000 1000 0100 1100 0010 1010 0110 1011 0001 1001 0101 1101 0011 1011 0111 1111
Figure 11.30 Sixteen-point DFT decomposition.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

253

x0

X0

x1

X1

x2

X2

x3

X3

x4

X4

x5

X5

x6

X6

x7

X7

x8

X8

x9

X9

x10

X10

x11

X11

x12

X12

x13

X13

x14

X14

x15

X15
Figure 11.31 Sixteen-point FFT computational flow.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

254

Process
Row
Inputs
P/r
0000 x0
P0

P1

P2

P3

Outputs
X0

0001 x1

X1

0010 x2

X2

0011 x3

X3

0100 x4

X4

0101 x5

X5

0110 x6

X6

0111 x7

X7

1000 x8

X8

1001 x9

X9

1010 x10

X10

1011 x11

X11

1100 x12

X12

1101 x13

X13

1110 x14

X14

1111 x15

X15
Figure 11.32 Mapping processors onto 16-point FFT computation.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

255

P0

P1

P2

P3

x0

x1

x2

x3

x4

x5

x6

x7

x8

x9

x10

x11

x12

x13

x14

x15

Figure 11.33 FFT using transpose


algorithm first two steps.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

256

P0

P1

P2

P3

x0

x1

x2

x3

x4

x5

x6

x7

x8

x9

x10

x11

x12

x13

x14

x15

Figure 11.34 Transposing array for


transpose algorithm.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

257

P0

P1

P2

P3

x0

x4

x8

x12

x1

x5

x9

x13

x2

x6

x10

x14

x3

x7

x11

x15

Figure 11.35 FFT using transpose


algorithm last two steps.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

258

7
6
5
4
3
2

Mask

Figure 11.36 Image for Problem 11-3.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

259

C0

First choice

Second choice

Not
including
C0

C1

Not
including
C1

Cn1

Not
including
Cn1

Third choice

Figure 12.1 State space tree.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

260

1
Parent A

p p+1
A1

1
Parent B

p p+1

m
B2

p p+1
A1

1
Child 2

A2

B1

Child 1

m
B2

p p+1
B1

m
A2

Figure 12.2 Single-point crossover.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

261

Subpopulation
Migration path;
every island sends
to every other island

Figure 12.3 Island model.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

262

Island subpopulations

Limited migration path

Figure 12.4 Stepping stone model

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

263

Program
Instructions
Clock
Processors
with local
memory
Data
Shared memory
Figure D.1 PRAM model.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

264

d[0] s[0]
1

d[1] s[1]
1

d[2] s[2]

d[3] s[3]

d[4] s[4]
1

d[5] s[5]
1

d[6] s[6]
1

d[7] s[7]
0
Null

Figure D.2

List ranking by pointer jumping.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

265

Threads or processes

Local computation
(maximum time w)

Maximum of h
sends or receives
Communication
Barrier synchronization
Figure D.3

A view of the bulk synchronous parallel model.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

266

Pi
Next message
Processors

Message

Pk
Pi

Time
Figure D.4 LogP parameters.

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

267

Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998

268