Anda di halaman 1dari 36

May 4, 2005

Programming Multicores
with
Pthreads and OpenMP
Nikos P. Pitsianis
nikos@cs.duke.edu
Xiaobai Sun
Bo Zhang

Duke University

Outline
Programming with Threads
Embarrassingly Parallel (Pleasantly Parallel)
Critical Sections (Mutual Exclusion)
Data Dependent Task Parallelism (Condition Variables & Signals)

Quick Introduction to OpenMP Programming

Multicore Programming Workshop

Sep 29, 2010

Duke University

What is a thread?
Process:
a program that is running
an address space with 1 or more threads executing within the same
address space, and the required system resources for those threads

Thread:
a sequence of control within a process
shares the resources in that process

We cover here Posix Threads (Pthreads)


widely supported threads programming API

Compile with gcc -pthread

This also forces the compiler to link in thread-safe libraries

Multicore Programming Workshop

Sep 29, 2010

Duke University

Process
Process: a program in running
a single address space
one or more threads executing
within that address space

required system resources for


those threads

Each process can have multiple


threads, even on a single-core
processor

Multicore Programming Workshop

Sep 29, 2010

Duke University

Threads
Thread:

a sequence of control
within a process

All threads per process share:


memory (program code and
global data)
open file/socket descriptors
signal handlers and signal
dispositions
working environment
Threads communicate using
shared memory

Multicore Programming Workshop

Sep 29, 2010

Duke University

Advantages and Disadvantages


Advantages:
creating a thread is significantly faster than creating a process
switching between threads is faster than switching between
processes
writing multithreaded programs is easier

Disadvantages :
writing multithreaded programs is harder
more difficult to debug than single threaded programs

Multicore Programming Workshop

Sep 29, 2010

Duke University

Outline
Programming with Threads
Embarrassingly Parallel (Pleasantly Parallel)
Critical Sections (Mutual Exclusion)
Data Dependent Task Parallelism (Condition Variables & Signals)

Quick Introduction to OpenMP Programming

Multicore Programming Workshop

Sep 29, 2010

Duke University

Example 0 sequential code, as default single thread


#include <stdlib.h>!
#include <stdio.h>!
!
void getvec (double *a);!
!
double dotprod (double *a, double *b, int n) {!
int i;!
double s = 0.0;!
for ( i = 0; i < n; i++ ) !
s += a[i]*b[i];!
return s;!
}!
!
int main () {!
double *a, *b;!
!
a = (double *) malloc(sizeof(double)*N);!
b = (double *) malloc(sizeof(double)*N);!
!
getvec(a); getvec(b);!
!
double dp = dotprod(a,b,N);!
printf("%f\n", dp);!
}!
!

Source:
www.cs.duke.edu/~nikos/mpw/dp0.c
Compile:
gcc D N=1024 O4 dp0.c o dp0
Run:
./dp0

Multicore Programming Workshop

Sep 29, 2010

Duke University

Example 1 sequential code as a separate thread


#include <stdlib.h>!
#include <stdio.h>!
#include <pthread.h>!
!
void getvec (double *a);!
double dotprod (double *a, double *b, int n);!
!
typedef struct {!
double *a, *b;!
int n;!
} dparg;!
!
void *wrapper (void *arg) {!
double *ap, *bp, s;!
int nn;!
ap = ((dparg *) arg)->a;!
bp = ((dparg *) arg)->b;!
nn = ((dparg *) arg)->n;!
!
s = dotprod(ap, bp, nn);!
printf("%f\n", s);!
}!
!
!

Source:
www.cs.duke.edu/~nikos/mpw/dp1.c
Compile:
gcc pthread D N=1024 O4 dp1.c o dp1
Run:
./dp1

Multicore Programming Workshop

Sep 29, 2010

Duke University

Example 1 sequential code as a separate thread


#include <stdlib.h>!
#include <stdio.h>!
#include <pthread.h>!
!
void getvec (double *a);!
double dotprod (double *a, double *b, int n);!
void *wrapper (void *arg);!
!
int main () {!
double *a, *b;!
pthread_t thread;!
dparg arg;!
!
a = (double *) malloc(sizeof(double)*N);!
b = (double *) malloc(sizeof(double)*N);!
!
getvec(a); getvec(b);!
!
arg.a = a;!
arg.b = b;!
arg.n = n;!
!
pthread_create (&thread, NULL, wrapper, (void *)
&arg);!
pthread_join (thread, NULL);!
}!
!
!

Source:
www.cs.duke.edu/~nikos/mpw/dp1.c
Compile:
gcc pthread D N=1024 O4 dp1.c o dp1
Run:
./dp1

Multicore Programming Workshop

Sep 29, 2010

10

Duke University

Thread Creation & Termination

pthread_create(
pthread_t * tid,
const pthread_attr_t * attr,
void *(*func)(void *),
void * arg);
func is the function to be called.
When func() returns the thread is terminated

Multicore Programming Workshop

Sep 29, 2010

11

Duke University

Thread Creation Arguments


Arguments are passed to thread library by creating a structure
and passing the address of the structure
Thread attributes can be set using

a*r,

Joinable or detached state


scheduling policy
NULL for system defaults

Multicore Programming Workshop

Sep 29, 2010

12

Duke University

Thread Lifespan
Once a thread is created
it starts executing the function func()
func)) is an argument passed to pthread_create()

The thread is terminated


when func() returns, or
by pthread_exit()

All threads are terminated


when main() returns or
any thread calls exit()

Multicore Programming Workshop

Sep 29, 2010

13

Duke University

Joinable and Detached State


Each thread can be either joinable or detached.
Joinable:
on its termination the thread ID and exit status are saved

Detached:
on its termination all resources used by the thread are released
A detached thread cannot be joined

A thread can "join" another by calling

pthread_join

The caller blocks until a specified thread exits.

int pthread_join( pthread_t 2d, void **status);

Multicore Programming Workshop

Sep 29, 2010

14

Duke University

Example 2 with multiple threads


#include <stdlib.h>!
#include <stdio.h>!
#include <pthread.h>!
!
typedef struct { double *a, *b, s; int n, tid;!
} dparg;!
!
double dotprod (double *a, double *b, int n, int tid) {!
int i;!
double s = 0.0;!
int block = n/NTHREADS;!
!
for ( i = tid*block; i < (tid+1)*block; i++) !
s += a[i]*b[i];!
Source:
return s;!
www.cs.duke.edu/~nikos/mpw/dp2.c
}!
!
Compile:
void * wrapper (void *arg) {!
gcc pthread D NTHREADS=8 D N=1024
double *ap, *bp, s;!
O4 dp2.c o dp2
int nn, tid;!
ap = ((dparg *) arg)->a;!
Run:
bp = ((dparg *) arg)->b;!
./dp2
nn = ((dparg *) arg)->n;!
tid = ((dparg *) arg)->tid;!
!
((dparg *) arg)->s = dotprod(ap, bp, nn, tid);!
}!
!
Multicore Programming Workshop Sep 29, 2010

15

Duke University

Example 2 with multiple threads


#include <stdlib.h>!
#include <stdio.h>!
#include <pthread.h>!
!
int main () {!
double *a, *b, dp;!
pthread_t thread[NTHREADS];!
dparg arg[NTHREADS];!
int i;!
a = (double *) malloc(sizeof(double)*N);!
b = (double *) malloc(sizeof(double)*N);!
getvec(a); getvec(b);!
!
for (i=0; i<NTHREADS; i++) {!
arg[i].a = a; arg[i].b = b;!
arg[i].n = n; arg[i].tid = i;!
!
pthread_create (&thread[i], NULL, wrapper, !
(void *)&arg[i]);!
}!
dp = 0.0;!
for (i=0; i<NTHREADS; i++) {!
rc = pthread_join (thread[i], NULL);!
dp += arg[i].s;!
}!
printf("%f\n", dp);!
}!
!

Source:
www.cs.duke.edu/~nikos/mpw/dp2.c
Compile:
gccD NTHREADS=8 D N=1024 \
pthread O4 dp2.c o dp2
Run:
./dp2

Multicore Programming Workshop

Sep 29, 2010

16

Duke University

Outline
Programming with Threads
Embarrassingly Parallel (Pleasantly Parallel)
Critical Sections (Mutual Exclusion)
Data Dependent Task Parallelism (Condition Variables & Signals)

Quick Introduction to OpenMP Programming

Multicore Programming Workshop

Sep 29, 2010

17

Duke University

Mutual Exclusion
Mutual Exclusion primitives protect against races
Read-Update-Write

Get the single key and


lock the critical section of a program before accessing global
variables
unlock as soon as you are done

pthread_mutex_t mux;
pthread_mutex_init (&mux, NULL);
pthread_mutex_lock (&mux);
pthread_mutex_unlock (&mux);

Multicore Programming Workshop

Sep 29, 2010

18

Duke University

Locking and Unlocking

To lock :
pthread_mutex_lock(pthread_mutex_t &);

To unlock :
pthread_mutex_unlock(pthread_mutex_t &);
Both functions are blocking

Multicore Programming Workshop

Sep 29, 2010

19

Duke University

Example 3 with Critical Section


#include <stdlib.h>!
#include <stdio.h>!
#include <pthread.h>!
!
pthread_mutex_t dp_mtx;!
double dp;!
!
void * wrapper (void *arg) {!
double *ap, *bp, s;!
int nn, tid;!
ap = ((dparg *) arg)->a;!
bp = ((dparg *) arg)->b;!
nn = ((dparg *) arg)->n;!
tid = ((dparg *) arg)->tid;!
!
s = dotprod(ap, bp, nn, tid);!
!
pthread_mutex_lock(&dp_mtx);!
dp += s;!
pthread_mutex_unlock(&dp_mtx);!
!
}!
!
!
!
!
!

Source:
www.cs.duke.edu/~nikos/mpw/dp3.c
Compile:
gccD NTHREADS=8 D N=1024 \
pthread O4 dp3.c o dp3
Run:
./dp3

Multicore Programming Workshop

Sep 29, 2010

20

Duke University

Example 3 with Critical Section


#include <stdlib.h>!
#include <stdio.h>!
#include <pthread.h>!
!
int main () {!
double *a, *b, ps;!
pthread_t thread[NTHREADS];!
dparg arg[NTHREADS];!
int i;!
!
getvec(a); getvec(b);!
!
dp = 0.0;!
pthread_mutex_init (&dp_mtx, NULL);!
!
for (i=0; i<NTHREADS; i++) {!
arg[i].a = a; arg[i].b = b;!
arg[i].n = N; arg[i].tid = i;!
!
pthread_create (&thread[i], NULL, wrapper, !
(void *)&arg[i]);!
}!
for (i=0; i<NTHREADS; i++) {!
pthread_join (thread[i], NULL);!
}!
printf("%f\n", dp);!
}!
!

Source:
www.cs.duke.edu/~nikos/mpw/dp3.c
Compile:
gccD NTHREADS=8 D N=1024 \
pthread O4 dp3.c o dp3
Run:
./dp3

Multicore Programming Workshop

Sep 29, 2010

21

Duke University

Outline
Programming with Threads
Embarrassingly Parallel (Pleasantly Parallel)
Critical Sections (Mutual Exclusion)
Data Dependent Task Parallelism (Condition Variables & Signals)

Quick Introduction to OpenMP Programming

Multicore Programming Workshop

Sep 29, 2010

22

Duke University

Condition Variables
Condition variables allow one thread to
wait for (sleep until) an event generated by any other thread

This allows us to avoid the busy waiting


pthread_cond_t *notFull, *notEmpty;
pthread_cond_init (q->notFull, NULL);
pthread_cond_init (q->notEmpty, NULL);
pthread_mutex_lock (fifo->mut);
while (fifo->full) {
printf ("producer: queue FULL.\n");
pthread_cond_wait (fifo->notFull, fifo->mut);
}
queueAdd (fifo, i);
pthread_mutex_unlock (fifo->mut);
pthread_cond_signal (fifo->notEmpty);
Multicore Programming Workshop

Sep 29, 2010

23

Duke University

Condition Variables
Condition variables are used with a mutex
pthread_cond_wait(pthread_cond_t *cptr,
pthread_mutex_t *mptr);
pthread_cond_signal(pthread_cond_t *cptr);

Multicore Programming Workshop

Sep 29, 2010

24

Duke University

Example 4 with Condition Variable


#include <stdlib.h>!
#include <stdio.h>!
#include <pthread.h>!
!
pthread_cond_t notEmptyVecSignal;!
pthread_mutex_t vec_mtx;!
pthread_mutex_t dp_mtx;!
double dp;!
int emptyVec;!
!
void * wrapper (void *arg) {!
double *ap, *bp, s;!
int nn, tid;!
[]!
!
pthread_mutex_lock(&vec_mtx);!
while (emptyVec) {!
pthread_cond_wait(&notEmptyVecSignal,&vec_mtx);!
}!
pthread_mutex_unlock(&vec_mtx);!
!
s = dotprod(ap, bp, nn, tid);!
!
pthread_mutex_lock(&dp_mtx);!
dp += s;!
pthread_mutex_unlock(&dp_mtx);!
}!
!

Source:
www.cs.duke.edu/~nikos/mpw/dp4.c
Compile:
gccD NTHREADS=8 D N=1024 \
pthread O4 dp4.c o dp4
Run:
./dp4

Multicore Programming Workshop

Sep 29, 2010

25

Duke University

Example 4 with Condition Variable


int main () {!
[ ]!
!
emptyVec = 1;!
pthread_mutex_init (&vec_mtx, NULL);!
pthread_cond_init (&notEmptyVecSignal, NULL);!
!
for (i=0; i<NTHREADS; i++) {!
arg[i].a = a; arg[i].b = b;!
arg[i].n = N; arg[i].tid = i;!
!
pthread_create (&thread[i], NULL, wrapper, (void *)
&arg[i]);!
}!
!
getvec(a); getvec(b);!
!
pthread_mutex_lock(&vec_mtx);!
emptyVec = 0;!
pthread_mutex_unlock(&vec_mtx);!
pthread_cond_broadcast (&notEmptyVecSignal);!
!
for (i=0; i<NTHREADS; i++) {!
rc = pthread_join (thread[i], NULL);!
}!
!
printf("%f\n", dp);!
}!
!
!

Source:
www.cs.duke.edu/~nikos/mpw/dp4.c
Compile:
gccD NTHREADS=8 D N=1024 \
pthread O4 dp4.c o dp4
Run:
./dp4

Multicore Programming Workshop

Sep 29, 2010

26

Duke University

Outline
Programming with Threads
Embarrassingly Parallel (Pleasantly Parallel)
Critical Sections (Mutual Exclusion)
Data Dependent Task Parallelism (Condition Variables & Signals)

Quick Introduction to Programming with OpenMP

Multicore Programming Workshop

Sep 29, 2010

27

Duke University

OpenMP
A set of compiler directives and library routines for parallel
application programmers
OMP simplifies writing multi-threaded programs in Fortran, C
and C++
Most of the constructs in OpenMP are compiler directives
#pragma omp construct [clause [clause]]
#pragma omp parallel num_threads(4)

Function prototypes and types in the file:


#include <omp.h>

Most OpenMP* constructs apply to a structured block


Structured block: a block of one or more statements with one point
of entry at the top and one point of exit at the bottom
Multicore Programming Workshop

Sep 29, 2010

28

Duke University

Example in OpenMP
#include <omp.h>!
#include <stdlib.h>!
#include <stdio.h>!
!
void getvec (double *a);!
!
double dotprod (double *a, double *b, int n) {!
int i;!
double s = 0.0;!
!
#pragma omp parallel for reduction(+:s)!
for ( i = 0; i < n; i++ ) !
s += a[i]*b[i];!
return s;!
}!
!
int main () {!
double *a, *b;!
!
a = (double *) malloc(sizeof(double)*N);!
b = (double *) malloc(sizeof(double)*N);!
!
getvec(a); getvec(b);!
!
omp_set_num_threads(NTHREADS);!
double dp = dotprod(a,b,n);!
printf("%f\n", dp);!
}!
!
!

Source:
www.cs.duke.edu/~nikos/mpw/dp0-omp.c
Compile:
gcc D NTHREADS=8 D N=1024 \
fopenmp O4 dp0-omp.c o dp0-omp
Run:
./dp0-omp

Multicore Programming Workshop

Sep 29, 2010

29

Duke University

OpenMP Parallel Region


#pragma omp parallel [clause...]!
if (scalar_expression) !
private (list) !
shared (list) !
default (shared | none)!
firstprivate (list) !
reduction (operator: list) !
copyin (list) !
! num_threads (n)!
!

structured_block!

When a thread reaches a


PARALLEL directive, it creates a
team of threads and becomes the
master of the team
The master becomes thread
number 0 within that team.

If any thread terminates within a


parallel region, all threads in the
team will terminate, and the work
done up until that point is
undefined.

The parallel region code is


executed by all threads
A barrier implied at the end of the
parallel section
Only the master thread
continues execution

Multicore Programming Workshop

Sep 29, 2010

30

Duke University

OpenMP Work Sharing DO/for


#pragma omp for [clause...]!
schedule (type [,chunk])!
ordered private (list)!
firstprivate (list)!
lastprivate (list) !
shared (list) !
reduction (operator: list)!
collapse (n) !
nowait !
!
for_loop !

#pragma omp parallel for \ !


shared(a,b,c) \!
private(i)!
for (i=0; i < n; i++) {!
c[i] = a[i] + b[i];
!
} !

Multicore Programming Workshop

Sep 29, 2010

31

Duke University

Directive Responsibility

Work-sharing
Data scoping
Synchronization
Scheduling

Parallel region: partition work


Each thread executes same
code

Parallel for loop: partition


iterations
Threads share iterations of
loop

Parallel section: functional


parallelism
Threads perform different
tasks

Multicore Programming Workshop

Sep 29, 2010

32

Duke University

Directive Responsibility

Work-sharing
Data scoping
Synchronization
Scheduling

Shared: threads access a


single copy of the data object
Private: each thread gets
volatile copy
Firstprivate: initialized from
master
Lastprivate: masters copy
updated with last value of last
thread

Multicore Programming Workshop

Sep 29, 2010

33

Duke University

Directive Responsibility

Work-sharing
Data scoping
Synchronization
Scheduling

#pragma omp master


{}

#pragma omp critical


{}

#pragma omp atomic


count++;

#pragma omp barrier


reduction (+: sum)

Shared data with concurrent


access lead to corrupted data
Synchronization
Mutex ensures exclusive
access to critical section of
code
Barrier causes a group of
threads to pause until all
have reached a defined point

Signaling
Conditional Wait waits for
some event; signals when it
occurs
Broadcasting signals a
group of waiting threads
Multicore Programming Workshop

Sep 29, 2010

34

Duke University

Directive Responsibility

Work-sharing
Data scoping
Synchronization
Scheduling

Static: splits iteration space


into blocks of size chunk
Dynamic: assign blocks to
threads as they become idle
(uneven workloads)
Guided: adjusts chunk-size
exponentially until all
assigned

Multicore Programming Workshop

Sep 29, 2010

35

Duke University

References
D. Butenhof, Programming with POSIX threads, Addison Wesley
(1997)
Online Tutorials from LLNL
https://computing.llnl.gov/tutorials/pthreads/
https://computing.llnl.gov/tutorials/openMP/

Multicore Programming Workshop

Sep 29, 2010

36

Anda mungkin juga menyukai