Lista1 PDF

UNIVERSIDADE FEDERAL DO RIO GRANDE DO NORTE
Departamento de Engenharia da Computao e Automao

Aluno: Francisco Helio da Cunha Junior
Programao Concorrente e Distribuda

Captulo 3
17 de Maio de 2015
Natal-RN
Problemas
resolvidos:1/2/3/4/5/6/7/8/9/10/12/14/15/16/17/19/20/21/22/24/25/26/
26
Problema 1
What happens in the greetings program if, instead of strlen(greeting) + 1 ,
we use strlen(greeting) for the length of the message being sent by processes 1, 2, . . . , comm sz 1 ? What happens if we use MAX STRING
instead of strlen ( greeting ) + 1 ? Can you explain these results?
Soluo
No programa no strlen +1, o +1 significa em relao ao \0 (null ). Com

isso , caso seja utilizado somente strlen o programa no saberia o fim da
linha e provavelmente seria bloqueado. Alguns problemas ir funcionar
normalmente , mas em outros podem ocorrer erro de segmentao
Caso enviasse o Max_String o programa iria funcionar mas estaria enviando
dados desnecessrios(lixos).
Problema 2
Modify the trapezoidal rule so that it will correctly estimate the
integral even if comm sz doesnt evenly divide n. (You can still
assume that n comm sz .)
Soluo : Um exemplo para ilustrar a soluo . Tendo 19 trapzios e 4

processos , teremos 5 processos para o processo (0,1,2) e o processo 3
ficaria com 4
Divisao=n/com_sz;
Resto=n%com_sz;
If(my_rank<resto)
{
local_n=resto+1;
local_a=a+my_rank*local_n*h;
local_b=local_a+localn*h;
}
Else{
Local_n=diviso;
Local_a=a+my_rank*local_n*h+resto*h;
Local_b=local_a+local_n*h;
}
Problema 3
Determine which of the variables in the trapezoidal rule program are local
and which are global.
Soluo
Varivel global= a ,b, h,total_int,n,com_sz

Varivel local=source,local_a,local_b,local_n,total_int
Problema 4
Modify the program that just prints a line of output from each process ( mpi
output.c ) so that the output is printed in process rank order: process 0s output
first, then process 1s, and so on.
Soluo
A ideia para que ele imprima em seqncia ser utilizar do conceito que o
rec bloqueante. Com isso todos os rank diferentes de zero enviar para o
processo zero que ira ficar responsvel por imprimir
int main(void) {
if (my_rank =0)
{
printf (" Proc %d of %d > Does anyone have a toothpick ?\n",
my_rank , comm_sz );
for (i =1;i< comm_sz ;i ++){
MPI_Recv (& aux ,1, MPI_INT, i ,0, MPI_COMM_WORLD,
MPI_STATUS_IGNORE);
printf (" Proc %d of %d > Does anyone have a
toothpick ?\n", aux, comm_sz );
}
}
else
{
MPI_Send (& my_rank ,1, MPI_INT ,0 ,0, MPI_COMM_WORLD);
}
entrada: n=4
Saida:
Proc 0
Proc 1
Proc 2
Proc 3
of
of
of
of
4
4
4
4
>
>
>
>
Does
Does
Does
Does
anyone
anyone
anyone
anyone
have
have
have
have
a
a
a
a
toothpick?
toothpick?
toothpick?
toothpick?
Problema 5
In a binary tree, there is a unique shortest path from each node to the root. The
length of this path is often called the depth of the node. A binary tree in which
every nonleaf has two children is called a full binary tree, and a full binary tree
in which every leaf has the same depth is sometimes called a complete binary
tree. See Figure 3.14. Use the principle of mathematical induction to prove
that if T is a complete binary tree with n leaves, then the depth of the leavesis log 2
(n)
Soluo:
Seguindo os passos :
1. A base: mostrar que o enunciado vale para n = 1
2. O passo indutivo: mostrar que, se o enunciado vale para n=k, ento o mesmo
enunciado vale para n=k+1.
3. Usando o caso quando a arvore s tem uma folha,ou seja, a prpria raiz a folha
teremos como profundidade 0 que segue o padro log(1)=0 . Sabendo que a
profundidade de um n igual a profundidade do pai +1 . E tendo que a
quantidade de folhas do pai a metade da quantidade do filho , podemos deduzir
que :
Profundidade (no)=profunidad(pai)+1=log2(n/2)+1=log2(n)-log2(2)+1=log2(n)-1+1=log2(n).
Com isso esta provada por induo que pela profundidade pelo mtodo da induo que a
profundidade da arvore igual log2(n)
Problema 6
Suppose comm sz = 4 and suppose that x is a vector with n = 14
components.
a. How would the components of x be distributed among the processes in
a program that used a block distribution?
b. How would the components of x be distributed among the processes in
a program that used a cyclic distribution?
c. How would the components of x be distributed among the processes in
a program that used a block-cyclic distribution with blocksize b = 2?
You should try to make your distributions general so that they could be
used regardless of what comm sz and n are. You should also try to make
your distributions fair so that if q and r are any two processes, the
difference between the number of components assigned to q and the
number of components assigned to r is as small as possible.
Soluo:
a) Bloco distribuido
Processo 0: x0,x1,x2,x3
processo 2:x8,x9,x10
processo 3:x11,x12,x13
b)Bloco cyclic
Proceso 1 : x1,x5,x9,x13
Proceso 2: x2,x6,x10
Proceso 4:x3,x7,x11
c)Bloco cyclic-distribuido
Processo 0:x0,x1,x8,x9
Proceso 1:x2,x3,x10,x11
Processo 2:x4,5x5,x12
Proceso 3:x6,x7,x13
Problema 7
What do the various MPI collective functions do if the communicator contains
a single process?
Caso o programa utilize somente um processador , o programa funcionar como um
serial . Caso tenha alguma situao bloqueante , pode ser que o programa fique
bloqueado
As funes testadas que funcionaram normal com um processador foram
:bcast,gather ,scatter,Allgather ,reduce ,Allreduce tiveram funcionamento normal com
somente um processo
Programa 8
Suppose comm sz = 8 and n = 16.
a. Draw a diagram that shows how MPI Scatter can be implemented using
tree-structured communication with comm sz processes when process 0
needs to distribute an array containing n elements.
b. Draw a diagram that shows how MPI Gather can be implemented using
tree-structured communication when an n-element array that has been
distributed among comm sz processes needs to be gathered onto process 0.
a)
b)
Problema 9
Write an MPI program that implements multiplication of a vector by a
scalar and dot product. The user should enter two vectors and a scalar, all
of which are read in by process 0 and distributed among the processes.
The results are calculated and collected onto process 0, which prints
them. You can assume that n, the order of the vectors, is evenly divisible
by comm sz .
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <mpi.h>
void ler_tam(int *local_n_p,int *n_p,int my_rank, int comm_sz,
MPI_Comm comm);
void Ler_val(double *local_vetor1, double *local_vetor2, double*
escalar_p,
int local_n, int my_rank, int comm_sz, MPI_Comm comm);
void Imprime_vetor(double local_vec[], int local_n, int n,
int my_rank, MPI_Comm comm);

double Produto_escalar_vetor(double local_vetor1[], double
local_vetor2[],
int local_n, MPI_Comm comm);
void Par_vector_escalar_mult(double local_vec[], double escalar,
double local_result[], int local_n);
int main(void) {
int n, local_n;
double *local_vetor1, *local_vetor2;
double escalar;
double *local_escalar_mult1, *local_escalar_mult2;
double produto_escalar;
int comm_sz, my_rank;
MPI_Comm comm;
MPI_Init(NULL, NULL);
comm = MPI_COMM_WORLD;
MPI_Comm_size(comm, &comm_sz);
MPI_Comm_rank(comm, &my_rank);
ler_tam(&local_n, &n, my_rank, comm_sz, comm);
local_vetor1 = malloc(local_n*sizeof(double));
local_vetor2 = malloc(local_n*sizeof(double));
local_escalar_mult1 = malloc(local_n*sizeof(double));
local_escalar_mult2 = malloc(local_n*sizeof(double));
Ler_val(local_vetor1, local_vetor2, &escalar, local_n, my_rank,
comm_sz,
comm);
/* Imprime os valores do vetores inseridos */
if (my_rank == 0)
printf("\n\n Vetores de entrada\n");
printf("\n o primeiro vetor e : \n");
Imprime_vetor(local_vetor1, local_n, n,my_rank, comm);
printf("\n o segundo vetor e : \n");
Imprime_vetor(local_vetor2, local_n, n,my_rank, comm);
if (my_rank == 0){
printf("escalar is %f\n",escalar);
}
/* Imprime resultado */
if (my_rank ==0)
printf("\n\n o resultado e :\n");
//Calcula o produto escalar entre os vetores
produto_escalar = Produto_escalar_vetor(local_vetor1, local_vetor2,
local_n, comm);
if (my_rank == 0) {
printf("Produto escalar e %f\n", produto_escalar);
}
//Calcula a multiplicaco escalar
Par_vector_escalar_mult(local_vetor1, escalar, local_escalar_mult1,
local_n);
Par_vector_escalar_mult(local_vetor2, escalar, local_escalar_mult2,
local_n);
printf("\n produto do primeiro vetor pelo escalar : \n");
Imprime_vetor(local_escalar_mult1, local_n, n,
my_rank, comm);
printf("\n produto do segundo vetor pelo escalar : \n");
Imprime_vetor(local_escalar_mult2, local_n, n,my_rank, comm);
free(local_escalar_mult2);
free(local_escalar_mult1);
free(local_vetor2);
free(local_vetor1);
MPI_Finalize();
return 0;
}
void ler_tam(int *local_n_p,int *n_p,int my_rank, int comm_sz,
MPI_Comm comm){
if(my_rank==0){
printf("Digite o tamanho do vetor \n");
scanf("%d",n_p);
}
MPI_Bcast(n_p,1,MPI_DOUBLE,0,comm);
*local_n_p=*n_p/comm_sz;
}
void Ler_val(double *local_vetor1, double *local_vetor2, double*

escalar_p,
int local_n, int my_rank, int comm_sz, MPI_Comm comm) {
double* aux_valor = NULL;
if (my_rank == 0){
printf("Digite o valor da escalar\n");
scanf("%lf", escalar_p);
}
MPI_Bcast(escalar_p, 1, MPI_DOUBLE, 0, comm);
if (my_rank == 0){
aux_valor = malloc(local_n * comm_sz * sizeof(double));
printf("Insira o primeiro vetor \n");
for ( int i = 0; i < local_n * comm_sz; i++)
scanf("%lf", &aux_valor[i]);
MPI_Scatter(aux_valor, local_n, MPI_DOUBLE, local_vetor1,
local_n,
MPI_DOUBLE, 0, comm);
printf("Insira o segundo vetor\n");
for (int i = 0; i < local_n * comm_sz; i++)
scanf("%lf", &aux_valor[i]);
local_n,
free(aux_valor);
} else {
local_n,

local_n,
}
}
void Imprime_vetor(double local_vetor[], int local_n, int n,

int my_rank, MPI_Comm comm) {
double* aux_vetor = NULL;
int i;
if (my_rank == 0) {
aux_vetor = malloc(n * sizeof(double));
MPI_Gather(local_vetor, local_n, MPI_DOUBLE, aux_vetor, local_n,
for (i = 0; i < n; i++)
printf("%f ", aux_vetor[i]);
printf("\n");
free(aux_vetor);
} else {
MPI_Gather(local_vetor, local_n, MPI_DOUBLE, aux_vetor, local_n,
}
}
double Produto_escalar_vetor(double local_vetor1[], double

local_vetor2[],
int local_n, MPI_Comm comm) {
int local_i;
double produto_escalar, local_produto_escalar = 0;
for (local_i = 0; local_i < local_n; local_i++)
local_produto_escalar += local_vetor1[local_i] *
local_vetor2[local_i];
MPI_Reduce(&local_produto_escalar, &produto_escalar, 1, MPI_DOUBLE,
MPI_SUM,
0, comm);
return produto_escalar;
}
void Par_vector_escalar_mult(double local_vec[], double escalar,

double local_result[], int local_n) {
int local_i;
for (local_i = 0; local_i < local_n; local_i++)

local_result[local_i] = local_vec[local_i] * escalar;
Entrada: n=4 vetor1=[4 4 4 4] vetor2=[4 4 4 4] constante=4

Saida:Produto escalar 64.000000
produto do vetor1 pela constante
16.00 16.00 16.00 16.00
produto do vetor2 pela constante
16.00 16.00 16.00 16.00
Problema 10
In the Read vector function shown in Program 3.9, we use local n as
the actual argument for two of the formal arguments to MPI Scatter :
send count and recv count . Why is it OK to alias these arguments?
Soluo:
Est correto pois o erro de alias ocorre somente em casos de parmetros de sada ou de
sada/entrada. Como no programa foi utilizado como dois parametros de entrada no
tem problema de aliasing que quando dois parametros acessa o mesmo bloco de
memria
Problema 12
An alternative to a butterfly-structured allreduce is a ring-pass structure. In a
ring-pass, if there are p processes, each process q sends data to process q + 1,
except that process p 1 sends data to process 0. This is repeated until each
process has the desired result. Thus, we can implement allreduce with the
following code:
sum = temp val = my val;
for (i = 1; i < p; i++) {
MPI Sendrecv replace(&temp val, 1, MPI INT, dest,
sendtag, source, recvtag, comm, &status);
sum += temp val;
}
a. Write an MPI program that implements this algorithm for allreduce. How
does its performance compare to the butterfly-structured allreduce?
b. Modify the MPI program you wrote in the first part so that it implements
prefix sums.
Soluo:
int Global_sum( int my_val){

int_val=my_val;
if(my_rank==0){
for (i = 1; i < comm_sz ; i++) {
MPI_Sendrecv_replace(&temp_val, 1, MPI_INT, 1, 0, 3, 0, comm,
MPI_STATUS_IGNORE);
sum += temp_val;}
}
eles if(my_rank=3){
for (i = 1; i < comm_sz ; i++) {
MPI_STATUS_IGNORE);
sum += temp_val;
}
}
else{
for (i = 1; i < comm_sz ; i++) {
MPI_Sendrecv_replace(&temp_val, 1, MPI_INT, my_rank+1, 0,

my_rank-1, 0, comm, MPI_STATUS_IGNORE);
sum += temp_val;
}
}
Proc
Proc
Proc
Proc
0 -x
1- y
2- z
3- w
Resultado
Proc 0 -x+y+z+w
Proc 1 -x+y+z+w
Proc 2 -x+y+z+w
Proc 3 -x+y+z+w
b)
int Global_sum( int my_val){
int_val=my_val;
if(my_rank==0){
for (i = 1; i < comm_sz ; i++) {
if (i > dest) temp_val = 0;
MPI_STATUS_IGNORE);
sum += temp_val;}
}
eles if(my_rank=3){
for (i = 1; i < comm_sz ; i++) {
MPI_STATUS_IGNORE);
sum += temp_val;
}
}
else{
for (i = 1; i < comm_sz ; i++) {
MPI_Sendrecv_replace(&temp_val, 1, MPI_INT, my_rank+1, 0,
my_rank-1, 0, comm, MPI_STATUS_IGNORE);
sum += temp_val;
}
}
Saida:
Proc
Proc
Proc
Proc
0
1
2
3
-x
-y
-z
-w
Proc
Proc
Proc
Proc
0
1
2
3
-x
-x+y
-x+y+z
-x+y+z+w
Problema '14
a. Write a serial C program that defines a two-dimensional array in the main
function. Just use numeric constants for the dimensions:
int two d[3][4];
Initialize the array in the main function. After the array is initialized, call
a function that attempts to print the array. The prototype for the function
should look something like this.
void Print two d(int two d[][], int rows, int cols);
After writing the function try to compile the program. Can you explain
why it wont compile?
b. After consulting a C reference (e.g., Kernighan and Ritchie [29]), modify
the program so that it will compile and run, but so that it still uses a twodimensional C array.
Soluo:
int main(void) {
int two_d[3][4];
int temp = 0;
for (int i = 0; i < 3; i++)
for(int j =0; j<4; j++) {
two_d[i][j] = i+j*2
}
Print_two_d(two_d[][], 3, 4);
return 0;
}
void Print_two_d(int two[][], int rows, int cols) {
for (int i = 0; i < rows ;i++){
for(int j =0; j < cols;j++) {
printf("%d ", two[i][j]);
}
printf("\n");
}
}
O problema se d porque a funo Print_two_d no tem como alocar memria sem

sabe pelo menos a segunda dimenso . H duas formas de resolver isso , uma por
alocao dinmica e a outra definindo o tamanho da coluna da matriz
b)
Duas situaes : Passando o tamanho da coluna . Modificando assim somente a
int main
int main(void)
{
int two_d[3][4];
for (int i = 0; i < 3; i++){
for (int j = 0; j < 4; j++){

two_d[i][j] =i+j*2;
}
}
}
Print_two_d(two_d,3,4);
return 0;
void Print_two_d(int two[][4], int rows, int cols) {

for (int i = 0; i < rows ;i++){
for(int j =0; j < cols;j++) {
}
printf("\n");
}
}
Aloca dinamicamente
int main(void) {
int i,j;
/* Temos um vetor que referncia cada linha, e cada linha por sua
vez tem um vetor que referncia as colunas de cada linha. resumindo,
temos um vetor para cada vetor, vimos que podemos refernciar vetores
atrvez de ponteiro*/
int **two=(int**)malloc(3*sizeof(int*));
for(i=0;i<3;i++){
two[i]=(int*)malloc(4*sizeof(int));
}
for (i = 0; i < 3; i++)
for(j =0; j<4; j++) {
two[i][j] = i*2;
}
Print_two_d(two, 3, 4);
free(two);
return 0;
}
void Print_two_d(int **two, int rows, int cols) {
int i,j;
for (i = 0; i < rows ;i++){

for(j =0; j < cols;j++) {
}
printf("\n");
}
Problema 15
What is the relationship between the row-major storage for twodimensional
arrays that we discussed in Section 2.2.3 and the one-dimensional storage we use in
Section 3.4.9?
Soluo:
Os dois tipos de armazenamentos so iguais . Os dois pega uma matrix de duas
dimenses e transforma em uma matriz de uma dimenso , ou seja, armazenado a
primeira linha em seguida armazenado a segunda linha e por a em diante.
Problema 16
Suppose comm sz = 8 and the vector x = (0, 1, 2, . . . , 15) has been distributed
among the processes using a block distribution. Draw a diagram illustrating the
steps in a butterfly implementation of allgather of x.
Soluo:
Problema 17
MPI Type contiguous can be used to build a derived datatype from a
collection of contiguous elements in an array. Its syntax is
int MPI Type contiguous(
int
count
MPI
Datatype
old
mpi t
MPI Datatype
new mpi t p
);
Modify the Read vector and Print vector functions so that they use
an MPI datatype created by a call to MPI Type contiguous and a
count
argument of 1 in the calls to MPI Scatter and MPI Gather .
Soluo:
A idia geral do programa ser :

int main(void) {
...
MPI_Datatype tipo;
...
MPI_Type_contiguous(local_n, MPI_DOUBLE, &tipo);
MPI_Type_commit(&tipo);//Permite que o dado seja utilizado varias
vezes
...
Read_vector(local_x, local_n, n,tipo);
Print_vector(local_x, local_n, n, tipo);
......
return 0;
}
void Read_vector(double
, int
n
MPI_Datatype
local_vec[]
, int
local_n
tipo ) {
..
if (my_rank == 0) {
vetor = malloc(n*sizeof(double));
printf("Insira o vetor %s\n", vec_name);
for (i = 0; i < n; i++)
scanf("%lf", &vetor[i]);
MPI_Scatter(vetor, 1, tipo, local_vec, 1, tipo, 0, comm);
free(a);
} else {
MPI_Scatter(vetor, 1, tipo, local_vec, 1, tipo, 0, comm);
}
}
void Print_vector(double
local_vec[]
, int
local_n
, int
n
, char
title[]
, MPI_Datatype
tipo) {
....
if (my_rank == 0) {
MPI_Gather(local_vec, 1, tipo, vec, 1, tipo, 0, comm);
for (i = 0; i < n; i++)
printf("%f ", vec[i]);
free(b);
} else {
MPI_Gather(local_vec, 1, tipo, b, 1, tipo, 0, comm);
}
}
Resultado:
Taamanho vetor :4
Vetor1=4,4,4,4
Vetor e
4.000000 4.000000 4.000000 4.000000
Vetor2= 4 4 4 4
Vetor2 e
4.000000 4.000000 4.000000 4.000000
A soma e
8.000000 8.000000 8.000000 8.000000
Problema 19
MPI Type indexed can be used to build a derived datatype from arbitrary
array elements. Its syntax is
int MPI Type indexed(
int
count
int
array of blocklengths[]
int
array of displacements[]
MPI Datatype
old
mpi t
MPI Datatype
new mpi t p)
Unlike MPI Type create struct , the displacements are measured in units of
old mpi t not bytes. Use MPI Type indexed to create a derived datatype
that corresponds to the upper triangular part of a square matrix. For example,
in the 4 4 matrix
the upper triangular part is the elements 0, 1, 2, 3, 5, 6, 7, 10, 11, 15. Process
0 should read in an n n matrix as a one-dimensional array, create the
derived datatype, and send the upper triangular part with a single call to
MPI Send . Process 1 should receive the upper triangular part with a single
call to MPI Recv and then print the data it received
Soluo:
//mpicc -g -Wall -std=c99 -o problema3_19 problema3_19.c
//mpiexec -n 2 ./problema3_19
#include
#include
#include
#include
<stdio.h>
<stdlib.h>
<string.h>
<mpi.h>
void ler_tam(int *n_p);

int my_rank, comm_sz;
MPI_Comm comm;
int main(void) {
int n;
double *matriz;
MPI_Datatype tipo;
int * bloco_disp;
int *bloco_tam;
comm = MPI_COMM_WORLD;
MPI_Comm_size(comm, &comm_sz);
MPI_Comm_rank(comm, &my_rank);
ler_tam(&n);
matriz=malloc(n*n*sizeof(double));
bloco_disp=malloc(n*sizeof(int));
bloco_tam=malloc(n*sizeof(int));
int disp=0;
int tam=n;
for (int i=0;i<n;i++){
bloco_tam[i]=tam;
bloco_disp[i]=disp+i*n;
disp++; tam--;
}
MPI_Type_indexed(n,bloco_tam, bloco_disp, MPI_DOUBLE,
&tipo);
MPI_Type_commit(&tipo);
free(bloco_disp);
free(bloco_tam);
if(my_rank==0)
{
printf("digite o valor \n");
for (int j=0;j<n;j++){
scanf("%lf",&matriz[j+i*n]);
}
}
MPI_Send(matriz, 1, tipo, 1, 0, comm);
}
else if(my_rank==1){
//Executando essa linha, mas no deu erro sem ela
for (int j=0;j<n;j++){
matriz[j+i*n]=0;
}
}
MPI_Recv(matriz, 1, tipo, 0, 0, comm, MPI_STATUS_IGNORE);

printf("\n a matriz triangular e :\n");
for (int i = 0; i < n; i++) {
for (int j = 0; j < n; j++){
printf("%.2f ", matriz[j+i*n ]);
}
printf("\n");
}
}
free(matriz);
MPI_Type_free(&tipo);
MPI_Finalize();
return 0;
void ler_tam(int *n_p){
if(my_rank==0){
printf("Digite o tamanho de n : \n");
scanf("%d",n_p);
}
MPI_Bcast(n_p, 1, MPI_INT, 0, comm);
}
Input: entrada tam :2 matriz 2 2 2 2

saidaa: 2 2
02
Problema 20
The functions MPI Pack and MPI Unpack provide an alternative to derived
datatypes for grouping data. MPI Pack copies the data to be sent, one
block at a time, into a user-provided buffer. The buffer can then be sent
and received. After the data is received, MPI Unpack can be used to
unpack it from the receive buffer. The syntax of MPI Pack is
int MPI Pack(
void
in buf
int
in_buf_count
MPI_Datatype datatype
void
pack_buf
int
pack buf sz
int
position p
MPI Comm comm
}
We could therefore pack the input data to the trapezoidal rule program
with
the following code:
char pack buf[100];
int position = 0;
MPI Pack(&a, 1, MPI DOUBLE, pack buf, 100, &position, comm);
MPI Pack(&b, 1, MPI DOUBLE, pack buf, 100, &position, comm);
MPI Pack(&n, 1, MPI INT, pack buf, 100, &position, comm);
The key is the position argument. When MPI Pack is called, position should
refer to the first available slot in pack buf . When MPI Pack returns, it
refers
to the first available slot after the data that was just packed, so after
process 0 executes this code, all the processes can call MPI Bcast :
MPI Bcast(pack buf, 100, MPI PACKED, 0, comm);

Note that the MPI datatype for a packed buffer is MPI PACKED . Now the other
processes can unpack the data using: MPI Unpack :
int MPI Unpack(

void
int
int
void
int
MPI Datatype
MPI Comm
pack buf
pack buf sz
position p
out buf
out_buf_count
datatype
comm
}
This can be used by reversing the steps in MPI Pack , that is, the data is
unpacked one block at a time starting with position = 0 .
Write another Get input function for the trapezoidal rule program. This
one should use MPI Pack on process 0 and MPI Unpack on the other processes.
Soluo
Usando o exemplo do trap3 visto em sala teremos :

* Compile: mpicc -g -Wall -o mpi_trap3 mpi_trap3.c
* Run: mpiexec -n <number of processes> ./mpi_trap2
#include <stdio.h>
/* We'll be using MPI routines, definitions, etc. */

#include <mpi.h>
/* Get the input values */
void Get_input(int my_rank, double* a_p, double* b_p, int * n_p);
/* Calculate local integral */
double Trap(double left_endpt, double right_endpt, int trap_count,
double base_len);
/* Function we're integrating */
double f(double x);
int main(void) {
int my_rank, comm_sz, n, local_n;
double a, b, h, local_a, local_b;
double local_int, total_int;
/* Let the system do what it needs to start up MPI */
/* Get my process rank */
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
/* Find out how many processes are being used */
MPI_Comm_size(MPI_COMM_WORLD, &comm_sz);
Get_input(my_rank,&a, &b, &n);
h = (b-a)/n;
local_n = n/comm_sz;
/* h is the same for all processes */

/* So is the number of trapezoids */
/* Length of each process' interval of

* integration = local_n*h. So my interval
* starts
local_a =
local_b =
local_int
at: */
a + my_rank*local_n*h;
local_a + local_n*h;
= Trap(local_a, local_b, local_n, h);
/* Add up the integrals calculated by each process */

MPI_Reduce(&local_int, &total_int, 1, MPI_DOUBLE, MPI_SUM, 0,
MPI_COMM_WORLD);
/* Print the result */
if (my_rank == 0) {
printf("With n = %d trapezoids, our estimate\n", n);
printf("of the integral from %f to %f = %.15e\n",
a, b, total_int);
}
/* Shut down MPI */
MPI_Finalize();
return 0;
} /* main */
/*-----------------------------------------------------------------* Function:
Get_input
* Purpose:
Get the user input: the left and right endpoints
*
and the number of trapezoids
* Input args:
my_rank: process rank in MPI_COMM_WORLD
*
comm_sz: number of processes in MPI_COMM_WORLD
* Output args: a_p: pointer to left endpoint
*
b_p: pointer to right endpoint
*
n_p: pointer to number of trapezoids
*/
void Get_input(int my_rank, double* a_p, double* b_p, int * n_p){
char pack_buf[1000];
int position=0;
if(my_rank==0){
printf("Digite o valor de a , b , n respectivamente\n");
scanf("%lf" "%lf" "%d", a_p,b_p,n_p);
MPI_Pack(a_p, 1, MPI_DOUBLE, pack_buf, 1000, &position,
MPI_COMM_WORLD);
MPI_Pack(b_p, 1, MPI_DOUBLE, pack_buf, 1000, &position,
MPI_COMM_WORLD);
MPI_Pack(n_p, 1, MPI_DOUBLE, pack_buf, 1000, &position,
MPI_COMM_WORLD);
}
MPI_Bcast(pack_buf, 1000, MPI_PACKED, 0, MPI_COMM_WORLD);
if (my_rank>0){
MPI_Unpack(pack_buf, 1000, &position, a_p, 1, MPI_DOUBLE,
MPI_COMM_WORLD);
MPI_Unpack(pack_buf, 1000, &position, b_p, 1, MPI_DOUBLE,
MPI_COMM_WORLD);
MPI_Unpack(pack_buf, 1000, &position, n_p, 1, MPI_DOUBLE,
MPI_COMM_WORLD);
}
}/* Get_input */
/* Get_input */
/*-----------------------------------------------------------------* Function:
Trap
* Purpose:
Serial function for estimating a definite integral
*
using the trapezoidal rule
* Input args:
left_endpt
*
right_endpt
*
trap_count
*
base_len
* Return val:
Trapezoidal rule estimate of integral from
*
left_endpt to right_endpt using trap_count
*
trapezoids
*/
double Trap(
double left_endpt /* in */,
double right_endpt /* in */,
int
trap_count /* in */,
double base_len
/* in */) {
double estimate, x;
int i;
estimate = (f(left_endpt) + f(right_endpt))/2.0;
for (i = 1; i <= trap_count-1; i++) {
x = left_endpt + i*base_len;
estimate += f(x);
}
estimate = estimate*base_len;
return estimate;
} /* Trap */
/*-----------------------------------------------------------------* Function:
f
* Purpose:
Compute value of function to be integrated
* Input args: x
*/
double f(double x) {
return x*x;
} /* f */
Resultado:
Digite o valor de a , b , n respectivamente
2
10
2
With n = 2 trapezoids, our estimate
of the integral from 2.000000 to 10.000000 = 6.400000000000000e+01
Problema 21
How does your system compare to ours? What run-times does your system
get for matrix-vector multiplication? What kind of variability do you see in
the times for a given value of comm sz and n? Do the results tend to cluster
around the minimum, the mean, or the median?
Para fazer a comparao utilizamos o programa de mpi_mat_vect_time . E modifiquei

os parametros de Comm_sz e do tamanho da matriz.
Pode-se observar que quanto maior Comm_sz menor ser a eficiencia , com isso um
caso de pouco escalavel . J quando aumentamos o tamanho do problema a eficincia
aumenta , sendo um caso de pouco escalavel.
A mdia=106.60 , min=1.72 e mediana=13 Pode se observar que os valores esto mais
prximos da mediana .
Problema 22
Time our implementation of the trapezoidal rule that uses MPI Reduce . How
will you choose n, the number of trapezoids? How do the minimum times
compare to the mean and median times? What are the speedups? What are the
efficiencies? On the basis of the data you collected, would you say that the
trapezoidal rule is scalable?
Soluo:
O n foi escolhido para que ficasse na ordem de milissegundo .Porque caso a ordem no
seja igual pode ser que perca informaes.
A mdia 4.13 e a mediana =2.3(2.26-243).
De acordo com os dados podemos deduzir que o problema fracamente escalonavel .
Pois quando aumentamos a quantidade de processo a eficincia diminui.
Problema 24
Take a look at Programming Assignment 3.7. The code that we outlined for
timing the cost of sending messages should work even if the count argument
is zero. What happens on your system when the count argument is 0? Can
you explain why you get a nonzero elapsed time when you send a zero-byte
message?
Soluo:
O tempo ser diferente de zero . Pois apesar de no conter dados , o envelope ir conter
a tag e o comunicador. E com isso manter uma conexo MPI mesmo no tendo dados.
Problema 25
If comm sz = p, we mentioned that the ideal speedup is p. Is it possible
to do better?
a. Consider a parallel program that computes a vector sum. If we only time
the vector sumthat is, we ignore input and output of the vectorshow
might this program achieve speedup greater than p?
b. A program that achieves speedup greater than p is said to have superlinear speedup. Our vector sum example only achieved superlinear
speedup by overcoming certain resource limitations. What were these
resource limitations? Is it possible for a program to obtain superlinear
speedup without overcoming resource limitations?
Soluo:
O programa paralelo pode ter speedup maior que p. Isso ocorre

principalmente devido a arquitetura do computador . Um exemplo a
memria cache que no caso de um programa serial possuir um vetor que
no caiba na memria principal e tiver um programa paralelo que o seu
vetor caiba na memria cache, pode acontecer do speedup ser maior que p.
b)No caso da soma o recurso foi a memoria cache. possvel ocorrer casos
de speedup maior que p mesmo no tendo limitaes de recursos. Isso
pode acontecer em programas de busca em que em um problema serial ir
ter que percorrer uma vetor por completo para encontrar o valor que quero ,
j no caso do paralelo pode ser que encontre mais rapido j que devidiu o
processo vrias vezes.
Problema 26
Serial odd-even transposition sort of an n-element list can sort the list in
considerably fewer than n phases. As an extreme example, if the input list
is already sorted, the algorithm requires 0 phases.
a. Write a serial Is sorted function that determines whether a list is
sorted.
b. Modify the serial odd-even transposition sort program so that it checks
whether the list is sorted after each phase.
c. If this program is tested on a random collection of n-element lists,
roughly what fraction get improved performance by checking whether the
list is sorted?
Soluo:
a)
is sort(int vetor[],int tam)
{
for (int i =1;i<tam;i++){
if (vetor[i-1]<vetor[i]){
return 0;
}
}
return 1;
}
Resultado:Numero ordenado
b)
int a[];
int n;
int phase , i , temp;
for (phase=0;phase<n;phase++){
if (is_sort==1){break;
}
if(phase%2 == 0){
for (i=1;i<n;i+=2){
if(a[a-1]>a[i]){
temp=a[i];
a[i]=a[i-1];
a[i-1]=temp;
}
}else {
for
(i=1;o <n-1;i+=2)
if(a[i]>a[i+1]){
temp=a[i];
a[i]=a[i+1];
a[i+1]=temp;
}
}
}
c)O programa geralmente ir ter ganho de performance. Ao menos que

ocorra situao de pior caso.
Problema 27
Find the speedups and efficiencies of the parallel odd-even sort.
Does the program obtain linear speedups? Is it scalable? Is it
strongly scalable? Is it weakly scalable?
Soluo:
O sppedup linear. Podemos observar que o problema fracamente

escalavel , pois quando aumentamos o tamanho do problema vai
diminuindo a eficiencia.

Lista1 PDF

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Lista1 PDF

Diunggah oleh

Hak Cipta:

Format Tersedia

UNIVERSIDADE FEDERAL DO RIO GRANDE DO NORTE

Departamento de Engenharia da Computao e Automao

Programao Concorrente e Distribuda

No programa no strlen +1, o +1 significa em relao ao \0 (null ). Com

Soluo : Um exemplo para ilustrar a soluo . Tendo 19 trapzios e 4

Varivel global= a ,b, h,total_int,n,com_sz

int my_rank, MPI_Comm comm);

void Ler_val(double *local_vetor1, double *local_vetor2, double*

MPI_Scatter(aux_valor, local_n, MPI_DOUBLE, local_vetor2,

void Imprime_vetor(double local_vetor[], int local_n, int n,

double Produto_escalar_vetor(double local_vetor1[], double

void Par_vector_escalar_mult(double local_vec[], double escalar,

for (local_i = 0; local_i < local_n; local_i++)

Entrada: n=4 vetor1=[4 4 4 4] vetor2=[4 4 4 4] constante=4

int Global_sum( int my_val){

MPI_Sendrecv_replace(&temp_val, 1, MPI_INT, my_rank+1, 0,

O problema se d porque a funo Print_two_d no tem como alocar memria sem

for (int j = 0; j < 4; j++){

void Print_two_d(int two[][4], int rows, int cols) {

for (i = 0; i < rows ;i++){

A idia geral do programa ser :

void ler_tam(int *n_p);

MPI_Recv(matriz, 1, tipo, 0, 0, comm, MPI_STATUS_IGNORE);

void ler_tam(int *n_p){

Input: entrada tam :2 matriz 2 2 2 2

MPI Bcast(pack buf, 100, MPI PACKED, 0, comm);

int MPI Unpack(

Usando o exemplo do trap3 visto em sala teremos :

/* We'll be using MPI routines, definitions, etc. */

/* h is the same for all processes */

/* Length of each process' interval of

/* Add up the integrals calculated by each process */

Para fazer a comparao utilizamos o programa de mpi_mat_vect_time . E modifiquei

O programa paralelo pode ter speedup maior que p. Isso ocorre

c)O programa geralmente ir ter ganho de performance. Ao menos que

O sppedup linear. Podemos observar que o problema fracamente

Anda mungkin juga menyukai

void Ler_val(double local_vetor1, double local_vetor2, double*