Pi-Calculation by Parallel Programming: Mr. Paopat Ratpunpairoj

PI-CALCULATION BY PARALLEL PROGRAMMING
Mr. Paopat Ratpunpairoj
I. Introduction
Compare to a conventional sequential programming, parallel programming is much faster. The speed of program depends on the number of processor, connecting system and structure of the algorithm. PI-Calculation is selected to be the example of parallel programming in the different system, shared memory system, distributed memory system and GPU system.
II. Algorithms
There are many methods to calculate Pi. Therefore Mesh, Segment and Monte algorithms were selected. Because they are easily programmed in a parallel system. A. Monte Carlo This algorithm start with random a point in the interval [0:1, 0:1] and determine it inside or outside of the unit circle. Value of Pi is equal to 4 multiply number of point inside the circle divined by number of point outside the circle. B. Segment Algorithm (24 cores) This algorithm is very fast, therefore there are small difference between resolution of computation and execution time.
B. Mesh This algorithm similar to the first algorithm, the difference is the method to locate the point. The point are located every area in the unit square and value of Pi is equal to 4 multiply number of point inside the circle divined by number C. Monte Algorithm (24 cores) of point outside the circle. Higher resolution give longer execution time. C. Segmentation This algorithm finds the area inside the circle by integrating small rectangle inside the circle. And the value of Pi is the area of the unit circle.
III. Message Passing Interface (MPI)

MPI is the method to connect many computers to be parallel computers, each nodes have there own memory, and connected by a network. Chemnitz High Performance Linux Cluster (CHIC) is used to programming by MPI. A .Mesh Algorithm The graphs show the relation between number of nodes (cores), resolution of computing and the execution time. Increasing number of node result the program run faster. And more resolution of computing result slower computation.
IV. GPU Programing (CUDA)

This parallel computing platform has high number of node (we use GEFORCE GTX 580 which has 32 nodes), in each node or block has a small memory and can be run with high number of threads. For all of the algorithm, with the constant resolution increasing number of nodes make the program run faster. A. Mesh algorithm (resolution=32768 with 64 threads per blocks)
V. OpenMP
OpenMP is the platform interface that support shared memory multiprocessing. The advantage of this platform is fast to connect to the memory. Istanbul, four 6-cores processors in Fiona cluster was used to program. With resolution = 1048576. For Mesh and Monte algorithms execution time decreasing when number of cores increasing. A. Mesh algorithm
B. Segment algorithm
B. Segment algorithm (resolution=32768 with 64 threads per blocks)
C. Monte Algorithm
C. Monte Algorithm (resolution=32768 with 64 threads per blocks)
VI. Conclusion
The fastest algorithm is Mesh method,the slowest algorithm is Monte because there have random function and segmentation have square root function. The most accuracy algorithms due to the same resolution is segmentation algorithm. MPI is the most powerful method, it is fast and can be program with the high number of CPU. For small number of cores the suitable method is OpenMP. And the fast and cheapest method is CUDA GPU programing.
References
[1] Parallel Program for Multicore and Cluster System, Thomas Rauber and Gudula Runger. [2] CUDA by Example : An Introduction to General-purpose Programming, David Weller.
Appendix
mpi-mesh.c #include<stdio.h> #include<math.h> #include "mpi.h" #include<stdlib.h> #include<time.h> int main(int argc, char *argv[]){ int i,j,resolution; int core,np; int *sendbuf,*recvbuf; MPI_Init(&argc,&argv); MPI_Comm_rank(MPI_COMM_WORLD,&core); MPI_Comm_size(MPI_COMM_WORLD,&np); MPI_Status status; for(j=1;j<=32;j=j+2){ sendbuf=(int *)malloc(sizeof(int)); recvbuf=(int *)malloc(sizeof(int)); *sendbuf=0; *recvbuf=0; double x,y,pi,time1,time2; long int in=0,sumin=0; resolution=j*j*40; int interval=resolution/(np-1); time1=MPI_Wtime(); for(i=0;i<np-1;i++){ if(core==i){ for(y=1.0*core*interval/ (resolution*1.0);y<(1.0*core*interval+1.0*interval)/ (resolution*1.0);y=y+1.0/resolution){ for(x=0.0;x<1.0;x=x+1.0/resolution){ if(x*x+y*y<=1.0) in++; } } *sendbuf=in; MPI_Reduce(sendbuf,recvbuf,1,MPI_INT,MPI_SUM,np1,MPI_COMM_WORLD); } } if(core==np-1){ *sendbuf=0; MPI_Reduce(sendbuf,recvbuf,1,MPI_INT,MPI_SUM,np1,MPI_COMM_WORLD); pi=((long double)4.0*(*recvbuf))/((long double)1.0*resolution*resolution); time2=MPI_Wtime(); printf("resolution is %d,numberof core is %d The approximatio if pi is %f, time :%f\n",resolution,np ,pi,time2-time1); } free(sendbuf); free(recvbuf); } MPI_Finalize(); return 0; } mpi-seg.c #include<stdio.h> #include<math.h> #include "mpi.h" #include<stdlib.h> #include<time.h> int main(int argc, char *argv[]){ int i,j,resolution; int core,np; double *sendbuf,*recvbuf; MPI_Init(&argc,&argv); MPI_Comm_rank(MPI_COMM_WORLD,&core); MPI_Comm_size(MPI_COMM_WORLD,&np);
MPI_Status status; for(j=1;j<=32;j=j+2){ sendbuf=(double *)malloc(sizeof(double)); recvbuf=(double *)malloc(sizeof(double)); *sendbuf=0.0; *recvbuf=0.0; double x,y,a,b,pi,time1,time2,area=0; long int in=0,sumin=0; resolution=j*j*40; int interval=resolution/(np-1); time1=MPI_Wtime(); for(i=0;i<np-1;i++){ if(core==i){ for(y=1.0*core*interval/ (resolution*1.0);y<(1.0*core*interval+1.0*interval)/ (resolution*1.0);y=y+1.0/resolution){ b=sqrt(1-y*y); area+=1.0/resolution*b; } *sendbuf=area; MPI_Reduce(sendbuf,recvbuf,1,MPI_DOUBLE,MPI_SUM,np1,MPI_COMM_WORLD); } } if(core==np-1){ *sendbuf=0; MPI_Reduce(sendbuf,recvbuf,1,MPI_DOUBLE,MPI_SUM,np1,MPI_COMM_WORLD); pi=(4.0*(*recvbuf)); time2=MPI_Wtime(); printf("resolution is %d,numberof core is %d The approximatio if pi is %f, time :%f\n",resolution,np ,pi,time2-time1); } free(sendbuf); free(recvbuf); } MPI_Finalize(); return 0; } mpi-monte.c #include<stdio.h> #include<math.h> #include "mpi.h" #include<stdlib.h> #include<time.h> int main(int argc, char *argv[]){ int i,j,resolution; int core,np; int *sendbuf,*recvbuf; MPI_Init(&argc,&argv); MPI_Comm_rank(MPI_COMM_WORLD,&core); MPI_Comm_size(MPI_COMM_WORLD,&np); MPI_Status status; for(j=1;j<=32;j=j+2){ sendbuf=(int *)malloc(sizeof(int)); recvbuf=(int *)malloc(sizeof(int)); *sendbuf=0; *recvbuf=0; double x,y,a,b,pi,time1,time2; long int in=0,sumin=0; resolution=j*j*40; int interval=resolution/(np-1); time1=MPI_Wtime(); for(i=0;i<np-1;i++){ if(core==i){ for(y=1.0*core*interval/ (resolution*1.0);y<(1.0*core*interval+1.0*interval)/ (resolution*1.0);y=y+1.0/resolution){ for(x=0.0;x<1.0;x=x+1.0/resolution){ a=(random()%10000)/(double)10000; b=(random()%10000)/(double)10000; if(a*a+b*b<=1.0) in=in+1; } } *sendbuf=in;
MPI_Reduce(sendbuf,recvbuf,1,MPI_INT,MPI_SUM,np1,MPI_COMM_WORLD); } } } if(core==np-1){
} }
for(k=0;k<omp_get_max_threads();k++) *sendbuf=0; MPI_Reduce(sendbuf,recvbuf,1,MPI_INT,MPI_SUM,np1,MPI_COMM_WORLD); pi=((long double)4.0*(*recvbuf))/((long double)1.0*resolution*resolution); time2=MPI_Wtime(); printf("resolution is %d,numberof core is %d The approximatio if pi is %f, time :%f\n",resolution,np ,pi,time2-time1); } free(sendbuf); free(recvbuf); } MPI_Finalize(); return 0; } omp-mesh.c #include<stdio.h> #include<math.h> #include <omp.h> #include <time.h> #include <sys/time.h> #include<stdlib.h> double wtime() { struct timeval tv1; gettimeofday(&tv1,NULL); return (double)tv1.tv_sec+(double)tv1.tv_usec/10e6; } int main(void) { double x,y; int out=32*32*32; int i,j,k,iam,sumin=0; int *in; in=(int *)malloc(omp_get_max_threads()*sizeof(int)); double rtime,rt2, rt1=wtime(); #pragma omp parallel private(x,y,i,j) { iam=omp_get_thread_num(); #pragma omp for schedule(static) nowait for(i=0;i<out;i++){ for(j=0;j<out;j++){ if(i*i+j*j<=out*out){ in[iam]+=1; } } int main(void) { double x,y; int out=32*32*32; int i,j,k,iam; double sumarea=0; double *area; area=(double *)malloc(omp_get_max_threads()*sizeof(double)); double rtime,rt2, rt1=wtime(); #pragma omp parallel private(x,y,i,j) { iam=omp_get_thread_num(); area[iam]=0; #pragma omp for schedule(static) nowait for(i=0;i<out;i++){ x=i/(double)out; y=sqrt(1-x*x); area[iam]+=1.0/out*y; } sumin+=in[k]; double pi=4.0*sumin/(1.0*out*out); rt2=wtime(); rtime=rt2-rt1; printf("from mesh approach %f in %fs \n" ,pi,rtime); return 0; } omp-segment.c #include<stdio.h> #include<math.h> #include <omp.h> #include <time.h> #include <sys/time.h> #include<stdlib.h> double wtime() { struct timeval tv1; gettimeofday(&tv1,NULL); return (double)tv1.tv_sec+(double)tv1.tv_usec/10e6;
} for(k=0;k<omp_get_max_threads();k++) sumarea+=area[k]; double pi=4.0*sumarea; rt2=wtime(); rtime=rt2-rt1; printf("from seg approach %f in %fs \n" ,pi,rtime); return 0; } omp-monte.c #include<stdio.h> #include<math.h> #include <omp.h> #include <time.h> #include <sys/time.h> #include<stdlib.h> double wtime() { struct timeval tv1; gettimeofday(&tv1,NULL); return (double)tv1.tv_sec+(double)tv1.tv_usec/10e6; } int main(void) { double x,y; int all=32*32*32*32*32*32; int i,j,iam,sumin; int *in,num=omp_get_num_threads(); in=(int *)malloc(omp_get_max_threads()*sizeof(int)); double rtime, rt1, rt2; rt1=wtime(); #pragma omp parallel private(x,y,i) { iam=omp_get_thread_num(); #pragma omp for schedule(static) nowait for(i=0;i<all;i++){ x=(random()%10000)/(double)10000; y=(random()%10000)/(double)10000; if(x*x+y*y<=1.0) in[iam]+=1; } } for(int k=0;k<omp_get_max_threads();k++) sumin+=in[k]; double pi1=4.0*sumin/all;
rt2=wtime(); rtime=rt2-rt1; printf("from monte carlo approach =%f in%f s\n",pi1,rtime); return 0; } Mesh.cu #include<stdio.h> #include<math.h> #include<stdlib.h> #include <sys/time.h> #include<time.h> #include "cuda.h" #include "common/book.h" #include<stdlib.h> const int res=32*32*32; const int tpb=64;//threads per block const int bpg=32;//blocks per grid ///////////////////////////////////// ///////////////////////////////////7 double wtime() { struct timeval tv1; gettimeofday(&tv1,NULL); return (double)tv1.tv_sec+(double)tv1.tv_usec/10e6; } ///////////////////////////////////// /////////////////////////////////// __global__ void mesh(int *in){ __shared__ int cache[tpb]; int tid=threadIdx.x+blockIdx.x*blockDim.x; int cacheindex=threadIdx.x; int count=0; while(tid<res){ for(int i=0;i<res;i++){ if(tid*tid+i*i<res*res) count++; } tid+=blockDim.x*gridDim.x; } cache[cacheindex]=count; __syncthreads(); in[blockIdx.x]=0; int x=blockDim.x/2; while( x!=0){ if(cacheindex<x) cache[cacheindex]+=cache[cacheindex+1]; __syncthreads(); x/=2; } if (cacheindex==0) in[blockIdx.x]=cache[0]; } //////////////////////////// ///////////////////////// int main(void){ double rtime, rt2, rt1=wtime(); int *in,*dev_in; long int sum_in=0; cudaMalloc((void**)&dev_in,bpg*sizeof(int)); in=(int*)malloc(res*sizeof(int)); mesh<<<bpg,tpb>>>(dev_in); cudaMemcpy(in,dev_in,bpg*sizeof(int),cudaMemcpyDeviceToHost); for(int k=0;k<bpg;k++) sum_in+=in[k]; double pi=4.0*sum_in/(1.0*res*res); rt2=wtime(); rtime=rt2-rt1; printf("from mesh approach %f in %fs \n" ,pi,rtime); cudaFree(dev_in); free(in); } Seg.cu #include<stdio.h> #include<math.h> #include<stdlib.h>
#include <sys/time.h> #include<time.h> #include "cuda.h" #include "common/book.h" #include<stdlib.h> const int res=32*32*32; const int tpb=64;//threads per block const int bpg=32;//blocks per grid ///////////////////////////////////// ///////////////////////////////////7 double wtime() { struct timeval tv1; gettimeofday(&tv1,NULL); return (double)tv1.tv_sec+(double)tv1.tv_usec/10e6; } ///////////////////////////////////// /////////////////////////////////// __global__ void seg(double *in){ __shared__ double cache[tpb]; int tid=threadIdx.x+blockIdx.x*blockDim.x; int cacheindex=threadIdx.x; double area=0; double yy,xx; double inc=1000.0/res/res; while(tid<res){ yy=res*tid*inc; for(int i=0;i<res;i++){ xx=sqrt(1000*1000-yy*yy); area+=(double)(inc*xx); yy+=inc; } printf("%d\n",area); tid+=blockDim.x*gridDim.x; } cache[cacheindex]=area; __syncthreads(); in[blockIdx.x]=0; int x=blockDim.x/2; while( x!=0){ if(cacheindex<x) cache[cacheindex]+=cache[cacheindex+1]; __syncthreads(); x/=2; } if (cacheindex==0) in[blockIdx.x]=cache[0]; } //////////////////////////// ///////////////////////// int main(void){ double rtime, rt2, rt1=wtime(); double *in,*dev_in; long int sum_in=0; cudaMalloc((void**)&dev_in,bpg*sizeof(double)); in=(double*)malloc(res*sizeof(double)); seg<<<bpg,tpb>>>(dev_in); cudaMemcpy(in,dev_in,bpg*sizeof(double),cudaMemcpyDeviceToHost); for(int k=0;k<bpg;k++) sum_in+=in[k]; double pi=4.0*sum_in/1000.0/1000.0; rt2=wtime(); rtime=rt2-rt1; printf("from seg approach pi= %f in %fs \n",pi,rtime); cudaFree(dev_in); free(in); } Monte.cu #include<stdio.h> #include<math.h> #include<stdlib.h> #include <sys/time.h> #include<time.h> #include "cuda.h" #include "common/book.h" #include<stdlib.h> const int res=32*32*32; const int tpb=64;//threads per block
const int bpg=32;//blocks per grid ///////////////////////////////////// ///////////////////////////////////7 double wtime() { struct timeval tv1; gettimeofday(&tv1,NULL); return (double)tv1.tv_sec+(double)tv1.tv_usec/10e6; } ///////////////////////////////////// /////////////////////////////////// __global__ void monte(int *in){ __shared__ int cache[tpb]; int tid=threadIdx.x+blockIdx.x*blockDim.x; int cacheindex=threadIdx.x; int count=0; float xx,yy; while(tid<res){ for(int i=0;i<res;i++){ xx=(((1664525*(tid*res+i) +1013904223)%1677216)%100000)/100000.0; yy=(((214013*(tid*res+i)+2531011)%1677216)%100000)/100000.0; if((xx*xx+yy*yy)<1.0) count++; } tid+=blockDim.x*gridDim.x; } cache[cacheindex]=count; __syncthreads(); in[blockIdx.x]=0; int x=blockDim.x/2; while( x!=0){ if(cacheindex<x) cache[cacheindex]+=cache[cacheindex+1]; __syncthreads(); x/=2; } if (cacheindex==0) in[blockIdx.x]=cache[0]; } //////////////////////////// ///////////////////////// int main(void){ double rtime, rt2, rt1=wtime(); int *in,*dev_in; long int sum_in=0; cudaMalloc((void**)&dev_in,bpg*sizeof(int)); in=(int*)malloc(res*sizeof(int)); monte<<<bpg,tpb>>>(dev_in); cudaMemcpy(in,dev_in,bpg*sizeof(int),cudaMemcpyDeviceToHost); for(int k=0;k<bpg;k++) sum_in+=in[k]; double pi=4.0*sum_in/(1.0*res*res); rt2=wtime(); rtime=rt2-rt1; printf("from monte approach %f in %fs \n" ,pi,rtime); cudaFree(dev_in); free(in); } Result MPI-mesh method resolution is 40,numberof core is 2 The approximatio if pi is 3.232500, time :0.000099 resolution is 360,numberof core is 2 The approximatio if pi is 3.152130, time :0.000009 resolution is 1000,numberof core is 2 The approximatio if pi is 3.145520, time :0.000008 resolution is 1960,numberof core is 2 The approximatio if pi is 3.143588, time :0.000008 resolution is 3240,numberof core is 2 The approximatio if pi is 3.142798, time :0.034344 resolution is 4840,numberof core is 2 The approximatio if pi is 3.142411, time :0.234251 resolution is 6760,numberof core is 2 The approximatio if pi is 3.142178, time :0.455051 resolution is 9000,numberof core is 2 The approximatio if pi is 3.142031, time :0.808640
resolution is 11560,numberof core is 2 The approximatio if pi is 3.141935, time :1.344910 resolution is 14440,numberof core is 2 The approximatio if pi is 3.141868, time :2.128927 resolution is 17640,numberof core is 2 The approximatio if pi is 3.141817, time :3.219702 resolution is 21160,numberof core is 2 The approximatio if pi is 3.141780, time :4.416676 resolution is 25000,numberof core is 2 The approximatio if pi is 3.141752, time :6.120500 resolution is 29160,numberof core is 2 The approximatio if pi is 3.141729, time :8.499504 resolution is 33640,numberof core is 2 The approximatio if pi is 3.141711, time :14.732842 resolution is 38440,numberof core is 2 The approximatio if pi is 3.141696, time :16.682726 resolution is 40,numberof core is 4 The approximatio if pi is 3.210000, time :0.000155 resolution is 360,numberof core is 4 The approximatio if pi is 3.170926, time :0.000009 resolution is 1000,numberof core is 4 The approximatio if pi is 3.145340, time :0.000007 resolution is 1960,numberof core is 4 The approximatio if pi is 3.145109, time :0.011534 resolution is 3240,numberof core is 4 The approximatio if pi is 3.142798, time :0.035357 resolution is 4840,numberof core is 4 The approximatio if pi is 3.143807, time :0.078854 resolution is 6760,numberof core is 4 The approximatio if pi is 3.142725, time :0.153838 resolution is 9000,numberof core is 4 The approximatio if pi is 3.142362, time :0.272934 resolution is 11560,numberof core is 4 The approximatio if pi is 3.142520, time :0.449725 resolution is 14440,numberof core is 4 The approximatio if pi is 3.142125, time :0.700413 resolution is 17640,numberof core is 4 The approximatio if pi is 3.142031, time :1.046778 resolution is 21160,numberof core is 4 The approximatio if pi is 3.141957, time :1.505166 resolution is 25000,numberof core is 4 The approximatio if pi is 3.141750, time :2.100546 resolution is 29160,numberof core is 4 The approximatio if pi is 3.141831, time :2.857659 resolution is 33640,numberof core is 4 The approximatio if pi is 3.141800, time :3.804365 resolution is 38440,numberof core is 4 The approximatio if pi is 3.141794, time :4.967042 resolution is 40,numberof core is 16 The approximatio if pi is 2.837500, time :1.036048 resolution is 360,numberof core is 16 The approximatio if pi is 3.265833, time :0.007592 resolution is 1000,numberof core is 16 The approximatio if pi is 3.141484, time :0.000683 resolution is 1960,numberof core is 16 The approximatio if pi is 3.153682, time :0.002563 resolution is 3240,numberof core is 16 The approximatio if pi is 3.145231, time :0.007074 resolution is 4840,numberof core is 16 The approximatio if pi is 3.150536, time :0.015528 resolution is 6760,numberof core is 16 The approximatio if pi is 3.145310, time :0.030396 resolution is 9000,numberof core is 16 The approximatio if pi is 3.144487, time :0.053624 resolution is 11560,numberof core is 16 The approximatio if pi is 3.144700, time :0.088626 resolution is 14440,numberof core is 16 The approximatio if pi is 3.143105, time :0.138462 resolution is 17640,numberof core is 16 The approximatio if pi is 3.142884, time :0.206482 resolution is 21160,numberof core is 16 The approximatio if pi is 3.142813, time :0.299828 resolution is 25000,numberof core is 16 The approximatio if pi is 3.142318, time :0.425451 resolution is 29160,numberof core is 16 The approximatio if pi is 3.142757, time :0.577369 resolution is 33640,numberof core is 16 The approximatio if pi is 3.142232, time :0.768937 resolution is 38440,numberof core is 16 The approximatio if pi is 3.142068, time :1.003796 resolution is 40,numberof core is 32 The approximatio if pi is 2.905000, time :1.068779
resolution is 360,numberof core is 32 The approximatio if pi is 3.341759, time :0.004666 resolution is 1000,numberof core is 32 The approximatio if pi is 3.142592, time :0.000339 resolution is 1960,numberof core is 32 The approximatio if pi is 3.166406, time :0.001265 resolution is 3240,numberof core is 32 The approximatio if pi is 3.148754, time :0.003442 resolution is 4840,numberof core is 32 The approximatio if pi is 3.160370, time :0.007719 resolution is 6760,numberof core is 32 The approximatio if pi is 3.150042, time :0.014976 resolution is 9000,numberof core is 32 The approximatio if pi is 3.147006, time :0.026532 resolution is 11560,numberof core is 32 The approximatio if pi is 3.147351, time :0.043592 resolution is 14440,numberof core is 32 The approximatio if pi is 3.144475, time :0.067970 resolution is 17640,numberof core is 32 The approximatio if pi is 3.143720, time :0.101501 resolution is 21160,numberof core is 32 The approximatio if pi is 3.144203, time :0.145999 resolution is 25000,numberof core is 32 The approximatio if pi is 3.143208, time :0.203233 resolution is 29160,numberof core is 32 The approximatio if pi is 3.143777, time :0.277005 resolution is 33640,numberof core is 32 The approximatio if pi is 3.142832, time :0.370355 resolution is 38440,numberof core is 32 The approximatio if pi is 3.142677, time :0.487263 resolution is 40,numberof core is 64 The approximatio if pi is 0.000000, time :0.986564 resolution is 360,numberof core is 64 The approximatio if pi is 3.363704, time :0.039591 resolution is 1000,numberof core is 64 The approximatio if pi is 3.096548, time :0.000153 resolution is 1960,numberof core is 64 The approximatio if pi is 3.190735, time :0.000589 resolution is 3240,numberof core is 64 The approximatio if pi is 3.156914, time :0.001689 resolution is 4840,numberof core is 64 The approximatio if pi is 3.175781, time :0.003721 resolution is 6760,numberof core is 64 The approximatio if pi is 3.157363, time :0.007239 resolution is 9000,numberof core is 64 The approximatio if pi is 3.150763, time :0.012690 resolution is 11560,numberof core is 64 The approximatio if pi is 3.153609, time :0.020891 resolution is 14440,numberof core is 64 The approximatio if pi is 3.147523, time :0.032907 resolution is 17640,numberof core is 64 The approximatio if pi is 3.145624, time :0.048804 resolution is 21160,numberof core is 64 The approximatio if pi is 3.146311, time :0.070662 resolution is 25000,numberof core is 64 The approximatio if pi is 3.144718, time :0.098156 resolution is 29160,numberof core is 64 The approximatio if pi is 3.145849, time :0.133596 resolution is 33640,numberof core is 64 The approximatio if pi is 3.143847, time :0.178015 resolution is 38440,numberof core is 64 The approximatio if pi is 3.143636, time :0.231746 resolution is 40,numberof core is 128 The approximatio if pi is 0.000000, time :0.430998 resolution is 360,numberof core is 128 The approximatio if pi is 3.016296, time :0.003569 resolution is 1000,numberof core is 128 The approximatio if pi is 3.007300, time :0.000080 resolution is 1960,numberof core is 128 The approximatio if pi is 3.220640, time :0.000296 resolution is 3240,numberof core is 128 The approximatio if pi is 3.164906, time :0.000788 resolution is 4840,numberof core is 128 The approximatio if pi is 3.218295, time :0.001866 resolution is 6760,numberof core is 128 The approximatio if pi is 3.173211, time :0.003603 resolution is 9000,numberof core is 128 The approximatio if pi is 3.157444, time :0.006183 resolution is 11560,numberof core is 128 The approximatio if pi is 3.166344, time :0.010580
resolution is 14440,numberof core is 128 The approximatio if pi is 3.151513, time :0.016566 resolution is 17640,numberof core is 128 The approximatio if pi is 3.147443, time :0.024494 resolution is 21160,numberof core is 128 The approximatio if pi is 3.151188, time :0.035030 resolution is 25000,numberof core is 128 The approximatio if pi is 3.147625, time :0.049006 resolution is 29160,numberof core is 128 The approximatio if pi is 3.149724, time :0.066873 resolution is 33640,numberof core is 128 The approximatio if pi is 3.145777, time :0.088930 resolution is 38440,numberof core is 128 The approximatio if pi is 3.145412, time :0.115862 resolution is 40,numberof core is 24 The approximatio if pi is 2.300000, time :1.128531 resolution is 360,numberof core is 24 The approximatio if pi is 3.300031, time :0.008715 resolution is 1000,numberof core is 24 The approximatio if pi is 3.140896, time :0.000462 resolution is 1960,numberof core is 24 The approximatio if pi is 3.159580, time :0.001695 resolution is 3240,numberof core is 24 The approximatio if pi is 3.145781, time :0.004603 resolution is 4840,numberof core is 24 The approximatio if pi is 3.155750, time :0.010306 resolution is 6760,numberof core is 24 The approximatio if pi is 3.147128, time :0.020072 resolution is 9000,numberof core is 24 The approximatio if pi is 3.145519, time :0.035648 resolution is 11560,numberof core is 24 The approximatio if pi is 3.146136, time :0.058706 resolution is 14440,numberof core is 24 The approximatio if pi is 3.143780, time :0.091646 resolution is 17640,numberof core is 24 The approximatio if pi is 3.143132, time :0.136423 resolution is 21160,numberof core is 24 The approximatio if pi is 3.143766, time :0.196748 resolution is 25000,numberof core is 24 The approximatio if pi is 3.142704, time :0.273813 resolution is 29160,numberof core is 24 The approximatio if pi is 3.143172, time :0.373682 resolution is 33640,numberof core is 24 The approximatio if pi is 3.142514, time :0.495099 resolution is 38440,numberof core is 24 The approximatio if pi is 3.142371, time :0.648665 mpi-seg mathod 24 cores resolution is 40,numberof core is 24 The approximatio if pi is 2.273010, time :0.059442 resolution is 360,numberof core is 24 The approximatio if pi is 3.294800, time :0.003104 resolution is 1000,numberof core is 24 The approximatio if pi is 3.138951, time :0.000027 resolution is 1960,numberof core is 24 The approximatio if pi is 3.158588, time :0.000003 resolution is 3240,numberof core is 24 The approximatio if pi is 3.145189, time :0.000031 resolution is 4840,numberof core is 24 The approximatio if pi is 3.155340, time :0.000004 resolution is 6760,numberof core is 24 The approximatio if pi is 3.146837, time :0.000004 resolution is 9000,numberof core is 24 The approximatio if pi is 3.145301, time :0.000003 resolution is 11560,numberof core is 24 The approximatio if pi is 3.145965, time :0.000008 resolution is 14440,numberof core is 24 The approximatio if pi is 3.143643, time :0.000004 resolution is 17640,numberof core is 24 The approximatio if pi is 3.143021, time :0.000062 resolution is 21160,numberof core is 24 The approximatio if pi is 3.143672, time :0.000032 resolution is 25000,numberof core is 24 The approximatio if pi is 3.142624, time :0.000025 resolution is 29160,numberof core is 24 The approximatio if pi is 3.143104, time :0.000041 resolution is 33640,numberof core is 24 The approximatio if pi is 3.142455, time :0.000041 resolution is 38440,numberof core is 24 The approximatio if pi is 3.142319, time :0.000053 mpi-monte method 24 cores
resolution is 40,numberof core is 24 The approximatio if pi is 1.965000, time :1.133449 resolution is 360,numberof core is 24 The approximatio if pi is 3.225833, time :0.005211 resolution is 1000,numberof core is 24 The approximatio if pi is 3.108440, time :0.004843 resolution is 1960,numberof core is 24 The approximatio if pi is 3.160585, time :0.020110 resolution is 3240,numberof core is 24 The approximatio if pi is 3.123668, time :0.054112 resolution is 4840,numberof core is 24 The approximatio if pi is 3.148384, time :0.121841 resolution is 6760,numberof core is 24 The approximatio if pi is 3.137545, time :0.236213 resolution is 9000,numberof core is 24 The approximatio if pi is 3.145452, time :0.420925 resolution is 11560,numberof core is 24 The approximatio if pi is 3.142452, time :0.693571 resolution is 14440,numberof core is 24 The approximatio if pi is 3.140254, time :1.079689 resolution is 17640,numberof core is 24 The approximatio if pi is 3.138997, time :1.611420 resolution is 21160,numberof core is 24 The approximatio if pi is 3.143652, time :2.298095 resolution is 25000,numberof core is 24 The approximatio if pi is 3.140300, time :3.323252 resolution is 29160,numberof core is 24 The approximatio if pi is 3.141801, time :4.520323 resolution is 33640,numberof core is 24 The approximatio if pi is 3.141529, time :6.017545 resolution is 38440,numberof core is 24 The approximatio if pi is 3.141802, time :7.863453 omp-mesh resolution=32^3 24 cores :from mesh approach 3.142554 in 0.013411s 16 cores :from mesh approach 3.142554 in 0.013412s 8 cores :from mesh approach 3.142554 in 0.013424s 4 cores :from mesh approach 3.142554 in 0.014045s omp_seg resolution=32^3 24 cores :from seg approach(24) 3.141593 in 1.958670s 16 cores :from seg approach(16) 3.141593 in 1.958679s 8 cores :from seg approach(8) 3.141593 in 1.958552s 4 cores :from seg approach(4) 3.141593 in 1.958163s omp_monte resolution=32^3 24 cores :from monte carlo approach =3.108652 in4.978707 s 16 cores :from monte carlo approach =3.112519 in5.994074 s 8 cores :from monte carlo approach =3.113294 in10.990329 s 4 cores :from monte carlo approach =3.082253 in23.002907 s meshcu without arch=sm_20(double) res=32768 64 threads meshcu 32 blocks from mesh approach 3.145107 in 4.015820s meshcu 16 blocks from mesh approach 3.145107 in 4.935294s meshcu 8 blocks from mesh approach 3.145107 in 4.944518s meshcu 2 blocks from mesh approach 3.145107 in 4.961996s segcu with arch=sm_20(double) res=32768 64 threads segcu 32 blocks from seg approach pi= 3.144916 in 4.033780s segcu 16 blocks from seg approach pi= 3.144952 in 4.957799s segcu 8 blocks from seg approach pi= 3.144976 in 5.002646s segcu 2 blocks from seg approach pi= 3.144984 in 8.001502s montecu with arch=sm_20(double) res=32768 64 threads montecu 32 blocks from monte approach 3.179746 in 4.031441s montecu 16 blocks from monte approach 3.179746 in 5.015675s montecu 8 blocks from monte approach 3.179746 in 6.026360s montecu 2 blocks from monte approach 3.179746 in 12.036450s

Pi-Calculation by Parallel Programming: Mr. Paopat Ratpunpairoj

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Pi-Calculation by Parallel Programming: Mr. Paopat Ratpunpairoj

Diunggah oleh

Hak Cipta:

Format Tersedia

PI-CALCULATION BY PARALLEL PROGRAMMING

Mr. Paopat Ratpunpairoj

III. Message Passing Interface (MPI)

IV. GPU Programing (CUDA)

B. Segment algorithm (resolution=32768 with 64 threads per blocks)

C. Monte Algorithm (resolution=32768 with 64 threads per blocks)

Anda mungkin juga menyukai