ME759 High Performance Computing for Engineering Applications

ME759 High Performance Computing for Engineering Applications Parallel Computing with the Message Passing Interface (MPI) November 6, 2013 © Dan Negr...
Author: Ross Conley
2 downloads 2 Views 519KB Size
ME759 High Performance Computing for Engineering Applications Parallel Computing with the Message Passing Interface (MPI) November 6, 2013

© Dan Negrut, 2013 ME759 UW-Madison

“Now this is not the end. It is not even the beginning of the end. But it is, perhaps, the end of the beginning.” -- Winston Churchill

Before We Get Started… 

Last time:  



Today:   



Wrap up point-to-point communication in MPI: non-blocking flavors Collective action: barriers, communication, operations

Collective action: operations User defined types in MPI Departing thoughts: CUDA, OpenMP, MPI

Miscellaneous  

No class on Friday. Time slot set aside for midterm exam Midterm exam is Nov. 25 at 7:15 PM in room 1163ME 



I will travel and miss four office hours: next week and subsequent week 



Review session on Monday, Nov 25 during regular class. Attend if you have questions I am checking my email on daily basis

Final Project Proposal due at 11:59 PM on Nov. 15

2

Midterm & Final Project Partitioning 

If you are happy with your Midterm Project, it can become your Final Project 



If not happy w/ your Midterm Project selection: November 15 provides the opportunity to wrap up and choose a different Final Project 



No midterm project report due then

Report should be detailed and follow rules spelled out in forum posting

Nov 15: Final Project proposal should be uploaded 

Do so even if you choose to continue Midterm Project 





In this case simply upload a one liner stating this

If changing to new project, submit a proposal that details the work to be done

For SPH default project: the student[s] w/ the fastest implementation will write a paper with Arman, Dan and another lab member 3

MPI_Reduce

before MPI_REDUCE • inbuf • result

after

ABC

ABC

o

DEF

DEF

o

GH I

GH I

o

JKL

JKL

o

MN O

MN O

root=1 AoDoGoJoM 4 [ICHEC]→

Reduce Operation

data (output buffer)

processes

data (input buffer)

A0

B0

C0

A1

B1

C1

A2

B2

C2

A0+A1+A2 B0+B1+B2

C0+C1+C2

reduce

Assumption: Rank 0 is the root 5 [A. Siegel]→

MPI_Reduce int MPI_Reduce (void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op, int root, MPI_Comm comm);

      

IN OUT IN IN IN IN IN

sendbuf recvbuf count datatype op root comm

(address of send buffer) (address of receive buffer) (number of elements in send buffer) (data type of elements in send buffer) (reduce operation) (rank of root process) (communicator) 6

[A. Siegel]→

MPI_Reduce example MPI_Reduce(sbuf,rbuf,6,MPI_INT,MPI_SUM,0,MPI_COMM_WORLD) sbuf P0

3

P2

+ 5 + 2

P3

+ 1

P1

11

4

2

8

12

1

rbuf 2

5

1

7

11

4

4

10

4

5

6

9

3

1

1

P0

11

16

20

22

24

18

7

MPI_Reduce, MPI_Allreduce 

MPI_Reduce: result is collected by the root only 



The operation is applied element-wise for each element of the input arrays on each processor

MPI_Allreduce: result is sent out to everyone

...

MPI_Reduce(x, r, 10, MPI_INT, MPI_MAX, 0, MPI_COMM_WORLD) ...

input array

output array

array size

root

...

MPI_Allreduce(x, r, 10, MPI_INT, MPI_MAX, MPI_COMM_WORLD) ... 8 Credit: Allan Snavely

MPI_Allreduce

data (buffer)

processes

data (buffer)

A0

B0

C0

A1

B1

A2

B2

A0+A1+A2 B0+B1+B2

C0+C1+C2

C1

A0+A1+A2 B0+B1+B2

C0+C1+C2

C2

A0+A1+A2 B0+B1+B2

C0+C1+C2

Allreduce

9 [A. Siegel]→

MPI_Allreduce int MPI_Allreduce (void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op, MPI_Comm comm);

     

IN OUT IN IN IN IN

sendbuf recvbuf count datatype op comm

(address of send buffer) (address of receive buffer) (number of elements in send buffer) (data type of elements in send buffer) (reduce operation) (communicator) 10

Example: MPI_Allreduce #include "mpi.h" #include #include int main(int argc, char **argv) { int my_rank, nprocs, gsum, gmax, gmin, data_l; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &nprocs); MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); data_l = my_rank; MPI_Allreduce(&data_l, &gsum, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD); MPI_Allreduce(&data_l, &gmax, 1, MPI_INT, MPI_MAX, MPI_COMM_WORLD); MPI_Allreduce(&data_l, &gmin, 1, MPI_INT, MPI_MIN, MPI_COMM_WORLD); printf("gsum: %d, gmax: %d MPI_Finalize();

gmin:%d\n", gsum,gmax,gmin);

} 11

Example: MPI_Allreduce [Output]

[negrut@euler24 gsum: 45, gmax: gsum: 45, gmax: gsum: 45, gmax: gsum: 45, gmax: gsum: 45, gmax: gsum: 45, gmax: gsum: 45, gmax: gsum: 45, gmax: gsum: 45, gmax: gsum: 45, gmax: [negrut@euler24

CodeBits]$ mpiexec 9 gmin:0 9 gmin:0 9 gmin:0 9 gmin:0 9 gmin:0 9 gmin:0 9 gmin:0 9 gmin:0 9 gmin:0 9 gmin:0 CodeBits]$

-np 10 me759.exe

12

MPI_SCAN



Performs a prefix reduction on data distributed across a communicator



The operation returns, in the receive buffer of the process with rank i, the reduction of the values in the send buffers of processes with ranks 0,...,i (inclusive)



The type of operations supported, their semantics, and the constraints on send and receive buffers are as for MPI_REDUCE

13

MPI_SCAN before MPI_SCAN • inbuf • result

after

ABC

ABC

A

o

DEF

DEF

AoD

o

GH I

o

JKL

GH I

JKL

AoDoG

AoDoGoJ

o

MN O

MN O

AoDoGoJoM

done in parallel 14 [ICHEC]→

Scan Operation

processes

data (input buffer)

[A. Siegel]→

A0

B0

C0

A1

B1

C1

A2

B2

C2

data (output buffer) scan

A0

B0

C0

A0+A1

B0+B1

C0+C1

A0+A1+A2 B0+B1+B2

C0+C1+C2

15

MPI_Scan: Prefix reduction 

Process i receives data reduced on process 0 through i rbuf

sbuf P0

3

4

2

8

12

1

P0

3

4

2

8

12

1

P1

5

2

5

1

7

11

P1

8

6

7

9

19

12

P2

2

4

4

10

4

5

P2

10

10

11

19

23

17

P3

1

6

9

3

1

1

P3

11

16

12

22

24

18

6 entries

MPI_Scan(sbuf,rbuf,6,MPI_INT,MPI_SUM,MPI_COMM_WORLD) 16 [A. Snavely]→

MPI_Scan int MPI_Scan (void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op, MPI_Comm comm);

     



[A. Siegel]→

IN OUT IN IN IN IN

sendbuf recvbuf count datatype op comm

(address of send buffer) (address of receive buffer) (number of elements in send buffer) (data type of elements in send buffer) (reduce operation) (communicator)

Note: count refers to total number of elements that will be received into receive buffer after operation is complete 17

#include "mpi.h" #include #include int main(int argc, char **argv){ int myRank, nprocs, i, n; int *result, *data_l; const int dimArray = 2; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &nprocs); MPI_Comm_rank(MPI_COMM_WORLD, &myRank); data_l = (int *) malloc(dimArray*sizeof(int)); for (i = 0; i < dimArray; ++i) data_l[i] = (i+1)*myRank; for (n = 0; n < nprocs; ++n) { if( myRank == n ) { for(i=0; i 1, the operation returns, in the receive buffer of the process with rank i, the reduction of the values in the send buffers of processes with ranks 0,...,i-1 (inclusive)



The type of operations supported, their semantics, and the constraints on send and receive buffers, are as for MPI_REDUCE 20

MPI_Exscan int MPI_Exscan (void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op, MPI_Comm comm);

     

[A. Siegel]→

IN OUT IN IN IN IN

sendbuf recvbuf count datatype op comm

(address of send buffer) (address of receive buffer) (number of elements in send buffer) (data type of elements in send buffer) (reduce operation) (communicator)

21

#include "mpi.h" #include #include int main(int argc, char **argv){ int myRank, nprocs,i, n; int *result, *data_l; const int dimArray = 2; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &nprocs); MPI_Comm_rank(MPI_COMM_WORLD, &myRank); data_l = (int *) malloc(dimArray*sizeof(int)); for (i = 0; i < dimArray; ++i) data_l[i] = (i+1)*myRank; for (n = 0; n < nprocs; ++n){ if( myRank == n ) { for(i=0; i

Suggest Documents