Practical Introduction to Message-Passing Interface (MPI)

Practical Introduction to Message-Passing Interface (MPI) February 26, 2015 Pier-Luc St-Onge, Calcul Qu´ebec – McGill HPC [email protected] ...
Author: Garey Wells
1 downloads 0 Views 423KB Size
Practical Introduction to Message-Passing Interface (MPI) February 26, 2015 Pier-Luc St-Onge, Calcul Qu´ebec – McGill HPC [email protected]

2

Outline

• Introduction to MPI • Getting Started with MPI • Simple Communications • Collective Communications • Conclusion

3

Outline • Introduction to MPI • Parallelizing your Serial Code • Memory Models • Distributed Memory Approach • MPI - Message Passing Interface • What is MPI for a User • Which MPI Library do I Link to • Getting Started with MPI • Simple Communications • Collective Communications • Conclusion

4

Parallelizing your Serial Code Models for parallel computing (as an ordinary user sees it ...) • Implicit Parallelization — minimum work for you • Threaded libraries (MKL, ACML, GOTO, etc.) • Compiler directives (OpenMP) • Good for desktops and shared memory machines • Explicit Parallelization — work is required ! • You tell what should be done on what CPU • Solution for distributed clusters (shared nothing!) • Hybrid Parallelization — work is required ! • Mix of implicit and explicit parallelization • Good for accelerators (CUDA, OpenCL, etc.)

5

Memory Models

Distributed Memory

one process, many threads

multiple processes, multiple nodes

Network

Shared Memory

6

Distributed Memory Approach Process 0

A(10)

Process 1 Address Space

A(10)

Different Variables! • Separate processes with own address spaces • You need to make them interact and exchange

information • You tell what operation to do on what process

7

MPI — Message Passing Interface

• Model for distributed memory approach • Programmer manages memory by placing data in a

particular process • Programmer sends data between processes • Programmer performs collective operations on sets of processes

8

What is MPI for a User? • MPI is NOT a language! • MPI is NOT a compiler or specific product • MPI is a specification for a standardized library: • You use its subroutines • You link it with your code • History: MPI-1 (1994), MPI-2 (1997), MPI-3

(2012). • Different implementations : •

MPICH(2), MVAPICH(2), Open MPI, HP-MPI, ....

9

Which MPI Library do I Link to? Your MPI code

MPI Lib

InfiniBand Lib

InfiniBand 40 Gbit/s

Gigabit 1 Gbit/s • When working on the Guillimin cluster, please use

our version of the MVAPICH2 or OpenMPI library! • MPI library must match compiler used (Intel, PGI, or GCC) both at compile and at run time.

10

Outline • Introduction to MPI • Getting Started with MPI • Basic Features of MPI Program • MPI is Simple! (and Complex ...) • Example: “Hello from N Cores” • General Program Structure • Compiling your MPI Code • Running your MPI Code • Exercises 1, 2 and 3 • Simple Communications • Collective Communications • Conclusion

11

Basic Features of MPI Program • Include basic definitions (#include , • • • •

INCLUDE ’mpif.h’, or USE mpi) Initialize and terminate MPI environment Get information about processes Send information between two specific processes (point-to-point communications) Send information between groups of processes (collective communications)

12

MPI is Simple! (and Complex ...) You need to know these 6 functions: • MPI Init : Initialize MPI environment • MPI Comm size : How many CPUs do I have? • MPI Comm rank : Identify each CPU • MPI Send : Send data to another CPU • MPI Recv : Receive data from another CPU • MPI Finalize : Close MPI environment

13

Example: “Hello from N Cores” C

Fortran

#include #include

PROGRAM hello USE mpi

int main (int argc, char * argv[]) { int rank, size;

INTEGER ierr, rank, size CALL MPI Init(ierr)

MPI Init( &argc, &argv ); MPI Comm rank( MPI COMM WORLD, &rank ); MPI Comm size( MPI COMM WORLD, &size ); printf( "Hello from processor " "%d of %d\n", rank, size ); MPI Finalize(); return 0; }

CALL MPI Comm rank (MPI COMM WORLD, & rank, ierr) CALL MPI Comm size (MPI COMM WORLD, & size, ierr) WRITE(*,*) ’Hello from processor ’,& rank, ’ of ’, size CALL MPI Finalize(ierr) END PROGRAM hello

14

General Program Structure • The header file (mpi.h or mpif.h) or module mpi

must be included - contains definitions of MPI constants, types and functions • All MPI programs start with MPI Init and end with MPI Finalize • MPI COMM WORLD is a default communicator (defined in mpi.h) — refers to the group of all processors in the job • Each statement executes independently in each process

15

Compiling your MPI Code • NOT defined by the standard • More or less similar for all implementations: • Need to specify include directory and MPI library • But usually a compiler wrapper (mpicc, mpif90) does it for you automatically. • On the Guillimin cluster: • user@guillimin> module add ifort icc openmpi • user@guillimin> mpicc hello.c -o hello • user@guillimin> mpif90 hello.f90 -o hello

16

Running your MPI Code • NOT defined by the standard • A launching program (mpirun, mpiexec, mpirun rsh,

...) is used to start your MPI code • Particular choice of launcher depends on MPI implementation and on the machine used • A hosts file is used to specify on which nodes to run MPI processes (will run on localhost by default) • On the Guillimin cluster: mpiexec -n 4 ./hello

17

Exercise 1: Log in to Guillimin, setting up the environment 1) Log in to Guillimin: ssh [email protected] 2) Check for loaded software modules: guillimin> module list 3) See all available modules: guillimin> module av 4) Load necessary modules: guillimin> module add ifort icc openmpi 5) Check loaded modules again 6) Verify that you have access to the correct mpi package: guillimin> which mpicc /software/CentOS-6/tools/openmpi-1.6.3-intel/bin/mpicc

18

Exercise 2: “Hello” program, job submission 1) Copy “hello.c” or “hello.f90” and “hello.pbs” to your home directory: guillimin> cp /software/workshop/intro-mpi/hello.c ./ guillimin> cp /software/workshop/intro-mpi/hello.f90 ./ guillimin> cp /software/workshop/intro-mpi/hello.pbs ./

2) Compile your code: guillimin> mpicc hello.c -o hello guillimin> mpif90 hello.f90 -o hello

19

Exercise 2: “Hello” program, job submission 3) Edit the file “hello.pbs”: #!/bin/bash #PBS -l nodes=1:ppn=2 #PBS -l walltime=00:05:00 #PBS -V #PBS -N hello cd $PBS O WORKDIR mpiexec -n 2 ./hello > hello.out

4) Submit your job: guillimin> qsub hello.pbs 5) Check the job status: guillimin> qstat -u 6) Check the output (hello.out)

20

Exercise 3: Modifying “Hello” Let’s ask each CPU to do its own computation ... #include // [...] float a, b; if (rank == 0) { a = sqrt(2.0); b = 0.0; } if (rank == 1) { a = 0.0; b = sqrt(3.0); } printf("On proc %d: a, b = \t%f\t%f\n", rank, a, b);

Recompile as before and submit to the queue ...

21

Outline • Introduction to MPI • Getting Started with MPI • Simple Communications • MPI is simple : MPI Send, MPI Recv • MPI basic datatypes (C and Fortran) • MPI: Sending a message • MPI: Receiving a message • MPI Send / MPI Recv : Examples 1 and 2 • Non-blocking Send: MPI Isend • Exercise 4: MPI Send/Recv • Collective Communications • Conclusion

22

MPI is simple! (and complex ...) You need to know these 6 functions: • MPI Init : Initialize MPI environment • MPI Comm size : How many CPUs I have? • MPI Comm rank : Identify each CPU • MPI Send : Send data to another CPU • MPI Recv : Receive data from another CPU • MPI Finalize : Close MPI environment

23

MPI Send / MPI Recv • Passing message between two different MPI

processes (point-to-point communication) • If one process sends, another initiates the matching receive. • The exchange data types are predefined for portability. (MPI has its own data types). • MPI Send / MPI Recv is blocking! (There are also non-blocking versions)

24

MPI basic datatypes (C) MPI Data type MPI CHAR MPI SHORT MPI INT MPI LONG MPI UNSIGNED CHAR MPI UNSIGNED SHORT MPI UNSIGNED MPI UNSIGNED LONG MPI FLOAT MPI DOUBLE MPI LONG DOUBLE MPI PACKED MPI BYTE

C Data type signed char signed short int signed int signed long int unsigned char unsigned short unsigned int unsigned long int float double long double Data packed with MPI Pack 8 binary digits

25

MPI basic datatypes (Fortran) MPI Data type MPI INTEGER MPI REAL MPI DOUBLE PRECISION MPI COMPLEX MPI DOUBLE COMPLEX MPI LOGICAL MPI CHARACTER MPI PACKED MPI BYTE

Fortran Data type INTEGER REAL DOUBLE PRECISION COMPLEX DOUBLE COMPLEX LOGICAL CHARACTER(1) Data packed with MPI Pack 8 binary digits

26

MPI: Sending a message C: MPI Send(&data, count, data type, dest, tag, comm) Fortran: MPI Send(data, count, data type, dest, tag, comm, ierr) • data : variable to send • count : number of data elements to send • data type : type of data to send • dest : rank of the receiving process • tag : the label of the message • comm : communicator — set of involved processes • ierr : error code (return value for C)

27

MPI: Receiving a message C: MPI Recv(&data, count, data type, source, tag, comm, &status) Fortran: MPI Recv(data, count, data type, source, tag, comm, status, ierr) • source : rank of the sending process (or can be set to MPI ANY SOURCE) • tag : must match the label used by sender (or can be set to MPI ANY TAG) • status : a C structure (MPI Status) or an integer array with information in case of an error (source, tag, actual number of bytes received) • MPI Send and MPI Recv are blocking!

28

MPI Send / MPI Recv : Example 1 int main (int argc, char * argv[]) { int rank, size, buffer = -1, tag = 10; MPI_Status status; MPI_Init( &argc, &argv ); MPI_Comm_rank( MPI_COMM_WORLD, &rank ); MPI_Comm_size( MPI_COMM_WORLD, &size ); if (size >= 2 && rank == 0) { buffer = 33; MPI_Send( &buffer, 1, MPI_INT, 1, tag, MPI_COMM_WORLD ); } if (size >= 2 && rank == 1) { MPI_Recv( &buffer, 1, MPI_INT, 0, tag, MPI_COMM_WORLD, &status ); printf("Rank %d\tbuffer= %d\n", rank, buffer); if (buffer != 33) printf("fail\n"); } MPI_Finalize(); return 0; }

29

MPI Send / MPI Recv : Example 2 if (size >= 2 && rank == 0) { MPI_Send( &buffer1, 1, MPI_INT, MPI_Recv( &buffer2, 1, MPI_INT, &status ); } if (size >= 2 && rank == 1) { MPI_Send( &buffer2, 1, MPI_INT, MPI_Recv( &buffer1, 1, MPI_INT, &status ); }

1, 10, MPI_COMM_WORLD ); 1, 20, MPI_COMM_WORLD,

0, 20, MPI_COMM_WORLD ); 0, 10, MPI_COMM_WORLD,

May NOT work!

30

Example 2 : Solution A if (size >= 2 && rank == 0) { MPI_Send( &buffer1, 1, MPI_INT, MPI_Recv( &buffer2, 1, MPI_INT, &status ); } if (size >= 2 && rank == 1) { MPI_Recv( &buffer1, 1, MPI_INT, &status ); MPI_Send( &buffer2, 1, MPI_INT, }

1, 10, MPI_COMM_WORLD ); 1, 20, MPI_COMM_WORLD,

0, 10, MPI_COMM_WORLD, 0, 20, MPI_COMM_WORLD );

Exchange Send/Recv order on one processor

31

Example 2 : Solution B MPI_Request request; if (size >= 2 && rank == 0) { MPI_Isend( &buffer1, 1, MPI_INT, 1, 10, MPI_COMM_WORLD, &request ); MPI_Recv( &buffer2, 1, MPI_INT, 1, 20, MPI_COMM_WORLD, &status ); } if (size >= 2 && rank == 1) { MPI_Isend( &buffer2, 1, MPI_INT, 0, 20, MPI_COMM_WORLD, &request ); MPI_Recv( &buffer1, 1, MPI_INT, 0, 10, MPI_COMM_WORLD, &status ); } MPI_Wait( &request, &status ); // Wait until send is complete

Use non-blocking send: ISend

32

Non-blocking Send: MPI Isend Differences from blocking Send: • Starts the transfer and returns control • Between the start and the end of transfer the code can do other things • You need to check for completion! • request — additional identifier for the communication • MPI Wait(&request, &status) : The code waits for the completion of the transfer — when it returns, the buffer can be reused or deallocated

33

Exercise 4: MPI Send/Recv Copy send.c(f90) to your home directory: guillimin> cp /software/workshop/intro-mpi/send.c ./

Let’s modify it: • Send an array or a matrix (int buffer[4][4]) • Ask 0 and 1 to exchange information (float A[2], B[2]) Hints: /software/workshop/intro-mpi/solutions/ex4/: • send matrix hint.c • exchange hint.c

34

Outline • Introduction to MPI • Getting Started with MPI • Simple Communications • Collective Communications • 1 step beyond : Collective Communications • MPI Bcast • MPI Reduce • MPI Scatter / MPI Gather • Exercise 5 : MPI Reduce & MPI Wtime • Exercise 6 • Conclusion

35

1 step beyond : Collective Communications • Involve ALL processes in the communicator • MPI Bcast : Same data sent from “root” process to

all the others • MPI Reduce : “root” process collects data from the others and performs an operation (min, max, add, multiply, etc ...) • MPI Scatter : distributes variables from the “root” process to each of the others • MPI Gather : collects variables from all processes to the “root” one

36

MPI Bcast: Example int main (int argc, char * argv[]) { int rank, size, root = 0; float a[2]; MPI_Init( &argc, &argv ); MPI_Comm_rank( MPI_COMM_WORLD, &rank ); MPI_Comm_size( MPI_COMM_WORLD, &size ); if (rank == root) { a[0] = 2.0f; a[1] = 4.0f; } MPI_Bcast( a, 2, MPI_FLOAT, root, MPI_COMM_WORLD ); printf( "%d: a[0]=%f\ta[1]=%f\n", rank, a[0], a[1] ); MPI_Finalize(); return 0; }

37

MPI Reduce : partial sum

Other operations: product, min, max, etc.

38

MPI Reduce : Example int main (int argc, char * argv[]) { int rank, size, root = 0; float a[2], res[2]; MPI_Init( &argc, &argv ); MPI_Comm_rank( MPI_COMM_WORLD, &rank ); MPI_Comm_size( MPI_COMM_WORLD, &size ); a[0] = 2.0f; a[1] = 4.0f; MPI_Reduce( a, res, 2, MPI_FLOAT, MPI_SUM, root, MPI_COMM_WORLD ); if (rank == root) { printf( "%d: res[0]=%f\tres[1]=%f\n", rank, res[0], res[1] ); } MPI_Finalize(); return 0; }

39

MPI Scatter / MPI Gather MPI Scatter

MPI Gather

• MPI Scatter: One-to-all communication — different

data sent from root process to all others in the communicator, following the rank order • MPI Gather: data collected by the root process. Is the opposite of Scatter

40

MPI Scatter / Gather : Example Scatter

Gather

int main (int argc, char * argv[]) { int rank, size, root = 0, i, sendcount; float a[16], b[2];

int main (int argc, char * argv[]) { int rank, size, root = 0, i, sendcount; float a[16], b[2];

MPI_Init( &argc, &argv ); MPI_Comm_rank( MPI_COMM_WORLD, &rank ); MPI_Comm_size( MPI_COMM_WORLD, &size );

MPI_Init( &argc, &argv ); MPI_Comm_rank( MPI_COMM_WORLD, &rank ); MPI_Comm_size( MPI_COMM_WORLD, &size );

if (rank == root) { for (i = 0; i < 16; i++) { a[i] = i; } }

b[0] = rank; b[1] = rank; sendcount = 2; MPI_Gather( b, sendcount, MPI_FLOAT, a, sendcount, MPI_FLOAT, root, MPI_COMM_WORLD );

sendcount = 2; MPI_Scatter ( a, sendcount, MPI_FLOAT, b, sendcount, MPI_FLOAT, root, MPI_COMM_WORLD );

}

printf( "%d: b[0]=%f\tb[1]=%f\n", rank, b[0], b[1] );

if (rank == root) { for (i = 0; i < sendcount * size; i++) printf( "%d: a[%d]=%f\n", rank, i, a[i]); }

MPI_Finalize(); return 0;

MPI_Finalize(); return 0; }

41

Exercise 5: MPI Reduce & MPI Wtime Copy pi collect.c(f90) to your home directory: guillimin> cp /software/workshop/intro-mpi/pi collect.c ./ guillimin> cp /software/workshop/intro-mpi/pi collect.in ./

Let’s add timings: double t1, t2;

π = 4 × arctan(1)   1 1 1 1 ≈4× − + − + ··· 1 3 5 7

if (rank == 0) { t1 = MPI_Wtime(); } //... if (rank == 0) { t2 = MPI_Wtime(); printf("Time = %.16f sec\n", t2 - t1); }

42

Exercise 6: Implement a dot product routine in MPI. • Two arrays of 8400 integers • a[i] = 2; b[i] = 3;

And/Or Replace collective calls in “pi collect” by point-to-point communication subroutines (MPI Send / MPI Recv)

43

Outline • Introduction to MPI • Getting Started with MPI • Simple Communications • Collective Communications • Conclusion • MPI routines we know ... • Further readings • Questions?

44

MPI routines we know ... • Startup: • MPI Init, MPI Finalize • Information on the processes: • MPI Comm rank, MPI Comm size • Point-to-point communications: • MPI Send, MPI Recv, MPI Isend, MPI Wait • Collective communications: • MPI Bcast, MPI Reduce, MPI Scatter, MPI Gather

45

Further readings: • The standard itself, news, development:

http://www.mpi-forum.org • Online reference book: http://www.netlib.org/utk/papers/ mpi-book/mpi-book.html • Calcul Qu´ ebec’s wiki: https://wiki.calculquebec.ca/w/MPI/en • Detailed MPI tutorials: http://people.ds.cam.ac.uk/nmm1/MPI/ http://www.mcs.anl.gov/research/projects/ mpi/tutorial/

46

Questions?

• Questions? • Guillimin support team:

[email protected]