MPI: Beyond the Basics David McCaughan HPC Analyst, SHARCNET
[email protected]
Review: “The Basics” l
MPI_Init()
l
MPI_Finalize()
l
MPI_Comm_rank()
l
MPI_Comm_size()
l
MPI_Send()
l
MPI_Recv()
MPI: Beyond the Basic
D. McCaughan
Review: sending/receiving int main(int argc, char *argv[]) { int rank; double pi = 3.14, val = 0.0; MPI_Status status;
executed by all processes
MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank);
executed by process 0
if (rank == 0) MPI_Send(&pi, 1, MPI_DOUBLE, 1, 0, MPI_COMM_WORLD); else { MPI_Recv(&val, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD, &status); printf(“Received: %f\n”, val); } MPI_Finalize();
executed by all processes except 0
}
MPI: Beyond the Basic
D. McCaughan
Understanding parallelism: Euclidian Inner Product l
Compute a weighted sum
s = ∑ aibi i
MPI: Beyond the Basic
Sequential Algorithm:
given arrays a, b of size N
s := 0 do i := 1,N s := s + (a[i] * b[i])
- run-time proportional to N
D. McCaughan
Thinking in Parallel l
Assume N = 2x processors (x an integer) P0
P1
P2
P3
P4
P5
P6
P7
a0
b0
a1
b1
a2
b2
a3
b3
a4
b4
a5
b5
a6
b6
a7
b7
t0
*
*
*
*
t1
+
+
+
+
t2
+
+
t3
+
result
MPI: Beyond the Basic
D. McCaughan
*
*
*
*
Parallel Inner Product Algorithm l
All processors executing same algorithm x := a[k] * b[k] do t := (log2N-1), 0, -1 if 2t 0) { if ((k >= v2_t) && (k MPI_Send(&op1, 1, else if (k < v2_t) { MPI_Recv(&op2, 1, (k+v2_t), op1 = op1 + op2; } v2_t = v2+t >> 1; }
MPI: Beyond the Basic
< (v2_t >= 1; { if ((k >= v2_t) && (k < (v2_t