COMP 322: Principles of Parallel Programming Lecture 15: OpenMP. Fall 2009

COMP 322: Principles of Parallel Programming Lecture 15: OpenMP Fall 2009 http://www.cs.rice.edu/~vsarkar/comp322 Vivek Sarkar Department of Computer...
Author: Isaac Barber
1 downloads 0 Views 2MB Size
COMP 322: Principles of Parallel Programming Lecture 15: OpenMP Fall 2009 http://www.cs.rice.edu/~vsarkar/comp322

Vivek Sarkar Department of Computer Science Rice University [email protected] COMP 322

Lecture 15

20 October 2009

Acknowledgments for todayʼs lecture • 

Course text: “Principles of Parallel Programming”, Calvin Lin & Lawrence Snyder — Includes resources available at http://www.pearsonhighered.com/educator/academic/product/ 0,3110,0321487907,00.html

• 

Slides from COMP 422 course at Rice University

• 

Slides from OpenMP tutorial given by Ruud van der Paas at HPCC 2007

• 

“Towards OpenMP 3.0”, Larry Meadows, HPCC 2007 presentation —  http://www.tlc2.uh.edu/hpcc07/Schedule/speakers/hpcc07_Larry.ppt

• 

2

—  Spring 2009: John Mellor-Crummey —  Spring 2008: Vivek Sarkar —  http://www.tlc2.uh.edu/hpcc07/Schedule/OpenMP

OpenMP 2.5 and 3.0 specifications —  http://www.openmp.org/mp-documents/spec25.pdf —  http://www.openmp.org/mp-documents/spec30.pdf

COMP 322, Fall 2009 (V.Sarkar)

What is OpenMP?

Latest specification: Version 3.0 (May 2008) Previous specification: Version 2.5 (May 2005) 3

COMP 322, Fall 2009 (V.Sarkar)

A first OpenMP example

4

COMP 322, Fall 2009 (V.Sarkar)

The OpenMP Execution Model

5

COMP 322, Fall 2009 (V.Sarkar)

Terminology

6

COMP 322, Fall 2009 (V.Sarkar)

Components of OpenMP

7

COMP 322, Fall 2009 (V.Sarkar)

OpenMP directives and clauses

8

COMP 322, Fall 2009 (V.Sarkar)

Parallel Region

A parallel region is a block of code executed by multiple threads simultaneously, and supports the following clauses:

9

COMP 322, Fall 2009 (V.Sarkar)

OpenMP Programming Model • 

The clause list is used to specify conditional parallelization, number of threads, and data handling. —  Conditional Parallelization: The clause if (scalar expression) determines whether the parallel construct results in creation of threads. —  Degree of Concurrency: The clause num_threads(integer expression) specifies the number of threads that are created. —  Data Handling: The clause private (variable list) indicates variables local to each thread. The clause firstprivate (variable list) is similar to the private, except values of variables are initialized to corresponding values before the parallel directive. The clause shared (variable list) indicates that variables are shared across all the threads.

10

COMP 322, Fall 2009 (V.Sarkar)

Work-sharing constructs in a Parallel Region

•  The work is distributed across all (master+worker) threads •  Must be enclosed in a parallel region •  Must be encountered by all threads in the team, or none at all •  No implied barrier on entry; implied barrier on exit (unless nowait is specified) •  A work-sharing construct does not launch any new threads •  Shorthand syntax supported for parallel region with single work-sharing construct e.g.,

11

COMP 322, Fall 2009 (V.Sarkar)

Code Spec 6.21 OpenMP parallel for statement.

12

COMP 322, Fall 2009 (V.Sarkar)

6-12

Example of work-sharing “omp for” loop

13

COMP 322, Fall 2009 (V.Sarkar)

Another Example 
 (Figure 6.28 from textbook)

14

COMP 322, Fall 2009 (V.Sarkar)

6-14

Reduction Clause in OpenMP •  •  •  • 

The reduction clause specifies how multiple local copies of a variable at different threads are combined into a single copy at the master when threads exit. The usage of the reduction clause is reduction (operator: variable list). The variables in the list are implicitly specified as being private to threads. The operator can be one of +, *, -, &, |, ^, &&, and ||.

#pragma omp parallel reduction(+: sum) num_threads(8) { /* compute local sums here */ } /*sum here contains sum of all local instances of sums */

15

COMP 322, Fall 2009 (V.Sarkar)

Code Spec 6.22 OpenMP reduce operation.

16

COMP 322, Fall 2009 (V.Sarkar)

6-16

OpenMP Programming: Example /* ****************************************************** An OpenMP version of a threaded program to compute PI. ****************************************************** */ #pragma omp parallel default(private) shared (npoints) \ reduction(+: sum) num_threads(8) { num_threads = omp_get_num_threads(); sample_points_per_thread = npoints / num_threads; sum = 0; for (i = 0; i < sample_points_per_thread; i++) { rand_no_x =(double)(rand_r(&seed))/(double)((2