Beginner's Guide to EOS HPC ClusterTexas A&M University Very Brief Introduction Fall 2012 Michael E. Thomadakis, Ph.D.
Texas A&M University Supercomputer Facility
Short-course Objectives, Contents Beginner's Guide to EOS HPC Cluster at Texas A&M University brief intro to EOS HPC cluster H/W and S/W environmnet pointers to documentation and references
interactive and batch execution of applications LS-DYNA example for running existing HPC application compile and run “new” example code (source provided)
demonstrate with active participation by audience
Intro intel64 iDP HPC Cluster
Texas A&M University (C)2012 Michael E. Thomadakis
2
Starting Point for SC Users SC web site at http://sc.tamu.edu/ Apply for account with SC https://sc.tamu.edu/forms/ Activate the SC systems https://sc.tamu.edu/ams/ User info http://sc.tamu.edu/services.php good overview (for beginners)
Intro intel64 iDP HPC Cluster
Texas A&M University (C)2012 Michael E. Thomadakis
3
User Guides for EOS The EOS's Users Guide http://sc.tamu.edu/help/eos the main reference for building and running all types of code on EOS Look carefully at http://sc.tamu.edu/help/eos/policies.php caveats, restrictions, usage and storage policies Contacting SC for assistance
[email protected] Helpdesk 9:00AM – 17:00 Mon – Fri
Intro intel64 iDP HPC Cluster
Texas A&M University (C)2012 Michael E. Thomadakis
4
Accessing EOS Cluster On-campus EOS access use a PC, laptop or workstation using Linux, UNIX, Windows 7, or a MAC http://sc.tamu.edu/help/access/
OffCampus Internet
Off-campus access use VPN http://hdc.tamu.edu/Connecting/VPN/index.php
Workstation
TAMU VPN
OnCampus Internet Workstation
compute nodes
. . .
compute nodes
Other Server
High-Speed Storage
High Performance Interconnect
Intro intel64 iDP HPC Cluster
interactive node
PC laptop
Texas A&M University (C)2012 Michael E. Thomadakis
Other Server
5
EOS : powerfull “HPC Commodity Cluster” EOS is a powerfull “HPC commodity cluster” state of the art H/W and S/W environment for very large scale science and engineering computation computation H/W (processing power) storage H/W (large and fast memories) communications (high-speed data exchange) files storage (fast access to large file space) S/W environment (compilers, libraries, etc.)
exploit parallelism in problem solution to reduce time to completion of computation What is the system organization ? How to leverage its capabilities best? Intro intel64 iDP HPC Cluster
Texas A&M University (C)2012 Michael E. Thomadakis
6
EOS Cluster Organization Overview
Interactive Nodes
Compute Nodes
372 EOS nodes 3168 cores, 8928+ GiBs DRAM 2592 Nehalem + 576 Westmere cores
Intro intel64 iDP HPC Cluster
Texas A&M University (C)2012 Michael E. Thomadakis
7
EOS Cluster Configuration Summary EOS is an “IBM iDataPlex” Commodity Cluster based on “Intel64” μ-processors (Intel “Xeons”) 5 interactive-access nodes (x-3650-M2) eos1.tamu.edu, …, eos5.tamu.edu eos.tamu.edu maps to any one of them in round-robin to avoid over-subsribing a particular node 48 GiBs DRAM / node (one has more exclusive to a user group)
362 compute nodes (dx-360 “Xeon-EP”) used in batch (NOT directly accessible) 24 GiBs DRAM / node 314 : Nehalem, 2.8GHz, 8 cores / node (M2 “Xeon 5560”) 48 : Westmere, 2.8GHz, 12 cores / node (M3 “Xeon 5660”)
Intro intel64 iDP HPC Cluster
Texas A&M University (C)2012 Michael E. Thomadakis
8
EOS Node H/W and Programming Styles Each EOS node is a Shared Memory Multi-processor (SMP) Nehalem with 8 “cores” or CPUs 0
...
1
7
Shared Memory DRAM 24GiBs
Nehalem EOS Node
Style: Shared Memory Programming (SMP) SMP parallel code consists of multiple threads which access data in common “shared” memory Each thread operates on a separate piece of data
Westmere with 12 “cores” or CPUs 0
...
1
7
...
11
Shared Memory DRAM 24GiBs
Westmere EOS Node
single thread: serial code multiple threads: shared memory code; OMP or Pthreads, etc.
Each CPU can run ONE piece of code at a time: 1 thread or 1 task (“process”)
How many cores ? How much total memory ? Intro intel64 iDP HPC Cluster
Texas A&M University (C)2012 Michael E. Thomadakis
9
EOS Cluster H/W and Programming Style EOS as a collection of SMP nodes Large applications use memory and cores available on the nodes exchange data by messages
multiple tasks (processes): one task per core; tasks cooperate by sending messages to other tasks; Use MPI 0 1
7
...
Shared Memory
Node k
“Message Passing”
Intro intel64 iDP HPC Cluster
7
0 1
Shared Memory
Shared Memory
Node 2
Node 1
High Performance QDR IB Switch 648 ports 4GB/s + 4GB/s
use MPI S/W stack Resources Num of nodes ? cores / node ? total memory ?
0 1
0 1
7
0 1
7
0 1
Shared Memory
Shared Memory
Shared Memory
Node k+1
Node k+2
Node 315
Texas A&M University (C)2012 Michael E. Thomadakis
11
...
10
7
EOS Cluster Capabilities Computation, Storage and Communication ( ideal ) computation double precision arithmetic in flops/sec Westmere / Nehalem processor : 11.2 Gflops/s = 4 flops/cycle × 2.8 GHz node Westmere node 134.4 Gflops/s = 12 × 11.2 Gflops/s Nehalem node 89.6 Gflops/s = 8 × 11.2 Gflops/s Hydra node (for comparison) : 121.6 Gflops/s = 16×4×1.9 Gflops/s
communication 4GB/s+4GB/s per InfiniBand link (QDR) Aggregate computation power : 35481 Gflop/s (=3168 × 11.2 Gflops/s) 32288 Gflop/s (at %91 efficiency) ~32TeraFlops
DRAM : 8928 giga-bytes (~8.9Tera-bytes) Intro intel64 iDP HPC Cluster
Texas A&M University (C)2012 Michael E. Thomadakis
11
Typical HPC Cluster Batch Processing 0. User prepares job J interactively for some large scale computation
compute nodes
. . .
compute nodes
High Performance Interconnect
interactive node
. . .
. . .
. . .
. . .
High-Speed Storage
Intro intel64 iDP HPC Cluster
Batch Scheduler
1. Users submit jobs J (Job Arrivals) Multi-user environment 2. Batch Job Scheduler classifies J according to J 's cluster resource requirements and places it in some “class” (“queue”), ordering it by certain criteria (FIFO, “fairness”, shortest-job next, etc.) 3. Scheduler launches job J as soon as the resources it requires become available and no other higher priority job is present. Resources remain allocated to this job for the duration of the execution of the job. 4. After job J terminates (or gets canceled) Scheduler releases its resources back to the cluster resource pool for other jobs.
Texas A&M University (C)2012 Michael E. Thomadakis
12
User Interactive Workflow [ia] Use an editor of your choice (vim, Emacs) to type input parameter files for application edit source code, create/edit “batch script” file for PBS, Run interactively small cases directly on the interactive nodes Run Large problems on the compute nodes through the batch system
Intro intel64 iDP HPC Cluster
Texas A&M University (C)2012 Michael E. Thomadakis
13
User Interactive Workflow [ia] Login to an interactive node using a SSH (“secure shell” client) SSh clients support authentication and communications encryption for your security use local machine but processing takes place on EOS server PuTTy ← login from a windows PC WinSC ← transfer data from local PC to/from EOS server ssh ← login from a Linux/UNIX/MacOS workstation sftp ← transfer data from local workstation to/from EOS server
Intro intel64 iDP HPC Cluster
Texas A&M University (C)2012 Michael E. Thomadakis
14
User Interactive Workflow [ia] Two basic SSH client types: text or “command line” terminal use “putty” to type text commands in and view text output NO graphics
graphical-user interface (GUI) client launch X11 windows “server” (or X11-emulator) on client system “Xming” : PC freeware X11 server
view graphical output, it can leverage advanced graphical H/W available on your workstation run “xterm” on EOS let X11 display window on your local PC
Intro intel64 iDP HPC Cluster
Texas A&M University (C)2012 Michael E. Thomadakis
15
EOS: GPFS File Systems and Disk Storage EOS's File Systems
IBM GPFS file cluster system large capacity and high-throughput supports parallel I/O by multiple users visible from interactive and compute nodes 438 TBs (1 Tera-Byte = 1012 bytes) raw GPFS storage
/g/home /g/software /scratch/... compute nodes
compute nodes
. . .
438 TBytes formatted SATA disks DDN 9900 HPC storage RAID6
High-Speed Storage
Intro intel64 iDP HPC Cluster
High Performance Interconnect
interactive node
Texas A&M University (C)2012 Michael E. Thomadakis
16
/g Cluster Global File Systems for EOS /g GPFS file system /g/home home directories for users; ● ●
●
backed-up outside the cluster, every night; large numbers of small to medium size files; – not designed for very large files – stores source code files, text files, parameter files, etc. quota: 1 GiB / user (cannot change)
/g/software Application Packages ●
stores the distribution binaries of installed applications;
Intro intel64 iDP HPC Cluster
Texas A&M University (C)2012 Michael E. Thomadakis
17
/scratch
Global File Systems for EOS
/scratch
“High performance” File-System
for Long-term, “temporary” storage and scratchpad for running jobs ●
●
users store results from the job runs or input data of future job runs; scratch-pad storage for executing batch jobs; – designed for large(r) data files quota: 50 GiB / user; can be increased upon request manage efficiently it is NOT BEING backed up
Intro intel64 iDP HPC Cluster
Texas A&M University (C)2012 Michael E. Thomadakis
18
EOS: GPFS File Systems [ia] Upon login use df h $ df -h Filesystem Size ... /dev/mapper/VG01-lvtmp 20G ... /dev/scratch 379T /dev/hul 59T
Used Avail Use% Mounted on 192M
19G
2%
213T 8.3T
167T 50T
57% 15%
/tmp /scratch /g
du sh mmlsquota AUser@eos[1001]: mmlsquota Block Limits Filesystem type KB quota limit hul USR 1136 1048576 1048576 scratch USR 200890816 1073741824 1073741824
Intro intel64 iDP HPC Cluster
in_doubt 71600 40960
| File Limits grace | files quota none | 79 10000 none | 1067 30000
Texas A&M University (C)2012 Michael E. Thomadakis
limit in_doubt 10000 173 30000 20
grace none none
19
Remarks
User Environment on EOS [ia] EOS uses Linux Operating System (UNIX-like) Linux uses the “bash” shell to interact with users During login, system initializes users environment creates a “process” from the bash “interpreter” processes user “init scripts” ~/.basrhc and ~/.bash_profile reads user input from keyboard (STDIN) typically commands writes output to monitor (STDOUT and STDERR) try tty process has unique identifier “pid” try echo $$
Intro intel64 iDP HPC Cluster
Texas A&M University (C)2012 Michael E. Thomadakis
20
User Environment on EOS [ia] Each Linux process has unique identifier “pid” (process-id) uses memory space (DRAM) for data stack “text” binary code other structures We set limits of usage to resources to protect system from abuse
Intro intel64 iDP HPC Cluster
Texas A&M University (C)2012 Michael E. Thomadakis
21
User Environment on EOS [ia] See http://sc.tamu.edu/help/eos/login.php Try “ulimit -a” command show “soft resource limits” per OS process for you ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) 2097152 scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 409600 max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) 2097152 open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 20480 cpu time (seconds, -t) 3600 max user processes (-u) 409600 virtual memory (kbytes, -v) 8192000 file locks (-x) unlimited
Intro intel64 iDP HPC Cluster
Texas A&M University (C)2012 Michael E. Thomadakis
22
User Environment on EOS [ia] During login, system stores information it needs to execute code in “environment variables” info is text and can be read/modified by user Try “env” and “echo” commands ● USER ← your login name ● TMPDIR ← set to point to /scratch/$USER/tmp ● TERM ← terminal type (“xterm”, or “vt100”, etc.) SCRATCH ← set to point to /scratch/$USER ● SHELL ← login shell (bash) ● HOST ← the hostname for this host ● MANPATH ← directory list with man files
Intro intel64 iDP HPC Cluster
Texas A&M University (C)2012 Michael E. Thomadakis
23
User Environment on EOS [ia] During login, system stores information it needs to execute code in “environment variables” Important EnvVars ● PATH ← a ':' separated list of directories system searches for binary executables – guides system to locate executable code “binaries – first matching directory is used for binary ● LD_LIBRARY_PATH ← list of directories with run-time libraries guides system to locate libraries with compiled code first matching directory is used to obtain Typically installed applications have their individual paths with binaries and run time libraries Cumbersome to manually prefix PATH and LD_LIBRARY_PATH to make applications visible or reverse effect Intro intel64 iDP HPC Cluster
Texas A&M University (C)2012 Michael E. Thomadakis
24
User Environment on EOS [ia] Large number of installed S/W base multiple versions of same S/W, need easy way to enable version of S/W we need to use next enable also any pre-requisite S/W w/o setting/resetting environment variables
Intro intel64 iDP HPC Cluster
Texas A&M University (C)2012 Michael E. Thomadakis
25
User Environment on EOS [ia] Most installed software invoked (“enabled”) using the “module” command ● ● ●
●
prepares user's environment to easily access particular s/w works in interactive and in batch modes
Initializes for you “environment variables” the system relies ● PATH prepends it with directory where binary is installed ● LD_LIBRARY_PATH prepends it with directory list with run-time libraries this particular s/w requires ● MANPATH list of directories containing man pages for this S/W ● etc.
Documentation http://sc.tamu.edu/help/modules.php
Intro intel64 iDP HPC Cluster
Texas A&M University (C)2012 Michael E. Thomadakis
26
Running Existing Software on EOS [ia] Upon login use ● module avail ● module load modulename make S/W modulename visible
module unload modulename make S/W modulename not-visible ●
module list –
●
currently enabled S/W in user's environment
module modulename help man softwarename – –
sometimes a “man” page provides 1st round of information not all S/W comes now with man pages
Intro intel64 iDP HPC Cluster
Texas A&M University (C)2012 Michael E. Thomadakis
27
Development Software on EOS [ia] module avail (excerpt)
------------------------------------------ /g/software/Modules/modulefiles/Development ----HDF5/1.8.4/OpenMPI intel/ipp/6.1.6.063(default) mvapich2/1.6-r4751+limic2-intel HDF5/1.8.4/serial intel/itac/8.0.0.011-mpich mvapich2/1.6-r4751-intel Java/6/1.6.0_21 intel/itac/8.0.0.011-mpich2 mvapich2/1.7 …........................ dfftw/2.1.5/OpenMPI dfftw/2.1.5/threads dfftw/2.1.5/threadsOMP expat/2.0.1 fftw/2.1.5 fftw/3.2.2(default) gcc/4.5.1
intel/tcheck/3.1.012 intel/tprofile/3.1.012 intel/vtune/9.1.8 intelXE/compilers/12.1.4.319 intelXE/compilers/12.1.5.339(default) intelXE/mkl/10.3.10.319 intelXE/mkl/10.3.11.339(default)
openmpi/1.4.2 openmpi/1.4.3(default) openmpi/1.6.0/intelXE papi/4.2.0 perfctr/2.6.42 pgi/compilers/11.0 pgi/compilers/11.5
mpich2/1.3.1 mvapich2/1.5_shared mvapich2/1.5_static mvapich2/1.6 mvapich2/1.6-latest mvapich2/1.6-latest-limic2
python/2.6.5 python/2.7.3(default) szip/2.1 tau/2.19.2
….................. hdf5/1.8.9 intel/compilers/11.1.059 intel/compilers/11.1.069 intel/compilers/11.1.072 intel/compilers/11.1.073(default) intel/ipp/6.1.4.059 …............................................
Intro intel64 iDP HPC Cluster
Texas A&M University (C)2012 Michael E. Thomadakis
28
Application Software on EOS [ia] module avail (excerpt)
…............................................ -------------------------------------------------------------------------- /g/software/Modules/modulefiles/Applications EMBOSS/6.2.0 hfss/140 paraview/3.12 Mathematica/7.0 icemcfd/12.1 petsc/3.1-p8-debug Mathematica/7.0.1(default) icemcfd/13.0(default) petsc/3.1-p8-opt R …............................................ g/shared gnuplot/4.4.3 trilinos/10.4.2/intel_12.1/openmpi_1.6.0/opt/shared gnuplot/4.6.0/gcc_4.5.1 grace/5.1.22 gromacs/4.5.1 gromacs/4.5.5(default)
Intro intel64 iDP HPC Cluster
p4est/0.3.4/intel_11.1/openmpi_1.4.3/opt p4est/0.3.4/intel_11.1/openmpi_1.4.3/p4est p4est/0.3.4/intel_12.1/intelmpi_4.0.3/p4est p4est/0.3.4/intel_12.1/mvapich2_1.8/p4est p4est/0.3.4/intel_12.1/openmpi_1.6.0/p4est
Texas A&M University (C)2012 Michael E. Thomadakis
udunits/2.1.14 udunits/2.1.24 visit/2.3.2
29
Examples of Module Commands [ia] module avail gnup $ module avail gnup -------- /g/software/Modules/modulefiles/Applications --gnuplot/4.4.3 gnuplot/4.6.0/gcc_4.5.1
module avail startccm+ $ module avail starccm+ -------- /g/software/Modules/modulefiles/Applications starccm+/5.02 starccm+/6.02 starccm+/6.04 starccm+/7.02 starccm+/7.04(default)
Intro intel64 iDP HPC Cluster
starccm+/6.06
Texas A&M University (C)2012 Michael E. Thomadakis
30
How to Utilize EOS's Power Run existing applications specializing to a domain of interest http://sc.tamu.edu/software/ Build and Run Free Open Source S/W Build you own application from scratch use existing scientific libraries to build your own
Intro intel64 iDP HPC Cluster
Texas A&M University (C)2012 Michael E. Thomadakis
31
cp and tar commands – review cp options Source Destination many options tar options tarfile pack several files into one for easy transfer -cvf TarFile.tar Filestoinclude create tarfile with these contents
-xvf TarFile.tar extract contents from this TarFile
Intro intel64 iDP HPC Cluster
Texas A&M University (C)2012 Michael E. Thomadakis
32
Running LS-DYNA Software on EOS ●
Introduce Interactive and Batch mode execution for existing application (no compilation required) LS-DYNA http://sc.tamu.edu/help/faq/lsdyna.php to prepare user environment use: module load lsdyna
●
NOTE: lsdyna can be run as ● serial : 1 “core” (CPU) + specified memory (DRAM) ● parallel OMP : 1 node with 1–12 cores + specified memory that is shared by all OMP threads ● parallel MPI : specify num of nodes + total memory
Intro intel64 iDP HPC Cluster
Texas A&M University (C)2012 Michael E. Thomadakis
33
Reference to Training Material Training material at /g/public/training/EOSHPC/2012Fall/EOSHPC-2012Fall.tar
Create a directory your $SCRATCH directory Copy the above .tar file to that directory cd there and extract the contents tar -xvf EOSHPC-2012Fall.tar cd to the new directory it has just generated cd 2012Fall ls -l
Intro intel64 iDP HPC Cluster
Texas A&M University (C)2012 Michael E. Thomadakis
34
Running LS-DYNA Software on EOS [ia] Interactive serial LS-DYNA 1 core + specified memory (DRAM)
0
1
7
Shared Memory
EOS Node j
module load ls-dyna module list ls-dyna i=input.k o=outputfile memory=100m
i=input.k : name of input file o=outputfile : name of output file memory=100m : requests 100 mega-words (100×4×Mbytes) for LS-DYNA 1 memory word = 4 bytes
Intro intel64 iDP HPC Cluster
Texas A&M University (C)2012 Michael E. Thomadakis
35
Running LS-DYNA Software on EOS [ia] serial : 1 core + specified memory (DRAM) Try at the command line cd $SCRATCH pwd mkdir lsdyna_iserial cd lsdyna_iserial module load ls-dyna ls-dyna \ i=/g/software/lstc/ls-dyna/examples/SPH_Lacome/bar1.k \ o=bar1.out \ memory=500m
(last one is a single line) see what was generated in the output directory Intro intel64 iDP HPC Cluster
Texas A&M University (C)2012 Michael E. Thomadakis
36
Running LS-DYNA Software on EOS [ia] Batch LS-DYNA and any batch application use text editor to write a “job script” file, say, “ls.pbs” job scripts files are text files which pres-scribe to PBS details relevant to our computation on the cluster, with at least 2 parts 1)resources specifications Num of nodes, cores/node, total memory, wall-clock time (total expected duration of job)
2) statements in shell language issuing specific commands with relevant parameters
submit the LS-Dyna “Batch Job” to the PBS/Maui batch scheduler to arrange when this code will run on cluster qsub ls.pbs Intro intel64 iDP HPC Cluster
Texas A&M University (C)2012 Michael E. Thomadakis
37
Running LS-DYNA Software on EOS [ia] Batch serial LS-DYNA edit a text file “ls.pbs” #PBS ← indicates that PBS examines this line up to first '#' (comment) -l resource specification -S use this shell for executable statements in script -j combine the STDOUT and STDERR into one file; -N job name
qsub ls.pbs Resources Specification part of job script
qstat JobId Executable part of job script in shell language
Intro intel64 iDP HPC Cluster
#PBS #PBS #PBS #PBS #PBS #PBS
-N -l -l -l -S -j
ls-dyna nodes=1:ppn=1 mem=4000mb walltime=4:00:00 /bin/bash oe
1 EOS node
wallclock time
1 core 4000 MBs memory
# This is were the executable statements start cd $PBS_O_WORKDIR module load ls-dyna mkdir lsdyna; cd lsdyna; ls-dyna \ i=/g/software/lstc/ls-yna/examples/SPH_Lacome/bird1.k \ o=bird1 memory=1000m
Texas A&M University (C)2012 Michael E. Thomadakis
38
Running LS-DYNA Software on EOS [ia] Interactive LS-DYNA small cases parallel OMP : 1-12 cores + specified memory shared by all OMP threads (“Shared Memory Programming Paradign”) module load ls-dyna export OMP_NUM_THREADS=4 ls-dyna i=g/software/lstc/ls-dyna/examples/SPH_Lacome/bar1.k \ o=outputfile \ ncpu=4 memory=500m OMP_NUM_THREADS specifies number of OpenMP threads to use (4) 4 cores and memory from the same node will be used 0
1
3
...
7
Shared Memory
EOS Node j
cannot have OMP_NUM_THREADS > num cores on a node Intro intel64 iDP HPC Cluster
Texas A&M University (C)2012 Michael E. Thomadakis
39
Running LS-DYNA Software on EOS [ia] Interactive LS-DYNA small cases parallel OMP : try this by hand mkdir lsdyna_iOMP cd lsdyna_iOMP module load ls-dyna export OMP_NUM_THREADS=4 ls-dyna i=g/software/lstc/ls-dyna/examples/SPH_Lacome/bar1.k \ o=outputfile \ ncpu=4 memory=500m
Intro intel64 iDP HPC Cluster
Texas A&M University (C)2012 Michael E. Thomadakis
40
Running LS-DYNA Software on EOS [ia] Batch parallel OMP LS-DYNA write in a file “ ls-dyna_OMP.pbs” OMP_NUM_THREADS specifies number of OpenMP threads to use (4) qsub ls-dyna_OMP.pbs qstat JobId Resource Specification part of job script
#PBS #PBS #PBS #PBS #PBS #PBS
-N -l -l -l -S -j
ls-dyna-OMP nodes=1:ppn=4 mem=2000mb walltime=3:00:00 /bin/bash oe
1 EOS node
wallclock time
4 cores 2000 MBs memory
# This is were the executable statements start
Executable part of job script
Intro intel64 iDP HPC Cluster
cd $PBS_O_WORKDIR module load intel/compilers module load ls-dyna export OMP_NUM_THREADS=4 ls-dyna ncpu=4 \ i=/g/software/lstc/ls-yna/examples/SPH_Lacome/bird1.k \ o=outputfile memory=200m Texas A&M University (C)2012 Michael E. Thomadakis
41
Running LS-DYNA Software on EOS [ia] Interactive parallel MPI LS-DYNA Message Passing Programming Paradigm use a number of nodes + specified total memory module load openmpi module load ls-dyna mpirun -np 4 ls-dyna-mpi \ i=/g/software/lstc/ls-yna/examples/SPH_Lacome/bar1.k \ o=outputfile \ memory=500m
Explanations mpirun is the command to launch OpenMPI (v1.4.3) code on EOS -np 4 asks OpenMPI to create 4 MPI tasks all 4 tasks will run on the same node (interactive case) data is exchanged by explicit messages tasks send to one another NO shared memory Intro intel64 iDP HPC Cluster
Texas A&M University (C)2012 Michael E. Thomadakis
42
Running LS-DYNA Software on EOS [ia] Batch parallel MPI LS-DYNA write in a file “ls-dyna_MPI.pbs”, then qsub ls-dyna_MPI.pbs Resource Specification part
#PBS #PBS #PBS #PBS #PBS #PBS
-N -l -l -l -S -j
ls-dyna-MPI nodes=2:ppn=4 mem=4000mb walltime=3:00:00 /bin/bash oe
2 EOS node
wallclock time
4 cores/node 4000 MBs total memory on all nodes
# This is were the executable statements start cd $PBS_O_WORKDIR
Executable part of
Intro intel64 iDP HPC Cluster
module load openmpi module load ls-dyna
openmpi will start 8=2×4 MPI tasks spread across the 2 nodes
mpirun ls-dyna-mpi \ i=/g/software/lstc/ls-yna/examples/SPH_Lacome/bird1.k \ o=bird1 \ memory=2100m
Texas A&M University (C)2012 Michael E. Thomadakis
43
Running LS-DYNA Software on EOS [ia] Batch parallel MPI LS-DYNA After qsub, PBS allocates 2 free nodes to this job OpenMPI launches 8 (2×4) MPI tasks across the 2 nodes using 4 cores on each node, 1 MPI task will use 1 core 4000 MiBs aggregate, allocated across all nodes 0 1
#PBS #PBS #PBS #PBS #PBS #PBS
-N -l -l -l -S -j
7
...
Shared Memory
ls-dyna-MPI nodes=2:ppn=4 mem=4000mb walltime=3:00:00 /bin/bash oe
Node k
4000MiB total
module load openmpi module load ls-dyna
0 1
mpirun ls-dyna-mpi i=input.k o=outputfile memory=1000m
7
7
0 1
Shared Memory
Shared Memory
Node 2
Node 1
IB QDR Switch
# cd $PBS_O_WORKDIR
Intro intel64 iDP HPC Cluster
0 1
0 1
7
0 1
11
Shared Memory
Shared Memory
Shared Memory
Node k+1
Node k+2
Node 315
Texas A&M University (C)2012 Michael E. Thomadakis
...
44
7
Reference for PBS Batch Jobs on EOS Refer to http://sc.tamu.edu/help/eos/batch/ for examples running different types of batch jobs on EOS
Intro intel64 iDP HPC Cluster
Texas A&M University (C)2012 Michael E. Thomadakis
45
Useful PBS Commansd PBS can report useful job information qstat -a View information about queued jobs qstat -f jobid View detailed information (such as assigned nodes and resource usage) about queued jobs. showq View information about queued jobs. Useful options are -r (show more info for running jobs) and -i (show more info for queued jobs) checkjob View status of your job. Use the -v option to view additional information for why a queued job is waiting. qdel jobid canceljob jobid (Maui) Terminates a job with id = jobid qlimit View the status of the queues. jrpt jobid Reports by process, by node and aggregate memory and CPU usage and other process information
Intro intel64 iDP HPC Cluster
Texas A&M University (C)2012 Michael E. Thomadakis
46
Batch Processing on EOS “Queues” PBS “queues” and their resource limits Per job limits per queue: -------------------------
atmo chang helmut lucchese quiring science_lms staff wheeler yang
1 1 1 1 1 1 1 1 1
Intro intel64 iDP HPC Cluster
128 128 8 1 1 362 1 16 8 22 8 4 16 0 24 8
1024 1024 64 8 8 2896 8 192 64 176 64 32 160 0 256 80
Max Walltime
01:00:00 24:00:00 96:00:00 500:00:00 96:00:00 12:00:00 96:00:00 None None None None None None None None None
. . .
1 1 1 1 1 1 1
Max Cpus
. . .
short medium long xlong low special abaqus
Max Node
. . .
Min Node
. . .
Queue
Texas A&M University (C)2012 Michael E. Thomadakis
Batch Scheduler
47
Batch Processing on EOS “Queues” PBS “queues” and snapshot of current usage Queue
short medium long low special abaqus atmo chang helmut lucchese quiring regular science_lms staff wheeler xlong yang
Accept Jobs? Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
Run | Jobs? | Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
| | | | | | | | | | | | | | | | |
Curr RJob
Max RJob
Curr QJob
0 60 29 0 0 0 1 1 3 2 0 0 18 1 31 0 0
1 150 150 96 1 0 0 0 0 0 0 0 0 0 0 1 0
0 55 91 1 0 0 0 0 0 0 0 0 9 1 5 0 0
Max | QJob | 4 350 150 400 10 0 0 0 0 0 0 0 0 32 0 10 0
| | | | | | | | | | | | | | | | |
Curr CPUs
Curr PEs
Max PEs | Max Max Soft/Hard | UserR UserQ
0 1110 589 0 0 0 48 16 24 32 0 0 132 704 248 0 0
0 1160 592 0 0 0 53 17 24 32 0 0 132 704 250 0 0
1024 1600/2048 800 944 3088 24 192 64 96/384 64 32 0 160 0 256 12 80
| | | | | | | | | | | | | | | | |
0 8/50 8/40 50 1/2 0 0 0 0 0 0 0 0 32 0 1 16
1 80 40 200 4 0 0 0 0 0 0 0 0 32 0 10 0
RJob = Running jobs QJob = Queued jobs. UserR = Running jobs for a user. UserQ = Queued jobs for a user. PE = Processor equivalent based on requested resources (ie. memory). Any queued jobs exceeding a queued job limit (queue-wide or user) will be ineligible for scheduling consideration.
Intro intel64 iDP HPC Cluster
Texas A&M University (C)2012 Michael E. Thomadakis
48
Multi-Serial PBS Batch Job We may launch several independent serial jobs on EOS nodes: 8 for Nehalem, 12 for Westmere Example script #PBS -l node=1,ppn=8,mem=20gb,walltime=04:00:00 #PBS . . . # ... assumes all needed files are in $PBS_O_WORKDIR # module load intel/compilers # cd $PBS_O_WORKDIR ( serial.exe < indata1 > outdata1 ) & ( serial.exe < indata2 > outdata2 ) & . . . & . . . & ( serial.exe < indata8 > outdata8 ) # wait # ------- End of multi-execution serial job ------
Intro intel64 iDP HPC Cluster
Texas A&M University (C)2012 Michael E. Thomadakis
49
Developing Application Code on EOS [ia] Interactive ● write code; correct errors; ● debug; optimize; tune up; ● execute small test cases on interactive nodes Batch ● put together and solve large cases ● submit through the PBS/Torque batch scheduler ● cycle to debug and tune up code is longer
Intro intel64 iDP HPC Cluster
Texas A&M University (C)2012 Michael E. Thomadakis
50
Code Development for EOS Some general references Building code for EOS –
http://sc.tamu.edu/help/eos/compiling
Debugging (intro) http://sc.tamu.edu/help/eos/debugging.php ●
Examples for Serial, OMP and MPI some in http://sc.tamu.edu/help/eos/compiling see EOS Lab
Intro intel64 iDP HPC Cluster
Texas A&M University (C)2012 Michael E. Thomadakis
51
EOS Code Development Lab [ia] Training material at
/g/public/training/EOSHPC/2012Fall/EOSHPC-2012Fall.tar
Copy to your local EOS $SCRATCH directory C and Fortran examples serial, OMP and MPI examples Build the code modify the makefiles available in each sub-directory xxx_makefile_L, where L is “C” or “F” Run the code interactively Submit the code in the batch system xxxxxx.pbs contains the PBS job scripts available in each sub-directory Intro intel64 iDP HPC Cluster
Texas A&M University (C)2012 Michael E. Thomadakis
52
Programming Paradigms EOS Supports Scalar or serial Shared-Memory Programming for SMPs Message Passing Paradigm (MPI)
Intro intel64 iDP HPC Cluster
Texas A&M University (C)2012 Michael E. Thomadakis
53
Application Types for EOS – Serial Serial (or ”scalar”) ● logically one sequence of instructions – – ● ●
●
●
●
occupies 1 core on a node occupies memory on that node
0
1
7
Shared Memory
EOS Node j
one logical “execution thread” typically, serial computation is “parallelized” (i.e., dynamically scheduled) at the machine instruction level, within the pipeline of the processor Instruction Level Parallelism (ILP): instruction level concurrency possible within a single thread serial code Xeon core can dispatch up to 4 instructions at a time to its 6 execution units completion is in logical order
Intro intel64 iDP HPC Cluster
Texas A&M University (C)2012 Michael E. Thomadakis
54
Intel S/W Tools for Serial Code Intel Compilers C, C++ and Fortran v12.1 ● extensive support for serial code and ILP optimization module load intelXE/compilers Intel Compilers C, C++ and Fortran v11.1 older, deprecated environment module load intel/compilers Linux/Intel Libraries and Tools available MKL, PETSc, Trilinos, etc. performance profiling and tuning
Intro intel64 iDP HPC Cluster
Texas A&M University (C)2012 Michael E. Thomadakis
55
Parallel Applications – SMP Shared-Memory Programming
●
●
typically, SMP computation is “parallelized” by the user or compiler at the basic-block, loop or program function levels
thread
Open MP (OMP) a programming standard for loop and thread level parallelism –
●
Computation parallel nc threads
thread
logically concurrent sequences of instructions which need coordination for controlled interaction
thread
●
Computation serial (1 thread)
multiple logical “execution threads”, accessing common memory space thread
●
Intel compilers support OMP V3.0
Xeon Nehalem, Westmere nodes can execute up to 8, 12, respectively (compute threads > 8 or 12 is NOT wise)
Intro intel64 iDP HPC Cluster
0
1
11
Shared Memory
EOS Node j
Texas A&M University (C)2012 Michael E. Thomadakis
56
Intel S/W Tools for SMP Code Intel Compilers C/C++, Fortran V12.1 ● OpenMP V3.0 ● see docs for compiler options and OMP ●
Various performance profiling and tuning tools
Intro intel64 iDP HPC Cluster
Texas A&M University (C)2012 Michael E. Thomadakis
57
Intel S/W Tools for SMP/OMP Code Intel OMP Debugging ● idb : attach to running OMP code or launch OMP code man idb ● gdb
Intro intel64 iDP HPC Cluster
Texas A&M University (C)2012 Michael E. Thomadakis
58
Parallel Application – Message Passing
●
●
●
●
●
even if tasks collocated cannot see one another's memory logically concurrent sequences of instructions which coordinate by explicit or implicit messages
0 1
EOS cluster can execute MPI code with as many message passing processes as available cores
Intro intel64 iDP HPC Cluster
. .7 .
task
Computation N tasks
0 1
Shared Memory
...
Node k
typically, MPP computation is “parallelized” by the user with calls to special message passing functions/APIs MPI a programming standard for message passing parallel computation
task
multiple logical execution tasks, cooperating by exchanging messages; NO common memory
task
●
task
Message Passing Programming (MPP)
Computation serial (1 task)
7
Shared Memory
Node 2
Interconnect
0 1 Shared Memory
7
...
Node k+1
Texas A&M University (C)2012 Michael E. Thomadakis
0 1
11
Shared Memory
Node 315
59
Intel S/W Tools for MPP Code Linux MPI based on Intel 12.1 “XE” suite ● Intel MPI : 4.0.3 ● OpenMPI : 1.6.x ● mvapich2 : 1.8-rxxx OSU release ● see documentation, attend short-course on MPI Linux/Intel Performance Profiling and Tuning ● vtune-amplifier / trace collector analyzer –
●
●
GUI and TUI leveraging Linux kernel tracing capability for MPI code
TAU an IDE for profiling, tracing and performance tuning of MPI and hybrid MPI/OMP code other tools available upon request
Intro intel64 iDP HPC Cluster
Texas A&M University (C)2012 Michael E. Thomadakis
60
Batch Processing for EOS Use Linux PBS/Torque to submit large(r)-scale computation to the compute nodes of EOS Building code for EOS batch computation ● http://sc.tamu.edu/help/eos/batch Only way to access HPC cluster resources User needs to have a sense of what type of resources his/hers code requries
Intro intel64 iDP HPC Cluster
Texas A&M University (C)2012 Michael E. Thomadakis
61
Batch Processing for EOS PBS-1 User ●
●
●
●
puts together a PBS “job-script” which explains what types of resources “job” requires in order to proceed effectively “submits” the job script to PBS at the command line with the “qsub” Linux command : “qsub PBS_Job_script” to send job to PBS use the “qstat PBS_Job_Id ” command to see the status of this job in the system
PBS scheduler ●
selects one of the available “PBS batch classes” to send your job based on the resource requirements of the job
Intro intel64 iDP HPC Cluster
Texas A&M University (C)2012 Michael E. Thomadakis
62
Batch Processing for EOS PBS-2 PBS cluster resources ● use the “diagnose n” command eos2 [pts/10>]:%
Intro intel64 iDP HPC Cluster
Texas A&M University (C)2012 Michael E. Thomadakis
63
Batch Processing for EOS PBS-3 PBS schedules the execution of this particular job for a future time, based on ● when the required resources become available for this job ● the recent-past resource consumption by this user
Intro intel64 iDP HPC Cluster
Texas A&M University (C)2012 Michael E. Thomadakis
64
Batch Processing for EOS PBS-4 PBS job script consists of ● resources' requirements section ● sequences of command lines in the bash shell language which prescribes the actions the system takes to preprocess, execute and post-process the job
Intro intel64 iDP HPC Cluster
Texas A&M University (C)2012 Michael E. Thomadakis
65
Program Development on EOS Intel 12.1 Intel S/W Development stack (recommended) Intel 12.1 Cluster Studio “XE” C/C++ and Fortran (compilers, debugger, OMP, Pthreads, etc.) module load intelXE/compilers Intel Math-Kernel Library, highly tuned math routines “MKL” module load intelXE/mkl Intel MPI module load intelXE/mpi VTuneAmplifier (for performance profiling and tuning) Trace Analyzer and Collector (MPI tracing) Inspector (for multi-threaded correctness) Threading Buillding Blocks (thread primitives) Integrated Performance Primitives IPP (signal, image processing) Intro intel64 iDP HPC Cluster
Texas A&M University (C)2012 Michael E. Thomadakis
66
Program Development on EOS Intel 11.1 Intel S/W Development stack (deprecated ) Intel 11.1 stack C/C++ and Fortran Compilers, debugger, OMP module load intel/compilers Intel Math-Kernel Library, efficient math routines “MKL” module load intel/mkl Intel MPI module load intel/mpi VTune Analyzer (for performance profiling and tuning) Trace Analyzer and Collector (MPI tracing) Threading Buillding Blocks (thread primitives) Integrated Performance Primitives IPP (signal, image processing)
Most installed S/W based on Intel 11.1 porting underway for Intel 12.1 Intro intel64 iDP HPC Cluster
Texas A&M University (C)2012 Michael E. Thomadakis
67
Program Development on EOS Intel 13.1 Intel Latest Cluster Parallel Studio XE Improved compilers and tools vs. 12.1 not installed yet but eventually we will deploy as code matures
Intro intel64 iDP HPC Cluster
Texas A&M University (C)2012 Michael E. Thomadakis
68
Program Development on EOS PGI PGI Compilers currently usable on eos2 only v12.3 and some older, newer version installed soon at available across cluster module load pgi/compilers full set of tools, few users or S/W based on it
Intro intel64 iDP HPC Cluster
Texas A&M University (C)2012 Michael E. Thomadakis
69
Program Development on EOS GNU GNU Compilers Strongly recommend to use Intel S/W tools for all native code development or porting Intel compiler generated code is far-far superior than that of GNU especially for Fortran GNU OpenMP code is NOT compatible with Intel v 4.1.2 Linux system build on it use it ONLY if you need to develop kernel or related code
v 4.5.1 module load gcc relatively more recent use it ONLY if you have to compile code which ONLY builds with GNU compilers Intro intel64 iDP HPC Cluster
Texas A&M University (C)2012 Michael E. Thomadakis
70
Program Development on EOS Tuning Software available for Performance Profiling, Tracing and Tuning intel VtuneAmplifier (12.1) tau papi
Intro intel64 iDP HPC Cluster
Texas A&M University (C)2012 Michael E. Thomadakis
71
Beginner's Guide to “EOS” User Background Survey Department and degree objective (MSc, PhD, etc.) Intended use: existing s/w (which ?), build/port existing FOSS package (language compilers, environment: scalar, OMP, MPI, hybrid, other?)
Prior HPC system experience (languages, packages, systems, etc.)
Is this only for this semester (due to course requirements) or is a long-term tool for your research? ✔
If HPC long-term tool, what is research area ✔
Intro intel64 iDP HPC Cluster
Texas A&M University (C)2012 Michael E. Thomadakis
72