What job means ? Main components (CE, SE, MON, WN … ) Job workflow
Grid Computing z
z
Is a collection of heterogeneous resources (computers, storages, devices, services …) that are geographically distributed and has a dynamic content Grid computing -> GRID SERVICES –
OGSA http://www.globus.org/ogsa/
Virtual Organization z
z
z
People who collaborate by sharing their resources on an organized environment (based on groups and roles …) A group of individuals or institutions who share their computing resources to serve a common goal; Infrastructure service that authorize access to a resource based on membership credentials;
VO :: Who can join ? z
VO defines three types of users: –
Resource providers: z
–
Applications providers: z
–
Computational clusters, storage devices and other type of devices Different fields: medicine, mathematics, physics, aeronautics …
Consumers: z
Persons involved in VO-registered application fields
VO :: How it works ? z
VOMS – Virtual Organization Membership Service
z
Each VO must have one or several VOMS servers
z
UI tools for authorization
z
Web-Interface for Administration
VO :: UI tools for authorization z
voms-proxy voms-proxy-init –voms gridmosi.ici.ro Enter GRID pass phrase: Your identity: /DC=RO/DC=RomanianGRID/O=UVT/CN=Silviu Panica Creating temporary proxy ....................................................................... Done Your proxy is valid until Fri Apr 4 06:53:10 2008
Certification Authority (CA) issues Digital Certificates for both users and resources (computers, programs …) CA’s periodically publish a list of denied certificates –
Approve/Deny user applications Define groups, roles, attributes Assign roles to users Assign users to groups https://voms.grid.info.uvt.ro:8443/voms/gridmosi.ici.ro/Login.do
Grid jobs z z
Jobs represent user application executed over grid Job information that must be specified: – –
Job characteristics Job requirements z z
z
Computing resources Software dependencies
JDL : Job Description Language –
Based on ClassAd (CONDOR CLASSified ADvertisement Language)
Job management :: WMS z
z z
Users uses Workload Management System (WMS) in order to interact with grid environment; WMS : distributed scheduling and resource management WMS allows: – – –
Job submission (it tries to schedule using optimal resources needed) Job status Job output retrieval
JS :: VO :: Services z
Resource Broker (RB) / WMS
z
BDII (Berkeley DB Information Index) –
z
Based on Monitoring and Discovery Service (MDS)
Relational Grid Monitoring Architecture (RGMA)
JS :: Cluster :: Services z
Computing Element (CE)
z
Storage Element (SE)
z
Monitor Node (MON)
z
User Interface (UI)
z
Worker Node (WN)
JS :: Cluster :: Services
JS :: Job description z
User uses JDL to describe a job: – –
z
Attributes : to describe the job Resources : used by RB to apply the matchmaking algorithm in order to select “the best” resources for scheduling
JDL structure must include: –
JobType: Normal (simple, sequential job); Interactive, MPICH, Checkpointable
JS :: Job description (2) – – –
–
–
–
Executables (mandatory) : command name Arguments (optional) : command arguments StdInput, StdOutput, StdError (optional) : standard input/output/error InputSandBox : list of files from UI that need to be registered for the job OutputSandBox : list of files that need to be retrieved after job executed VirtualOrganisation (optional)
JS :: JDL Example z
at least this information must be submitted: – – – –
The name of the executable The files for standard output and standard error The arguments to the executables if needed The files that need to be transferred from UI to WN and viceversa
-r the job is submitted directly to the computing element identified by -vo the Virtual Organisation (if user is not happy with the one specified in the UI configuration file) -o the generated jobId is written in the
Useful for other commands, e.g.: z glite-job-status –i (or jobId) –
-i the status information about jobId contained in the are displayed
JS :: Job workflow
JS :: Job states SUBMITTED submission logged in the LB WAIT job match making for resources READY job being sent to executing CE SCHEDULED job scheduled in the CE queue manager RUNNING job executing on a WN of the selected CE queue DONE job terminated without grid errors CLEARED job output retrieved ABORT job abortedby middleware, check reason
JS :: Output retrieval z
glite-wms-job-output –o --vo –
– –
-o defines output directory where files from RB will be saved --vo the VirtualOrganisation jobid generated by RB after job registration