security middleware virtualization September 2008 Prague Czech Republic

security middleware virtualization September 2008 Prague Czech Republic Gabriela Krˇcmaˇrová, Petr Sojka CESNET Conference 2008 Security, Middlewa...

Author: Felicity Newman

0 downloads 0 Views 8MB Size

Report

Download PDF

Recommend Documents

PRAGUE, CZECH REPUBLIC

prague 2016 MERCHANDISE COLLECTION PRAGUE, CZECH REPUBLIC

Engineering, Czech Technical University in Prague, Prague, Czech Republic;

Prague 4 The Czech Republic

38.Congress CIPS. Prague Czech Republic. 38th CIPS Congress Prague, Czech Republic April English version

Prague, Czech Republic 5th 8th May 2012

6-8 April 2009 Prague, Czech Republic

Study Center in Prague, Czech Republic

28-30 April, 2016 Prague, Czech Republic

Welcome to Prague. Keynotes. June 23-25, 2008 Andel s Hotel Prague Czech Republic

CZECH REPUBLIC CZECH REPUBLIC

part I: June 2015, Prague, Czech Republic part II: Elemental Impurities 17 June 2015, Prague, Czech Republic

Czech Republic. Lonely Planet Publications CZECH REPUBLIC CZECH REPUBLIC

HARVARD OPM 29 BOHEMIAN REUNION Prague, Czech Republic

OOS Forum Invitation. Prague, Czech Republic June 2012 ECA

EUROBANKING 2015: Prague, Czech Republic May 31 June 3, 2015

CEA Prague, Czech Republic Fall, Academic Year Program Handbook

CEA Prague, Czech Republic Spring 2010 Program Handbook

D-AIR project REGIONAL IMPLEMENTATION PLAN PRAGUE (CZECH REPUBLIC)

HUNGARY &THE CZECH REPUBLIC: BIRDS &MUSIC FROM BUDAPEST TO PRAGUE

PLANET Technology Information Day Prague (Czech Republic), May 26, 2003

Hospital, Institute of Medical Biochemistry, Prague, Czech Republic

Triennial Workshop. Prague, Czech Republic March 21-25, EducationUSA.state.gov

BOARDS OF GOVERNORS 2000 ANNUAL MEETINGS PRAGUE, CZECH REPUBLIC

security middleware virtualization

September 2008 Prague Czech Republic

Gabriela Krˇcmaˇrová, Petr Sojka

CESNET Conference 2008 Security, Middleware, and Virtualization – Glue of Future Networks Prague, Czech Republic, September 25–26, 2008 Proceedings

CESNET, z. s. p. o.

Editors: Gabriela Krˇcmaˇrová CESNET, z. s. p. o., Zikova 4, 160 00 Praha 6, Czech Republic Email: [email protected] Petr Sojka Faculty of Informatics, Masaryk University, Department of Computer Graphics and Design Botanická 68a, 602 00 Brno, Czech Republic Email: [email protected] CATALOGUING IN PUBLICATION NATIONAL LIBRARY OF CZECH REPUBLIC ˇ KATALOGIZACE V KNIZE - NÁRODNÍ KNIHOVNA CR CESNET Conference 2008 (Praha, Česko) CESNET Conference 2008 : security, middleware, and virtualization – glue of future networks : Prague, Czech Republic, September 25 – 26, 2008 : proceedings / Gabriela Krčmařová, Petr Sojka. –Praha : CESNET, 2008. –- x, 142 s. ISBN 978-80-904173-0-4 (brož.) 004.4 * 004.7:004.451.2 * 004.7.056 - software - computer network administration - computer network security - sborníky konferencí - proceedings of conferences - software - správa počítačových sítí - zabezpečení počítačových sítí 004.6 - Interfacing and communications [23] 004.7 - Počítačové sítě [23] ISBN 978-80-904173-0-4 This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the Czech Copyright Law, in its current version, and permission for use must always be obtained from CESNET. Violations are liable for prosecution under the Czech Copyright Law. c CESNET, z. s. p. o., 2008

Printed by Tribun EU in the Czech Republic. Number of copies: 150. Cover design: Pavel Satrapa. Typeset by program pdfTEX and Petr Sojka from data provided by authors. Not for sale.

Introduction

Motto: Network for Research and Research for Network CESNET Conference 2008 is organised by CESNET, two years after the 10th CESNET anniversary conference in 2006. This event also fulfils the promised beginning of a tradition of regular bi-annual conferences focusing on advances in highspeed communication technologies, Grids, related middleware and their sophisticated applications. Conference topics This year conference focuses on areas that stay behind not only the network itself, but are the basic building bricks of general e-Infrastructures. The three major topics are: Security conceptual, technical, and organizational aspects of the broadly understood security in networks, Grids, and generic e-Infrastructures; classical (like Kerberos and PKI) and novel (federations) approaches to authentication; new authorization mechanisms, tools, and infrastructures that could serve networks and Grids together; security in sensor networks and peer to peer systems; security in healthcare and other sensitive application areas; the concept of trust and its implementation in distributed systems. Middleware software components that lie between individual components and the vision of the hidden complexity of large-scale distributed systems like networks, and grids; planning and scheduling algorithms and systems; multimedia and other large capacity demanding content distribution; overlay and user empowered networks, ROI; security related middleware Virtualization virtual networks, computing nodes and data depots; virtual clusters and Grids; novel techniques in network virtualization; virtual organizations, user and resource management; new addressing and identification paradigms About CESNET CESNET was established in 1996 as an association of major Czech universities and the Czech Academy of Sciences with the aim of building and developing the national broadband computer network for science, research and education. From the very beginning of its existence, CESNET has assumed a leading position in the research and development of high-speed networks and communication technologies in the Czech Republic. Shortly after its foundation, CESNET joined the international networking

VI

Introduction

community and its specialists actively participated in a number of international research projects since 1997. Nowadays, CESNET operates and develops the CESNET2 national research network and performs applied research in a number of areas such as optical transmission technologies, programmable hardware, Grids, network monitoring, authentication/authorisation infrastructure and mobility. CESNET participates in pan-European projects GN2 and EGEE II/III and other international activities. More information about CESNET is available from http://www.ces.net.

Organization

Programme Committee Ludˇek Matyska, CESNET, Czech Republic (chair) Jan Gruntorád, CESNET, Czech Republic (vice chair) Jim Basney, NCSA, USA Mauro Campanella, GARR, Italy Bob Cowles, SLAC, USA Mario Freire, University of Coimbre, Portugal Eva Hladká, CESNET, Czech Republic Kate Keahey, ANL, USA Daniel Kouˇril, CESNET, Czech Republic Erwin Laure, CERN, Switzerland Ladislav Lhotka, CESNET, Czech Republic Miron Livny, University of Wisconsin, USA Vasilis Maglaris, GRNET, Greece Johan Montagnat, CNRS, France Syed Naqvi, CETIC, Belgium Milan Šárek, CESNET, Czech Republic Shuji Shimizu, Kyushu University, Japan Milan Sova, CESNET, Czech Republic Christoph Witzig, SWITCH, Switzerland Hans Döbbeling, DANTE, UK

Sponsors Platinum sponsor is Cisco Systems, Inc, Czech Republic, golden sponsor is INTERCOM SYSTEMS, Inc..

Table of Contents

I

Middleware

Job Centric Monitoring on the Grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Miroslav Ruda, Jiˇrí Sitera, Aleš Kˇrenek, Ludˇek Matyska, Zdenˇek Šustr, Michal Voc˚u (CESNET, Prague, Czech Republic)

3

Quantification of Traffic Burstiness with MAPI Middleware . . . . . . . . . . . . . . . . Sven Ubik, Aleš Friedl, Stanislav Hotmar (CESNET, Prague, Czech Republic)

13

SAML Metadata Management for eduID.cz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Milan Sova, Jan Tomášek (CESNET, Prague, Czech Republic)

23

II

Security

Security Risks in IP Telephony . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ˇ c, Miroslav Vozˇnák, Jan R˚užiˇcka (CESNET, Prague, Czech Republic) Filip Rezáˇ

31

Survey of Authentication Mechanisms for Grids . . . . . . . . . . . . . . . . . . . . . . . . . . Daniel Kouˇril, Ludˇek Matyska, Michal Procházka (CESNET, Prague, Czech Republic)

39

Flow Based Network Intrusion Detection System using Hardware-Accelerated NetFlow Probes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Karel Bartoš (CESNET, Prague, Czech Republic), Martin Grill (Czech Technical University, Prague, Czech Republic), Vojtˇech Krmíˇcek (Masaryk University, Brno, Czech Republic), Martin Rehák (Czech Technical University, ˇ Prague, Czech Republic), Pavel Celeda (Masaryk University, Brno, Czech Republic) Challenges of Deploying Scalable Virtual Infrastructures – A Security Perspective Syed Naqvi, Philippe Massonet (Centre of Excellence in Information and Communication Technologies, Belgium), Joseph Latanicki (Thales Theresis, France)

III

49

57

Network Management

A Case for Application-Level Control of Network Resources . . . . . . . . . . . . . . . . Andrei Hutanu, Gabrielle Allen (Louisiana State University, Baton Rouge, United States)

69

X

Table of Contents

Secure Remote Configuration of Network Devices – a Case Study . . . . . . . . . . . ˇ Radek Krejˇcí, Ladislav Lhotka, Pavel Celeda, Petr Špringl (CESNET, Prague, Czech Republic)

77

Fault Management in Hybrid Environment with IP and Optical Networks . . . . . . Te-Lung Liu, Hui-Min Tseng, Chu-Sing Yang and C. Eugene Yeh (National Center for High-Performance Computing, Taiwan)

85

Universal Virtual Node as One of the Building Bricks for FEDERICA Network . Jiˇrí Navrátil, Jan Fürman (CESNET, Prague, Czech Republic)

95

IV

Applications

DVTS Videoconferencing with Quatre . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shuji Shimizu (Kyushu University, Fukuoka, Japan), Koji Okamura, Jiˇrí Navrátil (CESNET, Prague, Czech Republic)

113

Encapsulation of a Communication Reflector into a Virtual Machine . . . . . . . . . . ˇ Aleš Cervenka (Masaryk University, Brno, Czech Republic)

123

Applied Information Technologies for Development of Continuous Shared Health Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Miroslav Nagy (EuroMISE, Prague, Czech Republic), Petr Hanzlíˇcek, Matej Dioszegi, Jana Zvárová, Petra Preckova, Libor Seidl, Karel Zvára, Vít Bureš (Medicalc software, Pilsen, Czech Republic), Daniel Šubrt

131

Subject Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

Part I

Middleware

Job Centric Monitoring on the Grid∗ 7 years of experience with L&B and JP services Miroslav Ruda, Jiˇrí Sitera, Aleš Kˇrenek, Ludˇek Matyska, Zdenˇek Šustr, and Michal Voc˚u CESNET z.s.p.o., Zikova 4, 160 00 Praha 6, Czech Republic email: [email protected]

Abstract. Users are approaching Grids with expectations of transparency and easy to use environment. They look for simple interface to submit a job and to retrieve results of its run, eventually following changes in job state. Logging and Bookkeeping service (L&B) introduced in the EU DataGrid project 7 years ago, replaces tedious inspection of log files at individual Grid components and provides job state information through a single interface. Based on a push model, L&B is an infrastructure that collects events from all Grid components job is passing through and process them in a fault tolerant way. Users can also add annotations, to further classify their jobs. As L&B is focused on running jobs, it is complemented by the Job Provenance service (JP) that provides a long term storage and access to the job information. While L&B is for several years in production use on the largest Grid infrastructure operated by the EGEE projects and has been already deployed in other environments, JP capabilities to support scientific workflows and enhance the ways scientists can deal with a Grid had been demonstrated recently at several occasions.

1

Job tracking service in a Grid

Nowadays Grids are complex heterogeneous systems composed from a very large number of hardware and software components. However, users expect a transparent access to Grid infrastructure, with a simple interface to submit jobs and retrieve results. Such an interface is not sufficient when users are either interested in more detailed information about their job state or when some error occurs and they want to understand its reason. Also, system administrators need information about jobs passing through the infrastructure, as its complex nature makes errors inevitable. Jobs move freely through the Grid, being processed by schedulers and passing internal administrative domains of individual resource providers. It is possible to dig the job state and associated information from the Grid infrastructure monitoring tools, but it is very difficult and error prone task. The Job centric Grid monitoring concept [1,2] introduces a specific service tracking Grid jobs. This service collects information about all jobs, stores it in one place and process it to deliver a coherent and concise job status view to users. Users deal with ∗ This work has been supported by the Czech Research Intent MSM6383917201. Logging

and Bookkeeping service is currently developed and maintained in the EU EGEE-III project, INFSO-RI-222667. Gabriela Krˇcmaˇrová, Petr Sojka (Eds.): CESNET Conference 2008, Proceedings, pp. 3–12, 2008. c CESNET, z. s. p. o., 2008

4

M. Ruda, J. Sitera, A. Kˇrenek, L. Matyska, Z. Šustr, M. Voc˚u

this service only, with guarantees that only those authorized have access to job related data. The Logging and Bookkeeping service (L&B) is an implementation of this concept. L&B guarantees non-blocking job related events delivery to a central database where the events are processed and actual job state is computed. With redundant events, the job state automaton is highly resilient and able to provide actual job state even when some events are delayed and delivered out of order. All the data transfers are secured, with strong authorization checks for data access. Optimized query interface can be used to ask complex questions, user annotations can help with even more detailed job classification. When a job finishes, all its data are removed from the L&B database, to keep it in reasonable size. However, users are interested in their jobs for much longer time period – they may need to inspect the exact parameters used to run a job, to check which Computing Element was used etc. Job Provenance service (JP) [12] is a conceptual extension of L&B to deal with indefinite storage of job related information. When job finishes on a Grid, its data are moved from L&B to the JP and there kept without restriction. To make the “provenance” more complete, additional data – sandboxes, runtime environment related information – are also stored in JP. To deal with potentially tremendous amount of data – the current target is to serve one million of jobs per day per a single L&B service instantiation – and simultaneously to provide extensive querying capabilities, JP Primary Server (JPPS) is optimized to store the data while serving arbitrary JP Index Servers (JPIS) that are optimized to process even complex queries. Other systems are also providing means to trail jobs over the Grid infrastructure. However, they are usually coupled with the resource management systems (RMS), focusing on running jobs only and without support for fault-tolerance. Single RMS instance is often used as authoritative source of job related information. Globus Grid Toolkit 4 includes information service MDS3 [17], which also publishes information about running jobs, but no care is taken to provide an aggregate information coming from different Grid components. More scalable and robust solution is provided within the Condor system [18], capable to submit to and monitor jobs in different Grid environments (including Globus and UNICORE). Condor provides short history information (equivalent to the L&B history), no permanent information with timescale of Job Provenance is provided. Higher level tools, e. g. Experiment Dashboard [7], GridView, or MonALISA [4] leverage from data received from various monitoring systems, and provide customized views for the users.

2

L&B architecture and implementation

The Logging and Bookkeeping service was initially developed in the EU DataGrid project1 as a part of the Workload Management System (WMS). The development continues in the series of EGEE projects2 where L&B became an independent part of the gLite middleware [5]. While L&B supported only gLite jobs originally, recently it was extended [6] to support native PBS [3] and Condor [18] jobs, CREAM [11] support is expected soon. 1 http://eu-datagrid.web.cern.ch/eu-datagrid/ 2 http://www.eu-egee.org

Job Centric Monitoring on the Grid – 7 years of experience with L&B and JP services

2.1

5

Concepts

Global job identifier. Each job on a Grid is assigned a unique identifier (the Grid jobid or just jobid) on an job entry on a Grid. Jobid includes a name of the L&B server that will keep all job related events. As jobid is passed through all Grid components together with job description, each component “knows” where to send events related to each particular job. Jobid also serves as an URL pointing to L&B server where users can get information about their job3 . This “source routing” approach allows each Grid component to identify where to send events without relying on any external service – the most robust and reliable setup. And we never found any serious impact of the emerging restrictions, especially the immutability of L&B server assignment for any single job.

L&B events. Every Grid component dealing with the job during its lifetime may be a source of information about the job. The L&B gathers information from all the relevant components in a form of L&B events. Each event marks important point in the job lifetime, e. g. transferring job between components, finding a matching resource, starting execution, etc. The L&B collects intentionally duplicate information – the event scheme has been designed to be as redundant as possible – and this redundancy is used to improve resiliency in a presence of component or network failures, which are omnipresent on any Grid. For instance, when the job is transferred between two components, both report the operation. The gathering is based on the push model where the components are actively producing and sending events. The push model offers higher performance and scalability than the pull model, where the components are to be queried by the server. No event source must be known in advance, the L&B just listens for and accepts events on a defined interface. The event delivery is asynchronous and based on the store-and-forward model. This minimizes performance impact and achieves highest reliability. Only the first step in the chain is synchronous – the primary event source, usually the user interface, receives a confirmation from the nearest L&B component that the event was registered and accepted for event delivery.

Job status. The events contain low level detailed information about the job processing. While valuable for problem tracking, more abstract description is better suitable for routine inspection. Moreover, the events could arrive in wrong order, making the interpretation of raw information difficult and not straightforward. Therefore, incoming raw events undergo complex processing, yielding a high level view, the job state, that is the primary type of data presented to the user. Fig. 1 shows job states in gLite, as well as transitions among them (for details see [10]). 3 even using an ordinary web browser

6

M. Ruda, J. Sitera, A. Kˇrenek, L. Matyska, Z. Šustr, M. Voc˚u SUBMITTED

WAITING

READY

CANCELLED

SCHEDULED

ABORTED

RUNNING

DONE(failed)

DONE(ok)

CLEARED

Fig. 1. L&B job state diagram

2.2

Components and their interactions

The principal components of the L&B service are logger and server. The task of the logger component is taking over the events from their primary sources, storing them reliably, and forwarding to the destination server (see the Computing element cloud in Fig. 2). Logger also manages delivery queues (one per destination server). Logger reports success immediately to the logging component, taking care of reliable delivery itself. All the events are eventually delivered, stored, and processed at the L&B server. Incoming events are parsed, checked for correctness, authorized (only the job owner or a Grid component acting on her behalf can store events belonging to a particular job), and stored into a database. When an event is accepted, the job state is updated by a job-state machine (shown schematically in Fig. 1). The users can query the server actively to retrieve either job states or raw events. Querying capabilities are fairly complex, besides single-job queries the user may specify conditions similar to a restricted SQL WHERE clause, e. g. “what are my jobs currently running at Computing element X?”. Alternatively, the users can subscribe to receive notifications when their jobs enter a specified state (with the same richness of specifying the conditions as in the query case). With the subscription for a notification the L&B library also spawns a callback listener. When a job enters a matching state, a special L&B event is generated and delivered to the listener using the same mechanism (i. e. the reliable logger component) as the primary event delivery. L&B server can work also in a proxy mode. It takes over the role of the logger daemon – it accepts the incoming events, stores them in local database, and forwards them to the full server. However, querying capabilities of the L&B server over the locally stored events are also offered. Proxy mode is exploited by WMS in gLite in order to

Job Centric Monitoring on the Grid – 7 years of experience with L&B and JP services

7

query job information back and to avoid the need of independently keeping per-job state information.

Fig. 2. L&B components and their interaction

2.3

Performance analysis and tuning

With the synchronous first step and with the speed it can provide actual job state, the L&B service performance has important influence on user’s perception of the Grid “quality”. An internal goal to achieve a throughput of one million jobs per a day (aka megajob) has been set and the actual performance is tested in real environment. The results in [19] indicate that the megajob goal is quite realistic – throughput of logger component is 100,000 jobs/day (not true if L&B proxy is used), L&B server can accept more than 500,000 jobs/day and components providing transfer of events between components are already able to transfer more that million of jobs per day. Roughly the same throughput is achieved with queries. However, one million of jobs per day means approx. 11 jobs per second, therefore the query interface can be

8

M. Ruda, J. Sitera, A. Kˇrenek, L. Matyska, Z. Šustr, M. Voc˚u

rather easily saturated with frequent queries. While such queries mostly return the same information (job state does not change too frequently), the L&B notifications carry virtually only new information and are much more suited for these high throughput rates. With a careful setup we are also able to deliver notifications on around 500,000 jobs/day. 2.4

Production deployment

The L&B is deployed as one of mandatory services on the EGEE III Grid, that consists of approx. 250 sites in 51 countries worldwide with 68,000 CPUs currently available to some 8,000 users comming from more than 15 application domains. L&B logger components are installed at each site, to serve all Grid components. Moreover, several dozens of WMS installations are used, each having its L&B proxy attached, and roughly the same number of L&B servers. The whole infrastructure handles a sustained load of 150,000 jobs per day, every job generating about a dozen of L&B events. These numbers illustrate the scale of the L&B deployment, up-to-date quantitative information can be found at http://www.eu-egee.org. Besides direct exploitation by the end users, the data gathered by L&B are of particular interest for Grid monitoring systems like Experiment Dashboard [7] or GridView. Due to direct interaction of L&B with other Grid components the data are very accurate, they are already aggregated from different sources, and the whole L&B infrastructure is particularly reliable. Such systems rely heavily on the L&B notifications [8], as the massive job queries would degrade seriously the L&B server performance. Use of notifications in this context lead to their further extensions and they were streamlined in order to allow wider subscriptions, e. g. all jobs of a specific VO. We also demonstrated the use of L&B data in early detection of abrupt changes in behavior of Grid components (e. g., a computing element failure to process jobs) [9].

3 3.1

Beyond L&B — the Job Provenance service Main concepts

The architecture of L&B is unsuitable to store huge numbers of job records, consequently not being able to keep job information for a longer period of time. This becomes the purpose of the Job Provenance. Such a service must fulfill various rather contradictory requirements: keep detailed information for long term (size efficient storage method), allow data mining queries (query efficient storage), be able to store and homogeneously interpret various data (coming from different services or user tools) while coping with primary data format changes during the storage period. Data representation. The JP uses two views on data. They are stored in raw representation while the manipulation is done using the logical view. The raw representation of data is formed by arbitrary tags (name/value pairs) and files. Arbitrary structure or type of tags and files is allowed. It is expected that the files are structured (e. g. complete L&B dump or application specific log) but they are stored “as is”.

Job Centric Monitoring on the Grid – 7 years of experience with L&B and JP services

9

Job Provenance (JP)

User

Job life cycle Annotate job

Computing Element

Record “measurements”

JP Primary Storage Primary JP Primary Storage front-end Storage

JP Index Server

JP Index Server feed

Register job

User Interface

Submit

Logging & Bookkeeping

Get job info

Retrieve raw files

Lookup

2

Upload LB dump

Submit

Workload Manager

Upload sandbox

JP Storage Primary Storage Backend back-end

Get files

3

1

JP Client

Fig. 3. JP architecture and interactions The logical view consists of attributes (name/value pairs) attached to jobs. An uploaded file is usually a source of multiple attributes, which are automatically extracted via plugins. To avoid naming conflicts even with future attributes, an attribute name always falls into a namespace. Currently we declare three different namespaces: for JP system attributes (e. g. job owner or registration time), attributes inherited from L&B, and unqualified user tags. As the JP also acts as a primary storage of historical records about Grid operation, the data are stored under WORM (write once read many times) semantics. The attributes, representing the logical view, are the only way to specify queries on JP. However, once the user knows an actual jobid, bulk files can be retrieved in the raw form, too. Layered architecture. JP consists of two services: a permanent Primary Storage (JPPS) and possibly volatile and configurable Index Servers (JPIS). JPPS stores all the data in compact form suitable for long term archival and it serves as the logical view data source for a set of Index Servers. JPPS is a permanent well-known service. On the contrary, a JPIS instance is optimized (via its configuration) to provide effective queries on particular subset of data (e. g. jobs of particular VO, submitted in certain period, subset of available attributes). Relationship of JPPS and JPIS is many-tomany – a single JPIS can query multiple JPPS’s and vice versa, a single JPPS is ready to stream data to multiple JPIS’s. 3.2

Implementation

Overall architecture of JP service is depicted on Fig. 3. A Primary Storage is formed by a front-end, exposing its operations via a web-service interface, and a back-end, responsible for actual data storage and providing the bulk file transfer interface. The front-end metadata (e. g. authorization information, list of files, etc.) are stored in a relational database. The bulk file transfer interface is implemented by gridftp server

10

M. Ruda, J. Sitera, A. Kˇrenek, L. Matyska, Z. Šustr, M. Voc˚u

with appropriate authorization plugin. Both the front- and back-ends share a filesystem so that the file-type plugins linked into the front-end access their files via POSIX I/O. The interface between Primary Storage and Index Server (data feeding) is completely based on web services. The Index Server is designed to use web-service interface both to communicate to the JPPS and to provide services for users. It stores all the data in a relational database. The database schema is adapted automatically based on Index Server configuration in order to store defined data subset in a form most suitable for expected queries.

4

Grid job as a scientific experiment — application view

Quality of insight into records of experiment history is a key factor in efficient and coherent operation of any scientific group. Each user group keeps records of experiments prepared and executed, typically combined with evaluation data. This information is typically used to propose further experiments. Large application communities often develop their custom systems to keep such records. However, due to strong specialization, these systems are difficult to be reused by another application. On the contrary, JP is designed as a generic service to be used as a basis for building such application-specific systems fairly easily. Via user attributes, information gained automatically from the Grid infrastructure can be augmented with application-specific data describing Grid jobs as particular scientific experiments. In a series of experiments, we demonstrated both the usability of JP to serve particular scientific record-keeping and inspection need, but also its generic nature to serve as a tool for custom-built front-end application. Besides these demonstrations we did also an assessment of JP capabilities w. r. t. large application-specific repository (the Atlas experiment) [14]. 4.1

Computational Chemistry – Molecular Docking

JP infrastructure has been used to support the specific collaborative workflow employed by the Czech National Center for Biomolecular Research for molecular docking studies. Their experiments require pre-screening of a fairly large number (hundreds of thousands) of potential matches, identifying prospective results, and running more experiments on these. The task is done in two stages. Firstly, the results are evaluated automatically in the course of the pre-screening process, and evaluations are stored as user attributes in JP. A custom-built graphical client accessing the JP through its web service interface allows users to see a visual summary of all experiments run with potential matches highlighted. Users may browse individual jobs and attributes, visualize results, and append text-based annotations and numeric ranks (additional JP user attributes) to individual jobs. Based on both the automatic evaluation and the expert-assigned rank, team members may then use the front-end application to submit more experiments for relevant molecule combinations, typically using more complex algorithms and finer grids to achieve more precise results. Since JP keeps track of all parameters used to run the pre-screening jobs, it is possible to modify and re-submit the same computation easily.

Job Centric Monitoring on the Grid – 7 years of experience with L&B and JP services

11

The work was presented as demonstration at the 2nd EGEE User Forum and described in [13].

4.2

Particle Physics – The Pierre Auger Experiment

A custom-built client relying in the web-service interface has also been used to support the particle physics community involved with the Pierre Auger Experiment. The user group uses JP to keep track of jobs that have been submitted into the Grid environment throughout the project’s history, and to monitor job completion rates and identify computing element reliability issues. JP user attributes allow them to sort the results and to focus on particular job classes [15].

4.3

Medical Research – Support for Parametric Studies

One of the most recent experiments focuses on helping the experts evaluate results gathered in massive (millions of jobs) parametric simulations. A simple graphical interface allows them to compare results achieved with various parameter settings (stored as user attributes in JP). Given the large number of information to process, a new approach has been tested bypassing the JP’s web service environment and cooperating directly with JP Index Server’s database [16].

5

Summary

L&B service represents a complex distributed component of a general Grid middleware. Being several years deployed and extensively used on the largest Grid infrastructure (the EGEE Grid) as well as in other environments, it demonstrates a strength of simple architectural concepts when dealing with large scale distributed systems. The high redundancy, leading to resilience and high reliability, independence of any third party service (e. g. L&B server address directly incorporated in the jobid), and asynchronous (store and forward) reliable delivery model contributed to the high performance, overall success and usability of this service. Its success motivated further work to cover the need for very long storage of job related data. The Job Provenance service, recently introduced, has already attracted interest from scientific groups as a “laboratory notebook”. Several demonstrations confirmed that JP is a suitable background tool to support complex scientific Grid-use workflows, checking the progress of complex simulations (composed of thousands to millions of jobs), revealing hidden patterns in the computation (including detection of configuration or input errors) and serving thus as a new powerful tool directly to scientists. In the future we plan to focus on further improvement of both services, their integration with new components (e. g. for Grid status monitoring and inspection) and also as tools to help make Grid use an easy and valuable task.

12

M. Ruda, J. Sitera, A. Kˇrenek, L. Matyska, Z. Šustr, M. Voc˚u

References 1. D. Kouˇril et al. Distributed tracking, storage, and re-use of job state information on the Grid. In Computing in High Energy and Nuclear Physics (CHEP04), 2004 2. F. Dvoˇrák et al. Services for tracking and archival of Grid job information. In Proceeding of Cracow Grid Workshop, 2005 3. R. Henderson and D. Tweten. Portable batch system: External reference specification. NASA, Ames Research Center, 1996 4. I. C. Legrand et al. MonALISA: An Agent based, Dynamic Service System to Monitor, Control and Optimize Grid based Applications. In: Computing in High Energy and Nuclear Physics (CHEP04), 2004 5. E. Laure et al. Middleware for the next generation Grid infrastructure. In: Computing in High Energy Physics and Nuclear Physics (CHEP04), 2004 6. M. Ruda et al. A uniform job monitoring service in multiple job universes. In GMW ’07: Proceedings of the 2007 workshop on Grid monitoring , ACM 2007. Pages 17–22 7. J. Andreeva et al. Experiment Dashboard: the monitoring system for the LHC experiments. In GMW ’07: Proceedings of the 2007 workshop on Grid monitoring, ACM 2007. Pages 45–49 8. J. Andreeva, A. Kˇrenek, J. Casey et al. Monitoring Grid Jobs with L&B Notifications in GridView and Experiment Dashboard. Submitted for EGEE ’08, 2008 9. C. Germain-Renaud and A. Kˇrenek. Early failure detection: a method and some applications. 3rd EGEE User Forum, Clermont-Ferrand, 2008 10. L. Matyska et al. Job Tracking on a Grid – the Logging and Bookkeeping and Job Provenance Services. CESNET technical report 9/2007 11. P. Andreetto et al. Job Submission and Management Through Web Services: the Experience with the CREAM Service. In: Proc. Computing in High Energy and Nuclear Physics (CHEP07). J. Phys.: Conf. Series, 2008 12. F. Dvorák et al. gLite Job Provenance. In Provenance and Annotation of Data, International Provenance and Annotation Workshop (IPAW06). Lecture Notes in Computer Science, vol. 4145. 2006 13. A. Kˇrenek et al. Multiple Ligand Trajectory Docking Study – Semiautomatic Analysis of Molecular Dynamics Simulations using EGEE gLite Services. In Proc. Euromicro Conference on Parallel Distributed and network-based Processing (PDP’08), 2008 14. A. Kˇrenek et al. Experimental Evaluation of Job Provenance in ATLAS Environment. In Proc. Computing in High Energy and Nuclear Physics (CHEP07). J. Phys.: Conf. Series, 2008 (accepted). 15. J. Schovancová et al. VO AUGER Large Scale Monte Carlo Simulations using the EGEE Grid Environment. 3rd EGEE User Forum, Clermont-Ferrand, France, 2008 16. A. Kˇrenek et al. Job Provenance – Insight into very large provenance datasets. IPAW 2008, Salt Lake City, USA, 2008 (accepted) 17. I. Foster. Globus toolkit version 4: Software for service-oriented systems. In IFIP International Conference on Network and Parallel Computing, Springer-Verlag LNCS 3779,, pages 2–13, 2005 18. Douglas Thain, Todd Tannenbaum, and Miron Livny. Distributed computing in practice: the Condor experience. Concurrency–Practice and Experience, 17(2-4):323–356, 2005 19. Michal Vocu et al. The megajob challenge – L&B performance tests. EGEE JRA1 All-hands meeting, 2006. http://indico.cern.ch/conferenceDisplay.py?confId=a062598.

Quantification of Traffic Burstiness with MAPI Middleware Sven Ubik, Aleš Friedl, Stanislav Hotmar CESNET, Prague, Czech Republic Key words: network traffic dynamics, passive monitoring, packet bursts, traffic burstiness, delay and jitter

1

Introduction and motivation

Network load monitoring is useful for network planning and performance problem troubleshooting. We can monitor network load in different time scales ranging from long-term averages to shortterm peak monitoring. Network traffic tends to be bursty for a variety of reasons including protocol design, user behaviour and traffic aggregation [1]. Consequently, load monitoring in fine time scales often reveals peaks of load much higher than in long-term averages. If the network is not provisioned for these peaks of load, packet queues in routers provide temporary buffers. In an extreme case when some queue overflows packet loss occurs. But even when packets fit into the queue, additional delay and jitter are introduced, which can negatively affect real-time video and audio application as well as behaviour of data transport protocols. Link capacity has been recently precisely defined [2]. But this definition applies only to relatively long-term traffic dynamics. Metrics for traffic burstiness have not yet been defined and methods to monitor traffic burstiness are not well understood. In this paper we explore how traffic burstiness can be quantified, how it can be monitored transparently using various packet capture hardware and what are likely consequences of traffic bursts.

2

Link capacity and traffic burstiness

When observing network load, we can define several terms [2]: Nominal physical link capacity is the theoretical maximum amount of data that the link can support measured at the physical layer. It includes any inter-frame gaps. For example, for Gigabit Ethernet it equals to 19 bits per second. IP-layer link capacity is the maximum number of IP-layer bits that can be transmitted over the link during a specified time interval (usually per second). IP-layer link usage is the actual number of IP-layer bits that are correctly received over the link during a specified time interval. IP-layer link utilization is equal to the link usage divided by link capacity and is therefore between zero and one. It can also be multiplied by 100 and given as percentage. IP-layer available link capacity is the complement of link usage to link capacity. It can also be defined for a network path as a minimum of available capacity on all links comprising the path. While not specified in [2], some additional metrics are frequently used: Gabriela Krˇcmaˇrová, Petr Sojka (Eds.): CESNET Conference 2008, Proceedings, pp. 13–22, 2008. c CESNET, z. s. p. o., 2008

14

Sven Ubik, Aleš Friedl, Stanislav Hotmar

– All the above metrics can be also defined for a network path (in addition to the available link capacity). In that case we are interested in the maximum value over all links for usage and utilization and and minimum value for nominal physical link capacity and IP-layer link capacity. – The link with the smallest IP-layer link capacity is commonly referred to as the "narrow link" and the link with the smallest available capacity is often referred to as the "tight link". – Throughput sometimes also called bulk transfer capacity or goodput, is the volume of additional data that can be transferred over the network path. It is measured at the transport layer (at the payload of a transport layer protocol) and depends on the transport protocols used to transfer data already carried over the network and newly added data. Elastic transport protocols, such as TCP, react to network overload (detected by packet drop) by reducing speed of sending data into the network. – The term bandwidth is sometimes used interchangeably with capacity, although electrical engineers use bandwidth to denote the signal spectrum width. The relationships between these metrics is illustrated in Fig. 1. While nominal physical link capacity is typically constant, the other metrics are varying in time. Available capacity cannot be measured directly, but we can measure link usage and compute available capacity as complement to link capacity.

capacity

nominal physical capacity IP-layer capacity

link-layer overhead IP-layer available capacity

average

time period

IP-layer usage time

Fig. 1. Metrics for link load measurements

Different time periods of link usage monitoring commonly show different fluctuations measured over the same traffic. Long-term averages smooth out short peaks and drops, which are observable in short-term averages. We may ask what is the "right" time period? It depends on the purpose of monitoring. For accounting purposes, long-term averages can be sufficient. In order to detect peaks of load that can result in delay or loss in packet queues, the shorter time period, the more detailed view on traffic dynamics. However, when we get down to the packet transmission times, the link is always either fully loaded when a packet is currently being transmitted or fully unloaded when no packet is currently being transmitted. The fully loaded condition can span more packets when they are transmitted one after another, thus forming a packet burst. The situation is illustrated in Fig. 2. While there is a metrics for burst loss [3,4], we currently do not have an official definition of a packet burst. We will therefore define a packet burst as: a sequence of consecutive packets with

inter-frame gap

packet on link

Quantification of Traffic Burstiness with MAPI Middleware

15

burst time

burst

burst

maximum inter-frame gap within a burst frame arrival time

Fig. 2. Packet bursts

inter-frame gaps no greater than a specified parameter, while the inter-frame gap before and after this sequence of packets is greater than a specified parameter. One possible inter-frame gap that can limit a packet burst is the minimum possible inter-frame gap on the given link type. For Ethernet links ranging from 10 Mb/s to 10 Gb/s the minimum inter-frame gap (IFG) measured from the last bit of the CRC field of one packet to the first bit of preamble of the next packet is equal to 12 byte slots at the given nominal physical link capacity. For example, for 10 Gigabit Ethernet 12 byte slots equal to 9.6 ns. Packet bursts can be quantified by measuring their length in time, a number of packets or a number of bytes. We can also quantify the inverse of packet bursts, that is the space between bursts and burst arrival times, that is times between starts of consecutive bursts. Various statistics can be applied to quantified packet bursts, such as minimum, maximum, average or quantile of packet burst sizes, space between bursts and their arrival times. Another possible reasoning is that an inter-frame gap so small that no valid packet can fit into it (including minimum possible inter-frame gaps before and after the packet) is a continuation of a burst. Alternatively, to assess the effect of packet bursts on delay and packet loss due to packet queue build-up, a packet burst can be considered as continuing when usable byte slots in an inter-arrival time between two packets destined for the same output router port are less than the space needed to send the first packet in a frame at the nominal physical link capacity of the output port. This can happen when the sum of nominal physical link capacitites of all input ports exceed the nominal physical link capacity of the output port, which is a common case. This situation is shown in Fig. 3. During a packet burst, the packet queue builds-up. After a packet burst ends, the available link capacity (considering only currently arriving packets as link usage) is used to empty the queue. Of course it can happen that another burst arrives before the queue is emptied and it can even fill the queue to a higher level than the previous burst. Although there was a work based on monitoring the volume of traffic that temporarily exceeds predefined or dynamically computed average link usage [7], it is difficult to capture and monitor traffic from all input router ports destined to particular output port. It is much easier to monitor the output port directly, such as a port to the network backbone on a border router with input ports aggregating user traffic. Since the queue is emptied at the nominal physical link capacity, we can make a hypothesis that the size of a packet burst in bits measured on the router output port divided by the nominal physical link capacity is the upper limit of delay added due to queue build-up. Distribution of

16

Sven Ubik, Aleš Friedl, Stanislav Hotmar

incoming traffic

packet burst sizes among the total volume of traffic is then a probability distribution of upper limits of delay that packets can encounter when passing through the router to the monitored link.

outgoing capacity

burst

time

Fig. 3. Packet queue built-up due to a packet burst

3

Traffic burstiness monitoring with MAPI middleware

We designed a system for monitoring of distribution of packet bursts sizes. It can be used to assess effects of packet bursts in real network traffic on added delay, jitter and possibly packet loss. It is a passive monitoring system that works on captured packets and therefore it does not affect user traffic in any way. Only packet sizes and packet capture times are used in the system. User privacy is not affected, because no information inside packets is used. The system works also for encrypted user traffic. We implemented our application using MAPI [8] middleware. MAPI is a library of functions for development of portable monitoring applications at a higher-level of abstraction. Application functionality is determined by a selection and order of predefined or user-defined monitoring functions applied on captured packet stream. A key benefit of MAPI is that allows applications to run transparently on different packet capture hardware. This is achieved by supporting each type of hardware by a separate implementation of monitoring functions. These functions can utilize whatever hardware acceleration is possible with the given hardware. Software implementation is provided in stdflib library that works for all hardware including regular Ethernet cards. The architecture is shown in Fig. 4. A monitored link is tapped by an optical splitter or a mirroring port on a router. Packets are captured by an Ethernet card or a specialized monitoring card that can provide hardware acceleration, such as DAG [9] or COMBO [10] card. We implemented two versions of a BURST monitoring function for MAPI: in stdflib library for regular Ethernet cards and in combo6flib library for a COMBO6X card. Both versions are designed for Gigabit Ethernet links. We are currently working on the implementation for our MTPP (Modular Traffic Processing Platform) hardware, which will operate on 10 Gigabit Ethernet at the full line rate. The BURST function accepts the following set of arguments: iftime The maximum inter-frame gap that is still considered as a continuation of a burst. For implementation purposes, it is specified including the packet preamble and the start-of-frame delimiter and it is expressed in nanoseconds. For example, the minimum value for Gigabit Ethernet is 160 ns. min Bursts smaller than this value are counted together.

Quantification of Traffic Burstiness with MAPI Middleware Configuration

application

17

Results

DiMAPI stub

DiMAPI

network

mapicommd + mapid

standard NIC splitter monitored line

...

mapicommd + mapid monitoring adapter (DAG,..)

remote monitoring stations (PCs)

port mirroring monitored line

Fig. 4. Application architecture

max Bursts larger than or equal to this value are counted together. step In between the minimum and maximum burst sizes specified by previous parameters, bursts are counted in classes separated by this step. Min, max and step arguments determine categories where bursts and gaps between bursts will be sorted: Category 0 1 254 255

Burst or gap size in bytes