Investigating uncertainty and sensitivity in integrated, multimedia environmental models: tools for FRAMES-3MRA

Environmental Modelling & Software 20 (2005) 1043–1055 www.elsevier.com/locate/envsoft Investigating uncertainty and sensitivity in integrated, multi...
Author: Sophie Bond
11 downloads 1 Views 675KB Size
Environmental Modelling & Software 20 (2005) 1043–1055 www.elsevier.com/locate/envsoft

Investigating uncertainty and sensitivity in integrated, multimedia environmental models: tools for FRAMES-3MRA J.E. Babendreiera,*, K.J. Castletonb a

Ecosystems Research Division, National Exposure Research Laboratory, Office of Research and Development, United States Environmental Protection Agency, 960 College Station Road, Athens, GA 30605, USA b Pacific Northwest National Laboratory, United States Department of Energy, operated by Battelle Memorial Institute, Richland, WA 99352, USA Received 19 May 2004; received in revised form 2 July 2004; accepted 23 September 2004

Abstract Elucidating uncertainty and sensitivity structures in environmental models can be a difficult task, even for low-order, singlemedium constructs driven by a unique set of site-specific data. Quantitative assessment of integrated, multimedia models that simulate hundreds of sites, spanning multiple geographical and ecological regions, will ultimately require a comparative approach using several techniques, coupled with sufficient computational power. The Framework for Risk Analysis in Multimedia Environmental Systems – Multimedia, Multipathway, and Multireceptor Risk Assessment (FRAMES-3MRA) is an important software model being developed by the United States Environmental Protection Agency for use in risk assessment of hazardous waste management facilities. The 3MRA modeling system includes a set of 17 science modules that collectively simulate release, fate and transport, exposure, and risk associated with hazardous contaminants disposed of in land-based waste management units (WMU). The 3MRA model encompasses 966 multi-dimensional input variables, over 185 of which are explicitly stochastic. Design of SuperMUSE, a 215 GHz PC-based, Windows-based Supercomputer for Model Uncertainty and Sensitivity Evaluation is described. Developed for 3MRA and extendable to other computer models, an accompanying platform-independent, Java-based parallel processing software toolset is also discussed. For 3MRA, comparison of stand-alone PC versus SuperMUSE simulation executions showed a parallel computing overhead of only 0.57 seconds/simulation, a relative cost increase of 0.7% over average model runtime. Parallel computing software tools represent a critical aspect of exploiting the capabilities of such modeling systems. The Java toolset developed here readily handled machine and job management tasks over the Windows cluster, and is currently capable of completing over 3 million 3MRA model simulations per month on SuperMUSE. Preliminary work is reported for an example uncertainty analysis of Benzene disposal that describes the relative importance of various exposure pathways in driving risk levels for ecological receptors and human health. Incorporating landfills, waste piles, aerated tanks, surface impoundments, and land application units, the site-based data used in the analysis included 201 facilities across the United States representing 419 site-WMU combinations. Published by Elsevier Ltd. Keywords: Multimedia model; Parallel computing; PC-based supercomputing; Uncertainty analysis; Sensitivity analysis; Benzene disposal; Java

* Corresponding author. Tel.: C1 706 355 8344; fax: C1 706 355 8302. E-mail address: [email protected] (J.E. Babendreier). 1364-8152/$ - see front matter Published by Elsevier Ltd. doi:10.1016/j.envsoft.2004.09.013

1044

J.E. Babendreier, K.J. Castleton / Environmental Modelling & Software 20 (2005) 1043–1055

Software availability Program title: Multimedia, Multipathway, and Multireceptor Risk Assessment (3MRA Version 1.0) Contact address: US Environmental Protection Agency (EPA), Center for Exposure Assessment Modeling (CEAM), http://www.epa.gov/ceampubl/ mmedia/3mra/index.htm Programming languages: Java, CCC, Visual Basic, Fortran 77/90/95 Hardware requirements: PC (recommended 1C GHz, 256C MB RAM) Operating System Requirements: Windows 98, NT, 2000, XP Cost: Free, limited technical support

1. Introduction Elucidating uncertainty and sensitivity structures in environmental models can be a difficult task, even for low-order, single-medium constructs driven by a unique set of site-specific data. The ensuing challenge of examining ever more complex, integrated, higher-order models is a formidable one. This is particularly true in regulatory settings applied on national scales that must ensure the continued protection of humans and ecology, while preserving the economic viability of industry. Achieving adequate quality assurance in modeling requires a battery of tests designed to establish the model’s validity, trustworthiness, and relevance in performing a prospective task of prediction (Chen and Beck, 1999). To this end, model evaluation is seen as an increasingly critical step in the process of establishing confidence in a model’s use, and providing a requisite level of safety that decision-makers may rely upon. Aspects of sensitivity for a given model may be evaluated through a wide array of computational techniques, for example screening methods, local differential-based methods, and global methods (Saltelli et al., 2000). In addition to the variance-based global sensitivity methods outlined by Saltelli et al. (2000), which provide an ability to quantitatively relate variance in input to variance in output, there are equally provocative schemes (Funtowicz and Ravetz, 1990) to be investigated to more fully characterize elements of uncertainty; reaching well beyond quantifiable, commonly applied Monte Carlo based probabilistic assessments (Cullen and Frey, 1999; Robert and Casella, 1999). In the NUSAP (Numeral, Unit, Spread, Assessment, and Pedigree) scheme of Funtowicz and Ravetz (1990), uncertainty is constructed along a continuum of familiar quantitative information, and less familiar qualitative information that asserts a level of confidence in the former. Together, the NUSAP entities impart a deep structure of quality assurance in the information

system otherwise historically represented by a model’s prediction and the best of intentions. To sustain our current course of evaluating ever more complex questions through use of increasingly complex models, many of these uncertainty and sensitivity analysis approaches will likely continue to rely on the application of Monte Carlo based techniques. The future will also continue to see advances in methodological approach, and all will desire to apply these computationally demanding model evaluation procedures in a timely fashion (Beck, 1999). Thoroughly evaluating integrated multimedia model applications that simulate hundreds of sites, spanning multiple geographical and ecological regions will ultimately require a comparative approach using several techniques coupled with sufficient computational power. This paper provides an overview of the multimedia model FRAMES-3MRA, which was designed for application on site-specific, regional, or national scales. It subsequently describes a set of hardware and software supercomputing tools created to facilitate model evaluation, and summarizes 3MRA runtime costs associated with PC-based distributed processing. It finally provides an illustrative national-scale example of 3MRA uncertainty analysis for disposal of Benzene in five landbased waste management units.

2. FRAMES-3MRA model During the past five years, USEPA’s Office of Research and Development (ORD) and the Office of Solid Waste have sponsored, along with other U.S. Federal Agencies, the development of the Framework for Risk Analysis in Multimedia Environmental Systems (FRAMES), a Windows-based modeling infrastructure which supports both model development activities and model applications. The Multimedia, Multipathway, Multireceptor Risk Assessment (3MRA) modeling system comprises of a unique set of simulation models developed by EPA which currently reside within FRAMES (Fig. 1). The underlying scientific basis for 3MRA exposure and risk assessment was first developed (Marin et al., 2003), peer-reviewed, and eventually implemented as software technology. Designed by researchers at the U.S. Department of Energy’s Pacific Northwest National Laboratory in collaboration with U.S. EPA, the FRAMES-3MRA Version 1.0 modeling system includes a set of 17 science modules that collectively simulate release, fate and transport, exposure, and risk associated with hazardous contaminants disposed of in five landbased waste management units. The 3MRA modeling system has undergone extensive quality assurance testing throughout model development, including module-level and system-level peer-reviews, and independent model

1045

J.E. Babendreier, K.J. Castleton / Environmental Modelling & Software 20 (2005) 1043–1055

Key

System User Interface (SUI)

User Interface

Site Loop (201 National Sites) Data File

Source Loop (5 Source Types)

Processor

Sampled Input Data Iteration Loop (nr)

Database

Chemical Loop (43 Metals & Organics) Cw Loop

Warnings/Errors to SUI

Site Definition Processor

National Database

Multimedia Multipathway Simulation Processor

Exit Level Processor I

Risk Summary Output File

Regional Database

Global Results Files

Site-Based Database

Site Simulation Files

List of Sites Exit Level Processor II

Risk Visualization Processor

List of Chemicals Chemical Properties Database

Site Input Data

Chemical Properties Processor

Site Definition

MET Database

Protective Summary Output File

Cw ≡ Waste stream concentration

Header Info from SUI

Metal Isotherms

Multimedia Multipathway Simulation

Cw Exit Level Processing

Fig. 1. FRAMES-3MRA Version 1.0 system design: stand-alone workstation application.

compilation and test plan execution (SAB, 2003). 3MRA Version 1.0 was made available to the public in 2003. For the example time trial and uncertainty analyses discussed here, an interim 3MRA Version 1.0 (Developers Release – January, 2002) was used for all simulations. 2.1. Multimedia risk assessment The 3MRA model encompasses 966 input variables, 185 of which are explicitly stochastic, and 372 modulelevel output variables that are further summarized in exit level post-processing routines. 3MRA starts with a waste stream concentration in a waste management unit (landfill, waste pile, aerated tank, surface impoundment, or land application unit), estimates the release and transport of the chemical throughout the environment, and predicts associated exposure and risk. Using a feedforward approach, 3MRA simulates multimedia (air, water, soil, sediments), fate and transport, multipathway

exposure routes (food ingestion, water ingestion, soil ingestion, air inhalation, etc.), multireceptor exposures (resident, gardener, farmer, fisher, ecological habitats and populations; all with various cohort considerations), and resulting risk (human cancer and non-cancer effects, ecological population and community effects). Example processes, inter-media fluxes, parameters, and exposure pathways considered in 3MRA are more fully outlined in Table 1 (PNNL, 1999). For each particular site and simulation description, appropriate modules are serially executed by the system. Science modules available include those to simulate contaminant release from sources; contaminant movement through the air, groundwater, soil, watersheds, rivers, and lakes, ponds, and wetlands; direct contact of humans, plants, and animals with the waste contaminants; contamination of drinking water wells, farms (through irrigation water or direct atmospheric deposition), plants, and animals (both on land and in water bodies); ingestion by humans and animals of contaminated materials such as food and

1046

CONTAMINANTSa Organics (28) Metals (15) SOURCE TYPESa Aerated Tank (137 Sites) Surface Impoundment (137 Sites) Land Application Unit (28 Sites) Waste Pile (61 Sites) Landfill (56 Sites) SOURCE TERM CHARACTERISTICS Mass Balance Multimedia/Multiphase Partitioning Source Degradation (anaerobic/aerobic) SOURCE RELEASE MECHANISMS Erosion Volatilization Runoff Leaching Particle Suspension TRANSPORT MEDIA Air Soil Vadose Zone Groundwater Surface Water Sediment a

FATE PROCESSES Chemical/Biological Transformation Linear Partitioning (water/air, water/solids, air/plant, water/biota) Non-linear Partitioning (metals in vadose zone) Chemical Reaction/Speciation

FOOD CHAIN Human (Farm) Human (Aquatic) Ecological (Aquatic Habitat) Ecological (Terrestrial Habitat)

INTERMEDIA CONTAMINANT FLUXES Source  Air (volatilization, resuspension) Source  Vadose Zone (leaching) Source Surface Soil  Watershed Soil (erosion, runoff) Air  Watershed/Farm/Habitat Soil (wet/dry deposition) Air  Surface Water (wet/dry deposition) Air  Vegetation (deposition/uptake) Farm/Habitat Soil  Vegetation (root uptake) Watershed Soil  Surface Water (erosion, runoff) Surface Water  Aquatic Organisms (uptake) Surface Water  Sediment (sedimentation, resuspension) Vadose Zone  Groundwater (percolation) Vadose Zone  Air (volatilization) Groundwater  Surface Water Soil  Vegetation (uptake, deposition) Vegetation, Soil, Water  Beef and Dairy (uptake)

RECEPTORS Human Resident (adult and child) Farmer (adult and child) Home Gardener (adult and child) Recreational Fisher (adult and child) Summation of Receptors

AGE GROUPS FOR HUMAN RECEPTORS Calculated Reported Infant !1 year Infant !1 year Child-a 1–5 years Child 1–12 years Child-b 6–11 years Young adults and adults; 13Cyears Child-c 12–19 years Summation of Groups Adult 20Cyears

Counts represent current database for FRAMES-3MRA Version 1.0.

Ecological Mammals, Birds, Amphibians, Reptiles Soil Biota, Terrestrial Plants, Aquatic Biota, Aquatic Plants, Sediment Biota EXPOSURE PATHWAYS Ingestion (plant, meat, milk, aquatic food, water, soil, breast milk) Inhalation (particulates and gases, showering with groundwater) Direct Contact (soil, water) Summation of Ingestion Summation of Inhalation Summation of Inhalation and Ingestion HUMAN AND ECOLOGICAL RISK ENDPOINTS Human Cancer Risk Human Non-cancer Hazard Quotient Ecological Population and Community Hazard Quotients

J.E. Babendreier, K.J. Castleton / Environmental Modelling & Software 20 (2005) 1043–1055

Table 1 Conceptual model elements considered in FRAMES-3MRA

J.E. Babendreier, K.J. Castleton / Environmental Modelling & Software 20 (2005) 1043–1055

soil; and risks to humans, plants, and animals from all potential methods of exposure being modeled. 2.2. 3MRA modeling system overview For the 3MRA stand-alone PC application (Fig. 1), the user opens the System User Interface (SUI), and selects the sites of interest, source types, chemicals, and waste stream influent concentration ranges (defined as Cw). The SUI generates a header file that contains all the information defining the simulation experiment (i.e. a set of simulations), and in stand-alone mode, controls its execution. Per simulation, the Site Definition Processor (SDP) reads in the header file information to determine the next site and scenario to be simulated and creates a complete set of associated flat-ASCII site simulation files (SSFs) for input to the various science modules. The SSFs are created by extracting information from existing databases or randomly generating data using distribution information and statistical subroutines. Various databases represent different hierarchical levels of data availability for a given site, covering site-based, regional, and national scales. These data are constant or stochastic, with cross-correlation data associated with certain variables. The chemical properties database contains needed data related to all chemical-dependent variables. The MET database contains five meteorological datasets describing various national monitoring stations (i.e. representing hourly, daily, monthly, annual, and long-term climatological data). Like the other processors shown in Fig. 1, the SDP reports any processorspecific warnings or errors to the SUI. The Multimedia Multipathway Simulation Processor (MMSP), which executes each site simulation, uses the generated SSF files as initial input. The SSF file set includes information describing chemical properties, site layout, sources, air data, vadose zone data, aquifer data, watershed data, waterbody network data, farm, terrestrial, and aquatic food chain data, human and ecological exposure data, and human and ecological risk data. The global results files (GRFs) contain all the key output data from the MMSP modules that were executed during a given simulation, and are consumed as additional inputs by downstream modules. The Exit Level Processor I (ELPI) output database stores key exposure and risk results for the entire simulation set. The Exit Level Processor II (ELPII) interprets ELPI output data and presents population and sub-population based summaries of cumulative risk. 2.3. National assessment strategies The development of 3MRA was originally driven by its initial intended application to a nationwide risk assessment for USEPA’s Hazardous Waste Identification Rule (HWIR). The 3MRA model was designed to

1047

identify which waste streams can safely be released from existing hazardous waste disposal requirements. Providing an integrated, quantitative risk-based assessment approach for regulatory decision-making, ‘‘low-level’’ hazardous waste with constituent chemical concentrations less than ‘‘exit’’ levels calculated by 3MRA could be reclassified as non-hazardous solid waste. The 3MRA application for the national assessment is currently based on data collected from 419 representative waste management units located at 201 sites across the United States. To extend eventually to hundreds of pollutants, a dataset for 43 metals and organics was initially developed. A key question 3MRA is capable of answering may be stated as follows: At what waste stream concentration (Cw) will wastes, when placed in a non-hazardous waste management unit over the unit’s life, result in:  Greater than A% of the people living within B distance of the facility with a risk/hazard of C or less, and  Greater than D% of the habitats within E distance of the facility with an ecological hazard less than F,  at G% of facilities nationwide? Further, an overall probability or confidence level (%H) may be assigned to empirical uncertainty associated with the derived Cw, and confidence (%I) assigned to precision in simulating this outcome. Defining the assessment profile (A, B, C, D, E, F, G, H, I), 3MRA embodies an integrated, probabilistic risk assessment strategy for protection of both ecological and human health.

2.4. Waste stream concentration exit level An identified waste stream concentration Cw that satisfies the posed assessment profile question is referred to as the exit level or Cwexit, indicating the exit threshold for transition from hazardous to non-hazardous waste. For the 3MRA assessment strategy, in addition to values for A, B, C, D, E, F, G, H, and I noted, Cwexit will also depend on the chemical of concern and waste management unit type (e.g. landfills-LF, waste pilesWP, aerated tanks-AT, surface impoundments-SI, or land application units-LAU). While 3MRA facilitates derivation of Cwexit for use in regulatory rulemaking, determinations of values A through I by-and-large represent policy decisions, albeit greatly informed by 3MRA simulation. Exposure pathway, receptor group, and cohort group also can be used to further define an assessment profile. One can thus evaluate which subpopulations are at greatest risk, and which pathways drive that risk, elucidating alternative strategies that might be employed in a given regulatory process.

1048

J.E. Babendreier, K.J. Castleton / Environmental Modelling & Software 20 (2005) 1043–1055

Per 3MRA simulation, the ELPI (see Fig. 1) calculates, receptor location by receptor location, if a site meets various pre-selected, discrete risk criteria (C, F) at various discrete distances (B, E), for various population protection percentiles (A, D). After all simulations are completed, the ELPII extracts information from ELPI output data based on a given user selected profile (A, B, C, D, E, F), providing interpolation capabilities to derive Cwexit for a user-selected site protection level %G. 2.5. FRAMES Version 2.0 In initial beta testing, FRAMES Version 2.0, will house 3MRA and additional models, modeling systems, and databases, and offers a number of enhanced capabilities over FRAMES 1.0. First and foremost is the ability to conceptualize a risk assessment on the fly. In the FRAMES-3MRA Version 1.0, the Conceptual Site Model (CSM) is stored in a database that makes it more difficult to change, where connections between models are strictly confined to a particular subset of simulation alternatives. Free-form CSM definition will provide the user with the ability to combine 3MRA components, and other models or databases in FRAMES 2.0, in different ways to answer alternative risk assessment questions without having to modify existing 3MRA Version 1.0 components. Other key enhancements found in the FRAMES 2.0 architecture include automated units conversion, a user interface for site-specific data entry, and an overall more flexible framework that provides for relatively rapid importation of new models and databases. Germane to this work is the expansion and facilitation of tools used to conduct uncertainty and sensitivity analyses, including an n-stage iterator that allows nesting of Monte Carlo sampling design, and inclusion of alternative sampling strategies such as Latin Hypercube Sampling (Cullen and Frey, 1999). These features will facilitate a muchneeded capability to jointly explore parameter, model, and modeler uncertainty within a common modeling architecture rooted in object-oriented software design. 2.6. 3MRA model evaluation As part of an extensive model evaluation process, uncertainty and sensitivity analyses (UA/SA) are being undertaken to fully evaluate the 3MRA modeling system. These efforts will emphasize the use of tens of millions of simulations to test effects of small and large changes in model inputs and parameters, and will require a computational effort not practically achieved through use of one or a few desktop PCs. The simulation set for a national assessment involving 43 chemicals, 419 site-source combinations, 5 Cw’s, and 1000 iterations is over 90!106, where this level assumes 966 input variables will express sufficient variation and

interaction within 1000 iterations as needed to confidently describe uncertainty and sensitivity of 3MRA.

3. Parallel computing cluster A characteristic of sampling-based UA/SA, particularly for complex models, is their need for high levels of computational capacity. Typically there are many more model simulations needed than PCs available. Computational needs for UA/SA represent a fundamental departure from what is commonly referred to as ‘‘massively’’ parallel computing (Brightwell et al., 2000) where inter-nodal communication dependencies prevail. One might refer to this UA/SA computational problem as being ‘‘embarrassingly’’ parallel or ‘‘nodeindependent’’. Here, being unlimited by the slowest CPU, grid-computing solutions can leverage the full power of heterogeneous PC hardware. 3.1. SuperMUSE While UA/SA is emerging as a critical area for environmental model evaluation, resources to conduct parallel computations, especially for Windows-based, PC-based modeling, have often been limited by lack of easy access to supercomputing capacity. This has been a driving factor in the typical avoidance by model developers and users to perform extended or even minimal UA/SA. Increasingly common, particularly for Linux-based systems and massively parallel problems, distributed PC-based supercomputing has expanded rapidly in recent years (e.g. Top 500 Supercomputer Sites). Less common though are PC-clusters that support Windows-based models for massively parallel problems or those encountered in UA/SA for 3MRA. Dual-boot systems that facilitate both Linux-based and Windows-based parallel computing also appear uncommon as are associated hardware systems offering dedicated KVM (keyboard, video, mouse) control, the latter of which can provide efficient capabilities in managing many PC clients. To facilitate UA/SA model evaluation research supporting USEPA’s modeling systems and applications, ORD has recently developed a 215-GHz Supercomputer for Model Uncertainty and Sensitivity Evaluation (SuperMUSE), with companion efforts to develop a supporting software infrastructure to conduct node-independent parallel computations. 3.2. Design, cost, and effort Major components of the existing SuperMUSE (Fig. 2) include a front-end program server, a backend data server, and 180 client PCs, each with a minimum of 256 MB RAM. A variety of Windows

J.E. Babendreier, K.J. Castleton / Environmental Modelling & Software 20 (2005) 1043–1055

Master Console

Network Switches

KVM Switches

Results Database(s) (e.g. MySQL Server)

Tasker Client software runs on each PC client Uses stand-alone version of model software

CPU Allocator Model Tasker(s)

Program Server

provides for gigE channel (1000 megabits/sec) data flow to and from servers, and also allows for single-user KVM remote access. Various combinations on the cluster design are easily achieved and depend on financial resources (e.g. client speeds, server storage capacities, etc.). Representing a capacity to support 192 clients, the initial SuperMUSE layout (with 121 client PCs) was acquired for $125,000 in early 2001. This cost excludes servers and 16 older 333 to 450 MHz processors routed to the project. Optimal purchasing based on $/GHz for client PCs will typically identify 3 to 6 month-old CPU technology. Plated wire shelving is best for cabling and heat dissipation. Pre-design considerations include available space, and room heating and cooling capacities. Representing a desired maximum capacity, SuperMUSE will be expanded in the near future to 384 client nodes, with supporting servers, totaling roughly 1000 GHz. Multi-user KVM access will also be provided, facilitating an existing SuperMUSE software capability that allows multiple experiments (i.e. multiple model evaluations) to be simulated concurrently. The primary hardware design criteria seeks a feasible cycle of PC upgrading on a 3 to 5 year life cycle, while maintaining a system machine speed 100 to 1000 times faster than the standard desktop PC typically available to model developers and users. Important to the hopeful exploitation of this ORD effort, a PC cluster can be scaled to any user’s needs, and can be constructed and configured by relative novices using off-the-shelf hardware technology and familiar Windows operating systems. Together with FRAMES 2.0 and supporting Java software code developed for the SuperMUSE concept, parallel solution of UA/SA problems for many Windows-based models becomes realistically easy to reach.

4. SuperMUSE software toolset

Data Server; Data Analysis Software

1049

Hardware

Fig. 2. Conceptual layout for PC-based SuperMUSE parallel computing cluster.

operating systems are supported (i.e. Windows 95, 98, NT, 2000). Interconnections were achieved through use of 16-port Raritan KVM switches, and 24-port Linksys (10/100) network switches branching to a master CISCO 3550-24/2 network switch. The system network protocol is based on TCP/IP. The system design currently

With the proliferation of workstation clusters connected by high-speed networks, providing efficient system support has become an important problem (Cruz and Park, 1999). As a recognized modern programming language for solution of distributed high performance computing problems on heterogeneous platforms (Laure, 2001), Java was selected for supporting software development for SuperMUSE. The Transmission Control Protocol (TCP) and the Internet Protocol (IP) were selected as the underlying basis for network communication. One popular alternative standard to TCP/IP for programming for PC clusters is the Message Passing Interface (MPI) (Sunderam et al., 1994). MPI allows the developer to write a program such that when an opportunity exists for the process to be run on another machine, then that will occur. Parallel Virtual Machine (PVM) (Gropp et al., 1999), another approach, allows

1050

J.E. Babendreier, K.J. Castleton / Environmental Modelling & Software 20 (2005) 1043–1055

for similar functionality. For the node-independent parallelism facing UA/SA, the complexity that could be handled by MPI and PVM is absent, since each client executes simulations independent from other clients. Only during synchronization of the tasks (which clients take which tasks) and collation of client databases is client-client network communication active. Even at these times only brief or summary information is being passed. 4.1. Supporting software system needs To exploit capabilities of the SuperMUSE parallel computing environment, several software tools were needed. Key functionalities that were identified included: 1) managing files across PCs; 2) facilitating the distribution of workloads among PCs; and 3) facilitating data analysis tasks. Java software application programs developed here included: file management tools (Client Update, Command Tasker, Process Messages, Site Summary, Client Collector); a distributed management program toolset (CPU Allocator, Model Tasker, Tasker Client); and tools for data analysis (Site Visualization Tool, Exit Level Visualization). Client Update is a file/system management tool employing a dropdown equipment list. For selected machines, the tool executes DOS or shell commands over the network using a batch file script and wildcard designations for various attributes (e.g. machine ID, operating system, etc.). Commands are essentially executed serially from a single server. The tool facilitates client-list selection by individual PCs or as designated PC groups (i.e. Win98, WinXP). The Site Visualization Tool (SVT) extracts data from SSF and GRF file sets for an individual scenario and waste concentration. With integration and statistical processing capabilities, the SVT charts, produced via GNUPlot, show the major outputs of each MMSP module, starting with the waste management unit source term and ending with human and ecological exposure modules. Collecting the charts into an html file, the SVT allows for visual summary of model outputs spanning the entire MMSP modeling domain. Risk assessment is handled via computational and visualization tools within the ELPI and ELPII post-processors, originally coded using MS-Access data structures. In addition to the SVT tool, MySQL database versions for the ELPI and ELPII processors were also created with extended ELPII visualization capabilities using scalable vector graphics (SVG), allowing expanded risk visualization options for large numbers of simulations. 4.2. Distributed management program tools Shown in Fig. 3, with program locations shown in Fig. 2, the distributed management program toolset

provides an effective, platform-independent parallel tool. Supporting uncertainty and sensitivity analysis evaluation tasks, the distributed processing scheme is capable of easily managing millions of simulations for 3MRA or other computer models. The CPU Allocator and Tasker Client are model independent. A Model Tasker is model dependent, and deconstructs a model’s system user interface to generate a set of tasks (e.g. individual model simulation header files) amenable to distributed processing across the PCbased parallel supercomputer. For example, a Model Tasker was developed for 3MRA, identified in Fig. 3 as the SUITasker. Several Model Taskers can be active, but must currently reside on separate machines. The Java parallel toolset is readily extended to Linux by additional recompilation of the 3MRA input/output dll (io.dll) called by the SUITasker, and could run 3MRA science modules equally well on Linux if these were compiled for Linux operating systems. 4.2.1. CPU allocator The CPU Allocator accepts job descriptions from one or several Model Taskers, and provides proportional load balancing across active Taskers. It functions as a TCP/IP server that accepts Model Tasker scenario set descriptions, and randomly assigns Tasker Clients to Model Taskers when clients indicate that they are free to execute tasks. Several CPU Allocators can be active, but must reside on separate machines. The CPU Allocator can also be used to restart or shutdown clients without affecting the status of active Model Taskers. 4.2.2. Model tasker For 3MRA, the SUITasker reads a stand-alone 3MRA header file (created in stand-alone mode) to define the overall scenario set to be simulated. It functions as a TCP/IP server that accepts Tasker Clients directed to it by the CPU Allocator. Providing equivalent stand-alone scenario looping (Fig. 1), the task list is created, managed, and updated with various statistics to track job performance. The SUITasker maintains a watchdog queuing approach and can handle errant clients that unexpectedly fail to complete requested tasking (e.g. manages client power failures without client UPS backup). Client-side error trapping criteria can also be used to flag failed tasks in the SUITasker queue, which can be subsequently reassigned to the general queue. Additional features include the ability to reassign or remove individual tasks from the queue, and an ability to specify chemical-specific runtime timeout values. If a client fails to return results within a specified timeout period, the task is reassigned to another client and the errant client is assigned a temporary failed state until it resumes requesting jobs. If a client requests a second job within the first job’s

J.E. Babendreier, K.J. Castleton / Environmental Modelling & Software 20 (2005) 1043–1055

Model Tasker (MT1 launched on Program Server

1051

0101)

CPU Allocator At launch, MT’s register with the CPU Allocator.

3. TC1 requests job from MT1. 4. Do job X ≡ a single task line in any MT. For the 3MRA Tasker (i.e. SUITasker), represents a unique scenario.

(launched on Program Server 0101) Oversees Model Taskers (MTi; i = 1…nt)

1. TaskerClient TC1 announces availability.

6. TC1 says done with job X.

2. If no MT, then idles. This TC11call was assigned to MT1. 5. Send job X warnings, errors, and results to Data Server’s project file area identified by MT1 (e.g. MySQL ParSUITest database).

Tasker Client (TCk; k = 1…nc) Model independent. Executes DOS commands in batch files delivered by a MT. For 3MRA, runs a single simulation defined by a unique site, source, iteration, chemical, and Cw combination.

Fig. 3. SuperMUSE distributed management program toolset – conceptual layout for 3MRA.

timeout period, the existing job in the queue originally assigned to the client is simply reissued. 4.2.3. Tasker client The Tasker Client is loaded on each client PC at start-up. Each Tasker Client then periodically calls the CPU Allocator when not tasked. If no Model Tasker is active, it is told to idle. The Tasker Client currently has no user interface, and functions as a TCP/IP client for the CPU Allocator and active Model Taskers. The client software will connect to the CPU Allocator, receive a Model Tasker machine ID, disconnect, and then connect to the assigned Model Tasker. It then receives a command from the Model Tasker with associated files to execute. In the case of the 3MRA SUITasker, this is a master batch file managing file cleanup and a single 3MRA header file. The file set is first written to the client disk, and the model is then executed in batch sequence. The Tasker Client will also restart or shutdown the local PC if told to do so by the CPU Allocator. Representing file management tools, providing key connections to the back-end data server, two auxiliary Java applications were also created for client job

processing. These also have no user interface. For 3MRA, the tools represent additional calls within the batch file scheme, for each simulation (i.e. header file) executed. The first, a Process Messages Tool, reads normally produced warning and error files and, via Java DataBase Connectivity (JDBC), updates a central MySQL database identified by the SUITasker. The second tool, a Site Summary Tool, extracts results from SSF and GRF files, storing user-selected variables defined by an experiment-wide delimited script file. The Site Summary Tool facilitates surgical data extraction for all available input and output data, per simulation, for subsequent sensitivity and uncertainty analysis processing. An additional client boot-up tool, a Client Monitor Tool currently compiled in Visual Basic, was also developed, and is comprised of two utilities: Launch&Watch (client-end) and Client Monitor (server-end). This tool orchestrates and monitors client boot-up activity to ensure successful launching of local clientbased MySQL servers, and other critical support applications needed to ready clients for Model Tasker jobs prior to launching the Tasker Client.

J.E. Babendreier, K.J. Castleton / Environmental Modelling & Software 20 (2005) 1043–1055

8

15 12 9

All Sites-Sources, Carbon Disulfide; 2095 Simulations

6 4

6 2

3

0 Warnings, Errors, & Results

Results Only

No mySQL Data

0 StandAlone PC

External to actual Model Tasker and Tasker Client simulation-based operations, two other tools were developed to expand file management capabilities. The first is the Command Tasker, essentially a server-end batch file manager. Similar to the capabilities of the Client Update Tool, and actually representing a Model Tasker, this tool delivers binary tree task dependencies in collection of common aggregated data/files, or reversibly, in distribution of common data/files. The Command Tasker acts as a Model Tasker in managing activities across the cluster, allowing the user to issue commands to clients (e.g. DOS commands for Windows or shell scripts for Linux) that are executed by the Tasker Client. Extensively generic in form, it is currently used for conducting log-scale database collections for 3MRA experiments, and for more quickly executing file management tasks that take individual PCs substantial time to complete. Supporting utilities allow quick generation of the command list based on PC-client equipment profiles. Finally, a Client Collector specific to 3MRA was also created, and utilizes the Command Tasker to effect execution. Providing a connection to the back-end data server, this tool facilitates scalability in parallel execution database design. Intensive model output database operations can be implemented where client simulation processing utilizes, for example, local MySQL hosts on clients for interim data storage (e.g. when dealing with O1000s query operations/minute/client).

Relative Overhead ( )

Overhead (secs)

Distributed Processing Cost Relative To Avg. Stand-Alone PC Model Runtime

4.3. Command tasker and client collector

Distributed Processing Added Runtime Cost Per Simulation (secs)

1052

Fig. 4. Comparison of average 3MRA runtimes: SuperMUSE versus stand-alone execution.

For comparison, representing similar parallel execution totaling 90,085 simulations, the average runtime of 15 metal and 28 organic chemicals, with full messaging, was 120 seconds compared to an average of 90 seconds for Carbon Disulfide. On average, SuperMUSE can currently complete over 3 million 3MRA model simulations per month. Such execution in stand-alone mode, using a few PCs, would be prohibited by: 1) the actual time expended to execute a given scenario set in stand-alone mode; 2) the need to optimize job assignments across PCs; and 3) the human capital needed to collect and collate errors, warnings, model input data, and model results.

6. Example toolset application 5. 3MRA time trial analysis To benchmark the impacts of time added due to distributed-processing, a comparison of 3MRA runtimes were made between stand-alone execution and parallel execution on SuperMUSE. Using 85 identical 1GHz PC clients in SuperMUSE for parallel computations, a scenario set describing the 201 national assessment sites was selected representing all 419 site and waste management unit combinations. One chemical, Carbon Disulfide, was chosen with 5 Cw values, using a single iteration. In this analysis, the Site Summary Tool captured a total of 61 input and output variables. Shown in Fig. 4, the average overhead runtime cost due to paralleling the 3MRA code was 6.0 seconds/simulation for full messaging capabilities, 7.2% of the average stand-alone model runtime. A more direct comparison of stand-alone PC versus SuperMUSE capabilities, with maximum storage turned off and no message or Site Summary Tool result processing on SuperMUSE, showed an increase of only 0.57 seconds/simulation, a relative cost increase of 0.7% over average model runtime.

Representing a typical application of FRAMES3MRA Version 1.0 and the supporting SuperMUSE hardware and software toolset, an evaluation of Benzene disposal in various land-based waste management units was conducted. 6.1. Benzene disposal simulation design The simulation design employed two basic experiments. The first was a simulation of Benzene across all 419 waste management units in the existing 3MRA site database. Analyzing five source types (AT, SI, LAU, WP, and LF), five Cw’s (appropriately selected based on source type, chemical properties, and known toxicity), and using an initial Monte Carlo random seed value of 11031, 100 iterations were conducted for each unique site-source-Cw combination, totaling 209,500 simulations. In this experiment, due to data storage limitations and still pending automation of ELPII processing for multiple, individual iterations, the 100 iterations were aggregated during simulation within a single database structure, resulting in the pooling of protected, and

1053

J.E. Babendreier, K.J. Castleton / Environmental Modelling & Software 20 (2005) 1043–1055

6.2. Benzene disposal uncertainty analysis Shown in Fig. 5, waste stream exit levels for the landfill source were calculated for the two assessment profiles (95%, 500 m, 1!10ÿ6, 95%, 1000 m, 1, 95%) and (99%, 2000 m, 1!10ÿ6, 99%, 2000 m, 1, 95%), for the sum of all ingestion and inhalation pathways. Data are based on individual calculations completed for each of 10 iterations simulated across 56 sites. For each iteration, the ELPII was used to provide values for actual %site protection at the five Cw’s evaluated, where average values were then determined at these Cw’s, along with 98% confidence levels (Zar, 1999) (e.g. HZ50%, IZ98%). Exit levels at 95% site protection were next derived for the average of 10 iterations, along with associated confidence intervals and minimum and maximum values observed. Actual values are presented in Table 2 for each profile examined. Log-linear interpolation was used, where alternative schemes can also be investigated to provide insight between Cw pairs. For all Benzene analyses described in Table 2, human cancer risk was the determinant concern at all associated

Avg All Iterations

98 Conf. Level Max/Min

Exit Level

Sites Protected

100

95 Waste stream exit levels shown for 95 site protection and 99 population protection within 2000m radial distance from source.

90

85

80 0.1

1

10

100

1000

Waste Stream Concentration (ug/g) 100

Sites Protected

separately, unprotected populations across all iterations. From data generated in this first experiment, the ELPII can be used to determine a single Cwexit value for any definable assessment profile (A, B, C, D, E, F, G). For the second experiment, the same selections were made, but only 10 national iterations were conducted. In this case the Exit Level Processor I output (see Fig. 1) was segregated by iteration into separate databases during simulation. Here, the ELPII could be used to analyze individual national iterations to determine a single average Cwexit value for any assessment profile (A, B, C, D, E, F, G), and, through additional uncertainty analysis, any combination of concerns represented by (A, B, C, D, E, F, G, H, I). Had the second experiment been completed for 100 segregated iterations, the average of all Cwexit values calculated by iteration would equal the single average Cwexit calculated in the first experiment. While the second experiment allowed for an initial estimation of uncertainties H and I in Cwexit for Benzene, the first experiment allowed for examination of trends in Cwexit for more extensive coverage of the input parameter space. For purposes of discussion and analysis, two assessment profiles (A, B, C, D, E, F, G) were examined using the associated ELP1 databases: (95%, 500 m, 1!10ÿ6, 95%, 1000 m, 1, 95%) and (99%, 2000 m, 1!10ÿ6, 99%, 2000 m, 1, 95%). These are both defined by the general assessment profile (% human population protection, radial distance from the source for human concern, increased risk of cancer in humans, % ecological population protection, radial distance from the source for ecological concern, ecological risk hazard quotient, and % national sites protected). In 3MRA there is no applicable human hazard risk considered for Benzene.

95 Waste stream exit levels shown for 95 site protection and 95 population protection within 500m radial distance from source.

90

85

80 0.1

1

10

100

1000

Waste Stream Concentration (ug/g) Fig. 5. Benzene disposal in landfills: uncertainty for sum of all ingestion and inhalation pathways (10 iterations at 56 sites).

waste concentration levels, for all source types. Not shown in Fig. 5, all sites were protected at the lowest Cw evaluated for landfills (0.001 mg/g). Based on 10 iterations representing a total of 2800 actual simulations, an average landfill Cwexit of 138 ppm and 184 ppm was derived for the (95% pop. protection, 500 m) and (99% pop. protection, 2000 m) profiles, respectively. For the (95%, 500 m) profile, upper and lower 98% confidence intervals ranged between 179 and 108 ppm, respectively, with a maximum observed value of 233 ppm, and a minimum observed value of 29 ppm. Based on the simulation experiment for 100 iterations (totaling 28,000 simulations), an average Cwexit of 135 ppm and 195 ppm was derived for the (95%, 500 m) and (99%, 2000 m) profiles, respectively. These values represented relatively close estimates to values found for only 10 iterations, with 98%, and 106% recovery, respectively. Landfill exit levels for Benzene are also depicted graphically in Fig. 5 for both profiles considered. In this case, a lower population protection level (95%) examined at a closer distance to the facility (500 m) determined significantly lower exit level thresholds for hazardous waste. 6.3. Analysis by source type and exposure pathways Comparing waste management unit types, using data for 100 national iterations and the (95%, 500 m) profile,

1054

J.E. Babendreier, K.J. Castleton / Environmental Modelling & Software 20 (2005) 1043–1055

Table 2 Benzene disposal: uncertainty analysis for summation of all ingestion and inhalation pathways # Iterations Simulated at Each Site 10

100

Source Type

Surface Impoundment

Aerated Tank

Land Application Unit

Waste Pile

Landfill

# Sites Evaluated

137

137

28

61

56

Total Simulations Radial Distance (m) % Sites Protected % Population Protected Maximum Value (ppm) Upper 98% C.L. (ppm)b Avg. Exit Level (ppm) Lower 98% C.L. (ppm)b Minimum Value (ppm) Avg. Exit Level (ppm) Relative Differencec

6850 500a 95% 95% 0.36 0.24 0.19

2000 95% 99% 0.38 0.32 0.26

6850 500a 95% 95% 8.5 4.8 3.5

2000 95% 99% 11 6.8 4.7

1400 500a 95% 95% 7.4 4.5 3.5

2000 95% 99% 5.0 3.8 3.0

0.15 0.12 0.19

0.22 0.18 0.27

2.8 2.4 2.0

3.6 2.6 2.4

2.6 1.4 3.1

2.3 1.4 3.1

100%

103%

55%

51%

89%

102%

3050 500a 95% 95% 145 65 17

2000 95% 99% 183 138 79

2800 500a 95% 95% 233 179 138

2000 95% 99% 501 249 184

26 11 22

108 29 135

144 123 195

28%

98%

106%

6.6 2.1 9.2 54%

6

Scenario considered all human receptors/cohorts and an increased cancer risk of 1!10 . Cancer risk was determinant for all sources. There is no applicable human hazard risk. Evaluated ecological receptors by ring and habitat groups (terrestrial, aquatic, wetland). No concerns observed for ecological hazard quotientZ1.0, for all waste concentrations considered. a For ecological population concerns, radial distance was 1000 meters. b C.L. indicates normal distribution confidence limit on average exit level (significance level aZ0.02). c Ratio of average waste stream exit levels calculated for 100 iterations and 10 iterations.

average exit levels were lowest for surface impoundments (0.19 ppm), followed by aerated tanks (2.0 ppm), land application units (3.1 ppm), waste piles (9.2 ppm), and landfills (135 ppm). While more simulation is needed to properly evaluate uncertainty and sensitivity of 3MRA predictions, from this example ecological and human health risk-based analysis, one can envision the potential future applications for substantive cost-benefit analyses. By also addressing external economic factors, ultimately one can determine cost-effective strategies for

both pre-treatment of hazardous wastes, followed by optimal disposal as a non-hazardous waste, normalized to risk. Exposure pathways driving human cancer risks can also be examined as shown in Fig. 6. For disposal of Benzene in landfills, the sum of all inhalation pathways dominated the sum of all ingestion pathways, for all Cw’s examined. At the 1!10ÿ6 increased cancer risk level, shower inhalation of contaminated groundwater was smaller in its impact on total inhalation risk,

Sum: Ing. & Inh. Pathways

Sum: Inh. Pathways

Sum: Ing. Pathways

Air Inhalation

Shower Inhalation

Water Ingestion

Groundwater Inh. & Ing.

Crop Ingestion

Fish/Milk/Beef/Soil Ing.

Increased Cancer Risk Level

1.E-05 Increased human cancer risk levels shown for 95 site protection and 95 population protection within 500m radial distance from source.

1.E-06

1.E-07

1.E-08

1.E-09 0.1

1

10

100

Waste Stream Concentration (ug/g) Fig. 6. Benzene disposal in landfills: exposure pathway analysis for increased human cancer risk (10 iterations at 56 sites).

1000

J.E. Babendreier, K.J. Castleton / Environmental Modelling & Software 20 (2005) 1043–1055

compared to ambient outdoor air inhalation concerns. Compared to specific inhalation pathways, water ingestion and crop ingestion contributed significant, but relatively smaller risks to total cancer risk from all pathways, and fish, milk, beef, and soil ingestion played relatively insignificant roles. Fig. 6 also shows in general the effect that different cancer risk criteria would have on determination of exit level values for Benzene disposal. For all pathways shown, a maximum value of 1000 ppm was simulated for landfills, where extrapolation beyond this level was not conducted. For values shown at 1000 ppm, actual Cw’s, will be O1000 ppm at the identified cancer risk, and O1000 ppm at higher cancer risk thresholds.

1055

Laboratory for their contributions to design and development of FRAMES-3MRA, and countless others on the development team who have contributed. Thanks are finally given to Kurt Wolfe, Robert Swank, David Brown, and Candida West of USEPA/ORD/NERL for their helpful comments during manuscript preparation. This paper has been reviewed in accordance with the U.S. Environmental Protection Agency’s peer and administrative review policies and approved for publication. Mention of trade names does not constitute endorsement.

References 7. Conclusions The SuperMUSE computing cluster is ideal for conducting uncertainty and sensitivity analysis tasking involving embarrassingly parallel model simulations, in both Windows or Linux environs. The supporting Java toolset developed for parallel computing represents a critical aspect of exploiting the capabilities of such systems. Fairly small, easy to write, and well suited for the application, the Java toolset readily handled the tasks of machine and job management over the nodeindependent distributed computing system. For 3MRA, added runtime costs were negligible compared to standalone PC execution, while the benefits delivered represent powerful, efficient model evaluation capabilities. The toolset is generally applicable to similar evaluation efforts for other models, where only a Model Tasker, which parallelizes a model user interface, would need to be developed. Alternatively, integration of any model into FRAMES 2.0 could be used as a path to apply SuperMUSE and the supporting software tools directly. As the example of Benzene disposal showed, 3MRA, together with SuperMUSE capabilities, embodies a robust, integrated, probabilistic risk assessment strategy for protection of both ecological and human health, and assessment of alternative strategies for hazardous waste identification and management.

Acknowledgements The authors are indebted to the support and expertise provided during development of 3MRA by Barnes Johnson, Stephen Kroner, David Cozzie, and Zubair Saleem of USEPA/OSWER/OSW, and Gerard Laniak, Robert Ambrose, and Donna Schwede of USEPA/ ORD/NERL. The authors wish to also thank Gene Whelan and Mitch Pelton at Pacific Northwest National

Beck, M.B., 1999. Coping with ever larger problems, models, and databases. Water Science Technology 39 (4), 1–11. Brightwell, R., Fisk, L.A., Greenberg, D.S., Hudson, T., Levenhagen, M., Maccabe, A.B., Riesen, R., 2000. Massively parallel computing using commodity components. Parallel Computing 26 (2–3), 243– 266. Chen, J., Beck, M.B., 1999. Quality Assurance of Multi-Media Model For Predictive Screening Tasks. USEPA, EPA/600/R-98-106. Cruz, J., Park, K., 1999. Toward performance-driven system support for distributed computing in clustered environments. Parallel Computing and Distributed Process 59 (2), 132–154. Cullen, A.C., Frey, H.C., 1999. Probabilistic Techniques in Exposure Assessment: A Handbook for Dealing with Variability and Uncertainty in Models and Inputs. Plenum Press, New York. Funtowicz, S.O., Ravetz, J.R., 1990. Uncertainty and Quality in Science for Policy. Kluwer Academics, Dordrecht, The Netherlands. Gropp, W., Lusk, E., Skjellum, A., 1999. Using MPI: Portable Parallel Programming With The Message-Passing Interface. 2nd ed. MIT Press, Cambridge, MA. Laure, E., 2001. OpusJava: a Java framework for distributed high performance computing. Future Generation Computer Systems 18 (2), 235–251. Marin, C.M., Guvanasen, V., Saleem, Z.A., 2003. The 3MRA risk assessment framework – a flexible approach for performing multimedia, multipathway, and multireceptor risk assessments under uncertainty. Human and Economic Risk Assessment 9 (7), 1655–1678. PNNL, 1999. Overview of the FRAMES-HWIR Technology Software System. PNNL-11914, vol. 1. Pacific Northwest National Laboratory, Richland, WA. Robert, C.P., Casella, G., 1999. Monte Carlo Statistical Methods. Springer-Verlag, New York. SAB, 2003. Project #03-13: U.S. Environmental Protection Agency Science Advisory Board’s Executive Committee (EC) Multimedia Multipathway Multireceptor Risk Assessment (3MRA) Modeling System Peer-Review Panel, 1400A, 1200 Pennsylvania Avenue, Washington, DC. Saltelli, A., Chan, K., Scott, E.M., 2000. Sensitivity Analysis. John Wiley & Sons, West Sussex, England. Sunderam, V., Geist, A., Dongarra, J., Manchek, R., 1994. The PVM concurrent computing system – evolution, experiences, and trends. Parallel Computing 20 (4), 531–545. Zar, J.H., 1999. Biostatistical Analysis. 4th ed. Prentice-Hall, Upper Saddle River, NJ.

Suggest Documents