Architectural Models for Deploying and Running Virtual Laboratories in the Cloud

Architectural Models for Deploying and Running Virtual Laboratories in the Cloud E. Afgan1,2, A. Lonie3, J. Taylor1, K. Skala2, N. Goonasekera3,* 1 J...
2 downloads 1 Views 367KB Size
Architectural Models for Deploying and Running Virtual Laboratories in the Cloud E. Afgan1,2, A. Lonie3, J. Taylor1, K. Skala2, N. Goonasekera3,* 1

Johns Hopkins University, Biology department, Baltimore, MD, USA Ruder Boskovic Institute (RBI), Centre for Informatics and Computing, Zagreb, Croatia 3 University of Melbourne, Victorian Life Sciences Computation Initiative, Melbourne, Australia [email protected], [email protected], [email protected], [email protected], [email protected] 2

Abstract - Running virtual laboratories as software services in cloud computing environments requires numerous technical challenges to be addressed. Domain scientists using those virtual laboratories desire powerful, effective and simple-to-use systems. To meet those requirements, these systems are deployed as sophisticated services that require a high level of autonomy and resilience. In this paper we describe a number of deployment models based on technical solutions and experiences that enabled our users to deploy and use thousands of virtual laboratory instances.

I.

INTRODUCTION

The past decade has seen cloud computing go from conception to the de facto standard platform for application deployment. Cloud infrastructures are delivering resources for deploying today’s applications that are more scalable [1], more cost-effective [2], more robust [3], more easily managed [4], and more economical [5]. Researchers and research groups are no different from the rest of the industry, expecting robust, powerful cloud platforms capable of handling their data analysis needs. However, deploying such platforms still requires a significant amount of effort and technical expertise. In this paper, we build on our experiences from 5+ years of building and managing virtual laboratories that were deployed thousands of times on clouds around the world. We present viable architectural deployment models and extract best practices for others developing or deploying their own versions of robust research platforms. The theme of this paper revolves around deploying the concept of a virtual laboratory as a platform for performing data analysis [6]. Virtual labs offer access to a gamut of data analysis tools and workflow platforms that are closely linked to commonly used datasets; they offer access to scalable infrastructure that has been appropriately configured, beforehand as well as dynamically at runtime. Once built, the virtual labs are often deployed on demand by the researchers themselves. However, in order to make these platforms available to domain researchers, there is a requirement to build, configure, and provision the necessary components. Depending on the complexity of a virtual lab, this is often a complex task spanning expertise in system administration, platform development, and domain-specific application setup. In addition to deploying variations of said virtual labs on public clouds, institutions are increasingly setting up academic clouds (e.g., NeCTAR, Chameleon, JetStream). Locality of the infrastructure, restrictions on off-shoring data [7], avoiding vendor lock-in and the no-cost or merit-

based allocation of resources are attractive reasons for utilizing those clouds. From a platform deployment standpoint, this brings up additional challenges because the platforms need to be deployed, managed, maintained, and supported on these additional clouds while coping with any differences among the cloud providers. It is hence imperative to design scalable, robust and cloud agnostic models for deploying these systems. Figure 1 captures the core concepts enabling development of such models: (a) a cross-cloud API layer; (b) automation; (c) a configurable and ‘composable’ set of resources. These concepts, detailed in the remainder of the paper, embody the notion that for successfully building a global virtual lab a common platform rooted in automation is needed.

VL

VL

VL

Resource set

Resource set

Resource set

Automated build and deploy Cloud 1 API Cloud 1

Common API Cloud 2 API Cloud 2

Cloud n API Cloud n

Figure 1. Virtual lab deployment stack unified for multiple clouds.

II.

FUNCTIONAL REQUIREMENTS

The choice of a Virtual Lab architecture is driven by a variety of aspects and cross-cutting concerns [8]. While some of these decisions are general architectural decisions applicable to software in general, and some are highly specific to the domain in question, there are some concerns which are applicable to virtual labs in general. In this section, we provide a treatment of such concerns, and list some of the various architectural concerns that must be addressed in designing and developing a virtual lab environment. For example, a virtual lab would need to determine the level of customisation that is required by a user. If a significant customisation is required, it is often the case that it will impact other users, and therefore, isolated or

* Corresponding author

298

MIPRO 2016/DC VIS

individualised access to resources is preferable over access to a common pool of shared resources. For example, a userowned container or virtual machine, as opposed to a predeployed web service. Similarly, small job sizes can typically be catered to by a single, individualised VM, whereas large job sizes may require an architecture that can dynamically scale to accommodate more diverse needs. The choice of appropriate strategy is dependent on several additional factors, including the purpose of the TABLE I. Challenges

virtual lab, target cloud(s) capabilities, available people effort commitment and similar. Table 1 supplies a core list of design questions to answer when weighing the available options. Note that there is no single answer to the supplied questions but they are largely dependent on the aims of the virtual lab. Deciding what are the acceptable answers for the particular lab will help guide the myriad of technical choices related to implementation. In the sections that follow, we discuss various compute and data provisioning strategies that can accommodate these decisions.

FUNCTIONAL DESIGN QUESTIONS TO CONSIDER WHEN DESIGNING A VIRTUAL LAB.

Description

Infrastructure Infrastructure maturity

Which cloud to use? How stable/mature is the infrastructure? Will the deployed lab be robust for users?

Infrastructure agnosticism

How easy is it to support multiple infrastructure providers? Is this desirable/necessary to increase accessibility/robustness?

Support

What type of support does the provider offer, for the virtual lab and individual users?

User Management Per-user customisation

Can each user customise the virtual lab according to their needs, and have a safe environment in which to learn through failure?

Data management

How is data put into/taken out of the virtual lab?

Quota management

What resource quotas should be enforced for the user? Does the infrastructure provider support that?

Users’ Management of VL Instance lock-in

Is an upgrade path available so that the user can always use the latest version of the virtual lab?

Replicability

Can the user replicate their experiments - with a guarantee that all software versions remain unchanged?

Reliability

How reliable should the service be? Can losses be tolerated?

Service Management Software management

Can a user manage the software on the virtual lab on their own, or do they need system administration skills?

Licensing constraints

Are there specific licensing constraints that limit the use of the software?

Security Authentication

Should the virtual lab allow for single-signon with institutional credentials?

Credentials

How are institutional credentials translated into cloud provider credentials?

Authorization

What actions are users allowed to perform within a virtual lab?

III.

VIRTUAL LAB INFRASTRUCTURE COMPONENTS

Answering above questions and provisioning a virtual lab requires a marriage of various, complex software components to their required storage and processing

MIPRO 2016/DC VIS

resource requirements. Depending on the intended usage for the virtual lab, there are a number of choices regarding the use of appropriate cloud resources. Table 2 provides a snapshot of the available approaches for supplying compute capacity along with the pros and cons for each option.

299

TABLE II. Provisioning strategy

COMPUTE PROVISIONING STRATEGIES.

Description

Pros

Cons

Machine Image

A pre-built machine image with all required software already installed.

• Quick startup • Excellent reproducibility

• Difficult to upgrade due to monolithic nature • Software packages not self-contained causing potential version conflicts • Software potentially tied to OS version • Limitations on size • Breached applications may affect entire machine

Container

A pre-built container (such as Docker or LXC), which is deployed on top of a running Machine Image or a cloud container service.

• Extremely quick startup • Excellent reproducibility • Containers mostly independent of underlying machine’s operating system and version • Easier updates to individual components • Breaches contained to container

• Must be pre-built • Still very new so quickly changing technology

Runtime

Required software installed at runtime, using automation software such as Ansible, Chef, Puppet, etc.

• Push or Pull update models • Updates easier to make

• Slow deployment/startup times • Less reproducible (transient network errors, software version changes)

Hybrid

A pre-built machine image/container for quick startup, brought up-to-date through runtime deployment/ extensibility.

• Can benefit from the advantage of all of the above models

• More complex to implement

Compute resources need to be matched with suitable storage capacity. Table 3 differentiates among the currently available cloud storage resource types and captures the pros and cons of each. The supplied information examines ways that data can be brought to the compute infrastructure, since this is still the dominant way to work with existing TABLE III. Storage Model

scientific software. We do not consider the reverse model as many virtual labs still struggle to shed the weight of their accumulated legacy software that require a shared, UNIXbased file-system to run, and are not yet in a position to take advantage of such models despite their benefits.

DATA PROVISIONING STRATEGIES FOR GETTING REQUIRED STATIC DATA TO THE COMPUTE.

Description

Pros

Cons

Volumes / Snapshots

A volume or snapshot containing the required data, which is attached to an instance at runtime.

• Quick to create/attach • Suitable for large amounts of data

• Not shareable between clouds (in OpenStack, not shareable between projects) • Not guaranteed to be available (e.g., infrastructure/quota restrictions) • Limited sharing ability between nodes (e.g., volumes only attachable to 1 instance at a time)

Shared POSIX filesystem

A shared filesystem containing the required data (e.g. NFS, Gluster), which is mounted on the target node at runtime.

• One-time setup • Very fast to attach • Updates visible at runtime to all virtual lab instances • Suitable for very large amounts of data

• Must be setup on each supported cloud • Centralised management/single point of failure • Not geographically scalable

Remotely fetched data archive

An http/ftp link containing the required data, which is downloaded and extracted onto local/transient storage.

• Cloud agnostic • Scalable

• Slow - takes a long time to fetch and extract • Not suited to very large amounts of data - to reduce download times/costs

Object-store

Object-storage service provided by the cloud provider (e.g. S3, Swift).

• High scalability

• Not suitable for random access • Not supported by legacy tools

300

MIPRO 2016/DC VIS

IV.

have appropriate access to the cloud provider where the image is available and must personally launch an instance of the virtual lab; various launcher applications can make this a straightforward process. Once launched, the user will have full control over the virtual lab services but will also need to manage the services, particularly when upgrades or fixes are necessary. In addition to managing the services, the user is in charge of data management, ensuring that the data is not lost when an instance is terminated. Virtual lab providers need to bundle the virtual lab into an instantiatable image, such as a virtual machine image or a container, and provide periodic upgrades to the image. Examples include CloudBioLinux [10];

DEPLOYMENT OPTIONS

The resources available and required to compose a virtual lab can be assembled in a variety of configurations. Configurations support different use cases and require varying levels of technical complexity to deploy. Hence, depending on the intended purpose of the virtual lab, it is important to choose an appropriate deployment model. We define the following deployment models and supply a flowchart in Figure 2 to navigate among the models: •



Centrally managed resource is a virtual lab which is presented as a public service to the community. Typically available as a web portal, this virtual lab requires little or no setup from the user’s side and permits the user to readily utilize resources offered by the virtual lab. Because it is a public resource, the user is likely to experience limited functionality, such as usage quotas, no ssh access, no possibility for customisation and other similar constraints typical of public services. While the users do not require any setup for this type of virtual lab, the lab maintainers need to manage and update underlying infrastructure supporting the supplied services. Resource management needs to account for the scaling of the supplied services, upgrades, and reproducibility of user’s results. In addition to accessibility, other main drivers for choosing this model for lab deployment are data management restrictions (in case data is too large for feasible sharing) and software licensing constraints. Examples of this type of virtual lab include XSEDE science gateways (https://portal.xsede.org/ web/guest/gateways-listing), Characterisation Virtual Lab (https://www.massive.org.au/cvl/), usegalaxy.org portal [9]; Standalone image represents a feature-full version of the virtual lab in a small package. A user is required to Will the virtual lab be used as a shared, non-customisable community service?

No

Yes

Yes

Standalone VM/container from an image

Are the number of users and workload sizes predictable? Yes

Are the anticipated workloads small?



Persistent short-lived scalable cluster is a dynamically scalable version of the virtual lab image with additional services to handle infrastructure scaling. These services (i.e., cluster management services) are used to provision a virtual cluster at runtime (e.g., Slurm, SGE, Hadoop) or utilize cloud provider services for scaling (e.g., container engine). The cluster manager software will also supply additional cluster management services, such as cluster persistence, allowing a user to shutdown the cluster when not in use while ensuring the data is preserved. This deployment model requires coordination of several resource types from Section 3 and use of cluster management software, hence implying a significant deployment effort from the virtual lab deployers. Examples include the Genomics Virtual Lab (GVL) [11];



Long-lived scalable cluster has the same characteristics of a short-lived cluster as well as the ability to upgrade running services. The upgrades are typically handled by the cluster management software.

No

Are the data analysis needs periodic?

No

Long-lived scalable virtual cluster

Yes Persistent short-lived scalable virtual cluster

No

Statically sized centrally managed resource

Dynamically scalable centrally managed resource Figure 2.

V.

Data provisioning strategies for getting required static data to the compute.

DISCUSSION

In addition to the hardware and functional requirements for establishing a virtual lab, there are other important technical and management decisions that affect its

MIPRO 2016/DC VIS

deployment. One of the key attractions for using virtual labs is the high-level, software-as-a-service experience delivered to a user. An implication is that the functions offered by the deployed services need to function well,

301

which drives a need for good testing strategies and quality assurance. However, complex services and frequent releases make this a challenge for the deployers. It is hence advisable to automate the testing procedure and develop a quality control process before each release. Ideally, the testing process is decentralized, testing the individual services for proper functionality and focusing the testing of the virtual lab on the configuration setup. Such testing can be achieved by adopting a user-centric view of the virtual lab and using tools such as Selenium to automate the typical user actions [12]. Further, virtual labs should be complemented with a set of training materials, describing all the steps required to access the services supplied by a virtual lab. Additional training materials for using the services are also beneficial, particularly if they are accompanied with webinars or handon workshops. Besides the technical implementation of the virtual lab, arguably the most challenging piece of a virtual lab is longterm support. Domain researchers will rely on the virtual lab to perform data analyses and publish new knowledge based on the obtained results. For reproducibility purposes, it is hence important to maintain their access to the resources required to use a virtual lab. When using a commercial cloud provider and supplying shared resources, note should be taken to questions of what happens when the project funding runs out or software being used becomes obsolete. Such questions imply that the virtual lab should make appropriate provisions to allow all the data and accompanying methods to be downloaded or transferred off the infrastructure initially used. Related to the long term support is the notion of upgradeability. For example, a user working with a particular version of a virtual lab, may wish to upgrade to the latest available version. It is generally undesirable to foist an upgrade on users, as this can adversely affect reproducibility when software versions are changed. Therefore, a controlled migration or exit path is often necessary so that users can switch to newer versions of a virtual lab when appropriate for their circumstances. VI.

SUMMARY

With the increased proliferation of cloud computing infrastructures, we believe the concept of a virtual lab - a composite platform capable of performing open-ended data analyses - will become a prevalent platform for utilizing cloud resources by researchers. In this paper we’ve described the components required to compose a virtual lab. Technical and managerial aspects of the decision making process have been presented shining light on the tradeoffs among viable options. Looking to the future, it is expected that the concept of a virtual lab will continue to evolve towards a more integrated, quickly deployable system that is instantly accessible by users. Containers, automation solutions, and serverless runtime platforms are likely key technologies that will be adopted to realize this evolution. ACKNOWLEDGMENTS This project was supported in part through grant VLS402 from National eCollaboration Tools and

302

Resources, grant eRIC07 from Australian National Data Service, grant number HG006620 from the National Human Genome Research Institute, and grant number CA184826 from the National Cancer Institute, National Institutes of Health. REFERENCES [1]

Q. Zhang, L. Cheng, and R. Boutaba, “Cloud computing: state-of-the-art and research challenges,” J. Internet Serv. Appl., vol. 1, no. 1, pp. 7–18, Apr. 2010. [2] M. Armbrust, A. Fox, R. Griffith, A. Joseph, and RH, “Above the clouds: A Berkeley view of cloud computing,” Univ. California, Berkeley, Tech. Rep. UCB , pp. 07–013, 2009. [3] G. Garrison, S. Kim, and R. L. Wakefield, “Success factors for deploying cloud computing,” Commun. ACM, vol. 55, no. 9, p. 62, Sep. 2012. [4] G. Garrison, R. L. Wakefield, and S. Kim, “The effects of IT capabilities and delivery model on cloud computing success and firm performance for cloud supported processes and operations,” Int. J. Inf. Manage., vol. 35, no. 4, pp. 377–393, Aug. 2015. [5] S. P. Ahuja, S. Mani, and J. Zambrano, “A Survey of the State of Cloud Computing in Healthcare,” Netw. Commun. Technol., vol. 1, no. 2, p. 12, Sep. 2012. [6] S. D. Burd, X. Luo, and A. F. Seazzu, “CloudBased Virtual Computing Laboratories,” in 2013 46th Hawaii International Conference on System Sciences, 2013, pp. 5079–5088. [7] J. J. M. Seddon and W. L. Currie, “Cloud computing and trans-border health data: Unpacking U.S. and EU healthcare regulation and compliance,” Heal. Policy Technol., vol. 2, no. 4, pp. 229–241, Dec. 2013. [8] A. Garcia, T. Batista, A. Rashid, and C. Sant’Anna, “Driving and managing architectural decisions with aspects,” ACM SIGSOFT Softw. Eng. Notes, vol. 31, no. 5, p. 6, Sep. 2006. [9] E. Afgan, J. Goecks, D. Baker, N. Coraor, A. Nekrutenko, and J. Taylor, “Galaxy - a Gateway to Tools in e-Science,” in Guide to e-Science, X. Yang, L. Wang, and W. Jie, Eds. Springer, 2011, pp. 145–177. [10] K. Krampis, T. Booth, B. Chapman, B. Tiwari, M. Bicak, D. Field, and K. Nelson, “Cloud BioLinux: pre-configured and on-demand bioinformatics computing for the genomics community,” BMC Bioinformatics, vol. 13, p. 42, 2012. [11] E. Afgan, C. Sloggett, N. Goonasekera, I. Makunin, D. Benson, M. Crowe, S. Gladman, Y. Kowsar, M. Pheasant, R. Horst, and A. Lonie, “Genomics Virtual Laboratory: A Practical Bioinformatics Workbench for the Cloud.,” PLoS One, vol. 10, no. 10, p. e0140829, Jan. 2015. [12] E. Afgan, D. Benson, and N. Goonasekera, “Testdriven Evaluation of Galaxy Scalability on the Cloud,” in Galaxy Community Conference, 2014.

MIPRO 2016/DC VIS

Suggest Documents