Grid Computing: Addressing Today s Business Issues

Grid Computing: Addressing Today’s Business Issues. A Viewpoint by Deloitte Consulting LLP Grid Computing: The Next Step Addressing Today’s Business...
Author: Jemima Park
1 downloads 0 Views 770KB Size
Grid Computing: Addressing Today’s Business Issues. A Viewpoint by Deloitte Consulting LLP

Grid Computing: The Next Step Addressing Today’s Business Issues with Grid During the 1990s, productivity growth accelerated and has remained historically high due to a number of factors, including developments in information technology and business processes. This boom in IT has helped lead to productivity gains and new market opportunities for the world’s major economies. Some have gone so far as to refer to this boom as a technology revolution but, in fact, it seems to be more of an evolution. And, the evolution continues as organizations look again to IT to address issues of efficiency, speed, and cost savings for supporting infrastructure. Grid computing could be the next step in dramatically improving IT operations and leveraging existing capability to support the enterprise. Ask any group of IT professionals what grid computing is and the benefits it provides and you will get a wide range of definitions and opinions. Everyone is in agreement that, for IT to realize its great promise of streamlining business processes, it needs to provide business solutions of greater flexibility and speed with an eye toward optimizing resources. To quote an old axiom, IT needs to lead the way in helping businesses do “more with less.” Many organizations are moving in this direction and away from a hardware-based, or a supply-side, perspective. Demand-side solutions are needed, that is, better alignment between business and IT. Grid

computing offers the flexibility to direct computing resources to support the business, making underutilized capacity available to process critical business transactions and to support new IT enabled business solutions.

Defining Grid Computing At its most basic, grid computing can be defined as a virtual environment in which an application no longer is reliant on a single, selfcontained system, but rather has access to the computing resources it needs, as those

resources are needed. Grid environments are collections of dispersed resources that can be shared to help address specific business requirements. Grid computing is often compared to our electrical grid. We take what is needed, and only what is needed, to perform a specific task, be it providing light, listening to music, or drying our hair in the morning. The rest is there to be utilized by others. Grid is frequently compared to other technologies, or even considered to be similar to them.

Technology

Similarities

Differences

Virtualization

A grid enables components to be viewed as a similarly available pool of resources. Work can be performed on any of them in a like manner

Virtualization is usually on a single system or a set of similar components managed to change overall unit size. Unlike virtualization, a grid pulls together heterogeneous components from anywhere, even globally, to form an IT service delivery pool

Clusters

Clusters and grids bring computing resources together to solve a single problem and can have single management tools for all the components

Clusters are typically made up of components of similar hardware and software while grids are made of disparate hardware possibly separated by vast distances and running different operating environments

Peer-to-peer

Both can share files and communicate directly with one another, sometimes through the implementation of a central broker

Grids can use many-to-many relationships to share a variety of resources other than just files

The web

A grid keeps all the complexity hidden. It connects dissimilar resources to each other and enables a single unified view of the environment

A grid allows machines work together and collaborate rather than just talk to each other

1

Grid Computing: The Next Step

The value comes from working more efficiently, optimizing existing resources, and delivering on your service promises now, rather than hours from now.

While compute-grids (the sharing of CPUs) are most commonly thought of as the definition of “grid,” there are actually many more implementations of grid technologies. Information grids, for example, provide virtualized, federated, and consolidated views of enterprise information while other grids exist that share memory and network resources— even this is not a comprehensive list. For the purposes of assessing the highest impact on business, we’ll limit our discussion to computational grids.

The Value of Grid Imagine a server requiring many minutes or even hours, at full capacity, to perform a particularly complex and critical calculation in which the results are needed right away. Now imagine that server being able to access the idle capacity of another server that is not operating at its peak. Add in the computing capability of dozens, hundreds, or even thousands of machines and you can see the real value of grid computing—providing the ability to do things that could not previously have been considered and in less time. The value is not in doing the one small piece more efficiently, it’s the net result of the overall workflow that can deliver a real business difference. As organizations evaluate the business lifecycle and identify the component processes that are part of their automated support systems, those component processes—each with its own workload attributes—represent functions or services that deliver end value to the business. By applying grid technology to some or many of those components, the overall time to completion can be shortened. All of these “little” items add up. They have an upstream and downstream correlation and impact other processes.

2

In our experience, grid clients have found that by taking typically long-running processes, cutting the delivery time through grid computing, and transforming these processes into real-time transactions, new revenue could be generated. Conversely, clients in other situations discovered that no matter how much hardware is thrown at the problem, they were not achieving the desired results in a timely fashion. The latter would be an example of a typical “supply-side” solution. Grid computing is a “demand-side” solution—addressing the business environment’s demand for greater computing power and speed. Such improvements enable business transformation without reengineering the overall process. This is the real “ah ha” that leading companies discover with grid computing. They realize it’s much more than driving up server utilizations— although that might have been the starting point. The value comes from working more efficiently, optimizing existing resources, and delivering on your service promises now, rather than hours from now. Bottom line: the analysis to identify these opportunities is fairly easy to perform in a few days to a couple of weeks with the right participants, and the benefits can continue for years to come.

The “Other” Workload Pattern There are two main types of business process patterns that grid computing technology supports. These business process patterns are typically components or subcomponents of a larger workload. “Parallel execution” is the first of the two workload patterns—the one that most people think of—when considering grid. In reality this is only about 20 percent of where grid technology can provide value.

The other workload pattern is “high concurrency.” High concurrency patterns are typical request/respond processes where multiple requests for similar functions or services are made. Examples might include an application server environment, a publish/subscribe (pub/sub) environment, or an enterprise server environment. Eighty percent of observed grid implementations are of this pattern but initial grid projects rarely consider these operations as applicable to grid-enablement. “Parallel execution” is one-to-many, i.e., one request results in many tasks that get spread out and executed over a broad environment. The workload is fairly predictable. Every hour, every day, every week, every month, a business process is performed and resources are consumed while performing a particular analysis, running a report, doing a batch job, or running other processes of this nature. These workloads can be real-time or batch with a typical work pattern of load, execute, read, and write. They are measured in seconds and frequently time out over minutes or hours—and sometimes even longer. When you apply grid to such parallel processing models the objective is typically to collapse execution times and to create a higher load on resources—to use more of the available capacity and leave fewer idle assets.

Grid Computing: The Next Step

Across industries, high performance computing in the form of parallel processing is being performed over grids. In financial services it includes risk analysis, valuations and simulations; in electronics it includes semiconductor design; in industrial manufacturing it’s crash and clash simulation; in life sciences it’s drug discovery research; in energy it’s seismic processing and reservoir simulation; in utilities it’s statement generation; and in entertainment and media, it’s graphics processing and rendering over grid-enabled server infrastructure. There are many other grid opportunities that are overlooked. For example, there are opportunities to apply grid to unstructured data processing. This may take the form of PDF rendering, converting XML data into and out of databases, or creating compressed, or ZIP, files as part of reducing file size before network transmission. Though it’s not typically considered, “high concurrency” can achieve similar or even greater benefits from a grid computing environment. While parallel execution represents the one-to-many model, high concurrency is the many-to-one model. Simply put, high concurrency is multiple requests for the same or similar service (versus the same service making many requests). Requesting a bank balance is such an example. The requester doesn’t care who else may be making such a request, just that his or her request is satisfied in the expected timeframe. During their request another thousand such requests might be made, with all having to be satisfied with the same level of service. These tasks or services are typically short-lived and their volume may be very volatile and unpredictable. They might be milliseconds in nature but at very high volumes. Thus, it is important that the service or application not be loaded every time the request is made.

Loading the application or service with every request would result in constant reprovisioning of the service to meet each request. The related cost, overhead, and latency can create significant performance issues. As volumes peak, the situation only gets worse by multiplying the latency. Grid computing provides a solution. Grid nodes can maintain their “state” for a predefined duration and additional nodes can be provisioned if loads exceed certain thresholds. In other words, grid management tools can ensure that the code required to be executed remains loaded even if there are no immediate requests. In both models (parallel execution and high concurrency), creating a solution that has horizontal scalability, as in grid computing, provides value to the business. Rather than having to upgrade larger SMP servers at considerable expense, the grid can grow its capacity through the addition of lower-cost servers, such as Blades. As opposed to the traditional model of shared computing, the grid processing model allows for improvements in granular growth, application scalability, resiliency, and the tying of business needs to IT performance. Furthermore, this growth can be obtained through idle resources or from third-parties. The objective of looking at any new technology should be to answer the question, “how can improvements be made to change the way the organization processes, services, bills, and supports its users and their applications?” Addressing this organizational issue can determine where grid technology can most effectively be applied.

Grid is Now We often hear that grid is not ready for the production environment. Yet leading companies are using grid computing in production today, measured in tens-ofthousands of nodes, or CPUs, and spanning multiple continents. Even now, grid environments are on par with traditional enterprise application servers and growing. Grid has now become a better alternative to service execution than traditional application server environments, largely due to grids’ horizontal scaling, inherent resiliency, and dynamic alignment to business priorities. The application server environment requires that applications execute on specific servers and therefore necessitate their availability. Growth is typically through upgrade of the single servers and management of the workloads is a labor-intensive activity. Grid middleware is most often the software enabler of a grid environment. Some companies have created custom solutions that they consider grid solutions. Custom grid solutions are usually specifically written into an application to provide grid-like benefits. These take considerably longer and cost more to develop and maintain than commercially available middleware. The one-off nature of the custom solution may prohibit moving to an enterprise-wide solution and potential inclusion in a utility environment. Proceeding along a strategy of custom developed grid solutions might maintain technology silos, perpetuate inefficiencies, and result in continued pools of unused capacities.

3

Grid Computing: The Next Step

Grid middleware typically sits in the application layer and is not a resource management play as many consider. There are many tools in the marketplace today that perform resource management functions independently of grid. Grid middleware provides additional business value when integrated with resource management tools enabling further optimization of IT performance against business rules. Single application or departmental grid environments certainly provide value in a point solution. Building multiple grids in multiple organizational silos simply maintains inefficiencies. Only when organizations broadly share and pool resources, and consider grid as part of a shared service, can all the IT and business benefits be achieved. Enterprise grid deployment strategies are where the IT leaders are today. Getting serverhuggers to share and getting to an enterprise-enabled architecture is a political and emotional battle – not a technical one.

Getting From Here to There IT organizations have traditionally built out their infrastructures (and for the most part still do) from the supply-side up (providing hardware and software on which to run applications), without fully considering the application environment. The IT organization tries to improve on this today through server consolidations that expand utilization by focusing on tools that partition the hardware into more granular components. By implementing a “service-contract view,” organizations can realize the benefits of a more holistic approach from the demand-side (under what rules and schedules the business processes need to complete).

4

Different users have different needs at different times of the day (or month, or year) and ask for different tasks to be executed. They might be one-to-many, or many-to-one, or both. IT needs to take that demand and apply the business rules to ensure the needed supply is available. This will ensure that supply is matched to the demand based on the rules—on who someone is, what they are, and what resources they need. No longer would the limitations of a user’s hardware impact performance. In the grid environment the limitations are the same and they are imposed by the business. Designing a comprehensive set of rules helps put in place a grid service fabric that enables the creation of a virtual execution platform tied to the operational needs of the business. The business requirements no longer would have technical specifications within them. Instead, the operational needs would reflect the business critical success factors. With such an environmental design, working from demand-down becomes truly possible. The rule definition provided by grid middleware drives real gains in productivity. Frequently there are critical business functions, such as front and middle-office in a financial services company, where IT (application or infrastructure) is sometimes viewed as being an obstacle standing in the way. Really it is IT that is the enabler for a company to compete effectively, or not, and it can have a real impact on company margins. What is needed is simply a way to supply all the IT as dictated by the business demand. And the answer, in our experience, is grid.

Defining the Rules: Service Level Agreements SLA definition is an important component in designing a grid environment. When resources can be shared by anyone (as opposed to being dedicated to a single process) identifying the workflow process limitations, the composition of the workload, the resource requirements, and constraints that the workload can take advantage of becomes critical to ensure the right activities take place at the right times. A “processing channel” can be created that is defined by the business rules, processing rules, resource rules, and priority rules. They can be tasks that are time driven, event driven, or ondemand service driven. But the rules need to be identified and defined to be properly managed and fulfilled in order to meet the business requirements. From the IT supply-side, understanding the state of the infrastructure, what is being requested, and the type of IOs necessary to optimize application performance, resource utilization, and leverage a shared services infrastructure is needed. Without the right analysis and planning, and by just enacting simple priorities, low-priority tasks might not be executed at all while high-priority tasks have access to the entire grid. The appropriate balance needs to be established that maps to the priorities of the business. SLA management in a grid environment is a very important focus area to provide the ability to dynamically alter the configuration of the environment. Grid offers this flexibility. The “dials” that can be used to tune and adjust for the demand, based on time of day, day of the week, or other events, further enhance performance. A “new” environment and set of configurations based on provisioning or changing SLAs can assure that the work is performed in the proper sequence.

Grid Computing: The Next Step

Think of the possibilities. Business volumes can double in this model without purchasing more hardware; and when the business volumes decline the resources work on something else.

The result of this type of analysis, which all businesses should be performing on a regular basis, is an understanding that the old methods of supplying IT services no longer work in today’s dynamic environment. The SLAs that are developed need to be about more than just “availability,” and also should include business rules and metrics supporting the needs of the enterprise, such as when exceptions should be made and what specific events could cause priorities to change. For anyone who doubts the value of grid technologies, this exercise very quickly justifies and explains why grid computing is not some future technology but rather a competitive advantage that must not be postponed.

The Benefits of Hardware Reprovisioning Most IT organizations traditionally “over provision” for peak periods. Some servers are in place only to handle an annual, limited set of tasks and are otherwise idle the rest of the year. When the processing environment is in silos, the excess capacity issue can be compounded. Most likely there are redundant systems at the ready for the “just in case” need but are idle most of the time. All of these resources can be used in the grid until their original purpose or event occurs. Automated tools can be used to drive up supply efficiency, improve overall application performance, and perform with fewer resources that are on the floor now.

Grid’s management components can make requests to provisioning tools for additional resources. When SLAs are approaching the set limits, and the demand is identified as growing beyond what is currently available, additional supply can be provided in order to fulfill SLAs. Once the workload is completed, the provisioning tools return the resources back to a defined state or they can maintain their current configuration. This temporary, or on-demand, scaling-out is only now available as a result of the combination of defined business rules, the virtualization aspects of grid, and the current state of provisioning tools. Think of the possibilities. Business volumes can double in this model without purchasing more hardware; and when the business volumes decline the resources work on something else or are returned. Companies have computing environments with capacities that are designed to handle volumes seen for just a few weeks, yet are idle (and being paid for) 11 months of the year. Many tax preparation systems and online sales systems are designed for short-lived, high-volume resource needs. In addition, grid and provisioning tools can supply the resources without the cost overhead. In the old computing model, such growth would require not only the hardware expense but more importantly the larger expense of the additional system administrators. The grid management tools available today enable the SAs to work smarter and at a higher level with the tools providing the mundane provisioning and configuration tasks. It’s a completely new way to look at how to provide flexible capacity while keeping costs in-line with current business revenue.

Taking the First Step Most of the issues and activities presented so far are focused on understanding the business and when and how it consumes IT resources. Ultimately it is from the business aspects that the greatest benefits are obtained. To get there, however, the first steps are very much technology focused. From an IT perspective implementing a grid solution can be fairly simple and straightforward. The path to grid-enablement is simple in the micro and should be thoughtfully planed in the macro. Individual legacy applications can be grid-enabled in a few hours in one extreme, to a few months in the other extreme, when major architectural, platform, and language changes are parts of the project. Usually, four to six weeks is the length of initial grid projects. Grid-enabling an application is not about rearchitecting. It’s about altering where process flow components are executed. The easiest application to add to a grid is an application that is already distributed (SOA or Web Services) and results in achieving value quickly. Other legacy applications written in almost any language can also achieve results quickly based upon language and workflow patterns. It’s the rare situation where an entire application is sent across a grid. Only certain business logic and data services are extracted and set to a different execution point for the request and fulfillment of that service. But even such obscure environments such as a SmallTalk CICS transaction running under zOS or APL running under Solaris have been initial grid projects that yielded dramatic business results.

5

Grid Computing: The Next Step

Grid computing is all about an evolutionary path from where you are today to a more dynamic, flexible business model enabled by new technologies.

Where to Start Looking

Grid Enabling an Enterprise

There are some basic considerations for selecting the initial applications for a grid project:

As soon as two or more grid projects have been completed, the enterprise needs to begin to break what we call the “serverhugging mentality,” to start crossing organizational boundaries with IT components. When the political and emotional issues associated with this mentality are addressed, small internal utilities (obtaining computing capacity from other than your own resources) can begin to be achieved on a small scale.

• Applications that run in hours or transaction tasks taking longer than 30 to 40 milliseconds are a primary consideration as a candidate to be processed on the grid. The results can be dramatic and achieved quickly. • If an application’s processing volumes are high, and/or becoming greater, the application should be considered as a candidate for running over the grid. In most cases throwing hardware at the problem is the traditional solution. Grid can stop and even reverse that trend very quickly creating a smaller environment that produces better results. • Grid can help address compliance issues (Sarbanes-Oxley, Basel II, SEC filings) that are driving large data volumes and processing to new highs. These need to be managed in specific timeframes and at huge expense and grid can help by guaranteeing the execution at a lower cost. • Large cost cutting consolidation or rightsizing efforts can be more effective by considering the automated workload management and scheduling that gridmiddleware provides. The data are already collected on applications and server location and utilization as part of the cost cutting and rightsizing efforts, so the grid efforts can be less costly here than in other environments. Applying grid concepts will not only improve the project results but it will position the organization to take advantage of more broad resource sharing, inherent business resiliency, and readiness for moving to a utility environment for peak processing needs.

6

Imagine if two grid-enabled applications are on different continents. A “follow-the-moon” model, i.e., work free of time constraints, can be created. Enterprises have huge capacities in “sleeping” servers, the idle resources of the business line that are closed for the evening, just waiting to be used. In many cases the delays caused by increased network latency are completely overshadowed—sometimes by orders of magnitude—by the decreases in elapsed time performance. The consideration needs to be on the overall performance improvement and not on a delay increase in a component function. Does it really matter that half the execution time of a new global gridenabled two-hour process is network latency when the original job took 18 hours? Create a plan that considers all business applications and when and if they can be put onto the grid. Analyze the basic business processes through an application portfolio review and determine response time and time to results on current and parallel workloads. Look at what serial workloads could be turned into parallel workloads. Investigate load and throughput measurements—when and how often do performance peaks and lulls take place? What do those demands look like? Look at the supporting workflows and decompose them to determine bottlenecks and areas which need to be addressed.

Then look ahead to what the next 12 months hold from a volumes perspective—after all, the plan is to address the future demand, not the problems of the past. Take a look from the business side. With a business that wants to grow, IT needs to provide the necessary support to perform faster and cheaper. With the creation of an enterprise virtual resource pool, the organization will have the ability to do more and better. In the old model, server size and cost might have prohibited the creation of applications needed by the business. Now, without the physical limitations of servers, these issues can be addressed. Identify the what-if business situations and understand how this demand will be satisfied. For example, what if business volumes double? How will that demand be fulfilled? It may be that fulfilling this demand might not require the purchase of additional resources.

The Utility Computing Model By decoupling applications from servers and creating pools of shared resources, it is now transparent to applications where they execute. No longer does an application have to run on a specific server; it can run on any appropriate server, regardless of location. When the multiple groups within an enterprise start sharing resources and operating like utilities and customers, the next logical step is to include third parties as the providers of computing capacity.

Grid Computing: The Next Step

The current capacity is most likely at a level to handle some predicted growth above what was the prior peak load. The hardware, software, maintenance, and people costs associated with this capacity are incurred at a fixed rate. It doesn’t matter if the business demand for this supply occurs for just four weeks or all year long, the costs remain fairly fixed. In the grid computing environment, the 80/20 rule applies very well. Plan an infrastructure that can handle the capacity needs for 80% of the business demand. As the need for additional resources occurs look to third parties to provide it. This enables costs to be tied to new business opportunities or revenue growth and will help maintain profitability. When the need goes away the infrastructure is not left with idle resources and expenses. Moving to a true utility model creates a variable cost model which is certainly more attractive than the fixed costs model it replaces. Additionally, should demand grow beyond the anticipated peaks, the link to third-party on-demand capacity providers means seamless processing of unanticipated volumes. There are quite a number of providers with offerings that provide capacity on demand. Costs can be less than a dollar per CPU hour in many cases. There are even some businesses that are looking to establish relationships with companies in different industries with mutually exclusive peak computing needs. By creating a utility with the right terms between two such companies, both can achieve substantial financial benefit. The supplier receives incremental revenue to offset the fixed costs of their idle resources and the consumer pays attractive competitive rates that are less than the total cost of ownership of the equivalent installed hardware and software.

Competitive Advantage Grid computing is not some great panacea that will solve all the business and IT challenges that exist. It fits within a certain area of business workflow processes and does very well there. Those who have worked with grid, and are using it in production, are barely exploiting all the benefits it can provide. Even for the leaders in this space, recognizing the broad applicability of follow-on use is not properly understood.

To achieve the greatest advantage, now is the time to look at grid for your organization. In a few short years most organizations will have grid environments. The benefits can be realized by implementing early and gaining the competitive edge now instead of playing catch-up later.

Grid computing is all about an evolutionary path from where you are today to a more dynamic, flexible business model enabled by new technologies. Grid is a business activity, not a technology play. Understanding the difference and creating appropriate projects will provide the possibility of dramatic business opportunities. The grid journey begins with learning and the enabling of a single application. Then, more enabled applications are added to the grid. The definition of policies and practices establishes orderly and logical sharing. The results are the dramatic cost reductions that can be achieved and the increased market competitiveness that can be gained. Servers are no longer idle. Business processes are completed in less time. Transaction volumes can increase while infrastructure costs get reduced. Service level agreements are met and surpassed. The business transforms. Capacity at this point can come from anywhere. Maybe it’s not just the least expensive, but the one that meets the deadlines set not only by the company but by the regulators.

7

Grid Computing: The Next Step

About the Author Lucian Lipinsky de Orlov Deloitte Consulting LLP Tel: +1 203-905-2679 Email: [email protected] Lucian Lipinsky de Orlov is a Senior Manager in Technology Integration at Deloitte Consulting. He is an expert on grid and utility computing issues and has assisted clients in architecting and executing their grid computing strategies. Mr. Lipinsky de Orlov has written and spoken internationally on the business benefits that grid computing can provide to businesses and how to obtain those benefits. Mr. Lipinsky de Orlov holds a bachelor's degree in computer science and a master’s degree in advanced technology from Binghamton University.

8

About Deloitte Deloitte refers to one or more of Deloitte Touche Tohmatsu, a Swiss Verein, its member firms and their respective subsidiaries and affiliates. As a Swiss Verein (association), neither Deloitte Touche Tohmatsu nor any of its member firms has any liability for each other’s acts or omissions. Each of the member firms is a separate and independent legal entity operating under the names “Deloitte,” “Deloitte & Touche,” “Deloitte Touche Tohmatsu,” or other related names. Services are provided by the member firms or their subsidiaries or affiliates and not by the Deloitte Touche Tohmatsu Verein. Deloitte & Touche USA LLP is the US member firm of Deloitte Touche Tohmatsu. In the US, services are provided by the subsidiaries of Deloitte & Touche USA LLP (Deloitte & Touche LLP, Deloitte Consulting LLP, Deloitte Financial Advisory Services LLP, Deloitte Tax LLP, and their subsidiaries), and not by Deloitte & Touche USA LLP. Copyright © 2005 Deloitte Development LLC. All rights reserved.

Member of Deloitte Touche Tohmatsu