GemFire Enterprise Data Fabric for Grid Computing

GEMFIRE GRID COMPUTING USE CASE THE ENTERPRISE DATA FABRIC GemFire Enterprise Data Fabric for Grid Computing To exploit fluctuating markets, manage ...
1 downloads 0 Views 232KB Size
GEMFIRE GRID COMPUTING USE CASE

THE ENTERPRISE DATA FABRIC

GemFire Enterprise Data Fabric for Grid Computing To exploit fluctuating markets, manage

options pricing, and trading. Today,

WHAT IS GRID COMPUTING?

growth, and outrun competitors, today's com-

Compute Grids are surfacing in investment

panies bank on information technology to

banks, brokerage houses, and trading floors,

The integration of various techniques to address the problem of efficiently utilizing dispersed resources is called a "grid"-a name arising by analogy with the grid that supplies ubiquitous access to electric power. Grid Computing is a term coined to describe the aggregation of large amounts of computing resources which can be geographically dispersed to tackle large problems and workloads as if all the servers and resources are located in a single site. Grid Computing enables businesses to form virtual, collaborative organizations that share applications and data in an open heterogeneous server environment in order to work on common problems.

help them keep global customers, business

utilizing spare CPU cycles from available

partners, and information workers connected

nodes and then applying aggregate compute

in real-time with the end goal of maximizing

capacity to speed business operations. But

operational efficiency.

the expansion of Grid technology into application domains beyond embarrassingly paral-

As a way of dramatically enhancing opera-

lel computation is limited today by a simple

tional efficiency and business agility while

reality: the absence of high performance data

lowering TCO, IT departments are discovering

services for Grid Computing.

Grid Computing. Grid Computing aggregates all IT resources into an enterprise-wide, virtu-

FILLING GAPS IN THE GRID

al resource pool. Resources can then be

Nearly all business process depend on reli-

dynamically provisioned across lines-of-busi-

able, fast access to shared data. As the ability

ness to accommodate fluctuating load and to

to move computation on-demand throughout

reliably fulfill demanding service level agree-

the enterprise increases with the rise of Grid

ments. With Grid technology, IT depart-

computing, so too does the need for distrib-

ments maximize the utility of their current IT

uted access to shared data. High performance

assets thus minimizing the need for future IT

business processes depend on matching com-

spending, ultimately enabling businesses to

putation with concomitant working data sets

deliver their most consistent, highest quality-

and the sharing of such results transparently

of-service.

across the enterprise where subsequent workflow logic can be executed. Business process-

Conceived as a powerful approach to simula-

es thus require load balancing data across

tion and problem solving, Grid computing

available storage, as well as replicating data to

has sprouted from its historical roots in the

sites where computational resources can be

world of scientific computation to the world

used most efficiently (to minimize waits

of business where it is quickly flourishing as a

before jobs are run, and cut down on network

competitive way to speed embarrassingly par-

utilization).

allel applications in risk management,

GEMFIRE GRID COMPUTING USE CASE

High performance data services address the quality-of-service

tion decouples application clients and services providers

requirements currently limiting Grid Computing:

allowing the service providers to advertise service-level-agreements with varying quality-of-services. From a data manage-

• From a performance viewpoint, the Grid needs data locali-

ment perspective, exposing data sources as services likewise

ty. The proximity of data to computation is paramount for

decouples computation from data dependencies enabling

speeding computation, minimizing network bandwidth,

application logic and data sources to be modified separately.

and for enabling JIT information integration. Data services

A service-oriented view of data ultimately creates data virtual-

enable co-location of business computation with business

ization. Data virtualization offers location transparency

data essential for high throughput.

enabling application clients to discover and access data independent of the physical location of the data. The data may

• From a reliability viewpoint, the Grid needs high-availabili-

be local or remote to the requesting client application

ty guarantees. Fine-grained RAID-like mirroring at the

process. Data virtualization also provides heterogeneity trans-

object-graph level is needed for data safety and availability.

parency enabling applications to remain insulated from

Data services provide replication capabilities vital for redun-

underlying raw data formats. Thus, by defining a common,

dancy and load balancing.

unified data access interface, data services ultimately enable simplified programmatic access, faster development times,

• From a scalability viewpoint, the Grid needs ways to keep

and an extensible, future-proof architecture.

larger and larger data volumes operationally on-tap for ready access by distributed business processes. By caching

EMERGING DATA STANDARDS FOR THE GRID

sedentary data trapped in high latency storage devices and

GemStone Enterprise Data Fabric (EDF) Grid Computing

transforming data from arcane formats, data services pro-

The GemFire EDF provides high performance caching high

vide caching and transformation capabilities that enable

availability and data distribution capabilities for a grid.

scale out of datasets into an operational fabric that spans

GemFire combines memory from physically distributed Grid

the Grid.

nodes into a single, extensible enterprise-wide distributed cache enabling any process to reliably share, store, replicate,

High performance data services for the Grid manage impor-

transform, route, and synchronize large volumes of data

tant runtime quality-of-service issues and create design-time

across the Grid in real-time. By reliably caching and integrat-

efficiencies. Exposing data sources as services-that is, network

ing dynamic events with static data from multiple Grid data

enabled resources that can deliver capabilities through mes-

sources and repeatedly serving up that information at high

saging-enables Service-Oriented Architectures (SOA). SOA

speed from synchronized copies placed near information

efficiencies are based on the economies of scale rooted in

stakeholders, GemFire dramatically speeds-up query perform-

homogeneity. Services define a well-defined contract for

ance, improves resiliency, and conserves system resources. By

identifying, discovering, invoking, and provisioning any IT

centralizing integration logic and insulating business process-

asset. Because of this consistent representation, IT resource

es from underlying changes in data formats, GemFire ensures

pools can be created in SOA runtime environments that

optimal reuse of unified data views and minimizes brittle,

respond dynamically to requested load. This creates a layer

point-to-point connections to origin data sources, ultimately

of abstraction between the capabilities proffered by IT services

reducing code complexity, time-to-market, and IT costs.

and the dynamic, underlying implementation. This abstrac-

GEMFIRE GRID COMPUTING USE CASE

Technical benefits of GemFire include:

GemFire works synergistically with Compute Grids providing the following benefits:

• Data locality—removes latency; replicates data to idle CPUs; minimizes network utilization; offloads EII federa-

• Data-Aware Routing. A Compute Grid job scheduler con-

tion work

nected to GemFire can manage the initial distribution of

• On-demand data provisioning—dynamic provisioning of

data among processing units to ensure that required data is

caching services instances on any Grid node to accommo-

present or en-route as jobs are dispatched. As tasks finish,

date load-balancing requests and to enable intelligent

the scheduler can either direct new jobs to processing units

scheduling of work jobs to specific data sets

that are already equipped with the right data, or it can ini-

• Dynamic data partitioning—enables management of large

tiate additional data distribution as needed. A scheduler

volumes of data on distributed nodes with dynamic scale-

able to exploit detailed knowledge of data locality can min-

out or shrinkage of memory based on need.

imize both the total data distribution load on the network

• Data virtualization-single—system image access to data via

by sending only needed information instead of supersets,

logical namespace; aggregate view of data with automated

and it can maximize processor utilization by reducing the

consistency

time processors must spend waiting for data.

• Scale-out—increased operational access to large volumes of data offloads long running, expensive analysis tasks

• Data-based Dynamic Computation Control. Compute

• Notification—complex event processing framework enables signaling between workflow activities

Grid processing nodes plugged into GemFire can share intermediate results and progress metrics with one another

• Security—SLAs can guarantee user account sand-boxing

and with the job scheduler. This enables fine-grained glob-

• High Availability—RAID-like mirroring at the object level

al adjustment of computational tasks for maximum effi-

avoiding the need for costly data reconfiguration during

ciency. In the context of Monte Carlo simulations, early

hardware servicing

and efficient feedback may make it possible to remove unprofitable paths from consideration to save time. For

GEMFIRE AND COMPUTE GRIDS

example, the job scheduler might elect to forego assigning

GemFire EDF complements Compute Grids by serving as a

certain tasks, or it could even choose to abort unpromising

high-speed operational fabric for storing, transporting, synchronizing, and reusing data specifically routed to individual compute task nodes. Without high performance data services, Compute Grid applications degener-

GemFire aggregates OGSA Grid Data Services

ate into problem set domains which require little or no

solutions to the transportation and synchronization of compute data. GemFire creates transparent access to shared data that can be easily load-balanced across a

on

data, or problem sets which require manual ftp-like GemFire can access data from anywhere on the Grid without having to know physical resource location

R e p li

ca

ti

GemFire

Compute Grid, providing distributed caching to support both intra-node and inter-node collaboration.

Grid Data Service

Re

p li c

a tio n

Grid Data Service

GEMFIRE GRID COMPUTING USE CASE

How GemFire Complements Compute Grid

Integration) can be expensive, brittle, and incur unacceptable latency. Live connections to databases are sus-

Engines

Clients

Engine 1 Task

ceptible to network partitions leading to inconsistent Region 1 in

Region 1 out

Region 2 in

Region 2 out

response times and quality of service. Database

Stores calculation data for input and output

resources can also be potentially overwhelmed by voluminous ad-hoc integration requests. GemFire provides

Workload Manager

Task

Engine 2

Direct client access to results

traditional EII solutions with a complementary integration strategy by enabling data locality. For long run-

Task

Globus Platform Computing Data Synapse

Engine 3

Region N in

Region N out

ning data analysis queries and data mining operations,

Control plane for signling job complection

GemFire can replicate Grid data sources to offload EII services, ultimately delivering better responsiveness

GemFire

and more consistent quality of service. Data freshness can be controlled by declarative synchronization polijobs already in progress. Furthermore, application code can

cies and cache loaders.

be modified to distribute progress information via GemFire on a periodic basis. This enables the job scheduler to build

SUMMARY

detailed knowledge, for example, about specific server char-

It's an exciting time for information technology. The conflu-

acteristics very early in a processing run so that subsequent

ence of web services, virtualization, and service-oriented

job assignments can be made with increasing intelligence.

architectures are leading to a new paradigm in resource utilization and program development called Grid computing.

• Non-blocking Interactions with Clients. It may undesirable for clients to block while waiting for long Compute

The ultimate goal of Grid computing is simple: to increase

Grid jobs to finish. This situation can be easily avoided if a

operational efficiency by making IT cheaper, better, and

client is connected to GemFire. The client need only

faster. CIOs are banking on Grid technology to lower TCO by

spawn a separate thread that waits on a GemFire cache lis-

optimizing IT resource utilization, to improve customer expe-

tener which will be triggered by the scheduler on job com-

riences through dynamic personalization, and to maximize

pletion.

business velocity through instant global access to integrated information.

How GemFire Aggregates Information Across the Grid

GEMFIRE AND GRID INFORMATION INTEGRATION

Client Systems -remote access by any line of business

Aggregated Market Data

For wide-area information integration challenges found on the Grid, real-time connectivity to Front Office

origin data sources commonly INTERNET

used by federation technologies like EII (Enterprise Information

Firewall Front Office

Front Office

GDS 1

Stocks

GDS 2

Currency

GDS 3

Counterparty

GDS 4

Bank Identification

GDS 5

Custodians

GEMFIRE GRID COMPUTING USE CASE

GemFire complements existing IT ecosystems

GemFire EDF provides high performance data caching, distribution and replication for Grids that helps transform promises into reality. GemFire virtualizes Grid data resources allowing them to be transparently accessed

Grid Engines

Grid Data Service

GemFire

from remote clients and to be provisioned on-demand. GemFire maximizes data resource utilization across the

C/C++ Apps

C

Grid ultimately leading to better quality of service allowing IT planners to dynamically accommodate changing load and to satisfy their most competitive

Reference Data Historical Data Market Data

App Servers EJB/ Servlet

service level agreements. The combination of scalabiliWorkflow

Cach

RDB

e Loader

XML Java

BPEL

Prejoined Data Materialized Query Tables

ty, high throughput, resource distribution, remote access, and data replication are key ingredients for a

Gr id Da t a Se r v ice

L Cache oade r

Unstructured Content

Ell

JMS

Messaging

reusable data service for high performance Grid applica-

CRM/SCM/ ERP

tions. GemFire high performance data services ultimately enables businesses to execute, analyze, and adapt their most data-intensive processes to next-generation Grid architectures.

WHAT IS OGSA? Open Grid Service Architecture (OGSA) is standard for a standard Grid system architecture based on web services jointly proposed by the Globus Project and IBM. OGSA integrates Grid computing and Web services technologies by using the Web Services Description Language (WSDL) to achieve self-describing, discoverable services and interoperable protocols, with extensions to support multiple coordinated interfaces and change management. Within OGSA, everything is represented as a Grid service: a Web service that provides a set of well-defined interfaces and that follows specific conventions. The interfaces address discovery, dynamic service creation, lifetime management, notification, and manageability; the conventions address naming and upgradeability. Grid services are not only a static set of persistent services; they can also be transient service instances such as a query against a database, a data mining operation, a network bandwidth allocation, a running data transfer, and an advance reservation for processing capability. There may be one or more instances of a particular Grid service.

Corporate Headquarters: 1260 NW Waterhouse Ave., Suite 200 Beaverton, OR 97006 | Phone: 503.533.3000 | Fax: 503.629.8556 | [email protected] | www.gemstone.com

Regional Sales Offices: New York | 90 Park Avenue 17th Floor New York, NY 10016 | Phone: 212.786.7328 Washington D.C. | 3 Bethesda Metro Center Suite 778 Bethesda, MD 20814 | Phone: 301.664.8494 Santa Clara | 2880 Lakeside Drive Suite 331 Santa Clara, CA 95054 | Phone: 408.496.0242 Copyright© 2005 by GemStone Systems, Inc. All rights reserved. GemStone®, GemFire™, and the GemStone logo are trademarks or registered trademarks of GemStone Systems, Inc. Information in this document is subject to change without notice. 09/05