GEMFIRE GRID COMPUTING USE CASE
THE ENTERPRISE DATA FABRIC
GemFire Enterprise Data Fabric for Grid Computing To exploit fluctuating markets, manage
options pricing, and trading. Today,
WHAT IS GRID COMPUTING?
growth, and outrun competitors, today's com-
Compute Grids are surfacing in investment
panies bank on information technology to
banks, brokerage houses, and trading floors,
The integration of various techniques to address the problem of efficiently utilizing dispersed resources is called a "grid"-a name arising by analogy with the grid that supplies ubiquitous access to electric power. Grid Computing is a term coined to describe the aggregation of large amounts of computing resources which can be geographically dispersed to tackle large problems and workloads as if all the servers and resources are located in a single site. Grid Computing enables businesses to form virtual, collaborative organizations that share applications and data in an open heterogeneous server environment in order to work on common problems.
help them keep global customers, business
utilizing spare CPU cycles from available
partners, and information workers connected
nodes and then applying aggregate compute
in real-time with the end goal of maximizing
capacity to speed business operations. But
operational efficiency.
the expansion of Grid technology into application domains beyond embarrassingly paral-
As a way of dramatically enhancing opera-
lel computation is limited today by a simple
tional efficiency and business agility while
reality: the absence of high performance data
lowering TCO, IT departments are discovering
services for Grid Computing.
Grid Computing. Grid Computing aggregates all IT resources into an enterprise-wide, virtu-
FILLING GAPS IN THE GRID
al resource pool. Resources can then be
Nearly all business process depend on reli-
dynamically provisioned across lines-of-busi-
able, fast access to shared data. As the ability
ness to accommodate fluctuating load and to
to move computation on-demand throughout
reliably fulfill demanding service level agree-
the enterprise increases with the rise of Grid
ments. With Grid technology, IT depart-
computing, so too does the need for distrib-
ments maximize the utility of their current IT
uted access to shared data. High performance
assets thus minimizing the need for future IT
business processes depend on matching com-
spending, ultimately enabling businesses to
putation with concomitant working data sets
deliver their most consistent, highest quality-
and the sharing of such results transparently
of-service.
across the enterprise where subsequent workflow logic can be executed. Business process-
Conceived as a powerful approach to simula-
es thus require load balancing data across
tion and problem solving, Grid computing
available storage, as well as replicating data to
has sprouted from its historical roots in the
sites where computational resources can be
world of scientific computation to the world
used most efficiently (to minimize waits
of business where it is quickly flourishing as a
before jobs are run, and cut down on network
competitive way to speed embarrassingly par-
utilization).
allel applications in risk management,
GEMFIRE GRID COMPUTING USE CASE
High performance data services address the quality-of-service
tion decouples application clients and services providers
requirements currently limiting Grid Computing:
allowing the service providers to advertise service-level-agreements with varying quality-of-services. From a data manage-
• From a performance viewpoint, the Grid needs data locali-
ment perspective, exposing data sources as services likewise
ty. The proximity of data to computation is paramount for
decouples computation from data dependencies enabling
speeding computation, minimizing network bandwidth,
application logic and data sources to be modified separately.
and for enabling JIT information integration. Data services
A service-oriented view of data ultimately creates data virtual-
enable co-location of business computation with business
ization. Data virtualization offers location transparency
data essential for high throughput.
enabling application clients to discover and access data independent of the physical location of the data. The data may
• From a reliability viewpoint, the Grid needs high-availabili-
be local or remote to the requesting client application
ty guarantees. Fine-grained RAID-like mirroring at the
process. Data virtualization also provides heterogeneity trans-
object-graph level is needed for data safety and availability.
parency enabling applications to remain insulated from
Data services provide replication capabilities vital for redun-
underlying raw data formats. Thus, by defining a common,
dancy and load balancing.
unified data access interface, data services ultimately enable simplified programmatic access, faster development times,
• From a scalability viewpoint, the Grid needs ways to keep
and an extensible, future-proof architecture.
larger and larger data volumes operationally on-tap for ready access by distributed business processes. By caching
EMERGING DATA STANDARDS FOR THE GRID
sedentary data trapped in high latency storage devices and
GemStone Enterprise Data Fabric (EDF) Grid Computing
transforming data from arcane formats, data services pro-
The GemFire EDF provides high performance caching high
vide caching and transformation capabilities that enable
availability and data distribution capabilities for a grid.
scale out of datasets into an operational fabric that spans
GemFire combines memory from physically distributed Grid
the Grid.
nodes into a single, extensible enterprise-wide distributed cache enabling any process to reliably share, store, replicate,
High performance data services for the Grid manage impor-
transform, route, and synchronize large volumes of data
tant runtime quality-of-service issues and create design-time
across the Grid in real-time. By reliably caching and integrat-
efficiencies. Exposing data sources as services-that is, network
ing dynamic events with static data from multiple Grid data
enabled resources that can deliver capabilities through mes-
sources and repeatedly serving up that information at high
saging-enables Service-Oriented Architectures (SOA). SOA
speed from synchronized copies placed near information
efficiencies are based on the economies of scale rooted in
stakeholders, GemFire dramatically speeds-up query perform-
homogeneity. Services define a well-defined contract for
ance, improves resiliency, and conserves system resources. By
identifying, discovering, invoking, and provisioning any IT
centralizing integration logic and insulating business process-
asset. Because of this consistent representation, IT resource
es from underlying changes in data formats, GemFire ensures
pools can be created in SOA runtime environments that
optimal reuse of unified data views and minimizes brittle,
respond dynamically to requested load. This creates a layer
point-to-point connections to origin data sources, ultimately
of abstraction between the capabilities proffered by IT services
reducing code complexity, time-to-market, and IT costs.
and the dynamic, underlying implementation. This abstrac-
GEMFIRE GRID COMPUTING USE CASE
Technical benefits of GemFire include:
GemFire works synergistically with Compute Grids providing the following benefits:
• Data locality—removes latency; replicates data to idle CPUs; minimizes network utilization; offloads EII federa-
• Data-Aware Routing. A Compute Grid job scheduler con-
tion work
nected to GemFire can manage the initial distribution of
• On-demand data provisioning—dynamic provisioning of
data among processing units to ensure that required data is
caching services instances on any Grid node to accommo-
present or en-route as jobs are dispatched. As tasks finish,
date load-balancing requests and to enable intelligent
the scheduler can either direct new jobs to processing units
scheduling of work jobs to specific data sets
that are already equipped with the right data, or it can ini-
• Dynamic data partitioning—enables management of large
tiate additional data distribution as needed. A scheduler
volumes of data on distributed nodes with dynamic scale-
able to exploit detailed knowledge of data locality can min-
out or shrinkage of memory based on need.
imize both the total data distribution load on the network
• Data virtualization-single—system image access to data via
by sending only needed information instead of supersets,
logical namespace; aggregate view of data with automated
and it can maximize processor utilization by reducing the
consistency
time processors must spend waiting for data.
• Scale-out—increased operational access to large volumes of data offloads long running, expensive analysis tasks
• Data-based Dynamic Computation Control. Compute
• Notification—complex event processing framework enables signaling between workflow activities
Grid processing nodes plugged into GemFire can share intermediate results and progress metrics with one another
• Security—SLAs can guarantee user account sand-boxing
and with the job scheduler. This enables fine-grained glob-
• High Availability—RAID-like mirroring at the object level
al adjustment of computational tasks for maximum effi-
avoiding the need for costly data reconfiguration during
ciency. In the context of Monte Carlo simulations, early
hardware servicing
and efficient feedback may make it possible to remove unprofitable paths from consideration to save time. For
GEMFIRE AND COMPUTE GRIDS
example, the job scheduler might elect to forego assigning
GemFire EDF complements Compute Grids by serving as a
certain tasks, or it could even choose to abort unpromising
high-speed operational fabric for storing, transporting, synchronizing, and reusing data specifically routed to individual compute task nodes. Without high performance data services, Compute Grid applications degener-
GemFire aggregates OGSA Grid Data Services
ate into problem set domains which require little or no
solutions to the transportation and synchronization of compute data. GemFire creates transparent access to shared data that can be easily load-balanced across a
on
data, or problem sets which require manual ftp-like GemFire can access data from anywhere on the Grid without having to know physical resource location
R e p li
ca
ti
GemFire
Compute Grid, providing distributed caching to support both intra-node and inter-node collaboration.
Grid Data Service
Re
p li c
a tio n
Grid Data Service
GEMFIRE GRID COMPUTING USE CASE
How GemFire Complements Compute Grid
Integration) can be expensive, brittle, and incur unacceptable latency. Live connections to databases are sus-
Engines
Clients
Engine 1 Task
ceptible to network partitions leading to inconsistent Region 1 in
Region 1 out
Region 2 in
Region 2 out
response times and quality of service. Database
Stores calculation data for input and output
resources can also be potentially overwhelmed by voluminous ad-hoc integration requests. GemFire provides
Workload Manager
Task
Engine 2
Direct client access to results
traditional EII solutions with a complementary integration strategy by enabling data locality. For long run-
Task
Globus Platform Computing Data Synapse
Engine 3
Region N in
Region N out
ning data analysis queries and data mining operations,
Control plane for signling job complection
GemFire can replicate Grid data sources to offload EII services, ultimately delivering better responsiveness
GemFire
and more consistent quality of service. Data freshness can be controlled by declarative synchronization polijobs already in progress. Furthermore, application code can
cies and cache loaders.
be modified to distribute progress information via GemFire on a periodic basis. This enables the job scheduler to build
SUMMARY
detailed knowledge, for example, about specific server char-
It's an exciting time for information technology. The conflu-
acteristics very early in a processing run so that subsequent
ence of web services, virtualization, and service-oriented
job assignments can be made with increasing intelligence.
architectures are leading to a new paradigm in resource utilization and program development called Grid computing.
• Non-blocking Interactions with Clients. It may undesirable for clients to block while waiting for long Compute
The ultimate goal of Grid computing is simple: to increase
Grid jobs to finish. This situation can be easily avoided if a
operational efficiency by making IT cheaper, better, and
client is connected to GemFire. The client need only
faster. CIOs are banking on Grid technology to lower TCO by
spawn a separate thread that waits on a GemFire cache lis-
optimizing IT resource utilization, to improve customer expe-
tener which will be triggered by the scheduler on job com-
riences through dynamic personalization, and to maximize
pletion.
business velocity through instant global access to integrated information.
How GemFire Aggregates Information Across the Grid
GEMFIRE AND GRID INFORMATION INTEGRATION
Client Systems -remote access by any line of business
Aggregated Market Data
For wide-area information integration challenges found on the Grid, real-time connectivity to Front Office
origin data sources commonly INTERNET
used by federation technologies like EII (Enterprise Information
Firewall Front Office
Front Office
GDS 1
Stocks
GDS 2
Currency
GDS 3
Counterparty
GDS 4
Bank Identification
GDS 5
Custodians
GEMFIRE GRID COMPUTING USE CASE
GemFire complements existing IT ecosystems
GemFire EDF provides high performance data caching, distribution and replication for Grids that helps transform promises into reality. GemFire virtualizes Grid data resources allowing them to be transparently accessed
Grid Engines
Grid Data Service
GemFire
from remote clients and to be provisioned on-demand. GemFire maximizes data resource utilization across the
C/C++ Apps
C
Grid ultimately leading to better quality of service allowing IT planners to dynamically accommodate changing load and to satisfy their most competitive
Reference Data Historical Data Market Data
App Servers EJB/ Servlet
service level agreements. The combination of scalabiliWorkflow
Cach
RDB
e Loader
XML Java
BPEL
Prejoined Data Materialized Query Tables
ty, high throughput, resource distribution, remote access, and data replication are key ingredients for a
Gr id Da t a Se r v ice
L Cache oade r
Unstructured Content
Ell
JMS
Messaging
reusable data service for high performance Grid applica-
CRM/SCM/ ERP
tions. GemFire high performance data services ultimately enables businesses to execute, analyze, and adapt their most data-intensive processes to next-generation Grid architectures.
WHAT IS OGSA? Open Grid Service Architecture (OGSA) is standard for a standard Grid system architecture based on web services jointly proposed by the Globus Project and IBM. OGSA integrates Grid computing and Web services technologies by using the Web Services Description Language (WSDL) to achieve self-describing, discoverable services and interoperable protocols, with extensions to support multiple coordinated interfaces and change management. Within OGSA, everything is represented as a Grid service: a Web service that provides a set of well-defined interfaces and that follows specific conventions. The interfaces address discovery, dynamic service creation, lifetime management, notification, and manageability; the conventions address naming and upgradeability. Grid services are not only a static set of persistent services; they can also be transient service instances such as a query against a database, a data mining operation, a network bandwidth allocation, a running data transfer, and an advance reservation for processing capability. There may be one or more instances of a particular Grid service.
Corporate Headquarters: 1260 NW Waterhouse Ave., Suite 200 Beaverton, OR 97006 | Phone: 503.533.3000 | Fax: 503.629.8556 |
[email protected] | www.gemstone.com
Regional Sales Offices: New York | 90 Park Avenue 17th Floor New York, NY 10016 | Phone: 212.786.7328 Washington D.C. | 3 Bethesda Metro Center Suite 778 Bethesda, MD 20814 | Phone: 301.664.8494 Santa Clara | 2880 Lakeside Drive Suite 331 Santa Clara, CA 95054 | Phone: 408.496.0242 Copyright© 2005 by GemStone Systems, Inc. All rights reserved. GemStone®, GemFire™, and the GemStone logo are trademarks or registered trademarks of GemStone Systems, Inc. Information in this document is subject to change without notice. 09/05