Dell s High Performance Computing Clusters

Summit Strategies QuickTake Product and Strategy Overview Dell’s High Performance Computing Clusters Who would ever have thought that Dell, with a pro...
Author: Lionel Kelly
4 downloads 0 Views 811KB Size
Summit Strategies QuickTake Product and Strategy Overview Dell’s High Performance Computing Clusters Who would ever have thought that Dell, with a product line consisting of x86-based servers that scale only up to 4-way configurations, would become a market leader in high-performance supercomputing? Surely, this can not be true. Everyone knows that high-performance supercomputers must be built using tightly-coupled, monolithic, single system designs—right? According to IDC statistics, Dell is now number three in “bright cluster” revenue (a bright cluster is a pre-configured, high-performance computing (HPC) configuration that behaves and is sold like a monolithic system). And Dell is number one in node volume! This means that Dell has figured out how to make HPC a volume-based business—exactly the kind of business that Dell has become extremely proficient in exploiting. Further, due to significant price advantages and straightforward deployment packaging, former big iron, monolithic systems buyers are now buying into Dell’s HPC cluster design architecture. How did Dell manage this feat? What does Dell’s high performance computing cluster (HPCC) architecture look like? How does Dell’s approach compare to its competition? These are but a few of the questions that we consider as we examine Dell’s strategy and product offerings in this HPCC Product and Strategy Overview.

Dell in High Performance Computing: The Big Picture At an industry/financial analyst event in New York City on April 2, 2003, Michael Dell, CEO of Dell Inc., and Larry Ellison, CEO of Oracle, revealed their combined vision of the future of information systems design. They described an architecture based on the use of standards-based, low-cost 2-way and 4-way clustered server building blocks capable of delivering a combination of unlimited scalability and strong application/database performance to information technology (IT) systems buyers. They claimed that this small server/cluster design architecture would rival traditional vertically-scaled SMP systems in terms of overall performance—at a significantly lower price. As it turns out, they were quite correct in their assertion. Clustered, standards-based, building block architectures can now challenge traditional multiple processor, tightly-coupled, verticallyscaled computing designs. As of June 2005, over 300 of the world’s top 500 supercomputers are using a clustered rack-mounted server approach—some achieving teraflop processing speeds (a teraflop represents a trillion of floating-point calculations per second). These rack/cluster designs now directly rival multiple processor/single system designs in terms of overall compute power. Yet these rack/cluster designs cost significantly less than their big iron brethren, are significantly easier to expand and can easily incorporate older servers into clusters (providing ongoing investment protection). August, 2005

©2005 Summit Strategies

Page 1

Dell’s High Performance Computing Clusters Dell has “bet the farm” on this building block approach to scalability (what Dell calls “the scalable enterprise”). And Dell’s performance in HPC is one indication that its technology strategy is succeeding.

Market Positioning In information technology markets, Dell is primarily known for producing high-quality, standardsbased PCs and servers—using volume manufacturing as a means to create distinct competitive price advantage. But, until recently, Dell has not been known as a supercomputer provider. However, with big wins at the National Center for Supercomputing Applications (NCSA), the University of Sherbrooke, Brigham Young University and other research institutions, Dell has established a solid presence in the high-performance supercomputer market. In fact, Dell has now deployed twenty-one of the top 500 supercomputers in the world according to Top 500 Supercomputers’ June 2005 survey (survey results can be found at http://www.top500.org/).

Competitive Positioning High-performance computers are designed to maximize processing speed in order to solve problems that are either very complex or massive in nature. Typical applications include modeling (for instance, weather forecasting), animated graphics, nuclear research, seismic research, and oil and gas exploration. Systems at the highest end of the HPC marketplace (positions 1-19 of the rankings) use monolithic, multiple processor/tightly-coupled single system designs or clusters of such designs featuring IBM Power, Intel Itanium or NEC 64-bit microprocessors. But starting at position twenty, clustered rack systems comprised of multiple industry standard x86-based processors in 2-way, rack-mounted configurations start to appear en masse. (Intel processors can now be found in 333 of the world’s top 500 supercomputers.)

One of the key differences between large monolithic and clustered x86 HPC systems is price. By comparison, one of the world’s largest supercomputers, Japan’s Earth Simulator, cost $350 million to deploy and delivers 35.86 teraflop performance. By contrast, the 9.81 teraflop clustered system architected by Dell for the NCSA costs around $14 million when fully deployed.

Another important difference among supercomputers is which operating environments they use. Unix used to dominate the supercomputer marketplace—but over the past few years Linux has made a strong showing.

Product Offerings A close look at Dell’s high-performance computing clusters reveals that Dell uses a building block approach to configuring turnkey HPC hardware solutions. Furthermore, Dell has done a stellar job pre-testing packaged HPCC solutions.

August, 2005

©2005 Summit Strategies

Page 2

Dell’s High Performance Computing Clusters Dell HPCC: The Basic Design Figure 1 provides an overview of Dell’s HPCC design. Its basic components consist of 1U (1.75 inches tall) rack-mounted or blade servers interconnected using a choice of three high-speed intracluster communications switches (InfiniBand, Gigabit Ethernet or Myrinet)—all controlled by a master node that provides management functions.

Figure 1: Dell’s High-Performance Computing Cluster Design

Its compute nodes consist of Dell PowerEdge SC1425 (dual Xeon, 1U), PowerEdge 1850 (dual Xeon, 1U), and/or blade architecture based 1855 servers. Dell has been able to optimize the performance of these servers in rack/blade configurations. Accordingly, it offers the fastest Xeon-based HPCC configuration on the market today according to the June 2005 Top 500 Supercomputer list. Its storage technologies include SCSI and Fibre Channel for attachment to Dell PowerVault or Dell/EMC storage. Intra-cluster switch connections include Gigabit Ethernet, InfiniBand and Myrinet. Dell’s HPCC design enables a master node to manage clusters using a KVM (keyboard,

August, 2005

©2005 Summit Strategies

Page 3

Dell’s High Performance Computing Clusters video, mouse) switch, an out-of-band management network (in order to activate Dell Remote Assistant Cards or Baseboard Management Controller for management purposes), and/or an in-band management network that can be used to provide additional means for monitoring and managing compute nodes. A Complete, Integrated Infrastructure Stack But there’s more to building an HPC environment than interconnecting compute, storage and management nodes. Depending upon the type of application a buyer wishes to run, numerous other decisions need to be made in order to build an HPC system optimized to produce the fastest results. Some of these decisions include: Which operating environment to use; What level of fault tolerance is required; Which compilers, debuggers, math libraries and performance tools should be used to optimize application performance; How should jobs be scheduled and controlled; Which file system should be used; and Which management tools and applications are needed. A close examination of Dell’s HPCC design shows that Dell supports two operating environments— and that Dell provides several different options for middleware (especially message passing interface), development/tuning tools, schedulers, file systems and management tools (see Figure 2). Operating system support includes Linux and Windows—although all of Dell’s major HPCC installs are Linux today (we expect Dell to become far more aggressive in Windows HPCC when Microsoft releases its Windows Server 2003 Compute Cluster Edition later in 2005). Message passing interface options that help users utilize the aggregate computational power include MPICH, MPICH-GM, MPI/LAM, and PVM (parallel virtual machine). Support for these architectures is extremely important in environments where applications are processed in parallel and need to be executed in an efficient and scalable manner. Application optimization and performance tuning tools include various compilers and math libraries—as well as an MPI analyzer/profiler, a debugger, and other performance analyzers and optimizers. Like the above components, job scheduling is extremely important in massively parallel computing environments. Dell uses open source and third-party independent software vendor (ISV) products (such as Platform Computing’s LAVA and LSF scheduler and middleware) for scheduling and managing jobs efficiently across multiple nodes. From a file systems perspective, Dell’s HPCC environment supports numerous file system environments including NFS, PVFS (parallel virtual file system), and IBRIX’ Fusion file system. For more details on several of these file systems, visit: http://www.dell.com/downloads/global/power/ps2q05-20040179-Saify-OE.pdf.

August, 2005

©2005 Summit Strategies

Page 4

Dell’s High Performance Computing Clusters Figure 2: Beyond the Hardware: A Complete Infrastructure Stack

In short, Dell has assembled (and tested) a complete ecosystem of infrastructure components needed to build and deploy integrated, high performance computing clusters. Platform Rocks for Easy Deployment For some IT buyers, having access to numerous components that have been tested and verified to run on Dell servers only partially meets their needs. The really hard work in building an HPC cluster (and often the majority of the cost) is in performing component integration, deployment, and performance tuning and optimization. Many of these customers would prefer to use a pre-integrated infrastructure/toolset to build their HPC environment. To address this issue, Dell has partnered with Platform Computing (a leading grid middleware maker) to market and support a standardized, integrated cluster management environment known as Platform Rocks. Platform Rocks is based on the National Partnership for Advanced Computational Infrastructure (NPACI) Rocks developed by the San Diego Supercomputing Center. Platform Rocks is a comprehensive cluster management toolkit that blends open source clustering solutions with commercial closed-source solutions in order to enable IT architects to rapidly assemble, integrate and manage large-scale Linux cluster environments. And best of all, Platform Rocks is free of charge for customers that have a Red Hat Enterprise Linux license (although an Annual Cluster Care package that includes value-added services, 24/7 commercial support and upgrades is available on a chargeable basis from Platform Computing).

August, 2005

©2005 Summit Strategies

Page 5

Dell’s High Performance Computing Clusters Dell’s Real Value: Pre-Tested/Validated Packaging As mentioned, Dell has established HPC as a volume business—exactly the type of business at which Dell excels. By combining validated sets of components into pre-packaged HPC clustered solutions, Dell has found a way to simplify the design, ordering and deployment of high-end, scalable Linux clusters, and thus change the HPC business from a custom design business to a volume-oriented business. Dell’s basic pre-packaged HPCC solutions use rack or blade servers, a choice of high-speed intra-cluster interconnects, and a management console. With this basic design, Dell is able to create 8, 16, 32, 64, 128 and 256-node PowerEdge rack server bundles that support both IA32 and EM64T for 32-bit and 64 bit computing; or 10, 20, 40, 70, 130 and 260-node PowerEdge blade server configurations. Upon this hardware base, Dell provides its OpenManage suite of management software, its Baseboard Management Controller (for remote server management), as well as intelligent platform management interface (IPMI) capability. Red Hat Linux is the only operating system used in Dell’s HPCC pre-tested bundle. Upon this operating environment, Dell packages a few of Platform Computing’s products including Platform Rocks for ease of deployment and LSF for job scheduling. Dell also supplies systems monitoring tools, the IBRIX Fusion cluster file system and integrated MPI for message passing as well as other related drivers. Dell also offers HPCC professional services, including both pre-sales and post-sales support. Finally, for those customers who want to roll-their-own clusters, Dell also provides custom HPCC professional services. Dell’s packaging and support offerings are illustrated in Figure 3. Figure 3: Dell’s HPCC Packaging

August, 2005

©2005 Summit Strategies

Page 6

Dell’s High Performance Computing Clusters

Summary Observations Over the past two years Dell has made strong progress in the HPC marketplace. The company now has twenty-one Linux HPCC clusters ranked in the world’s top 500 supercomputers. Dell is also experiencing great success with smaller configurations in markets such as life sciences (for instance, in drug discovery), financial (including banking and financial analysis), manufacturing (including automotive and other complex designs), and in the energy field (seismic analysis, oil and gas exploration). One factor contributing to Dell’s success in these fields is that the company has spent a lot of time and effort making it easy for its customers to design, deploy and manage their HPCC environments. Dell was the first major computer maker to create packaged HPC bundles. And the company’s investment in creating pre-packaged HPC solutions combined with simplified ordering, easy deployment and ongoing support is paying back in increased market share. But HPCC buyers must be aware that hardware speed and pricing are not the whole game when it comes to high-performance computing. They must factor in additional costs for software applications that can exploit the speed of the underlying hardware. Still, Dell’s lead in HPC is not a given. While the vendor has systems design, deployment, tuning and testing expertise, it relies heavily on partners for middleware and application design/deployment. Market leadership will require Dell to grow the size of its HPC professional services organization. And while Dell has partnered extremely closely with Platform Computing, it has not partnered as closely with other leading grid middleware vendors (such as DataSynapse, United Devices, etc.). These other vendors bring expertise in financial and scientific markets—and could help Dell expand its market reach. When all is said and done, Dell has done a remarkable job “reinventing” the HPC marketplace. Its Xeon-based HPC clusters are best-in-class from a performance perspective, its building block approach is creative and its packaging is quite innovative. And, we expect vast improvements in overall system performance with the advent of dual-core Xeons by year end. With this performance edge and its innovative packaging, we expect that Dell will greatly improve its HPC marketshare over the forthcoming year.

Summit Strategies (617) 266-9050

© Summit Strategies All rights reserved

August, 2005

Summit Strategies QuickTakes are high-level overviews of recent announcements, industry trends or product positioning overviews. They are not intended to provide the level of analysis or depth of a vendor's broad overall strategy or the competitive environment that we provide in full Summit Strategies reports. www.summitstrat.com

©2005 Summit Strategies

Page 7