Power Lock Dr. Arshad Ali & Tahir Azim
Grid Computing: The concept of Grid computing has been in the making since the 1960’s, but it took quite some time up to the late 1990’s, with the advent of faster networks and better understanding of distributed computing, that some of those concepts were realized. Even now, the world’s computer networks are very primitive as compared to other networks, such as the electrical power grid, the telephone network, or radio and television networks, which have matured over a long period to provide pervasive access to these resources. Such a kind of pervasive access to computing power is yet to be delivered to people’s homes by contemporary computer networks. The term “Grid computing” is literally based on the electric power “grid” as it promises to provide a ubiquitous source of computing power for users throughout the world, just like the electrical power delivered by electrical power grids. Grid computing aims at sharing the computing resources of organizations and individuals across the globe, to create a single, vast source of computing power. At the same time, it stresses on the security of these shared resources to prevent unauthenticated access and abuse of these resources. In a way, Grid computing is an extension of the World Wide Web (WWW). WWW is currently the world’s source of information, text, images, and multimedia. The Grid aims to become the world’s source of raw computing power and other computing facilities. WWW expanded to revolutionize the way we share information and do business. Similarly, the Grid is now expected to radically transform the way we do our computing - instead of doing our work on our local desktop machines, we will now be able to do computing on a world-wide Grid, without having to know where our jobs are being actually executed. Grid computing concepts may be applied to the world wide distributed set of computing systems, or to a smaller set of computing elements such as a blade server farm located within one organization. Either or both of the topologies can be used, depending on the kind of computational task at hand, and the kind of advantages that we are aiming to achieve. The former topology can act as a source of computing power to users all over the world, who are unable to afford computers of such power themselves. The latter topology can be used by a software company, which needs to do compute-intensive tasks daily, and therefore needs a high-performance computing system that can cheaply provide it with the processing power it requires. Grid computing is about getting computers to work together. Almost every organization is sitting on top of enormous, unused computing capacity, widely distributed. Mainframes are idle 40% of the time. UNIX servers are actually "serving" something less than 10% of the time. And most PCs do nothing for 95% of a typical day. Imagine an airline with 90% of its fleet on the ground, an automaker with 40% of its assembly plants idle, a hotel chain with 95% of its rooms unoccupied. With grid
computing, businesses can optimize computing and data resources, pool them for large capacity workloads, share them across networks, and enable collaboration.
Powering science Applications of Grid computing in the science and the academia have already started to appear in several fields. Grid computing can be of special importance to Pakistan since our scientists, engineers and technicians are currently lagging way behind in the application of science and technology for the benefit of society, due to absence of research culture in our universities. An important reason for this is that research and development requires a large amount of resources, lots of computing equipment and expensive hi-tech equipment that our country can ill afford. Grid computing can step in to provide a way for scientists to carry out heavy computational tasks that normally had to be carried out on supercomputers or mainframes. Some of the fields where Grids are envisaged to play an important role are described below: High Energy Physics (HEP) is one of the prime drivers of Grid technology. HEP is getting ready to face the greatest data deluge in history, as the Large Hadron Collider (LHC) comes online in 2006-2007. This machine, itself a marvel of modern engineering, is expected to generate petabytes (1015 bytes) of data each year. Grid computing will be used as the technology to handle the storage and analysis of this data. This will involve shipping of this data to remote locations around the world for storage, cataloging this data for fast subsequent retrieval and then replication and analysis of smaller subsets of this data to reconstruct the events taking place within the LHC. Physicists throughout the world as well as in Pakistan will be getting a unique opportunity to participate in original research with their international counterparts. This is because without having to build the original machines that can carry out such physics experiments; they will be getting access to actual data from these physics experiments. In this way, Grid computing provides a great window of opportunity for Pakistani physicists to participate in cutting-edge research activities for their country. Astronomy is another field where some interesting work is in progress using the Grid as an enabling technology. The universe is a complicated place, and this is proved by the kind of images that are generated by modern telescopes such as the Hubble Space Telescope. Storing the data generated by these research facilities, processing it, and deducing results from this analysis is a complicated process requiring huge amounts of data storage facilities and processing power. This has been made possible by using the power of Grid computing. This also makes it possible for astronomers who do not have access to such high-tech, expensive research equipment to receive astronomical data and engage in active research. Combinatorial Chemistry is another field that is being greatly influenced by Grid technologies. Chemistry is itself considered a field that has close links with 3D graphics and visualization, because molecular structures, especially those of organic compounds, have complex 3D representations. To compute the exact geometry of a molecule from raw data obtained from experiments is a highly computation-intensive
task, and Grid computing is a very applicable field that is already proving its capabilities for these tasks. Comb-e-Chem, a part of the UK eScience program, is an example of a Grid project for combinatorial chemistry. Genetics is another field that requires tremendous amounts of computing power to unravel its secrets. Discovering the exact genetic map of the human DNA (the human genome) is a very complex task, which cannot be carried out on a single computer, as it involves searching through billions of possibilities to find an exact match. Similarly, finding out the structures of various proteins needed by the human body also involves finding out the correct match from hundreds of billions of combinations of the twenty or so basic amino acids. Grid computing is making this possible now by providing a way to utilize the world’s vast computing resources for this task. Discovery Net and MyGrid of the UK eScience initiative are Grid projects aimed at biological research. In a country like ours where we cannot afford biological equipment for carrying out this research, this is an exciting prospect, as our doctors and biologists can participate in these latest experiments, by getting data from equipment deployed in developed countries, and then processing and analyzing it in local Grids and compute server farms. This will enable our biologists to become true genetic and biological researchers in their own right.
Military Applications: Grid computing has opened up a new era of computing applications for the military. A vivid example of the application of Grids to military requirements is the SF (Synthetic Forces) Express software developed at the California Institute of Technology from 1996-1998. In 1998, this software was able to simulate a war game comprising of 113,000 vehicles in combat simultaneously, using compute clusters at 13 sites spread across the United States. The power of Grid computing enabled the developers to launch jobs at all 13 sites simultaneously, using a script run manually at only one location. This tremendous achievement is just one example of how Grid computing can revolutionize military planning, strategy and decision support. By using the power of hundreds of computers, military analysts and decision makers can simulate entire wars, predict enemy movements, calculate trajectories of enemy missiles, and test the hundreds of possibilities in a war-like situation to take the correct decision. Even more effective can be the use of computing Grids to crack enemy codes and passwords, or block enemy propaganda using DOS attacks. As we shall see later, a new generation of games is now coming up that allows thousands of gamers to participate in a single gaming session at a time. The driving force behind these games is again computational and data Grids. War games based on these games can allow the army to train its personnel (belonging to all the ranks) mentally for real-life war scenarios. This can enable the army men to be well trained to make correct decisions at critical situations. All these cases clearly show that Grids can play a vital role in training and upgrading our armed forces.
Grids for the Economy: A large number of major international software companies have begun to make major investment in Grid computing projects. These companies include Oracle, IBM, Sun and Hewlett-Packard, with many more getting ready to join in the fold. By taking on Grid computing as a major development environment, Pakistani companies can quickly grab a significant share of the world market in this field. They can acquire projects with these high-profile companies, and thus start major software businesses in this country. This will also create employment opportunities for the hundreds of computer graduates being churned out by our universities annually. Thus, getting a foothold in the field of Grid computing as soon as possible should be one of the major priorities of Pakistan’s IT companies. As described in the making of Oracle database 10g, Grid computing enables enterprises to speed up their workflows and simplify deployment of their resources. Instead of having separate clusters or compute servers for different tasks, the same resources can be used repeatedly for different tasks. At the same time, these resources can also be used as personal workstations. Therefore, the amount of infrastructure needed is reduced. According to GridComputingPlanet.com, some companies in the insurance industry are already utilizing grids to cut the run-time of actuarial programs from hours to minutes, allowing them to use risk analysis and exposure information many times a day verses just once. Similarly, IBM has been able to cut a 22-hour run-time down to just 20 minutes by grid enabling the application. Grid computing also allows enterprises to breeze through their regression testing cycles, and therefore reduce development time for their products significantly. This allows enterprises to get faster results, thereby increasing productivity. Pakistan’s small but important aerospace industry can get a major boost by using Grid computing for carrying out its computations and simulations, as this will cause results to be computed faster, calculations to be carried out more efficiently designs to be prepared and implemented more effectively. Rendering processes used in animation and digital movie processing is one of the most time consuming operations faced by animation and multimedia industries. Recently, Grid computing has come up with a solution to this problem as well. An animation company named Axyz Animation used Sun’s Grid Engine along with Globus, to reduce its rendering times by almost 80%. By distributing rendering tasks on the animator’s machines as well as dedicated machines, they were able to reduce rendering times from 10 hours to 2-3 hours. This is an excellent opportunity for Pakistani animation and graphics development companies (like PostAmazers and Trivor) to get a huge advantage over their competitors by using Grids to render their work, and get much faster development times. A new era of multiplayer gaming is about to start with the launch of Electronic Arts “The Sims Online”, which is built on an infrastructure based on Oracle’s Grid technology to allow almost 250,000 gamers to play simultaneously in a virtual world.
With the arrival of this game, and the presence of Sony’s massively multiplayer game “EverQuest”, a new generation of games has arrived, where thousands of gamers can participate in a single game at the same time. (Imagine playing as a single armed unit in an army comprising thousands of other units (each one a separate human player) in Red Alert). As Pakistani game developers begin to flourish (with the release of several 3D games this year, this certainly is the case), there should be an increased attention towards the potential of games powered by the Grid, which can definitely be much more powerful and richer than other contemporary games. This will give Pakistani game developers a definite edge in this field. A major goal of Grid computing is to provide huge amounts of data storage space to meet the requirements of large amounts of data. Currently existing distributed databases can scale to a maximum of 100 Terabytes, but for databases and files of even larger sizes, there is no available database today. With the information deluge expected in the next few years, data storage requirements are bound to increase and Grid computing is being touted as the way to go for storing and cataloging this data. Data storage services will therefore play an important role in the future of computing. With Pakistani businesses running out of ideas for new kinds of software development, data storage services for local and foreign companies can become another field adopted by Pakistani software companies. Storage of various other kinds of data such as population and census data, data for the social sciences, and statistical data about homes and families etc. Fast processing of queries on such large datasets, and efficient cataloging is also made possible very effectively by Grid computing concepts. Thus, GIS ventures in Pakistan looking to store geographic and location-specific information and statistical organizations such as NADRA can use Grid computing to create an effective computing for efficient querying and storage. Grid computing is all about providing various kinds of services to consumers. Pakistani companies can promote Grid computing in Pakistan (and do business at the same time), by offering various kinds of computing services for common users. Many users and even organizations in Pakistan cannot afford fast computing resources. These users can be facilitated by offering them services to carry out such operations over the Grid. They can then be charged nominally in proportion to the amount of resources they have utilized. Of course, to make this possible, it is necessary to set up a Grid infrastructure in the country, especially fast networks, in order to make it feasible to transfer large amounts of data effectively to and from the customers.
Outlook The importance of Grid computing in the future of computing can be gauged by looking at some of the projects being undertaken in this field around the world. A brief overview of some of these projects follows: CERN provides an opportunity for scientists around the world to work on peaceful research on subatomic particle physics in a fully international environment. Through CERN’s World Wide Web (WWW), particle physicists were the first to make remote
collaborations. The WWW made it possible for a scientist working at CERN to maintain contacts with colleagues around the world, and contribute to software development, data analysis and hardware construction. CERN is now a lead player in Grid computing activities because of its need to meet the data storage challenge posed by the LHC. CERN is keen to share the burden of preparing the grid with developing countries. For example, several LHC Grid packages have been offered to India and discussions are in progress for National University of Sciences & Technology (NUST) and National Center for Physics (NCP) participation in LHC grid development activities. Other corporate Grid projects include: • • • •
IBM IntraGrid, aimed at developing Grid test bed linking IBM laboratories International Virtual Data Grid Laboratory (iVDGL) supported by the National Science Foundation (NSF), to create an international Data Grid to enable large-scale experimentation on Grid technologies & applications Oracle 10g, the first Grid-enabled application and database server Global Grid Forum, established as a community standards organization for
Grid computing in Pakistan has yet to really take off. Research on Grid computing is limited to a handful of universities. A pioneer of Grid computing in Pakistan is the National University of Sciences and Technology (NUST), whose collaboration with CERN and Caltech, has led to a significant amount of research being conducted in this field. NUST is now a Grid node in an international Grid started from Melbourne, Australia. It also has the status of a CMS Production Center, which simulates particle physics events for CERN. Some of the completed projects at NUST include the WISDOM project, which aimed at serialization and migration of database objects in the form of XML across globally scattered Grid nodes, and DiAMoNDS (DIstributed Agents for MObile aNd Dynamic Services), whose objective was to use mobile agents based on Jini to act as mobile services over a WAN or a Grid. Some of the projects currently in progress include: • JClarens which aims at creating a Java-based peer-to-peer set of portals for hosting Grid services (especially HEP analysis services), for subsequent lookup and discovery by clients • The Semantic Grid project, which is based on the idea that Grid services should be described by semantic descriptions instead of such syntax…The project, aims at creating a logical extension to the Semantic Web, an idea launched by Tim Berners Lee, the inventor of the Web and head of the W3C. • End-host Monitoring Agent (EMA), which will be utilized for monitoring of end hosts and clients connected to the Grid, and report their status information to a monitoring tool such as MonALISA. • Interactive Analysis Applications for handheld devices, whose objective is to give handheld devices such as the PocketPC the ability to access physics data generated by the Grid, and analyze this data interactively.
The Network Topology Discovery project is meant to provide an application that can quickly describe the topology of an entire WAN.
These projects have enabled NUST to acquire millions of rupees in funding, and allowed them to send several of their students to CERN for knowledge exchange. It has also allowed them to send students for carrying out advanced research leading to PhDs in countries including South Korea, Switzerland and England. Another major development is that NUST has now been included in a consortium of universities pursuing research in analysis of mammograms for breast cancer research using Grids. The huge advantages and prospective benefits from Grid computing outlined above quite clearly indicate that this is one field that Pakistan cannot ignore. The Pakistan government must try to invest as much as possible in this field by devoting its talent and resources. It should also strive to set up an infrastructure that can facilitate computational and data Grids, which especially includes high-speed data networks and ubiquitous Internet connectivity, to make possible data grids capable of processing and sharing massive datasets. It should also encourage work in this field by promoting it as the next step of IT. Clearly, research and development in this field can be a major source of investment and economic development in this country, as well as encourage the use of computing as a utility among the common public.
References: LHC Computing Grid lcg.web.cern.ch/LCG/ Particle Physics Data Grid www.ppdg.net/ Grid Physics Network www.griphyn.org/ Comb-e-Chem www.it-innovation.soton.ac.uk/research/grid/comb_e_chem.shtml Reality Grid www.realitygrid.org/information.html Discovery Net ex.doc.ic.ac.uk/new/ eDiamond e-science.ox.ac.uk/public/eprojects/e-diamond/ SF Express www.cacr.caltech.edu/SFExpress/ Oracle 10g otn.oracle.com/products/database/oracle10g/ Axyz Animation and Sun Grid Engine http://www.embeddedstar.com/press/content/2002/7/embedded4468.html PostAmazers www.postamazers.com/ The Sims Online and the Grid otn.oracle.com/products/oracle9i/ grid_computing/EA_Grid Everquest everquest.station.sony.com/ DataTag datatag.web.cern.ch/datatag/ Internation Virtual Data Grid Laboratory www.ivdgl.org/ Global Grid Forum www.ggf.org/
EuroGrid & Grid Interoperability (GRIP) eurogrid.org Distributed Aircraft Maintenace Environment (DAME) www.informatics.leeds.ac.uk/pages/03_research/rp_dame.htm UK Grid Center grid-support.ac.uk Globus www.globus.org GridLab www.gridlab.org TeraGrid teragrid.org Network for Earthquake Engg. Simulation Grid neesgrid.org Information Power Grid www.ipg.nasa.gov Grid Application Development Software hipersoft.rice.edu/grads Grid Research Integration Development & Support (GRIDS) Center grids-center.org DOE Science Grid doesciencegrid.org DISCOM www.cs.sandia.gov/discom Access Grid www.accessgrid.org