WHITE PAPER: SERVER MANAGEMENT

WHITE PAPER: SERVER MANAGEMENT Enhancing Availability with Configuration Management Using Veritas™ Configuration Manager to More Effectively Manage t...
Author: Guest
2 downloads 0 Views 2MB Size
WHITE PAPER: SERVER MANAGEMENT

Enhancing Availability with Configuration Management Using Veritas™ Configuration Manager to More Effectively Manage the IT Infrastructure and Help Prevent Downtime

White Paper: Server Management

Enhancing Availability with Configuration Management

Contents Executive summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 Clustering and availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5 Beyond clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6 Enhancing availability with Veritas Configuration Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6 Managing configuration drift . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7 Identifying unknown application dependencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8 Providing full-context visibility into change . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8 Example: BEA WebLogic database connection pool configurations in a clustered environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15

Enhancing Availability with Configuration Management

Executive summary Large organizations are expected to operate 24 hours a day, seven days a week in today’s global economy. Any unplanned downtime in mission-critical environments is costly in terms of business productivity and revenue. There are several approaches to meeting the challenge of application availability. Nearly all employ fault-tolerant and redundant hardware components. And many approaches include more flexible and advanced clustering technologies in which several servers (or nodes) are configured so that each node in the cluster can take over in the event of partner node failures. An example of such clustering software is Veritas™ Cluster Server—the number one cross-platform clustering software solution on the market, according to a 2005 IDC study.* All of these approaches are targeted at business-critical application environments, specifically, individual components of those environments, such as the primary database or application server. Current approaches for maintaining the availability of business-critical applications protect against application, server, storage, network, and even site failures, but they are not designed to tackle other threats that cause downtime. Manual changes to the infrastructure, for example, are one of the leading causes of application downtime today and often increase the amount of time required to repair outages. Accordingly, a comprehensive configuration management strategy that targets changes and controls their adverse impact must be a part of the availability solution. Veritas Configuration Manager provides three ways to gain the data and visibility required to more effectively manage change and maintain consistency in modern data centers and reduce application downtime: • Allows system administrators to proactively identify configuration drift between nodes in a cluster, live servers and their “ideal state” gold standards, and each promotional environment (Dev to QA, QA to UAT). • Helps identify potentially unknown and undocumented resource dependencies to support proper availability planning for all resources involved in delivery of the application. • Provides real-time, full-context visibility into the changes occurring in the environment so potential problems can be identified before customers report them and so the business can meet audit obligations. Incorporating a comprehensive configuration management solution into the IT environment allows IT managers to operate with confidence. Applications fail less often, and when they do, they fail over successfully every time. Recovering from disrupted states becomes easier, and future disruptions can be avoided.

* IDC, “Worldwide Clustering and Availability Software 2005 Vendor Shares”, #203676, 10/2006

4

Enhancing Availability with Configuration Management

Clustering and availability Application and server solutions can be employed to address high availability needs at various levels of the infrastructure. For example, at the lowest level, providing limited availability, a server can be used with disk mirroring to enable some redundancy in terms of swappable hard disks. To achieve higher availability, many organizations strive to implement 100 percent component and functional redundancy, so that any hardware failure is transparent to the user, work is not interrupted, and no transactions are lost. If any component fails, the system is still alive and switches over. Technologies like disk mirroring and disk snapshots effectively address physical problems such as a bad block on a disk. However, physical component redundancy may not provide a complete high availability solution for a complex application infrastructure. Levels of data protection must be in place to ensure both information access and data integrity in the event that an interruption does occur. For example, if data corruption is caused by a software error in the I/O device driver or DBMS layer, that corruption will simply be propagated to the remote device. The value of physical component redundancy as a solution is also limited when the distance between a secondary site and a primary processing site is taken into consideration. Clustering is used to minimize downtime in environments where managing availability at the application layer is the primary factor. Various clustering solution architectures exist, but at the highest conceptual level, if a server or application fault is detected in a clustered environment, the application is made available on other servers in the cluster. A failover routine is initiated, and applications and databases are started on available nodes. Veritas Cluster Server provides a solution for reducing both planned and unplanned downtime that helps ensure maximum data and application availability across operating systems, applications, hardware components, and data center locations. By monitoring the status of applications and automatically moving them to available servers in the event of a fault, Veritas Cluster Server can dramatically increase the availability of an application or database. Veritas Cluster Server can detect faults in an application and all of its dependent components, including the associated database, operating system, network, and storage resources. When a failure is detected, it gracefully shuts down the application, restarts it on an available server, connects it to the appropriate storage device, and resumes normal operations. Veritas Cluster Server supports UNIX®, Linux®, Windows®, and virtualization platforms such as VMware and Microsoft Virtual Server and provides out-of-the-box support and protection for the leading enterprise applications and databases, including SAP, Oracle®, BEA WebLogic®, and Microsoft® SQL and Exchange to name a few. Veritas Cluster Server HA/DR provides protection against site failures by enabling automated disaster recovery and integration with major replication technologies.

5

Enhancing Availability with Configuration Management

Beyond clustering While the benefits of clustering mission-critical components are apparent, clustering alone is not sufficient to guarantee high availability for a business application. Increasing complexity within the data center has created new challenges to maintaining application availability—challenges for which existing technologies do not have a solution. To meet the task of ensuring that today’s digital infrastructure is up and running whenever needed, systems must be provisioned, patched, backed up, and running at optimal performance. At the same time, the infrastructure in today’s data center is becoming increasingly complex with heterogeneous server platforms and operating systems, multiple storage devices, and huge numbers of applications. This limits visibility into the IT environment and consequently increases the potential for costly mistakes. As a result, manual changes to the infrastructure are one of the leading causes of downtime today. To promote efficiency in these computing environments, simplifying the administration and management of heterogeneous environments is absolutely essential. The combination of clustering solutions and configuration management can significantly improve an organization’s ability to avoid disruptions and help ensure that when outages do occur, existing technologies will work as expected.

Enhancing availability with Veritas Configuration Manager Veritas Configuration Manager provides three ways to gain the data and visibility required to more effectively manage change and maintain consistency in modern data centers and reduce application downtime: • Allows system administrators to proactively identify configuration drift between nodes in a cluster, live servers and their “ideal state” gold standards, and each promotional environment (Dev to QA, QA to UAT). • Helps identify potentially unknown and undocumented resource dependencies to support proper availability planning for all resources involved in delivery of the application. • Provides real-time, full-context visibility into the changes occurring in the environment so potential problems can be identified before customers report them and so the business can meet audit obligations. This visibility enables more efficient root-cause analysis and greatly reduces the time to repair when change-related downtime occurs.

6

Enhancing Availability with Configuration Management

Managing configuration drift While clustering provides outstanding technological resiliency for mission-critical applications, the need to control the impact of manual changes and enforce consistency across clustered nodes remains critical. Following are some key considerations to help manage configuration drift in high availability environments. First, clustered environments must remain consistent to provide failover protection. In a production environment, applications can stay running on an active node for weeks or months at a time. In the event of an application failover, it is critical that the application configuration be valid so that the application starts successfully on the new node. Configuration files are typically only referenced upon startup, and without attention to changes, irregularities with a configuration may not be discovered until the failover is attempted. The challenge of maintaining multi-tiered enterprise applications is significant considering the great number of software components, libraries, scripts, and support files necessary for a successful application startup. Effective controls must be in place to ensure that this environment is always ready for a failover event. Veritas Configuration Manager provides the capability to store a “snapshot” of the hardware and software configurations and patch levels on a reference or “gold” server so that it can be compared to the running configuration on a regular basis. This validation is a critical assurance that a difference in hardware configuration or patch level will not prevent successful startup of a clustered application. To achieve the highest level of confidence, administrators can regularly compare running configurations against established baselines identified as the “gold standard” for an application or server. These comparisons can be scheduled or run on-demand and provide comprehensive reports identifying any configuration components that deviate from the defined standard. Second, configuration drift also presents problems when trying to manage the application life-cycle process. As applications are moved through promotional environments, different tests are performed with the assumption that the environment in which they are being performed is the same as and/or sufficiently similar to the production environment in which the tests are actually valid. In practice, this is rarely true. Inconsistency between environments is a drag on a company’s ability to roll out new revenue-producing applications and a contributor to production failures. The same capabilities that Veritas Configuration Manager provides to manage consistency in cluster nodes can be applied to comparing servers in production to servers in QA/UAT to help ensure more successful application rollouts, or to servers in disaster recovery facilities to provide a higher degree of confidence in a company’s ability to recover.

7

Enhancing Availability with Configuration Management

Identifying unknown application dependencies Oftentimes, application downtime is caused by factors outside of the clustered resources. For instance, if you have a web application with a J2EE application server, a clustered Oracle back end, and an authentication server, failures could occur in multiple places without causing any problems within the clustered environment. The authentication server could fail outright or begin responding too slowly. Changes that cause failures can also be made to servers not known to be involved in the delivery of an application, such as a web server or database in a different data center, or in extreme cases, a development system underneath someone’s desk. Veritas Configuration Manager can provide visibility into the relationships between all of the software components involved in delivering an application. This allows system administrators to identify resources that depend on the cluster, resources the cluster depends on, and other resources that are required for application availability. With this information, system administrators can add these resources to the cluster and also perform an availability risk assessment to understand where other potential risks reside. Then they can establish high availability strategies as necessary to meet committed Service Level Agreements.

Providing full-context visibility into change Change management is the discipline of ensuring that when change is introduced into the environment, it is done so in a methodical, planned manner and that all reasonable steps to avoid problems are taken. In today’s interconnected, constantly changing data centers, change management is both a necessity and a nuisance. The process is not easy, and many IT administrators avoid it when possible. And when the process is used, it is only as good as the information and expertise applied to it. Changes that are subjected to solid change management practices may still cause outages if dependencies are not fully understood. Until now, change managers have had to enforce the change management process without visibility. They would only find out about unplanned changes when problems arose that could be tracked back to such a change. Veritas Configuration Manager provides the data needed for the change manager, the application owner, and the system administrator to be alerted in real time when a change is occurring in their environment outside of expected change windows. With a strong change-control strategy that includes both real-time change tracking and scheduled consistency checks, potentially disruptive changes can be rolled back before they cause a customer-visible outage. Veritas Configuration Manager’s comprehensive change tracking provides the information necessary for your team to diagnose disruptive events historically so that root cause is established and faults are permanently corrected.

8

Enhancing Availability with Configuration Management

Example: BEA WebLogic database connection pool configurations in a clustered environment Financial analysis and planning often involve viewing summary data in a visual dashboard and then drilling down to transactional detail to explore and analyze the underlying data. Setting up these enterprise-class database connections in a multi-tiered environment can involve several configuration files and hundreds of configuration parameters. Configuring the parameters for implementing an application is straightforward. However, in production, it is not unusual for software performance attributes to be adjusted to accommodate new modules and updated environments. At this point, it is essential to maintain proper change control and ensure that configurations are consistent across servers and environments. In an ideal world, all assumptions and reasons for a change would be recorded as well as a complete description of the desired results. A careful assessment of the change would be performed by personnel involved with the maintenance and delivery who knew enough about the design and its interfaces to avoid adversely affecting other areas of the application. Those responsible for maintaining the high availability architecture also play an important role in ensuring that the change does not adversely affect the ability of the application to fail over and that consistency is maintained across all environments. In today’s fast-paced and complex IT environments, however, these processes still leave room for error. While a team is likely to recover quickly from a configuration problem that affects a production system, seeing the necessary updates through to all high availability environments and clustered configurations poses a formidable challenge. For example, suppose we are creating a web-based analysis and planning application for a large retail company. This application will need to provide access to multiple large transactional data sources simultaneously to a department of financial analysts. Rather than instantiating a new database connection each time one is needed, permanent database connections can be maintained at the application layer using database connection pooling. Reducing unnecessary overhead by reusing database connection objects allows the application to deliver the required high-performance benefits. We’ll use the BEA WebLogic Java™ application server platform and configure the connection pools to attach to Oracle relational databases. The definition for the database connection pools can be found in the JDBCConnectionPool entry in the WebLogic application config.xml file. The Java application will be written to use these connection objects rather than create direct connections to the underlying data sources. Note that this is one of many configuration files used by the application components, and this example focuses on this one set of configuration parameters.

9

Enhancing Availability with Configuration Management



Let’s take a closer look at the ways in which Veritas Configuration Manager provides the control and visibility essential to effectively manage change by updating the TestConnectionsOnReserve and CountOfTestFailuresTillFlush attributes in the WebLogic connection pool definition. Ordinarily, when a database connection comes back online after being unavailable to WebLogic, the refresh process restores the connection quickly. However, in some cases and for some modes of failure, the process of testing for dead connections can impose a long delay. This delay occurs for each dead connection in the connection pool until all connections are replaced. To minimize the delay that occurs during the testing of dead database connections, you can set the CountOfTestFailuresTillFlush attribute on the connection pool so that WebLogic considers all connections in the connection pool dead after a specified number of consecutive test failures and automatically closes all connections in the connection pool. When an application requests a connection, the connection pool creates a connection without first having to test a dead connection, which minimizes the delay for connection requests following the connection pool flush.

10

Enhancing Availability with Configuration Management

The CountOfTestFailuresTillFlush attribute is set in the JDBCConnectionPool entry in the config.xml file. TestConnectionsOnReserve must also be set to true:



Let’s examine how these configuration parameters are managed by Veritas Configuration Manager by looking at an implementation of the Petstore J2EE reference application. Without manual input or prior knowledge, Veritas Configuration Manager automatically builds a comprehensive hardware and software configuration inventory, including the Petstore application along with its dependencies—WebLogic, Java components, and databases (see Figure 1).

Figure 1. Automatic hardware and software inventory

11

Enhancing Availability with Configuration Management

Dependency maps show how applications and servers are related to one another within the overall business service delivery environment and where changes are taking place. This enables more efficient change management by allowing administrators to see and understand the impact of change across servers and tiers. In this example, Veritas Configuration Manager automatically recognizes the Petstore application dependencies on Java components, the WebLogic platform, and a back-end Oracle database (see Figure 2).

Figure 2. Dependency map

12

Enhancing Availability with Configuration Management

Configuration Manager automatically captures the configuration data for discovered applications. In this example, the configuration data for the Petstore application shows the configuration parameters from the application config.xml file including TestConnectionsOnReserve and CountOfTestFailuresTillFlush (see Figure 3).

Figure 3. Configuration data

Veritas Configuration Manager tracks changes to server and software configurations in real time to reduce change-related downtime. It captures what changed, when it changed, and in what order as well as who did it and what’s affected. With this capability, administrators can validate in-process changes and immediately identify unauthorized changes made outside of the change management process.

13

Enhancing Availability with Configuration Management

In Figure 2, the Petstore application block is highlighted in red—an indication that a recent change in that environment occurred. This history is stored in the database and can be viewed in the user interface by application, server, or business service (see Figure 4).

Figure 4. History

Veritas Configuration Manager shows the config.xml file change as well as modifications to individual configuration parameters, which appear as detailed line items. Furthermore, you can drill in to the text file change entry by clicking the DIFFS button to see everything that changed inside of the file (see Figure 5).

Figure 5. Content differences

Veritas Configuration Manager can also compare server and application configuration information to establish baselines, define corporate gold standards, and help ensure consistency and compliance.

14

Enhancing Availability with Configuration Management

In this example, the discovered Petstore application configuration has been saved as a snapshot. This baseline can then be used to confirm running configurations on demand within the user interface or for automated comparisons and exception reports. Configuration inconsistencies are shown via a hierarchical tree view that highlights differences in red, allowing operators to drill in to complex configurations and quickly identify inconsistencies (see Figure 6).

Figure 6. Automatic comparison

Conclusion A comprehensive configuration management strategy that targets people and process issues is critical to maintaining availability. Veritas Configuration Manager gives system administrators the data and visibility they need to more effectively manage change and reduce application downtime. With such a solution in place, IT managers can operate with confidence, knowing that applications will fail less often, and when they do, they will fail over successfully. Recovering from disrupted states becomes easier, and future disruptions can be avoided.

15

About Symantec Symantec is a global leader in infrastructure software, enabling businesses and consumers to have confidence in a connected world. The company helps customers protect their infrastructure, information, and interactions by delivering software and services that address risks to security, availability, compliance, and performance. Headquartered in Cupertino, Calif., Symantec has operations in 40 countries. More information is available at www.symantec.com.

For specific country offices and

Symantec Corporation

contact numbers, please visit

World Headquarters

our Web site. For product

20330 Stevens Creek Boulevard

information in the U.S., call

Cupertino, CA 95014 USA

toll-free 1 (800) 745 6054.

+1 (408) 517 8000 1 (800) 721 3934 www.symantec.com

Copyright © 2007 Symantec Corporation. All rights reserved. Symantec, the Symantec Logo, and Veritas are trademarks or registered trademarks of Symantec Corporation or its affiliates in the U.S. and other countries. Microsoft and Windows are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries. Java is a trademark or registered trademark of Sun Microsystems, Inc. in the United States and other countries. Other names may be trademarks of their respective owners. Printed in the U.S.A. 03/07 11902373

Suggest Documents