USING VPLEX METRO WITH VMWARE HIGH AVAILABILITY AND FAULT TOLERANCE FOR ULTIMATE AVAILABILITY

White Paper USING VPLEX™ METRO WITH VMWARE HIGH AVAILABILITY AND FAULT TOLERANCE FOR ULTIMATE AVAILABILITY Abstract This white paper discusses using...

Author: Bertram Ethelbert Thomas

0 downloads 0 Views 2MB Size

Report

Download PDF

Recommend Documents

USING EMC VPLEX METRO WITH VMWARE HA

VMware High Availability

Deploying VMware High Availability & Fault Tolerance cluster on HA3969U (NFS) Application Note

Maximum Availability Architecture. Oracle Best Practices For High Availability

High Availability Cluster Environments

PowerCenter High Availability & Grid

Monitoring Application Performance and Availability Using Service Availability

ZABBIX & High Availability

WMQ High Availability

JBoss EAP6 High Availability

High Availability for a Citrix MetaFrame Environment Using Double-Take

Using GTID-based Replication for MySQL High Availability

WebLogic Server Overview High Availability

DB2 UDB and High Availability with VERITAS Cluster Server

CA ARCserve Replication and High Availability

Nokia High Availability Design and Implementation Documentation

CA ARCserve Replication and High Availability for UNIX and Linux

Proven SQL Server Architectures for High Availability and Disaster Recovery

Land Availability for Afforestation

CA ARCserve Replication and High Availability for Windows

Availability and Reliability Standards

Pack details and availability

:ARKITEX PREPRESS. High Availability Installation Guide

White Paper

USING VPLEX™ METRO WITH VMWARE HIGH AVAILABILITY AND FAULT TOLERANCE FOR ULTIMATE AVAILABILITY

Abstract This white paper discusses using best of breed technologies from VMware® and EMC® to create federated continuous availability solutions. The following topics are reviewed  Choosing between federated Fault Tolerance or federated High Availability  Design considerations and constraints  Operational Best Practice

November 2012 Revision 1.1

Copyright © 2012 EMC Corporation. All Rights Reserved. EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice. The information in this publication is provided “as is.” EMC Corporation makes no representations or warranties of any kind with respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a particular purpose. Use, copying, and distribution of any EMC software described in this publication requires an applicable software license. For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com.

USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY

2

Table of Contents Executive summary ............................................................................................. 5 Audience ......................................................................................................................... 6 Document scope and limitations................................................................................. 6

Introduction .......................................................................................................... 8 EMC VPLEX technology ..................................................................................... 10 VPLEX terms and Glossary ........................................................................................... 11 EMC VPLEX architecture.............................................................................................. 13 EMC VPLEX Metro overview ........................................................................................ 14 Understanding VPLEX Metro active/active distributed volumes ........................... 15 VPLEX Witness – An introduction................................................................................. 18 Protecting VPLEX Witness using VMware FT .............................................................. 22 VPLEX Metro HA ............................................................................................................ 24 VPLEX Metro cross cluster connect ............................................................................ 24

Uniform and non-uniform access explained .................................................. 26 Uniform and non-uniform I/O access ........................................................................ 26 Uniform access for split clusters (non-VPLEX) ............................................................ 27 Uniform access with active/passive replication (non-VPLEX) ................................ 28 Non-Uniform Access (VPLEX IO access pattern)...................................................... 29 VPLEX with cross-connect and non-uniform mode ................................................. 31 VPLEX with cross-connect and forced uniform mode ............................................ 32

Combining VPLEX HA with VMware HA and/or FT .......................................... 34 vSphere HA and VPLEX Metro HA (federated HA) .................................................. 34 Use Cases for federated HA ....................................................................................... 35 Datacenter pooling using DRS with federated HA.................................................. 35 Avoiding downtime and disasters using federated HA and vMotion .................. 36 Failure scenarios and recovery using federated HA ............................................... 37 vSphere FT and VPLEX Metro (federated FT) ............................................................ 41 Use cases for a federated FT solution ........................................................................ 41 Failure scenarios and recovery using federated FT ................................................. 42 Choosing between federated availability or disaster recovery (or both) ........... 45 Augmenting DR with federated HA and/or FT ......................................................... 47 Environments where federated HA and/or FT should not replace DR ................. 48

Best Practices and considerations for VPLEX HA with VMware HA............... 50 VMware HA and FT best practice requirements ...................................................... 50 Networking principles and pre-requisites .................................................................. 50 vCenter placement options ....................................................................................... 52

USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY

3

Path loss handling semantics (PDL and APD) ........................................................... 53 Cross-connect Topologies and failure scenarios. .................................................... 55 Cross-connect and multipathing ............................................................................... 58 VPLEX site preference rules ......................................................................................... 58 DRS and site affinity rules ............................................................................................. 59

Additional best practices and considerations for VMware FT ...................... 60 Secondary VM placement considerations............................................................... 61 DRS affinity and cluster node count. ......................................................................... 61 VPLEX preference rule considerations for FT............................................................. 62 Other generic recommendations for FT .................................................................... 63

Conclusion ......................................................................................................... 64 References ......................................................................................................... 65 Appendix A - vMotioning over longer distances (10ms) .............................. 67

USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY

4

Executive summary The EMC® VPLEX™ family removes physical barriers within, across, and between datacenters. VPLEX Local provides simplified management and non-disruptive data mobility for heterogeneous arrays. VPLEX Metro and Geo provide data access and mobility between two VPLEX clusters within synchronous and asynchronous distances respectively. With a unique scale-out architecture, VPLEX’s advanced data caching and distributed cache coherency provide workload resiliency, automatic sharing, balancing and failover of storage domains, and enable both local and remote data access with predictable service levels. VMware vSphere makes it simpler and less expensive to provide higher levels of availability for important applications. With vSphere, organizations can easily increase the baseline level of availability provided for all applications, as well as provide higher levels of availability more easily and cost-effectively. vSphere makes it possible to reduce both planned and unplanned downtime. The revolutionary VMware vMotion™ (vMotion) capabilities in vSphere make it possible to perform planned maintenance with zero application downtime. VMware High Availability (HA), a feature of vSphere, reduces unplanned downtime by leveraging multiple VMware ESX® and VMware ESXi™ hosts configured as a cluster, to provide automatic recovery from outages as well as cost-effective high availability for applications running in virtual machines. VMware Fault Tolerance (FT) leverages the well-known encapsulation properties of virtualization by building fault tolerance directly into the ESXi hypervisor in order to deliver hardware style fault tolerance to virtual machines. Guest operating systems and applications do not require modifications or reconfiguration. In fact, they remain unaware of the protection transparently delivered by ESXi and the underlying architecture. By leveraging distance, VPLEX Metro builds on the strengths of VMware FT and HA to provide solutions that go beyond traditional “Disaster Recovery”. These solutions provide a new type of deployment which achieve the absolute highest levels of continuous availability over distance for today’s enterprise storage and cloud environments. When using such technologies, it is now possible to provide a solution that has both zero Recovery Point Objective (RPO) with zero "storage" Recovery Time Objective (RTO) (and zero "application" RTO when using VMware FT). This white paper is designed to give technology decision-makers a deeper understanding of VPLEX Metro in conjunction with VMware Fault Tolerance

USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY

5

and/or High Availability discussing design, features, functionality and benefits. This paper also highlights the key technical considerations for implementing VMware Fault Tolerance and/or High Availability with VPLEX Metro technology to achieve "Federated Availability" over distance.

Audience This white paper is intended for technology architects, storage administrators and EMC professional services partners who are responsible for architecting, creating, managing and using IT environments that utilize EMC VPLEX and VMware Fault Tolerance and/or High Availability technologies (FT and HA respectively). The white paper assumes that the reader is familiar with EMC VPLEX and VMware technologies and concepts.

Document scope and limitations This document applies to EMC VPLEX Metro configured with VPLEX Witness. The details provided in this white paper are based on the following configurations: •

VPLEX Geosynchrony 5.1 (patch 2) or higher

•

VPLEX Metro HA only (Local and Geo are not supported with FT or HA in a stretched configuration)

•

VPLEX Clusters are within 5 milliseconds (ms) of each other for VMware HA

•

Cross-connected configurations can be optionally deployed for VMware HA solutions (not mandatory).

•

For VMware FT configurations VPLEX cross cluster connect is in place (mandatory requirement).

•

VPLEX Clusters are within 5 millisecond (ms) round trip time (RTT) of each other for VMware HA

•

VPLEX Clusters are within 1 millisecond (ms) round trip time (RTT) of each other for VMware FT

•

VPLEX Witness is deployed to a third failure domain (Mandatory). The Witness functionality is required for “VPLEX Metro” to become a true active/active continuously available storage cluster.

•

ESXi and vSphere 5.0 Update 1 or later are used

•

Any qualified pair of arrays (both EMC and non-EMC) listed on the EMC Simple Support Matrix (ESSM) found here: https://elabnavigator.emc.com/vault/pdf/EMC_VPLEX.pdf

USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY

6

•

The configuration is in full compliance with VPLEX best practice found here: http://powerlink.emc.com/km/live1/en_US/Offering_Technical/Tech nical_Documentation/h7139-implementation-planning-vplex-tn.pdf

Please consult with your local EMC Support representative if you are uncertain as to the applicability of these requirements. Note: While out of scope for this document, it should be noted that in addition to all best practices within this paper, that all federated FT and HA solutions will carry the same best practices and limitations imposed by the VMware HA and FT technologies too. For instance at the time of writing VMware FT technology is only capable of supporting a single vCPU per VM (VMware HA does not carry the same vCPU limitation) and this limitation will prevail when federating a VMware FT cluster. Please ensure to review the VMware best practice documentation as well as the limitations and considerations documentation (please see the References section) for further information.

USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY

7

Introduction Increasingly, more and more customers wish to protect their business services from any event imaginable that would lead to downtime. Previously (i.e. prior to VPLEX) solutions to prevent downtime fell into two camps: 1. Highly available and fault tolerant systems within a datacenter 2. Disaster recovery solutions outside of a datacenter. The benefit of FT and HA solutions are that they provide automatic recovery in the event of a failure. However, the geographical protection range is limited to a single datacenter therefore not protecting business services from a datacenter failure. On the other hand, disaster recovery solutions typically protect business services using geographic dispersion so that if a datacenter fails, recovery would be achieved using another datacenter in a separate fault domain from the primary. Some of the drawbacks with disaster recovery solutions, however, are that they are human decision based (i.e. not automatic) and typically require a 2nd disruptive failback once the primary site is repaired. In other words, should a primary datacenter fail the business would need to make a non-trivial decision to invoke disaster recovery. Since disaster recovery is decision-based (i.e. manually invoked), it can lead to extended outages since the very decision itself takes time, and this is generally made at the business level involving key stakeholders. As most site outages are caused by recoverable events (e.g. an elongated power outage), faced with the “Invoke DR” decision some businesses choose not to invoke DR and to ride through the outage instead. This means that critical business IT services remain offline for the duration of the event. These types of scenarios are not uncommon in these "disaster" situations and non-invocation can be for various reasons. The two biggest ones are: 1. The primary site that failed can be recovered within 24-48 hours therefore not warranting the complexity and risk of invoking DR. 2. Invoking DR will require a “failback” at some point in the future which in turn will bring more disruption. Other potential concerns to invoking disaster recovery include complexity, lack of testing, lack of resources, lack of skill sets and lengthy recovery time. To avoid such pitfalls, VPLEX and VMware offer a more comprehensive answer to safeguarding your environments. By combining the benefits of HA and FT, a new category of availability is created. This new type of category provides the automatic (non-decision based) benefits of FT and

USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY

8

HA, but allows them to be leveraged over distance by using VPLEX Metro. This brings the geographical distance benefits normally associated with disaster recovery to the table enhancing the HA and FT propositions significantly. The new category is known as “Federated Availability” and enables bullet proof availability which in turn significantly lessens the chance of downtime for both planned and unplanned events.

USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY

9

EMC VPLEX technology VPLEX encapsulates traditional physical storage array devices and applies three layers of logical abstraction to them. The logical relationships of each layer are shown in Figure 1. Extents are the mechanism VPLEX uses to divide storage volumes. Extents may be all or part of the underlying storage volume. EMC VPLEX aggregates extents and applies RAID protection in the device layer. Devices are constructed using one or more extents and can be combined into more complex RAID schemes and device structures as desired. At the top layer of the VPLEX storage structures are virtual volumes. Virtual volumes are created from devices and inherit the size of the underlying device. Virtual volumes are the elements VPLEX exposes to hosts using its Front End (FE) ports. Access to virtual volumes is controlled using storage views. Storage views are comparable to Auto-provisioning Groups on EMC Symmetrix® or to storage groups on EMC VNX®. They act as logical containers determining host initiator access to VPLEX FE ports and virtual volumes.

Figure 1 EMC VPLEX Logical Storage Structures

USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY

10

VPLEX terms and Glossary Term

Definition

VPLEX Virtual Volume

Unit of storage presented by the VPLEX front-end ports to hosts

VPLEX Distributed Volume

A single unit of storage presented by the VPLEX front-end ports of both VPLEX clusters in a VPLEX Metro configuration separated by distance

VPLEX Director

The central processing and intelligence of the VPLEX solution. There are redundant (A and B) directors in each VPLEX Engine

VPLEX Engine

Consists of two directors and is the unit of scale for the VPLEX solution

VPLEX cluster

A collection of VPLEX engines in one rack.

VPLEX Metro

The cooperation of two VPLEX clusters, each serving their own storage domain over synchronous distance forming active/active distributed volume(s)

VPLEX Metro HA

As per VPLEX Metro, but configured with VPLEX Witness to provide fully automatic recovery from the loss of any failure domain. This can also be thought of as an active/active continuously available storage cluster over distance.

Access Anywhere

The term used to describe a distributed volume using VPLEX Metro which has active/active characteristics

Federation

The cooperation of storage

USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY

11

elements at a peer level over distance enabling mobility, availability and collaboration Automatic

No human intervention whatsoever (e.g. HA and FT)

Automated

No human intervention required once a decision has been made (e.g. disaster recovery with VMware's SRM technology)

USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY

12

EMC VPLEX architecture EMC VPLEX represents the next-generation architecture for data mobility and information access. The new architecture is based on EMC’s more than 20 years of expertise in designing, implementing, and perfecting enterprise-class intelligent cache and distributed data protection solutions. As shown in Figure 2, VPLEX is a solution for vitalizing and federating both EMC and non-EMC storage systems together. VPLEX resides between servers and heterogeneous storage assets (abstracting the storage subsystem from the host) and introduces a new architecture with these unique characteristics: •

Scale-out clustering hardware, which lets customers start small and grow big with predictable service levels

•

Advanced data caching, which utilizes large-scale SDRAM cache to improve performance and reduce I/O latency and array contention

•

Distributed cache coherence for automatic sharing, balancing, and failover of I/O across the cluster

•

A consistent view of one or more LUNs across VPLEX clusters separated either by a few feet within a datacenter or across synchronous distances, enabling new models of high availability and workload relocation

Physical Host Layer A

A

A

A

A

A

Virtual Storage Layer (VPLEX) A

A

Physical Storage Layer

Figure 2 Capability of an EMC VPLEX local system to abstract Heterogeneous Storage

USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY

13

EMC VPLEX Metro overview VPLEX Metro brings mobility and access across two locations separated by an inter-site round trip time of up to 5 milliseconds (host application permitting). VPLEX Metro uses two VPLEX clusters (one at each location) and includes the unique capability to support synchronous distributed volumes that mirror data between the two clusters using write-through caching. Since a VPLEX Metro Distributed volume is under the control of the VPLEX Metro advanced cache coherency algorithms, active data I/O access to the distributed volume is possible at either VPLEX cluster. VPLEX Metro therefore is a truly active/active solution which goes far beyond traditional active/passive legacy replication solutions. VPLEX Metro distributes the same block volume to more than one location and ensures standard HA cluster environments (e.g. VMware HA and FT) can simply leverage this capability and therefore can be easily and transparently deployed and over distance too. The key to this is to make the host cluster believe there is no distance between the nodes so they behave identically as they would in a single data center. This is known as “dissolving distance” and is a key deliverable of VPLEX Metro. The other piece to delivering truly active/active FT or HA environments is an active/active network topology whereby the Layer 2 of the same network resides in each location giving truly seamless datacenter pooling. Whilst layer 2 network stretching is a pre-requisite for any FT or HA solution based on VPLEX Metro, it is outside of the scope of this document. Going forward throughout this document it is assumed that there is a stretched layer 2 network between datacenters where a VPLEX Metro resides. Note: Please see further information on Cisco Overlay Transport Virtualization (OTV) found here http://www.cisco.com/en/US/docs/solutions/Enterprise/Data_Center/DCI/ whitepaper/DCI_1.html and Brocade Virtual Private LAN Service(VPLS) found here http://www.brocade.com/downloads/documents/white_papers/Offering_ Scalable_Layer2_Services_with_VPLS_and_VLL.pdf technology for stretching a layer 2 network over distance.

USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY

14

Understanding VPLEX Metro active/active distributed volumes Unlike traditional legacy replication where access to a replicated volume is either in one location or another (i.e. an active/passive only paradigm) VPLEX distributes a virtual device over distance which ultimately means host access is now possible in more than one location to the same (distributed) volume. In engineering terms the distributed volumes that is presented from VPLEX Metro is said to have “single disk semantics” meaning that in every way (including failure) the disk will behave as one object as any traditional block device would. This therefore means that all the rules associated with a single disk are fully applicable to a VPLEX Metro distributed volume. For instance, the following figure shows a single host accessing a single JBOD type volume:

Datacenter Figure 3 Single host access to a single disk Clearly the host in the diagram is the only host initiator accessing the single volume. The next figure shows a local two node cluster. Cluster of hosts coordinate for access

Datacenter Figure 4 Multiple host access to a single disk As shown in the diagram there are now two hosts contending for the single volume. The dashed orange rectangle shows that each of the nodes is

USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY

15

required to be in a cluster or utilize a cluster file system so they can effectively coordinate locking to ensure the volume remains consistent. The next figure shows the same two node cluster but now connected to a VPLEX distributed volume using VPLEX cache coherency technology.

Cluster of hosts coordinate for access

VPLEX AccessAnywhere™ Datacenter

Datacenter

Figure 5 Multiple host access to a VPLEX distributed volume In this example there is no difference to the fundamental dynamics of the two node cluster access pattern to the single volume. Additionally as far as the hosts are concerned they cannot see any different between this and the previous example since VPLEX is distributing the device between datacenters via AccessAnywhere™ (which is a type of federation). This means that the hosts are still required to coordinate locking to ensure the volume remains consistent. For ESXi this mechanism is controlled by the cluster file system Virtual Machine File System (VMFS) within each datastore. In this case each distributed volume will be imported into VPLEX and formatted with the VMFS file system. The figure below shows a high-level physical topology of a VPLEX Metro distributed device. A

A

A

A

A

A

SITE A

SITE B AccessAnywhere™ A

LINK

A

Figure 6 Multiple host access to a VPLEX distributed volume This figure is a physical representation of the logical configuration shown in Figure 5. Effectively, with this topology deployed, the distributed volume

USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY

16

can be treated just like any other volume; the only difference being it is now distributed and available in two locations at the same time. Another benefit of this type of architecture is “extreme simplicity” since it is no more difficult to configure a cluster across distance that it is in a single data center. Note: VPLEX Metro can use either 8GB FC or native 10GB Ethernet WAN connectivity (where the word link is written). When using FC connectivity this can be configured with either a dedicated channel (i.e. separate non merged fabrics) or ISL based (i.e. where fabrics have been merged across sites). It is assumed that any WAN link will have a second physically redundant circuit. Note: It is vital that VPLEX Metro has enough bandwidth between clusters to meet requirements. EMC can assist in the qualification of this through the Business Continuity Solution Designer (BCSD) tool. Please engage your EMC account team to perform a sizing exercise. For further details on VPLEX Metro architecture, please see the VPLEX HA Techbook found here: http://www.emc.com/collateral/hardware/technicaldocumentation/h7113-vplex-architecture-deployment.pdf

USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY

17

VPLEX Witness – An introduction As mentioned previously, VPLEX Metro goes beyond the realms of legacy active/passive replication technologies since it can deliver true active/active storage over distance as well as federated availability. There are three main items that are required to deliver true "Federated Availability". 1. True active/active fibre channel block storage over distance. 2. Synchronous mirroring to ensure both locations are in lock step with each other from a data perspective. 3. External arbitration to ensure that under all failure conditions automatic recovery is possible. In the previous sections we have discussed 1 and 2, but now we will look at external arbitration which is enabled by VPLEX Witness. VPLEX Witness is delivered as a zero cost VMware Virtual Appliance (vApp) which runs on a customer supplied ESXi server. The ESXi server resides in a physically separate failure domain to either VPLEX cluster and uses different storage to the VPLEX cluster. Using VPLEX Witness ensures that true Federated Availability can be delivered. This means that regardless of site or link/WAN failure a copy of the data will automatically remain online in at least one of the locations. When setting up a single or a group of distributed volumes the user will choose a “preference rule” which is a special property that each individual or group of distributed volumes has. It is the preference rule that determines the outcome after failure conditions such as site failure or link partition. The preference rule can either be set to cluster A preferred, cluster B preferred or no automatic winner. At a high level this has the following effect to a single or group of distributed volumes under different failure conditions as listed below:

USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY

18

Preference Rule / scenario

VPLEX CLUSTER PARTITION

SITE A FAILS

SITE B FAILS

Site A

Site B

Site A

Site B

Site A

Site B

Cluster A Preferred

ONLINE

SUSPENDED

FAILED

SUSPENDED

ONLINE

FAILED

Cluster B preferred

SUSPENDED

No automatic winner

GOOD

BAD (by design) ONLINE

FAILED

ONLINE

GOOD SUSPENDED

FAILED

GOOD

GOOD

BAD (by design)

SUSPENDED (by design)

SUSPENDED (by design)

SUSPENDED (by design)

Table 1 Failure scenarios without VPLEX Witness As we can see in Table 1(above) if we only used the preference rules without VPLEX Witness then under some scenarios manual intervention would be required to bring the volume online at a given VPLEX cluster(e.g. if site A is the preferred site, and site A fails, site B would also suspend). This is where VPLEX Witness assists since it can better diagnose failures due to the network triangulation, and ensures that at any time at least one of the VPLEX clusters has an active path to the data as shown in the table below: Preference Rule

VPLEX CLUSTER PARTITION

SITE A FAILS

SITE B FAILS

Site A

Site B

Site A

Site B

Site A

Site B

Cluster A Preferred

ONLINE

SUSPENDED

FAILED

ONLINE

ONLINE

FAILED

Cluster B preferred

SUSPENDED

No automatic winner

GOOD

GOOD ONLINE

FAILED

ONLINE

GOOD ONLINE

FAILED

GOOD

GOOD

GOOD

SUSPENDED (by design)

SUSPENDED (by design)

SUSPENDED (by design)

Table 2 Failure scenarios with VPLEX Witness As one can see from Table 2 VPLEX Witness converts a VPLEX Metro from an active/active mobility and collaboration solution into an active/active continuously available storage cluster. Furthermore once VPLEX Witness is deployed, failure scenarios become self-managing (i.e. fully automatic) which makes it extremely simple since there is nothing to do regardless of failure condition!

USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY

19

Figure 7 below shows the high level topology of VPLEX Witness

Figure 7 VPLEX configured for VPLEX Witness As depicted in Figure 7 we can see that the Witness VM is deployed in a separate fault domain (as defined by the customer) and connected into both VPLEX management stations via an IP network. Note: Fault domain is decided by the customer and can range from different racks in the same datacenter all the way up to VPLEX clusters 5ms of distance away from each other (5ms measured round trip time latency or typical synchronous distance). The distance that VPLEX witness can be placed from the two VPLEX clusters can be even further. The current supported maximum round trip latency for this is 1 second.

USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY

20

Figure 8 below shows a more detailed connectivity diagram of VPLEX Witness

IMPORTANT / REQUIREMENT!

SEPARATE FAULT DOMAIN!

Figure 8 Detailed VPLEX Witness network layout The witness network is physically separate from the VPLEX inter-cluster network and also uses storage that is physically separate from either VPLEX cluster. As stated previously, it is critical to deploy VPLEX Witness into a third failure domain. The definition of this domain changes depending on where the VPLEX clusters are deployed. For instance if the VPLEX Metro clusters are to be deployed into the same physical building but perhaps different areas of the datacenter, then the failure domain here would be deemed the VPLEX rack itself. Therefore VPLEX Witness could also be deployed into the same physical building but in a separate rack. If, however, each VPLEX cluster was deployed 50 miles apart in totally different buildings then the failure domain here would be the physical building and/or town. Therefore in this scenario it would makes sense to deploy VPLEX Witness in another town altogether; and since the maximum round trip latency can be as much as one second then you could effectively pick any city in the world, especially given the bandwidth requirement is as low as 3Kb/sec.

USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY

21

For more in depth VPLEX Witness architecture details please refer to the VPLEX HA Techbook that can be found here: http://www.emc.com/collateral/hardware/technicaldocumentation/h7113-vplex-architecture-deployment.pdf Note: Always deploy VPLEX Witness in a 3rd failure domain and ensure that all distributed volumes reside in a consistency group with the witness function enabled. Also ensure that EMC Secure Remote Support (ESRS) Gateway is fully configured and the witness has the capability to alert if it for whatever reason, fails. It is important to note that there is no impact to I/O if the witness fails.

Protecting VPLEX Witness using VMware FT Under normal operational conditions VPLEX Witness is not a vital component that is required to drive active/active I/O (i.e. if the Witness is disconnected or lost, I/O still continues).It does however become a crucial component to ensure availability in the event of site loss at either of the locations where the VPLEX clusters reside. If, for whatever reason, the VPLEX Witness was lost and soon after there was a catastrophic site failure at a site containing a VPLEX cluster then the hosts at the remaining site would also lose access to the remaining VPLEX volumes since the remaining VPLEX would think it was isolated because the VPLEX Witness is also unavailable. To minimize this risk, it is considered best practice to disable the VPLEX Witness function if it has been lost and will remain offline for a long time. Another way to ensure availability is to minimize the risk of a VPLEX Witness loss in the first place by increasing the availability of the VPLEX Witness VM running in the third location. A way to significantly boost availability for this individual VM is to use VMware FT to protect VPLEX Witness at the third location. This ensures that the VPLEX Witness remains unaffected at the third failure domain should a hardware failure occur to the ESXi server in the third failure domain that is supporting the VPLEX Witness VM. To deploy this functionality, simply enable ESXi HA clustering for the VPLEX Witness VM across two or more ESXi hosts (in the same location), and once this has been configured right click the VPLEX Witness VM and enable fault tolerance.

USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY

22

Note: The FT configuration on VPLEX Witness must be enforced to reside only within one location and not a stretched / federated FT configuration. The storage that the VPLEX Witness uses should be physically contained within the boundaries of the third failure domain on local (i.e. not VPLEX Metro distributed) volumes. Additionally it should be noted that currently HA alone is not supported, only FT or unprotected.

USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY

23

VPLEX Metro HA As discussed in the two previous sections, VPLEX Metro is able to provide active/active distributed storage, however we have seen that in some cases depending on failure, loss of access to the storage volume could occur if the preferred site fails for some reason causing the non-preferred site to suspend too. Using VPLEX Witness overcomes this scenario and ensures that access to a VPLEX cluster is always maintained regardless of which site fails. VPLEX Metro HA describes a VPLEX Metro solution that has also been deployed with VPLEX Witness. As the name suggests, VPLEX Metro HA effectively delivers truly available distributed storage volumes over distance and forms a solid foundation for additional layers of VMware technology such as HA and FT. Note: It is assumed that all topologies discussed within this white paper use VPLEX Metro HA (i.e. use VPLEX Metro and VPLEX Witness). This is mandatory to ensure fully automatic (i.e. decision less) recovery under all the failure conditions outlined within this document.

VPLEX Metro cross cluster connect Another important feature of VPLEX Metro that can be optionally deployed within a campus topology (i.e. up to 1ms) is cross cluster connect. Note: Cross-connect is a mandatory requirement for VMware FT implementations. This feature pushes VPLEX HA into an even greater level of availability than before since now an entire VPLEX cluster failure at a single location would not cause an interruption to host I/O at either location (using either VMware FT or HA)

USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY

24

Figure 9 below shows the topology of a cross-connected configuration: A

A

SITE A

A

OPTIONAL X – CONNECT

A

A

A

SITE B

AccessAnywhere™ A

LINK

A

VPLEX WITNESS

IP

IP

Figure 9 VPLEX Metro deployment with cross-connect As we can see in the diagram the cross-connect offers an alternate path or paths from each ESXi server to the remote VPLEX. This ensures that if for any reason an entire VPLEX cluster were to fail (which is unlikely since there is no single-point-of-failure) there would be no interruption to I/O since the remaining VPLEX cluster will continue to service I/O across the remote cross link (alternate path) It is recommended when deploying cross-connect that rather than merging fabrics and using an Inter Switch Link (ISL), additional host bus adapters (HBAs) should be used to connect directly to the remote data centers switch fabric. This ensures that fabrics do not merge and span failure domains. Another important note to remember for cross-connect is that it is only supported for campus environments up to 1ms round trip time. Note: When setting up cross-connect, each ESXi server will see double the paths to the datastore (50% local and 50% remote). It is best practice to ensure that the pathing policy is set to fixed and mark the remote paths across to the other cluster as standby. This ensures that the workload remains balanced and only committing to a single cluster at any one time.

USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY

25

Uniform and non-uniform access explained VPLEX is built from the ground up to perform block storage distribution over long distances at enterprise scale and performance. One of the unique core principles of VPLEX that enables this is its underlying and extremely efficient cache coherency algorithms which enable an active/active topology without compromise. Since VPLEX is architecturally unique from other virtual storage products, two simple categories are used to easily distinguish between the architectures.

Uniform and non-uniform I/O access Essentially these two categories are a way to describe the I/O access pattern from the host to the storage system when using a stretched or distributed cluster configuration. VPLEX Metro (under normal conditions) follows what is known technically as a non-uniform access pattern, whereas other products that function differently from VPLEX follow what is known as a uniform I/O access pattern. On the surface, both types of topology seem to deliver active/active storage over distance, however at the simplest level it is only the non-uniform category that delivers true active/active which carries some significant benefits over uniform type solutions. The terms are defined as follows: 1. Uniform access Uniform access is typically based on active/passive technology where all I/O is serviced by only 50% of the available storage controllers in the same physical location (i.e. 50% of the controllers are passive);therefore all I/O is sent to or received from the same location where the active controller resides, hence the term "uniform". Typically this involves "stretching" dual controller active/passive mid-range storage products, but can also be architected by using legacy active/passive replication. In both cases the use of an ISL is typically required so all hosts can access the active storage controllers at the remote location. These two types of uniform access are known as "split cluster" and "replication" uniform access respectively. 2. Non uniform access I/O can be serviced by any available storage controller (100%) at any given location; therefore I/O can be sent to or received from any storage target location, hence the term "non-uniform". This is derived from "distributing" multiple active controllers/directors in

USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY

26

each location and does not require an ISL (although an ISL can be optionally deployed). To understand this in greater detail and to quantify the benefits of non-uniform access we must first understand uniform access.

Uniform access for split clusters (non-VPLEX) Split cluster uniform access works in a very similar way to any legacy dual controller array that uses active/passive storage controllers, the main difference being that the controllers are split apart from each other. In a typical dual controller array setup (i.e. an array without split controllers) a host would generally be connected to both controllers in a HA configuration so if one controller failed the other one would continue to process I/O. However since the secondary storage controller is passive, no write or read I/O can be propagated to it or from it except in a failover condition. It is important to note, for the sake of redundancy, that these types of architectures typically employ synchronous cache mirroring to synchronize any write I/O from the primary controller/director to the secondary, passive controller. If one were to take a dual controller active/passive array and physically split the nodes/controllers apart, it would effectively create what is known as a "split cluster uniform" configuration. This would provide a multi-site configuration whereby the controllers would now be stretched over distance with the active controller/node residing in site A and the secondary passive controller/node residing in site B. In such a configuration, however, there is now only a single controller at each location compromising the local HA ability of the solution since each site has a single point of failure. Another challenge in this setup is to maintain host access to both controllers from either location. Let's suppose we have an ESXi server in site A and a second one in site B. If the only active storage controller resides at A, then we need to ensure that hosts in both sites A and site B have access to the storage controller in site A (uniform access). This is important since if we want to run a host workload at site B we will need an active path to connect it back to the active director in site A since the controller at site B is passive. This may be handled by a standard FC ISL which stretches the fabric across sites. Additionally we will also require a physical path from the ESXi hosts in site A to the passive controller at site B. The reason for this is just in case there is a controller failure at site A, the controller at site B should be able to service I/O.

USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY

27

As noted in the previous section this type of configuration is known as "Uniform Access" since all I/O will be serviced uniformly by the exact same controller for any given storage volume, passing all I/O to and from the same location. The diagram in Figure 10 below shows a typical example of a uniform architecture.

Fabric A – Stretched via ISL

A

A

A

A

A

SPLIT CONTROLLERS Proprietary or Dedicated ISL

Front End (Passive) Cache (Mirrored) Backend (Passive)

Single Controller

SITE A

Front End (Active) Cache (Mirrored) Backend (Mirrored)

Communication

Single Controller

A

Communication

Fabric B Stretched via ISL A

A

A

A

SITE B

A

A

Figure 10 A typical non-uniform layout As we can see in the above diagram, hosts at each site connect to both controllers by way of the stretched fabric; however the active controller (for any given LUN) is only at one of the sites (in this case site A).

Uniform access with active/passive replication (non-VPLEX) Another way to architect a uniform access topology is to use legacy active/passive replication and enhance it with cross site ISLs for remote host access. Since the remote volume is not active and has a different identity to the primary, path management software has to then be used at the host layer to spoof the identity and control the replication failover when required. Although this type of architecture in some ways improves on the split cluster uniform topology since each location can have more than one active controller (in enterprise type arrays), many of the remaining drawbacks previously mentioned still exist, as well as these additional ones:

USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY

28

1. Like the split cluster topology, this is still an active/passive solution and requires additional FC networking between locations. All hosts at the passive location need to access the active storage through some kind of cross site ISL connection. Also similar to the split cluster topology, this will introduce higher response times at the passive site for both read and write I/O as well as increase the bandwidth utilization since data has to traverse the WAN twice. 2. This type of configuration requires deep integration into the host I/O stack and adds complexity as the passive volume needs to have its identity "spoofed". This is due to the fact that (unlike VPLEX) the passive volume has a different WWN and UUID (identity) than the active volume. 3. The host path management software has to be configured and maintained on all of the connected hosts at initial deployment time as well as each time a new volume is added to the configuration. Also since this is a non-standard configuration, only the vendors path management software can be used and therefore it will be host and operating system dependent. 4. Manual intervention is required under some failure scenarios due to APD.

Non-Uniform Access (VPLEX IO access pattern) While VPLEX can be configured to provide uniform access, the typical VPLEX Metro deployment uses non-uniform access. VPLEX was built from the ground up for extremely efficient non-uniform access. This means it has a different hardware and cache architecture relative to uniform access solutions and, contrary to what you might have already read about non-uniform access clusters, provides significant advantages over uniform access for several reasons: 1. All controllers in a VPLEX distributed cluster are fully active. Therefore if an I/O is initiated at site A, the write will be issued to the director in site A directly and then mirrored to B before the acknowledgement is given, and vice versa if the IO is initiated at site B. 2. Since all controllers are active, a cross-connection where hosts at site A connect to the storage controllers at site B is not a mandatory requirement (unless using VMware FT) simplifying the deployment. Additionally, with VPLEX if a cross-connect is deployed, it is only used as a last resort in the unlikely event that a full VPLEX cluster has been lost (this would be deemed a double failure since a single VPLEX cluster has no SPOFs) or the WAN has failed/been partitioned.

USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY

29

3. Non-uniform access is typically more efficient when compared to uniform access since under normal conditions all I/O is handled by the local active controller (all controllers are active). 4. Interestingly, due to the active/active nature of VPLEX, should a full site outage occur VPLEX does not need to perform a failover since the remaining copy of the data was already active. This is another key difference when compared to uniform access since if the primary active node is lost a failover to the passive node is required.

Front End (Active)

Cache (Distributed)

Cache (Distributed)

A

Backend

Backend

A

A

A

Communication

VPLEX Cluster A

A

I P or FC

Front End (Active)

Front End (Active)

Cache (Distributed)

Cache (Distributed)

Backend

VPLEX Cluster B

Front End (Active)

Communication

The diagram below shows a high-level architecture of VPLEX when distributed over a Metro distance:

A

A

Backend

A

A

A

A

SITE B

SITE A Figure 11 VPLEX non-uniform access layout

As we can see in Figure 11, each host is only connected to the local VPLEX cluster ensuring that I/O flow from whatever location is always serviced by the local storage controllers. VPLEX can achieve this because all of the controllers at both sites are in an active state and able to service I/O. Some other key differences to observe from the diagram are: 1. Storage devices behind VPLEX are only connected to each respective local VPLEX cluster and are not connected across the WAN, dramatically simplifying fabric design. 2. VPLEX has dedicated redundant WAN ports that can be connected natively to either 10GB Ethernet or 8GB FC. 3. VPLEX has multiple active controllers in each location ensuring there are no local single points of failure. With up to eight controllers in each location, VPLEX provides N+1 redundancy.

USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY

30

4. VPLEX uses and maintains single disk semantics across clusters at two different locations.

VPLEX with cross-connect and non-uniform mode When using VPLEX Metro with a cross cluster connect configuration (up to 1ms round-trip time) is sometimes referred to as "VPLEX in uniform mode" since each ESXi host is now connected to both the local and remote VPLEX clusters. While on the surface this does look similar to uniform mode it still typically functions in a non-uniform mode. This is because under the covers all VPLEX directors remain active and able to serve data locally, maintaining the efficiencies of the VPLEX cache coherent architecture. Additionally when using cross-connected clusters, it is recommended to configure the ESXi servers so that the cross-connected paths are only standby paths. Therefore even with a VPLEX cross-connected configuration, I/O flow is still locally serviced from each local VPLEX cluster and does not traverse the cross-connect link. The diagram below shows an example of this:

Front End (Active)

Cache (Distributed)

Cache (Distributed)

A

Backend

Backend

A

A

A

Communication

VPLEX Cluster A

A

I P or FC

Front End (Active)

Front End (Active)

Cache (Distributed)

Cache (Distributed)

Backend

VPLEX Cluster B

Front End (Active)

Communication

Paths in standby

A

A

Backend

A

A

A

A

SITE A

SITE B

Figure 12 High-level VPLEX cross-connect with non-uniform I/O access In Figure 12, each ESXi host now has an alternate path to the remote VPLEX cluster. Compared to the typical uniform diagram in the previous section, however, we can still see that the underlying VPLEX architecture differs significantly since it remains identical to the non-uniform layout, servicing I/O locally at either location.

USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY

31

VPLEX with cross-connect and forced uniform mode Although VPLEX functions primarily in a non-uniform model, there are certain conditions where VPLEX can sustain a type of uniform access mode. One such condition is if cross-connect is used and certain failures occur causing the uniform mode to be forced. One of the scenarios where this may occur is when VPLEX and the crossconnect network are using physically separate channels and the VPLEX clusters are partitioned while the cross-connect network remains in place.

Front End (Active)

Cache (Distributed)

Cache (Distributed)

Backend

Backend

A

A

A

A

Partition

A

Communication

VPLEX Cluster A

A

Front End (Passive)

Front End (Passive)

Cache (Distributed)

Cache (Distributed)

Backend

VPLEX Cluster B

Front End (Active)

Communication

The diagram below shows an example of this:

A

A

Backend

A

A

A

SITE B

SITE A

Figure 13 Forced uniform mode due to WAN partition As illustrated in Figure 13, VPLEX will invoke the "site preference rule" suspending access to a given distributed virtual volume at one of the locations (in the case site B). This ultimately means that I/O at site B has to traverse the link to site A since the VPLEX controller path in site B is now suspended due to the preference rule. Another scenario where this might occur is if one of the VPLEX clusters at either location becomes isolated or destroyed. The diagram below shows an example of a localized rack failure at site B which has taken the VPLEX cluster offline at site B.

USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY

32

Cache (Distributed)

Cache (Distributed)

Backend

Backend

A

A

A

Communication

VPLEX Cluster A

A

I P or FC

Front End Front End (offline) (offline) Localized

VPLEX Cluster B

Front End (Active)

A

Communication

Front End (Active)

A

Cache (Distributed)

Cache

rack failure (Distributed)

Backend

A

Backend

A

A

A

A

SITE A

SITE B

Figure 14 VPLEX forced uniform mode due to cluster failure In this scenario the VPLEX cluster remains online at site A (through VPLEX Witness) and any I/O at site B will automatically access the VPLEX cluster at site A over the cross-connect, thereby turning the standby path into an active path. In summary, VPLEX can use ‘forced uniform’ mode as a failsafe to ensure that the highest possible level of availability is maintained at all times. Note: Cross-connected VPLEX clusters are only supported with distances up to 1 ms round trip time.

USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY

33

Combining VPLEX HA with VMware HA and/or FT Due to its core design, EMC VPLEX Metro provides the perfect foundation for VMware Fault Tolerance and High Availability clustering over distance ensuring simple and transparent deployment of stretched clusters without any added complexity.

vSphere HA and VPLEX Metro HA (federated HA) VPLEX Metro takes a single block storage device in one location and “distributes” to provide single disk semantics across two locations. This enables a “distributed” VMFS datastore to be created on that virtual volume. Furthermore, if the layer 2 network has also been “stretched” then a single instance vSphere (including a single logical datacenter) can now also be “distributed” into more than one location and VMware HA can be enabled for any given vSphere cluster! This is possible since the storage federation layer of the VPLEX is completely transparent to ESXi. It therefore enables the user to add ESXi hosts at two different locations to the same HA cluster. Stretching a HA failover cluster (such as VMware HA) with VPLEX creates a “Federated HA” cluster over distance. This blurs the boundaries between local HA and disaster recovery since the configuration has the automatic restart capabilities of HA combined with the geographical distance typically associated with synchronous DR.

A

A

ESX

Distributed ESX HA Cluster

A

VPLEX

A

ESX

A

A

VPLEX

WAN

A

A A

A

IP

IP

Heterogeneous Storage

Heterogeneous Storage

VPLEX WITNESS

SITE A

SITE B

Figure 15 VPLEX Metro HA with vSphere HA

USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY

34

For detailed technical setup instruction please see the VPLEX Procedure generator - Configuring a distributed volume as well as the " VMware vSphere® Metro Storage Cluster Case Study " white paper found here: http://www.vmware.com/files/pdf/techpaper/vSPHR-CS-MTRO-STORCLSTR-USLET-102-HI-RES.pdf for additional information around: •

Setting up Persistent Device Loss (PDL) handling

•

vCenter placement options and considerations

•

DRS enablement and affinity rules

•

Controlling restart priorities (High/Medium/Low)

Use Cases for federated HA A federated HA solution is an ideal fit if a customer has two datacenters that are no more than 5ms (round trip latency) apart and wants to enable an active/active datacenter design whilst also significantly enhancing availability. Using this type of solution brings several key business continuity items into the solution including downtime and disaster avoidance as well as fullyautomatic service restart in the event of a total site outage. This type of configuration would need to also be deployed with a stretched layer 2 network to ensure seamless network capability regardless of which location the VM runs in.

Datacenter pooling using DRS with federated HA Another useful feature of federated HA solutions is the ability for VMware DRS (Dynamic Resource Scheduler) to be enabled and function relatively transparently within the stretched cluster. Using DRS effectively means that the vCenter/ESXi server load can be distributed over two separate locations driving up utilization and using all available, formerly passive, assets. Effectively with DRS enabled, the configuration can be considered as two physical datacenters acting as a single logical datacenter. This has some significant benefits since it brings the ability to utilize what were once passive assets at a remote location into a fully-active state. To enable this functionality DRS can simply be switched on within the stretched cluster and configured by the user to the desired automation level. Depending on the setting, VMs will then automatically start to distribute between the datacenters (For more details, please read

USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY

35

http://www.vmware.com/files/pdf/techpaper/vSPHR-CS-MTRO-STORCLSTR-USLET-102-HI-RES.pdf). Note: A design consideration to take into account if DRS is desired within a solution is to ensure that there are enough compute and network resources at each location to take the full load of the business services should either site fail.

Avoiding downtime and disasters using federated HA and vMotion Another important feature of a federated HA solution with vSphere is the ability to avoid planned downtime as well as unplanned downtime. This is achievable using the vMotion ability of vCenter to move a running VM (or group of VMs) to any ESXi server in another (physical) datacenter. Since the vMotion ability is now federated over distance, planned downtime can be avoided for events that affect an entire datacenter location. For instance, let's say that we needed to perform a power upgrade at datacenter A which will result in the power being offline for 2 hours. Downtime can be avoided since all running VMs at site A can be moved to site B before the outage. Once the outage has ended, the VMs can be moved back to site A using vMotion while keeping everything completely online. This use case can also be employed for anticipated, yet unplanned events. For instance, a hurricane may be in close proximity to your datacenter, this solution brings the ability to move the VMs elsewhere avoiding any potential disaster. Note: During a planned event where power will be taken offline it is best to engage EMC support to bring the VPLEX down gracefully. However, in the event of a scenario where time does not permit (perhaps a hurricane) it may not be possible to involve EMC support. In this case if site A was destroyed there would still be no interruption assuming the VMs were vMotioned ahead of time since VPLEX Witness would ensure that the site that remains online keeps full access to the storage volume once site A has been powered off. Please see the Failure scenarios and recovery using federated HA below for more details.

USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY

36

Failure scenarios and recovery using federated HA This section addresses all of the different types of failures and shows how in each case VMware HA is able to continue or restart operations ensuring maximum uptime. The configuration below is a representation of a typical federated HA solution:

STRETCHED VSPHERE CLUSTER (DRS + HA) ESX A

A

SITE A

A

ESX

optional cross connect

A

A

A

SITE B VPLEX

VPLEX A

A

WAN

IP

IP

VPLEX WITNESS

Figure 16 Typical VPLEX federated HA layout (multi-node cluster)

USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY

37

The table below shows the different failure scenarios and the outcome: Failure

VMs at A

VMs at B

Notes

Storage failure at site A

Remain online / uninterrupted

Remain online / uninterrupted

Cache read miss at sire A now incurs additional link latency, cache read hits remain the same as do write I/O response times

Storage failure at site B

Remain online / uninterrupted

Remain online / uninterrupted

Cache read miss at site B now incurs additional link latency, cache read hits remain the same as do write I/O response times

VPLEX Witness failure

Remain online / uninterrupted

Remain online / uninterrupted

Both VPLEX clusters dial home

All ESXi hosts fail at A

All VMs are restarted automatically on the ESXi host at site B

Remain online / uninterrupted

Once the ESXi hosts are recovered, DRS (if configured) will move them back automatically

All ESXi hosts fail at B

Remain online / uninterrupted

All VMs are restarted automatically on the ESXi host at site A

Once the ESXi hosts are recovered, DRS (if configured) will move them back automatically

Total cross-connect failure (if using crossconnect)

Remain online / uninterrupted

Remain online / uninterrupted

Cross-connect is not normally in use and access remains non-uniform.

Full WAN failure (no cross-connect in place) and VPLEX preference at site A

Remain online / uninterrupted

Distributed volume suspended at B and Persistent Device Loss (PDL) sent to ESX servers at B causing VMs to die. This invokes a HA restart and VMs start coming online at A.

Site B notes only valid for ESXi 5.0 update 1 and above. ESXi versions prior to 5.0 update 1 will require manual intervention for VMs at site B. Use DRS site affinity to avoid manual intervention for older versions.

Full WAN failure (no cross-connect in place) and VPLEX

Distributed volume suspended at A and Persistent Device

Remain online / uninterrupted

Site A notes only valid for ESXi 5.0 update 1 and

USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY

38

preference at site A

Loss (PDL) sent to ESXi servers at A causing VMs to die. This invokes a HA restart and VMs start coming online at B.

above. ESXi versions prior to 5.0 update 1 will require manual intervention for VMs at site A. Use DRS site affinity to avoid manual intervention for older versions.

WAN failure with cross-connect intact

Remain online / uninterrupted

Remain online / uninterrupted

Cross-connect is now in use for the hosts at the "nonpreferred" site. (This is called forced uniform mode.)

Full WAN failure with cross-connect partitioned and VPLEX preference at site A

Remain online / uninterrupted

Distributed volume suspended at B and Persistent Device Loss (PDL) sent to ESX servers at B causing VMs to die. This invokes a HA restart and VMs start coming online at A.

Site B notes only valid for ESXi 5.1 and above. ESXi versions prior to 5.1 (including 5.0 update 1) will require manual intervention for VMs at site B. Use DRS site affinity to avoid manual intervention for older versions. *See note below

Full WAN failure with cross-connect partitioned and VPLEX preference at site B

Distributed volume suspended at A and Persistent Device Loss (PDL) sent to ESXi servers at A causing VMs to die. This invokes a HA restart and VMs start coming online at B.

Remain online / uninterrupted

Site A notes only valid for ESXi 5.1 and above. ESXi versions prior to 5.1 (including 5.0 update 1) will require manual intervention for VMs at site A. Use DRS site affinity to avoid manual intervention for older versions. *See note below

VPLEX cluster outage at A (with cross-connect)

Remain online / uninterrupted

Remain online / uninterrupted

Highly unlikely since VPLEX has no SPOFS. Full site failure more likely.

VPLEX cluster outage at B (with cross-connect)

Remain online / uninterrupted

Remain online / uninterrupted

Highly unlikely since VPLEX has no SPOFS. Full site failure more likely.

USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY

39

VPLEX cluster outage at A (without crossconnect)

ESXi detects an all paths down condition (APD) and VMs cannot continue and are not restarted.

Remain online / uninterrupted

Highly unlikely since VPLEX has no SPOFS. Full site failure more likely.

VPLEX cluster outage at B (without cross-connect)

Remain online / uninterrupted

ESXi detects an all paths down condition (APD) and VMs cannot continue and are not restarted.

Highly unlikely since VPLEX has no SPOFS. Full site failure more likely.

Full site failure at A

Since the VPLEX Witness ensures that the datastore remains online at B, all VMs die (at A) but are restarted automatically at B.

Remain online / uninterrupted

A disaster recovery solution would need a manual decision at this point whereas the VPLEX HA layer ensures fully automatic operation with minimal downtime.

Full site failure at B

Remain online / uninterrupted

Since the VPLEX Witness ensures that the datastore remains online at A, all VMs die (at B) but are restarted automatically at A.

A disaster recovery solution would need a manual decision at this point whereas the VPLEX HA layer ensures fully automatic operation with minimal downtime.

Table 3 Federated HA failure scenarios

Note: In a full WAN partition that includes cross-connect, VPLEX can only send SCSI sense code (2/4/3+5) across 50% of the paths since the crossconnected paths are effectively dead. When using ESXi version 5.1 and above, ESXi servers at the non-preferred site will declare PDL and kill VM's causing them to restart elsewhere (assuming advanced settings are in place); however ESXi 5.0 update 1 and below will only declare APD (even though VPLEX is sending sense code 2/4/3+5). This will result in a VM zombie state. Please see the section Path loss handling semantics (PDL and APD)for more details.

USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY

40

vSphere FT and VPLEX Metro (federated FT) Deploying VMware FT on top of a VPLEX Metro HA configuration goes another step beyond traditional availability (even when compared to federated HA) by enabling a "continuous availability" type of solution. This means that for any failure, there is no downtime whatsoever (zero RPO and zero RTO). The figure below shows a high level view of a federated FT configuration whereby a two node ESXi cluster is distributed over distance and two VMs are configured with secondary VMs at the remote locations in a bidirectional configuration.

Distributed ESX HA Cluster

VPLEX

ESX

VPLEX

WAN A

A

Heterogeneous Storage

IP

IP

Heterogeneous Storage

VPLEX WITNESS

SITE A

SITE B

Figure 17 VPLEX Metro HA with vSphere FT (federated FT)

Use cases for a federated FT solution This type of solution is an ideal fit if a customer has two datacenters that are no more than 1ms (round trip latency) apart (typically associated with campus type distances). If they want to protect the most critical parts of the business at the highest tier enabling continuous availability then an active/active datacenter design can be enabled whereby one datacenter is effectively kept in full lock step with the other. This type of configuration can be thought of as effectively two datacenters configured using RAID-1, where the D in RAID now stands for datacenter rather than disk (Redundant Array of Independent Datacenters).

USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY

41

Similar to federated HA this type of configuration requires a stretched layer 2 network to ensure seamless capability regardless of which location the VM runs in. Note: A further design consideration to take into account here is any limitation that exists with VMware FT compared to HA will also pertain in the federated FT solution. Currently, with vSphere 5.1 and earlier, VMware FT can only support a single vCPU per VM. See the paper here for more details http://www.vmware.com/files/pdf/fault_tolerance_recommendations_con siderations_on_vmw_vsphere4.pdf.

Failure scenarios and recovery using federated FT This section addresses all of the different type of failures and shows how in each case VMware FT is able to keep the service online without any downtime. The configuration below shows a typical federated FT solution using a two node cluster with cross-connect using a physically separate network from the VPLEX WAN. Secondary VMs

Primary VMs

STRETCHED VSPHERE CLUSTER

ESX

SITE A

ESX

cross connect A

A

A

A

A

A

SITE B

VPLEX A

A

WAN

IP

IP

VPLEX WITNESS

Figure 18 Typical VPLEX federated FT layout (2 node cluster)

USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY

42

The table below shows the different failure scenarios and the outcome: Failure

VM State (Assuming primary at A)

VM using Primary or Secondary

Notes

Storage failure at A

Remain online / uninterrupted

Primary

Cache read hits remain the same as do write I/O response time. Cache read miss at A now incurs additional link latency (