VMware Solutions and Business Continuity – Availability Fault Domain Manager (FDM) Fault Tolerance (FT) Overview Additional Resources Q&A
VMware Solutions and Business Continuity VMware Offers Protection at Every Level
3
VMware HA VMware HA:
• Provides automatic restart of virtual machines in case of physical host failures • No need for passive standby hardware and dedicated administrators • Support for virtual machine failures monitoring it, no additional software required VMware HA has several advantages over traditional failover solutions, including:
• Minimal setup • Reduced complexity • Reduced hardware cost and setup • Increased application availability without the expense of additional failover hosts • VMware DRS and vMotion integration
4
VMware HA in Action
LUN 6 LUN 5
LUN 4 LUN 3 LUN 2
LUN 1
virtual machine A
virtual machine B
virtual machine A
virtual machine C
virtual machine E
virtual machine B
virtual machine D
virtual machine F
host
host
host
vCenter Server
5
Architecture of a VMware HA Cluster
vCenter Server
vCenter Server agent
host
6
VMware HA agent
host
vCenter Server agent
vCenter Server agent
VMware HA agent
VMware HA agent
host
Enabling VMware HA
Enable VMware HA by creating a cluster or modifying a DRS cluster.
7
Configuring VMware HA Settings Disable Host Monitoring when performing maintenance activities on the host.
Admission control helps ensure sufficient resources to provide high availability.
8
Admission Control Policy Choices
9
Policy
Description
Recommended use
No. of host failures
Reserves enough resources to tolerate specified number of host failures
When virtual machines have similar CPU and memory reservations
% of resource reserved
Reserves specified percentage of total capacity
When virtual machines have highly variable CPU and memory reservations
Specific failover host
Dedicates a host exclusively for failover service
To accommodate organizational policies that dictate the use of a passive failover host
Configuring Virtual Machine Options Configure options at the cluster level or per virtual machine.
VM restart priority determines relative order in which virtual machines are restarted after a host failure. Host Isolation response determines what happens when a host loses the management (or service console) network but continues running.
10
Configuring Virtual Machine Monitoring
Restart a virtual machine if its VMware Tools Heartbeats are not received.
Determine how quickly failures are detected.
Set monitoring sensitivity for individual virtual machines.
11
Configuring Virtual Machine Monitoring
12
Fault Domain Manager/vSphere HA 5.0
13
FDM Architecture:
FDM uses a FDM agent: • The agent on one host is the master and on all other hosts slaves • Uses both the mgmt. network and storage devices for communication • It also introduces IPv6 support • New configuration mechanism reduces the cluster configuration time to ~1 minute • When HA is enabled, all FDM agents participate in an election to choose the master
Master is elected when: • vSphere HA is enabled • A master host fails • A management network partition occurs
Heartbeat Datastore: • A new feature of FDM is its ability to use Heartbeat Datastores • Datastores will be used to communicate only when the management network is lost
14
FDM Master and Slave Views:
Master View
15
Slave View
Heartbeat Datastores We can select datastore only when hosts are in the cluster.
16
•
Using the datastore as a method of communication allows a master to:
•
Monitor availability of slave hosts and the VMs running on them
•
Determine whether a host has become Network Isolated rather than Network Partitioned
•
Coordinate with other masters, who will coordinate VM ownership thru datastore communication
HA States
A new host property to report the HA state of a host Possible States are: • N/A (HA not configured) • Election (Master election in progress) • Master (Can be more than one) • Connected (to master over network) • Network partitioned • Network isolated • Dead • Agent unreachable • Initialization error • Unconfig error
17
Fault Tolerance
18
What Is FT? FT: • Provides a higher level of business continuity than VMware HA • Provides zero downtime and zero data loss for applications
FT can be used for: • Any application that needs to be available at all times • Custom applications that have no other way of doing clustering • Cases where high availability might be provided through Microsoft Cluster Service but MSCS is too complicated to configure and maintain
FT can be used with DRS: • Fault-tolerant virtual machines benefit from better initial placement and are included in the cluster’s load-balancing calculations