Thursday, August 29, 2019

vSphere HA vs vCenter HA



 
Many times I heard my students ask what is the VCHA really and what is different between this feature and vSphere HA?
vSphere HA is a cluster-level feature that can be enabled to increase total availability of VMs inside the cluster and works whenever an ESXi host has been crashed, then HA will move VMs of that failed host to another available resources inside the cluster and reboot them in the new hosts. HA interacts directly to the ESXi HA Agent and will monitor status of each host of a cluster by investigate their heartbeats, So if an network segmentation / partitioning/ downtime is happened and also ESXi cannot provide its heartbeat to the shared datastore, HA will consider the host is failed and execute VM migration operation.
But vCenter HA is a new feature published after release of vSphere6.5 and directly related to the vCenter Server Appliance. It will create a cluster-state of VCSA VM in a triple-node structure: Active node (Primary vCenter server), Passive (Secondary vCenter acting after disaster) and Witness (act as a quorum). It's just about the VCSA availability factor only. vCenter HA is a new vCenter feature that is enabled only for VCSA (Because of PostgreSQL native replication mechanism), also can provide more availability for this mission-critical service inside the virtualization infrastructure.
As VMware said whenever VCHA is enabled, in case of vCenter failure, operation will be revived after 2~4 minutes depends on vCenter config and inventory size. Also VCHA activation process can be done less than 10 minutes.

Now I want to compare these two feature with respect to each related concept of IT infrastructure:

1. Network Complexity:

vCenter HA configuration needs a dedicated network to work and is totally separated from vCenter management network, Then to run VCHA cluster successfully it's required to have only three static IP or dedicated FQDN for assigning to each of cluster node. (I always prefer to choose a /29 subnet for them) After Active node failure, Passive will be automatically handle the vCenter management traffic and users just need to re-login their connections to the vCenter (VPXD through API or Web Client).
But a good vSphere HA operation is highly depends on cluster settings, so you don't need to do more network configuration especially for HA operation. (Just maybe in some situations you may need to separate host management and vMotion port groups based on network throughput)


2. Network Isolation:

In situation where there is a partitioning between hosts of a cluster, if a host cannot send any heartbeat to the shared datastore, it will be considered as a failed host. So HA tries to migrate and reboot all running VMs of that Host to another healthy hosts. I want to emphasis respect to availability of VMs belong to the host cluster there is two mechanisms of checking failures: network connections (between hosts and vCenter) and storage communication (inside the SAN area).
But if there is a network segmentation between vCenter HA nodes, we must care about what's really going on? I mean separation is happened between which nodes of the cluster? If Active-Passive or even Active-Witness nodes are  connected no need to worry, because the active node is still responsible of VI management operation. But what happened if active node is the isolated node?! Operationally it will get out of the VCHA cluster and stop to servicing, now the passive node will continue its job.

3. Multiple failures:

In the case of consecutive failures, if there is enough resources (RAM & CPU) inside the cluster, it can handle this problem, because vSphere HA will migrate VMs more and more to another available ESXi hosts. Just remember you must check out the Admission Control Policy settings respect to handle multiple ESXi failure.
But in vCenter HA, you should know about VCHA is not designed for multiple failures, So after the second failure, the VCHA cluster is not available and functional anymore.
4. Utilization, Performance and Overhead:

There is a little overhead for primary vCenter when VCHA is enabled, especially every time there is too many tasks to do for vCenter Server.
Witness needs the lowest CPU, because there is only VCHA service. Also it's almostly same for Passive node just for VCHA and PostgreSQL. There is no concern for memory usage.
But if you want HA works in its best mode you must pay attention to remaining resources in the cluster because bad HA configuration can make the cluster unstable, So for best performance in whole cluster you need to calculate availability rate based on remained and used physical resource. Specifying at lease two dedicated failover ESXi hosts to encounter against failure can be a suitable HA config.





No comments:

Post a Comment

I will start a new journey soon ...