Thursday, July 22, 2021

An unexpected error while VM migration!

In one of my projects, we forced to change some of the AD infrastructure and re-define the new established DNS servers for the vSphere environment. However, I should mentioned before there was no FQDN/IP address variation, because this action was just a server replacement with same configuration. Two days after establishing the normal situation, one of my team member on the virtualization support group told me we have a problem with the vMotion and also manually VM-migration. He said if we want to move a VM between the ESXi hosts inside the primary cluster, it will show irrelevant errors in the compatibility checking section of  migration operation.

All hosts of this cluster were healthy in that moment and, all of them have the same ESXi version/Build Number too. We reviewed the corresponding ESXi checklist to be ensure nothing special incidents was happened around the hypervisor operations and after that we worked on the following items to find the cause of this issue and fix it:

  1. Firewall settings on the source and destination ESXi hosts.
  2. VMkernel interface's TCP/IP settings (IP Address, Subnet Mask) and also Default Gateway.
  3.  Firewall configuration on the new DCs and DNS servers.
  4. Capturing traffics between two hosts via tcpdump-uw and analyzing the results.
  5. Re-configuration of HA and DRS settings (I couldn't accept the risk of cluster re-construction, because of its highly operational role in the availability of the infrastructure services).
  6. Removing both of these ESXi hosts and bringing back them again to the cluster.

But the problem still remained. Even we restored the final VCSA backup, but sadly I saw there is no success (Of course I didn't replace it with the current vCenter's VM). As the final thought with desperation, I decided to upgrade the vCenter Server to a newer version, because I remembered that its current version was 6.7 U1(6.7.0.20000). So I tried to upgrade it to the 6.7 U3m (6.7.0.47000) . Strangely after the successful upgrading we saw everything is working fine one more time, and like the old normal circumstance, virtual machine migration is possible again.

As the conclusion, in similar situation many of us experienced that stable versions those give us a good feedback and results, can help the IT tech staffs to fix many unreasonable and irrational issues about the infrastructure services. So never and ever forget to have an outstanding plan for patching hotfix and upgrading operations.

1 comment:

I will start a new journey soon ...