Undercity of Virtualization
In the deep darkness of IT Infrastructure, there are too much to learn and so many ways to go ...
Saturday, January 28, 2023
I will start a new journey soon ...
Wednesday, November 30, 2022
VMware VDI (Horizon View) Troubleshooting - Part VI
Before everything please accept my apologies because I didn’t post for nearly two months and the primary reason for this shortening is related to the recent catastrophic suppression inside my Homeland, Iran. During these two months through the Iranian protests, the IRGC military forces like the brutal alien mercenaries murdered many of my people, including more than 60 children because they are fighting for their freedom and shouting out the “Woman, Life, Freedom” slogan. This revolution disrupts the life of all Iranian around the world and plunged us into deep anger and sadness. Thanks to the people of other countries for their support, I hope they keep it up until achieving freedom for all.
For an instant clone desktop pool if you change each one of the related vSphere objects, include of the cluster, snapshot, master image, and so on, you should schedule a Maintain operation to push the new snapshot of the modified image, because the desktop pool couldn’t find these previously defined objects and provisioning operations will stop. However, it didn’t stop the provisioned desktops because they are already generated and registered to the related Horizon Connection Servers. But for the next machines, it will prompt errors that will announce it’s not possible to provide new desktops and of course corresponding, VM generation inside the vSphere environment will fail too.
So run the maintain wizard and fix the issue by adding the new candidate snapshot for the future desktop deployment. Attend if you want to modify any vSphere objects that are related to the Horizon too, you should double-check them before the editing.
Changing the AD credential or account expiration will fail the desktop generation in the final steps. However, if you change the related OU structure, it will corrupt new desktop provisioning in the related step of computer account (SID) generation. So like the vSphere modification, you should keep in mind if you change the Active Directory objects values, you should fix them immediately in the corresponding Desktop Pool settings inside the Horizon administration console. In the following error, it prompts that computer account creation has failed.
Review the Domain definitions. For example, all related Active Directory objects like OU and Account credentials. Also, if you provide a delegation control based on specific OU for VDI desktops, you should check its AD privileges.
Review the whole vSphere hierarchy objects that are related to your VDI: Datacenter, Cluster, VM folder, Golden Image VM name and the related Snapshot.
Also review the defined permissions in both mentioned environments (Active Directory/vSphere) if you couldn’t understand the root cause of errors or experience some new infrastructure changes recently.
If you removed or changed the VM’s snapshot manually or through the actions like Disk Consolidation, you should provide a new snapshot based on the required attributes for an Instant Clone desktop pool.
Regardless of the Instant Clone desktop pool benefits (like faster desktop provisioning and deployment) you should attend it’s naturally harder to maintainging and troubleshooting the issues related to the desktops deployed via this method, because of their complex structure. It generates a lot of vSphere objects for the Horizon environment like Internal Template, Replica and Parent VMs and the Clone itself. However, in the case of Full Image, It’s a fully independent VM that is generated for a candidate Template, while the whole preparing procedure has many manual steps and takes more time to go.
Check the Event in the monitoring section. By the way, if you don’t find a related error you have to investigate and deep dive inside the log file in the following default path:
C:\ProgramData\VMware\VDM\logs\
As the last conceptual point, we should understand deeply what's going on inside a service and its related network connections and communication to truly troubleshoot related issues and the VMware Horizon View is not an exception. As an important use case, I saw many times IT staff didn't know how to configure the firewalls between the different sections of a VDI environment, especially whenever we separate the VLANs/Subnets of the desktop pools and management servers. For example Blast Extreme protocol has two sites: As the initial step of connecting the port is TCP/UDP 8443, but in the next step to communication between VDI Client and Desktop/RDSH, it's TCP/UDP 22443. So you should understand the differences in both setup and troubleshooting steps.
This post is presented to Kian Pirfalak, a 9-years old boy that was killed by the regime forces. He was a very creative young boy and dreamed to be a Robotics Engineer in the future. R.I.P our lovely Son.
Tuesday, September 20, 2022
Bursting of new releases for the vSphere 7.0
- Security: Encrypted VM fails to power on Trusted Cluster containing an unattested host in migrating/cloning states, or in the HA/DRS-enabled cluster.
- Networking: VM might lose Ethernet traffic after hot-add, hot-remove or storage vMotion.
- Networking: IPv6 traffic fails to pass through VMkernel ports using IPsec.
- Networking: When upgrading from vSphere6.7 to vSphere7.0 high throughput virtual machines may experience degradation in network performance while NIOC is enabled.
- Storage: VOMA check is not supported for NVMe-based VMFS datastores and will fail with an error.
- Storage: After recovering from APD/PDL conditions, the VMFS datastore with
enabled support for clustered virtual disks might remain inaccessible. The VMkernel log might show multiple
"SCSI3 reservation conflict"
messages. - VSAN: VMs lose connectivity due to a network outage in the preferred site of a vSAN stretched cluster and still stay inaccessible state, while they should failover to the secondary site.
- vCLS: System VMs that are added for ensuring healthy operations of the vSphere Cluster Services, might impact cluster and datastore maintenance workflows in vCenter 7.0 U1.
- vSphere client: Cross vCenter migration of a VM fails with an error:
"The operation is not allowed in the current state".
- VM MGMT: The post customization section of the script runs before the guest customization, if you enable Cloud-Init in a Linux Guest OS.
- VM MGMT: Deploying an OVF or OVA template from a URL fails with a 403 Forbidden error and also maybe a local OVF deployment containing files with non-ASCII characters in their name might fail with an error.
- VM MGMT: You cannot add or modify an existing network adapter on a virtual machine:
Wednesday, August 31, 2022
Which level of authentication?
- SFA: Although the single-factor is the easiest way, it's ideal for situations the user has no options to input his/her credentials into the system. For example, inside an industrial environment where may no keyboard exists for authentication, SFA is the best solution. Most of the SFAs are included with biometric factors like face detection, fingerprinting, or eye scanners. So regardless of SFA nature, it's a secure way, especially for OT networks.
- 2FA: Two-factor authentication is the most popular method because it can easily block more than 90% of threats like Brute-Forcing, Password Guessing, and Library Attacks. Regardless of the possible limitations of any selected solution, I strongly suggest considering the SSO (Single Sign-On) attributes of your 2FA system. I mean 2FA should be compatible with regular authentication services like Microsoft Active Directory, or any common LDAP services. Because it can reduce the rate of general complexity of 2FA/MFA systems. Corresponding to the traditional first level of authentication and also SSO/SAML integration, you can select the suitable option for your infra.
- MFA: Increasing the level of AUTH, will totally increase the security measures. But you shouldn't avoid inhabiting factors like the knowledge of users about how to efficiently use each authentication level of configured MFA system. Next, we should obtain a failover option at each authentication level. For example, a Help Desk team quickly reset TOTP paired tokens for problematic users. Or select a backup solution if a user loses their hardware token or lost his/her smartphone with the software(App) token. Mixing the hardware/software solutions and using an easy-to-use biometric method like an eye-scanner inside the hard-working areas of your organization, as the replacement of regular TOTP-based Tokens.
Employment of all AUTH levels from the same provider can increase integration and compatibility rate, but make the system vulnerable to unknown architectural bugs like the Zero-Days of company products. Combining with a 3rd-party solution increases both the security and complexity properties of the design, so you should consider all aspects of MFA system maintenance.
In the conclusion, I think before constructing the final authentication system, check the integration rate of each selected AUTH level with their similar solutions, and then consider a backup way if you lose physical/logical access to the provided tokens.
Congrats, and enjoy your MFA solution.
Monday, August 22, 2022
Configure ESXi SNMPv3 via PowerCLI
In another post, I described configuring SNMPv3 via VMware ESXi ESXCLI command line. In this post, I want to combine and run the esxcli with the powercli cmdlets to make it an automated procedure to get the value of corresponding ESXi hosts inside the vSphere environment and set the required SNMP(v3) configuration. If you aren't aware of how to connect to the vCenter via PowerCLI, read this first.
As the initial step I get a list of the ESXi hosts and put them inside a for loop, then call the esxcli inside the PowerCLI.
In the next step, I recommend providing arguments, including each field of ESXi SNMP v3 configs. At last, we can then set command via invoking the filled arguments. Now the configuration has been run on each ESXi host selected by the condition (via the Where-Object cmdlet).
You can also check the accuracy of the result via running the esxcli system snmp get through the ESXi shell, or $esxcli.system.snmp.get.invoke() inside the PowerCLI connection.
Sunday, July 31, 2022
Investigation around the vSphere objects via PowerCLI
There was a missing VM inside the cluster that led to losing it and we couldn't understand what happened or whether it belongs to which ESXi host. I should mention it's about an enterprise environment that sadly has no logging solution such as vRealize Log Insight (vRLI) or 3rd-Party solution like Splunk. So there is no way of sorting, filtering, and searching between thousand of daily logs, just the vSphere itself: Monitor\Event section. But we couldn't reach any cause of this and sadly there was no time to inspect the Log files of all ESXi hosts of this cluster to find out what exactly occurred. However, I guessed there is a wrong VM re-naming that suddenly happened by a Help Desk staff without announcing to any vSphere Admins (Although it's a wrong access definition/granting for them because we should remove this privilege from their permission list). So I decided to inspect the details of Log files via PowerCLI through the running of the Get-VIevent cmdlet.
However, this problem forces me to post some use cases for working with this useful PowerCLI cmdlet. In the following I will show you some practical examples:
1. As the first sample, you can watch the result of all events in the Warning severity level by running this:
2. In the second example, I ran a little more complex filter based on the start time which Event Type ID is like this 'com.vmware.vc.authorization*'. It can also be included ending date with -finish syntax.
3. As the last one, you can see I ran the command against a cluster object named "CLS" where the log message included a word like "Vm" and the result is shown in PowerShell GridView.
There are many other possible methods of mixing and pipe-lining cmdlet to get the expected results. It just needs a little patience and understanding of whatever you want to do. I hope you always will be in a good situation in your Log management system.
Saturday, July 16, 2022
Desktop Pool deployment failure factors
While there are many reasons to stop the VDI deployment, I want to investigate some cases of vDesktop provisioning failures and how to achieve a fast way to resolve or a method of bringing back the provisioning operation to its normal mode. Especially for an Instant Clone Desktop Pool, because of its complicated architecture in front of Full Clone type, we can experience deployment failure, and sadly it's not possible to change easily all settings of this desktop pool. Modifying the Golden Image needs to do maintenance actions and run the publish wizard, so any modification can lead to an unstable state of vDesktop generation. So let's go to check most of the situations:
- Accounts: Generally two types of credentials are required in the construction of Instant Clone: First an account for accessing the vSphere infrastructure that can be a part of vCenter SSO or any other connected LDAP repository. The second one is a part of the AD domain account to join the OS of deployed virtual machines. Modifying each one of them in any maintenance interval without informing the VDI Admin team may lead the vDesktop deployment to stop: VM deployment failure (vCenter account) or VDI error (AD account).
- Directories: Renaming the VM Folder or changing its hierarchy can cause to loss of the Reference VM and then fail the new deployment. However you can find the new placement path through Publish wizard still, it will stop the virtual desktop recover option by the way. In this situation you should know it's not possible to edit the Desktop Pool easily, thus as a good recommendation, first define the VM folder structure and precedence, then create the required desktop pools based on design. Although it's the same story for AD OU changes, it's easier to set the corresponding OU path inside the desktop pool edit section.
- Privileges: There are some necessary permissions for successfully creating a Virtual Desktop. Part one: vCenter privileges on each level of vSphere's objects hierarchy for automatically virtual machine deployment inside the Cluster and put it on the corresponding VM folder. Part two: AD privileges for computer object creation inside the considered OU. Both procedures have their own required permissions. Comply these notes always: Do not modify the Horizon Connection Server considered permissions that are defined in the vSphere environment, and do not change the granted access for the VDI account that is authorized for Domain joining. It's good to create a vSphere role with the required privilege to grant required permissions to Horizon Service Account. For AD accounts, do not set a higher administrator level than is required. I think it's enough to delegate the control with the required AD permission for the mentioned account at the corresponding OU level.
- Defined Assets: If you change some primary vSphere components that are selected as part of Instant Clone desktop deployment, like the Cluster and Shared Datastores, this action may break the line of new vDesktop generation without the possibility of knowing exactly what's happened. Of course, you can investigate inside the Horizon details logs to know what's going on, like checking this path (C:\Program Files\VMware\VMware View\Server\Broker\Logs) but it's a complicated and time-consuming troubleshooting operation. So as a good recommendation, define a naming pattern for each type of the vSphere object and configure them all, before running the VDI construction.
- Name Resolution: Whenever you are using FQDN instead of IP address, changing the naming convention method or each of the VDI-related DNS records may lead Connection Server, Domain Controller, Event Database Server, and vCenter Server lost each other. The best practices of this section told us to define all servers preferably by their DNS names. For example, it's not possible to change the defined vCenter Server while there is just a related desktop pool (it means never!). Now if you decide to change the network subnets, it's enough to update the DNS cache to resolve the vCenter Server address. However be careful if you define an Alias name or CNAME record for the Horizon Server definition, never wiped them.
In this post, I tried to mention some of the most potential failure factors. However, there are a lot of reasons for the vDesktop provisioning failure that you may encounter with them in future, like virtual machine snapshot issues (I think I should speak about them in another post). Before starting the VDI project it's highly recommended to construct the server and datacenter virtualization infrastructure carefully, with the power of scalability to avoid unnecessary changes, especially object renaming or changing directory patterns and so on.
-
One of my students, asks me about the difference between vpxa & hostd. hostd (daemon) is responsible for the performing main manageme...
-
All of you maybe see a file name like ".sdd.sf " in the ESXi root directory of each VMFS volume especially when you connect via ...
-
FDM agent is a part of vSphere HA to monitor availability of the ESXi host and its VMs and also power operations of that protected VMs in f...