Saturday, January 28, 2023

I will start a new journey soon ...

In this year 2023, I want to start a new roadmap alongside my old experiences and focused documentation on the Virtualization area with respect to VMware Products. As the topics of the new posts. I will discuss more Linux Security and Hardening Open-Source solutions. So I will post two of them soon. Keep waiting and forgive my absence in recent months. As you may know, I am still in anger mode because of the recent severe suppression against the revolutionary actions of Iranian society, especially the movement of Woman, Life, Freedom. So I will start writing again.


Wednesday, November 30, 2022

VMware VDI (Horizon View) Troubleshooting - Part VI

Before everything please accept my apologies because I didn’t post for nearly two months and the primary reason for this shortening is related to the recent catastrophic suppression inside my Homeland, Iran. During these two months through the Iranian protests, the IRGC military forces like the brutal alien mercenaries murdered many of my people, including more than 60 children because they are fighting for their freedom and shouting out the “Woman, Life, Freedom” slogan. This revolution disrupts the life of all Iranian around the world and plunged us into deep anger and sadness. Thanks to the people of other countries for their support, I hope they keep it up until achieving freedom for all. 

 

After a long time I decided to continue the VDI Tshoot series (Part5 is te last one). From the beginning of introducing VMware Horizon View, the Instant Clone desktop pool is one of the three major types of Desktop Provisioning through this VDI solution. However, the Linked Clone type is deprecated in the recent versions and the real question of this topic is which one of the remaining types is suitable for your Company’s VDI deployment and why? Instant clone or Full Clone?

For an instant clone desktop pool if you change each one of the related vSphere objects, include of the cluster, snapshot, master image, and so on, you should schedule a Maintain operation to push the new snapshot of the modified image, because the desktop pool couldn’t find these previously defined objects and provisioning operations will stop. However, it didn’t stop the provisioned desktops because they are already generated and registered to the related Horizon Connection Servers. But for the next machines, it will prompt errors that will announce it’s not possible to provide new desktops and of course corresponding, VM generation inside the vSphere environment will fail too.

So run the maintain wizard and fix the issue by adding the new candidate snapshot for the future desktop deployment. Attend if you want to modify any vSphere objects that are related to the Horizon too, you should double-check them before the editing.

Changing the AD credential or account expiration will fail the desktop generation in the final steps. However, if you change the related OU structure, it will corrupt new desktop provisioning in the related step of computer account (SID) generation. So like the vSphere modification, you should keep in mind if you change the Active Directory objects values, you should fix them immediately in the corresponding Desktop Pool settings inside the Horizon administration console. In the following error, it prompts that computer account creation has failed.


Although the Instant Clone type is an efficient method for desktop generation, possible maintenance operations are not too much easy. To avoid issues like these mentioned ones, you can review the following consideration checklist. Most of them are based on my personal feedback in different VDI projects, so it’s my pleasure to read and add your similar experiences:

  1. Review the Domain definitions. For example, all related Active Directory objects like OU and Account credentials. Also, if you provide a delegation control based on specific OU for VDI desktops, you should check its AD privileges.

  2. Review the whole vSphere hierarchy objects that are related to your VDI: Datacenter, Cluster, VM folder, Golden Image VM name and the related Snapshot.

  3. Also review the defined permissions in both mentioned environments (Active Directory/vSphere) if you couldn’t understand the root cause of errors or experience some new infrastructure changes recently.

  4. If you removed or changed the VM’s snapshot manually or through the actions like Disk Consolidation, you should provide a new snapshot based on the required attributes for an Instant Clone desktop pool.

  5. Regardless of the Instant Clone desktop pool benefits (like faster desktop provisioning and deployment) you should attend it’s naturally harder to maintainging and troubleshooting the issues related to the desktops deployed via this method, because of their complex structure. It generates a lot of vSphere objects for the Horizon environment like Internal Template, Replica and Parent VMs and the Clone itself. However, in the case of Full Image, It’s a fully independent VM that is generated for a candidate Template, while the whole preparing procedure has many manual steps and takes more time to go.

  6. Check the Event in the monitoring section. By the way, if you don’t find a related error you have to investigate and deep dive inside the log file in the following default path:

    C:\ProgramData\VMware\VDM\logs\

As the last conceptual point, we should understand deeply what's going on inside a service and its related network connections and communication to truly troubleshoot related issues and the VMware Horizon View is not an exception. As an important use case, I saw many times IT staff didn't know how to configure the firewalls between the different sections of a VDI environment, especially whenever we separate the VLANs/Subnets of the desktop pools and management servers. For example Blast Extreme protocol has two sites: As the initial step of connecting the port is TCP/UDP 8443, but in the next step to communication between VDI Client and Desktop/RDSH, it's TCP/UDP 22443. So you should understand the differences in both setup and troubleshooting steps. 

This post is presented to Kian Pirfalak, a 9-years old boy that was killed by the regime forces. He was a very creative young boy and dreamed to be a Robotics Engineer in the future. R.I.P our lovely Son.

 





Tuesday, September 20, 2022

Bursting of new releases for the vSphere 7.0

 I think the recent release of the vSphere (Version7.0 Update3) is one of the VMware Products that have many patch releases in a short duration. (8 versions from Oct 2021 until now) Regardless of the reasons that were usually based on the security weakness of each release, it brings us a clear conclusion: Every new product although passed many complex security tests and considerations, can include recent zero days and deep breaches inside their architecture. Log4j vulnerability proved this fact that an old behavior of a service or solution can be a perfect target for Hackers because most of the time an unknown reaction or response to a complex request maybe leads the whole system to an unstable status.

In recent years, most Unix/Linux Based OS and services are the targets of new attacks and other types of threats like ransomware. So the VMware products are not excluded from these critical risks. It's natural to see many new patches in a short time. However, it's not a reason to avoid providing a well-done designed plan that is nicely scheduled for protecting against new disasters like announcing a new vulnerability. I think if we are forced to check the new build numbers of ESXi or vCenter server weekly, it should be part of our IT staff's primary tasks.
Some releases include of many security fixes, like the vCenter Server 7.0U3f:
 

In this post, I want to mention to some of the known issues in the recent releases of vSphere 7.0 U3 Patches:
  1. Security: Encrypted VM fails to power on Trusted Cluster containing an unattested host in migrating/cloning states, or in the HA/DRS-enabled cluster.
  2. Networking: VM might lose Ethernet traffic after hot-add, hot-remove or storage vMotion.
  3. Networking: IPv6 traffic fails to pass through VMkernel ports using IPsec.
  4. Networking: When upgrading from vSphere6.7 to vSphere7.0 high throughput virtual machines may experience degradation in network performance while NIOC is enabled.
  5. Storage: VOMA check is not supported for NVMe-based VMFS datastores and will fail with an error.
  6. Storage: After recovering from APD/PDL conditions, the VMFS datastore with enabled support for clustered virtual disks might remain inaccessible. The VMkernel log might show multiple "SCSI3 reservation conflict" messages.
  7. VSAN: VMs lose connectivity due to a network outage in the preferred site of a vSAN stretched cluster and still stay inaccessible state, while they should failover to the secondary site.
  8. vCLS: System VMs that are added for ensuring healthy operations of the vSphere Cluster Services, might impact cluster and datastore maintenance workflows in vCenter 7.0 U1.
  9. vSphere client: Cross vCenter migration of a VM fails with an error: "The operation is not allowed in the current state".
  10. VM MGMT: The post customization section of the script runs before the guest customization, if you enable Cloud-Init in a Linux Guest OS.
  11. VM MGMT: Deploying an OVF or OVA template from a URL fails with a 403 Forbidden error and also maybe a local OVF deployment containing files with non-ASCII characters in their name might fail with an error.
  12. VM MGMT: You cannot add or modify an existing network adapter on a virtual machine: 

Although it's just a part of the whole story and you should read the full document of VMware vCenter Server 7.0 U3g Release Notes, about the known issues to truly understand what actions and workarounds are required to do, or which update you should run to fix them. I always prefer to execute the CLI way instead of GUI methods:
 

 

Wednesday, August 31, 2022

Which level of authentication?


 Did you ever ask yourself which level of authentication is required to secure the infrastructure access? Recently the Multi-Factor Authentication (MFA) solutions are one of the greatest ways of securing every IT infrastructure and also improving the identity management platforms. Regardless of each action for IT security hardening, MFAs are the most powerful way to increase the organizational user's authentication procedure, especially whenever you mix it with PAM/IAM solutions. But what information is required to understand and select a suitable solution? First of all, we need to decide exactly which level of authentication is based on our network architecture, types of internal or external incoming authentication requests, and risk level of corresponding data/service through this method of access. Truly based on each authentication level consideration, you can decide how many AUTH factors and finally which solution is the best for you:

  1. SFA: Although the single-factor is the easiest way, it's ideal for situations the user has no options to input his/her credentials into the system. For example, inside an industrial environment where may no keyboard exists for authentication, SFA is the best solution. Most of the SFAs are included with biometric factors like face detection, fingerprinting, or eye scanners. So regardless of SFA nature, it's a secure way, especially for OT networks.
  2. 2FA: Two-factor authentication is the most popular method because it can easily block more than 90% of threats like Brute-Forcing, Password Guessing, and Library Attacks. Regardless of the possible limitations of any selected solution, I strongly suggest considering the SSO (Single Sign-On) attributes of your 2FA system. I mean 2FA should be compatible with regular authentication services like Microsoft Active Directory, or any common LDAP services. Because it can reduce the rate of general complexity of 2FA/MFA systems. Corresponding to the traditional first level of authentication and also SSO/SAML integration, you can select the suitable option for your infra.
  3. MFA: Increasing the level of AUTH, will totally increase the security measures. But you shouldn't avoid inhabiting factors like the knowledge of users about how to efficiently use each authentication level of configured MFA system. Next, we should obtain a failover option at each authentication level. For example, a Help Desk team quickly reset TOTP paired tokens for problematic users. Or select a backup solution if a user loses their hardware token or lost his/her smartphone with the software(App) token. Mixing the hardware/software solutions and using an easy-to-use biometric method like an eye-scanner inside the hard-working areas of your organization, as the replacement of regular TOTP-based Tokens.

 Employment of all AUTH levels from the same provider can increase integration and compatibility rate, but make the system vulnerable to unknown architectural bugs like the Zero-Days of company products. Combining with a 3rd-party solution increases both the security and complexity properties of the design, so you should consider all aspects of MFA system maintenance.

 In the conclusion, I think before constructing the final authentication system, check the integration rate of each selected AUTH level with their similar solutions, and then consider a backup way if you lose physical/logical access to the provided tokens. 

 Congrats, and enjoy your MFA solution.

Monday, August 22, 2022

Configure ESXi SNMPv3 via PowerCLI

In another post, I described configuring SNMPv3 via VMware ESXi ESXCLI command line. In this post, I want to combine and run the esxcli with the powercli cmdlets to make it an automated procedure to get the value of corresponding ESXi hosts inside the vSphere environment and set the required SNMP(v3) configuration. If you aren't aware of how to connect to the vCenter via PowerCLI, read this first.

 As the initial step I get a list of the ESXi hosts and put them inside a for loop, then call the esxcli inside the PowerCLI.

 

In the next step, I recommend providing arguments, including each field of ESXi SNMP v3 configs. At last, we can then set command via invoking the filled arguments. Now the configuration has been run on each ESXi host selected by the condition (via the Where-Object cmdlet).

You can also check the accuracy of the result via running the esxcli system snmp get through the ESXi shell, or $esxcli.system.snmp.get.invoke() inside the PowerCLI connection.


Sunday, July 31, 2022

Investigation around the vSphere objects via PowerCLI

There was a missing VM inside the cluster that led to losing it and we couldn't understand what happened or whether it belongs to which ESXi host. I should mention it's about an enterprise environment that sadly has no logging solution such as vRealize Log Insight (vRLI) or 3rd-Party solution like Splunk. So there is no way of sorting, filtering, and searching between thousand of daily logs, just the vSphere itself: Monitor\Event section. But we couldn't reach any cause of this and sadly there was no time to inspect the Log files of all ESXi hosts of this cluster to find out what exactly occurred. However, I guessed there is a wrong VM re-naming that suddenly happened by a Help Desk staff without announcing to any vSphere Admins (Although it's a wrong access definition/granting for them because we should remove this privilege from their permission list). So I decided to inspect the details of Log files via PowerCLI through the running of the Get-VIevent cmdlet.

However, this problem forces me to post some use cases for working with this useful PowerCLI cmdlet. In the following I will show you some practical examples:

1. As the first sample, you can watch the result of all events in the Warning severity level by running this:

 

 

 

2. In the second example, I ran a little more complex filter based on the start time which Event Type ID is like this 'com.vmware.vc.authorization*'. It can also be included ending date with -finish syntax.



3. As the last one, you can see I ran the command against a cluster object named "CLS" where the log message included a word like "Vm" and the result is shown in PowerShell GridView.


 

There are many other possible methods of mixing and pipe-lining cmdlet to get the expected results. It just needs a little patience and understanding of whatever you want to do. I hope you always will be in a good situation in your Log management system.

Saturday, July 16, 2022

Desktop Pool deployment failure factors

 
 Have you ever been in the situation of suddenly virtual desktop deployment failure that you couldn't realize what's happening in your VDI environment and related services? As you may know, installation of most VDI solutions like VMware Horizon View's first deployment is not a complex task for an expert administrator, but understanding the cause of each issue that stops the desktop pool provisioning and in the following finding a way of troubleshooting progress is not easy at all.

 While there are many reasons to stop the VDI deployment, I want to investigate some cases of vDesktop provisioning failures and how to achieve a fast way to resolve or a method of bringing back the provisioning operation to its normal mode. Especially for an Instant Clone Desktop Pool, because of its complicated architecture in front of Full Clone type, we can experience deployment failure, and sadly it's not possible to change easily all settings of this desktop pool. Modifying the Golden Image needs to do maintenance actions and run the publish wizard, so any modification can lead to an unstable state of vDesktop generation. So let's go to check most of the situations:
  1. Accounts: Generally two types of credentials are required in the construction of Instant Clone: First an account for accessing the vSphere infrastructure that can be a part of vCenter SSO or any other connected LDAP repository. The second one is a part of the AD domain account to join the OS of deployed virtual machines. Modifying each one of them in any maintenance interval without informing the VDI Admin team may lead the vDesktop deployment to stop: VM deployment failure (vCenter account) or VDI error (AD account).
  2. Directories: Renaming the VM Folder or changing its hierarchy can cause to loss of the Reference VM and then fail the new deployment. However you can find the new placement path through Publish wizard still, it will stop the virtual desktop recover option by the way. In this situation you should know it's not possible to edit the Desktop Pool easily, thus as a good recommendation, first define the VM folder structure and precedence, then create the required desktop pools based on design. Although it's the same story for AD OU changes, it's easier to set the corresponding OU path inside the desktop pool edit section.
  3. Privileges: There are some necessary permissions for successfully creating a Virtual Desktop. Part one: vCenter privileges on each level of vSphere's objects hierarchy for automatically virtual machine deployment inside the Cluster and put it on the corresponding VM folder. Part two: AD privileges for computer object creation inside the considered OU. Both procedures have their own required permissions. Comply these notes always: Do not modify the Horizon Connection Server considered permissions that are defined in the vSphere environment, and do not change the granted access for the VDI account that is authorized for Domain joining. It's good to create a vSphere role with the required privilege to grant required permissions to Horizon Service Account. For AD accounts, do not set a higher administrator level than is required. I think it's enough to delegate the control with the required AD permission for the mentioned account at the corresponding OU level.
  4. Defined Assets: If you change some primary vSphere components that are selected as part of Instant Clone desktop deployment, like the Cluster and Shared Datastores, this action may break the line of new vDesktop generation without the possibility of knowing exactly what's happened. Of course, you can investigate inside the Horizon details logs to know what's going on, like checking this path (C:\Program Files\VMware\VMware View\Server\Broker\Logs) but it's a complicated and time-consuming troubleshooting operation. So as a good recommendation, define a naming pattern for each type of the vSphere object and configure them all, before running the VDI construction.
  5. Name Resolution: Whenever you are using FQDN instead of IP address, changing the naming convention method or each of the VDI-related DNS records may lead Connection Server, Domain Controller, Event Database Server, and vCenter Server lost each other. The best practices of this section told us to define all servers preferably by their DNS names. For example, it's not possible to change the defined vCenter Server while there is just a related desktop pool (it means never!). Now if you decide to change the network subnets, it's enough to update the DNS cache to resolve the vCenter Server address. However be careful if you define an Alias name or CNAME record for the Horizon Server definition, never wiped them.

 In this post, I tried to mention some of the most potential failure factors. However, there are a lot of reasons for the vDesktop provisioning failure that you may encounter with them in future, like virtual machine snapshot issues (I think I should speak about them in another post). Before starting the VDI project it's highly recommended to construct the server and datacenter virtualization infrastructure carefully, with the power of scalability to avoid unnecessary changes, especially object renaming or changing directory patterns and so on. 

I will start a new journey soon ...