Thursday, December 30, 2021

VMFS Recovery before the last night of 2021 :)

We had a situation that one ESXi host had been crashed because of multiple power outages, and unfortunately one of its virtual machine couldn't be power-on via any action. I tried to start the VM through the GUI and CLI, and after attaching the VMDK files to a new VM, the operation failed and didn't let me bring it back to the circle of service, because the VMFS datastore was truly corrupted.

Reinstalling the ESXi host and, copying the virtual machine's file to another datastore are two options that weren't possible to do. why? There was no stable backup file and everything was in dastastore1, so any type of ESXi installation may be led to data loss and, it was a risky way. Also, copying the virtual disk files was not possible in that state, since all VMDK files were corrupted. Then we decided to attach the whole array to a Linux OS like an Ubuntu system and as you may know, this popular recovery method: Mounting the VMFS partition inside the Ubuntu:

# vmfs-fuse /dev/sdb1 /mnt/myvol/    (For example in my case sdb1 means disk 2 partition 1)

However, we face another error that mentioned VMFS version 6 is not supported. So, consider if you want to recover a VMFS6 datastore, you should install vmfs6-tools, not the vmfs-tools that is used for the older versions like VMFS5. Also, you can get it through apt command or git from GitHub link:

# apt-get install vmfs6-tools

 


Now we could run the vmfs6-fuse without any problem and mount the datastore inside the mentioned directory. Although I don't want to investigate details of file/directory-related applications in this post, but as another important point is you should know you can mount them only inside the /mnt/ directory. so to move them again to another directory, you may have issue with apps like nautilus to work for security considerations.

Finally, we achieved to mount the datastore, and migrate the vmdk files to another storage, then attach them to another installed OS in a new VM, and power it up. Although inside the Guest OS, we forced the reset the NTFS permissions and take the ownership of the required files to grant access. It was absolutely a long way to go for us, but we did it successfully

As an easy-to-do recommendation keep your backup files in at least two safe places, then you are less likely to encounter such situations. I hope you guys have a safe year in 2022 with more protection and truly fewer issues.

 



 


Wednesday, December 15, 2021

Log4Shell (Log4j) or Log 4 Hell ?!

 

From four days ago we did whatever we could do to make secure many infrastructures against the recent RCE (remote code execution) zero-day vulnerability: Apache log4j. Especially we focused on virtualization environments and critical components like vCenter Servers, vRealize Suite, Horizon View Connection Server's Apache Tomcat, and Unified Access Gateway (UAG). I truly confess it's a bad situation, very very bad. Personally, I believe that no exploit was worse than this in 2021, because there are many solutions from a variety of vendors that use Apache as the web App (and subsequently log4j as the Apache's logging component): From Cisco, VMware, Dell to many Industrial Applications of OT Networks.

The log4j is a Java-based logging framework of Apache web services, but its default JNDI feature (Java Naming & Directory Interface) is the primary point of this zero-day. The severity rate of this vulnerability is critical (10/10) that was publicly announced on Dec 9th, 2021 by the developer team and led them to release of the CVE-2021-44228.

All versions below 2.14.1 are vulnerable by default. However, in the last day Apache Software Foundation (ASF) announced the second vulnerability CVE-2021-45046 that version 2.15.0 can be affected too. So, the suggested workarounds are not enough and cannot address this catastrophic exploit lonely. SANS said: "The new version 2.16.0 has been made available to completely fix the issue (so far at least)"

As ASF explained about the Apache log4j 2 update: "In version 2.16.0 the message lookups feature has been completely removed. Lookups in configuration still work. Furthermore, Log4j now disables access to JNDI by default. JNDI lookups in configuration now need to be enabled explicitly. Also, Log4j now limits the protocols by default to only Java, LDAP, and LDAPS and limits the LDAP protocols to only accessing Java primitive objects. Hosts other than the localhost need to be explicitly allowed."

This vulnerability lets the attacker execute RCEs on target without authentication, which potentially is a high risk that you should never ignore. However, VMware announced VMSA-2021-0028 which listed all the exploitable products and also their current workarounds and possible fixes. Of course, there are many other products that are under investigation to find further weaknesses against the possibility of this vulnerability. So, I strongly recommend executing any immediate remediation action, because the situation is still @#$%ed up and one of the best choices is the following FAQ of this "In the Wild" exploit.

At last, you should be aware that in most products you cannot handle directly this issue and need to review the vendor's provided solution to fix it. While these actions are not enough to protect against the attacker's activities, but it was mentioned on many FAQs that adding this line: "-Dlog4j2.formatMsgNoLookups=true" has no require to revert back to the latest setting. So, it seems the patching will roll the products forward and is accommodated with the provided workarounds. Then I think it's not just a temporary solution.


Friday, December 10, 2021

VMware UAG deployment and thumbprint detection issue in front of the firewall



 In isolated network structures that you are forced to secure incoming connections from outside the network, it’s absolutely necessary to deploy the VMware Unified Access Gateway (UAG) for securing the VDI environment include of a limited type of access and also hardening the internal Horizon Connection Servers (CS). In most scenarios, the UAG server(s) will be deployed in the DMZ boundary (subnet / VLAN) in the standalone form or load-balanced structure, so if there is a firewall between these network boundaries, probably restrict the communications between them. (CS & UAG servers)
 UAG as the heir of Security Server and Access Point is based on Photos OS VMware’s built-in Linux distribution OS and can act as the Edge service for VMware Workspace ONE and Horizon suite. It also supports the Java KeyStore (JKS) certificate format that it’s recommended you learn how to work with Linux built-in SSL tools like OpenSSL and KeyTool, Because in the UAG deployment, in the first step of configuration, you should set the Horizon Connection Server’s URL address and thumbprint, then the UAG should validate the CS certificate. There are two methods for this operation:

  1. Set the SHA1 thumbprint manually for the deployed CS, no different it's a self-signed certificate, or even generated by an internal CA service. 
  2. Add the CA certification path (root & subordinate) for an external valid certificate if it doesn't exist to validate the announced certificate by the CS.

 

 In some experiences, I saw the configured firewall policy includes SSL inspection may prevent directly secured channel establishment, even if the firewall trusts the corresponding SSL certificate. It means the firewall changes the SSL connection between the CS and UAG, in the way that the firewall acts as the middle of communication and secures two sides of the channel: UAG to FW and, CS to FW and also, their reverse-path in the firewall rules.

 I mentioned before, we can import the trusted certificate used by Horizon CS (that has VDM as its friendly name too) to the UAG Java KeyStore but still, you need to consider the firewall intervention between SSL communication, especially if you used hardware appliance devices, like Sophos, Juniper, Fortinet, PaloAlto. You can convert the certificate (openssl) and import its root CA to the Java KeyStore (keytool) and also, explore inside the cacerts file (UAG Java KeyStore) in the following path:

# vi /lib/jvm/jre/lib/security/cacerts 

Because if you want to import the SSL anyway, you will still encounter some other issues in the road of certificate validation in UAG. Thus I will explain them later in another post. When I understood my firewall is like an intermediate between the CS and UAG SSL communication, therefore although I set the SHA1 thumbprint, UAG couldn't connect to the CS successfully. So as the next step, I tried the curl command like this:

# curl -v https://horizon-cs.company.com:443



And after reviewing the generated result with more precision, I got the firewall accepts the certificate but change the issuer’s information and replaces it with the firewall’s default issued certificate. Now I was ensuring that's the primary issue. So, I ran the following command to find the X509 format of the responding certificate’s thumbprint:

# echo | openssl s_client -connect horizon-cs.company.com:443 |& openssl x509 -fingerprint -noout


Then I find out if I replace the Horizon certificate thumbprint with the required result, then the secured connection will be established successfully.

 

Congratulation Now the UAG is connected to the CS.

Tuesday, November 30, 2021

Restriction of file transfer option in the Horizon Client



 In some situations, you may require to Disable or Enable file transfer option (Copy/Paste) through the Horizon Client / Web Client, because of some security considerations, like preventing users to share files through Virtual Apps or Virtual Desktops. You can change them via creating or editing the following windows registry key SOFTWARE\Policies\VMware, Inc.\VMware Blast\Config (if exist) of Guest OS, inside the considered virtual desktop or RDS hosts.

Registry Path: HKLM \ SOFTWARE\Policies\VMware, Inc.\VMware Blast\Config

Create (for the first time) or modify this value: FileTransferState with REG_SZ type.

So whatever you need to do, set the following settings:

 

Should I replace the SD/USB disks or still can keep them?

I think many of us read about the new features on the release of vSphere 7.0 Update3 and like many people that asked me about, you may be concerned about keeping the procedure of ESXi installation on SD Cards or USB Drives. Using these types of disks as the ESXi boot device is common, just because they have a lower cost in comparison to other types of storage. 

VMware has been changed the ESXi BootBank architecture and created the ESX-OSData partition, and there is a major difference between the current and older versions. I mentioned some of them in this Post: ESXi BootBank partitions. I suggest reading this post if you decided to change your hypervisor setup plan. However, Bob Plankers described the limitations and restrictions of NAND Flash memories in vSphere7 Up3 What's new truly perfect.

In addition, you should know the ESXi 7.0 is required at least 32GB of disk space to store ESX-OSData. So, keep in mind boot devices especially lower capacity devices (less than 32GB) aren’t suitable anymore in new version.

You need enough space for ESX-OSData (consists of Dump and Log files) it’s still recommended to change their location to local HDD/SSD media if you want to keep the existing (SD/USB) boot device structure. So according to the recently ESXi structure variations, you need to choose one of the following installation procedures:

 1. Using local HDD/SSD disks for both of ESX-OSData and Boot. The oldest method of installation and consequently still supported by VMware, so you can keep going through this way.

2. Flash Local HDD/SSD disks for keeping ESX-OSData and NAND Flash used as the System Boot and BootBank partitions. VMware calls this structure Legacy, while still is supported but it's better to change your Boot devices ASAP.    

3. Using only SD/USB disks is Deprecated in vSphere 7.0 Update 3 because of their poor performance and lower capacity than HDD/SSD disks. So it's strongly recommended to reconsider the setup plan for new ESXi hosts.

You can read the vSphere Team's blog Post about the Boot Media Considerations.

For more information, if you need to know how to change the log files (/scratch/log) directory to another persistent storage, you can achieve this goal through modifying this advanced system parameter: Syslog.global.LogDir
Also, to change Coredump physical path to another partition for collecting dump files, review this post.

Friday, October 29, 2021

VMware VDI (Horizon View) Troubleshooting - Part V

 
 As I promised, I want to deep dive into some tricks of VMware Horizon View in every section of the VMware VDI deployment. So in continuous of the following troubleshooting series, I will mention some of the important considerations:

VMware VDI (Horizon View) Troubleshooting - Part I

VMware VDI (Horizon View) Troubleshooting - Part II

 VMware VDI (Horizon View) Troubleshooting - Part III

VMware VDI (Horizon View) Troubleshooting - Part IV

1. Agent Restriction: While the Horizon Agent has been connected to the Connection Server through the Virtual Desktop/App if you monitor the status of the network connection (via simple commands like netstat) you can watch the only established session is on the JMS-SSL (TCP 4002). However, if you want to limit the permitted port via an External/Internal Firewall to the mentioned port, whenever the corresponding VM of that v-Desktop is in the recovery process, you will certainly encounter the provisioning issue. The error shows the Agent is "Unreachable" while before the desktop re-provisioning operation, you could reach this one through the Horizon. In this state (recovering the VM) we should be aware it needs running the JMS (TCP 4001) in the background too. Then by changing the firewall policies and permitting both 4001/4002 TCP ports, the Agent status is “Available” once more again.
By the way, if any vDesktop stuck on "Unknown" state, you can remove the object from the Directory Services ADSIEdit console connected to the (I explained how to connect in the Part 1).

2. New Certificate Generation: If you generate or provide a new valid certificate for the Horizon environment and for example, want to create a PFX certificate file, you should select the “Mark this key as exportable” checkbox in the *.pfx generation wizard to make the private key exportable. If you don’t choose this option or even use another certificate extension (like *.cer) which is without the Private Key cause the following error, so the Connection Server cannot handle any secured communication.

At last, never forget to set the "vdm" value as the "Friendly Name" of the chosen certificate.

 
3. UAG DNS Setup: At the beginning of UAG (Unified Access Gateway) deployment, while you setup the TCP/IP setting through the OVF wizard it's possible to not accept the NIC setup. So, you can run the following CLI to setup the networking:

 

/opt/vmware/share/vami/vami_config_net

 In 2103/2106 versions, although you configured the DNS servers it will not show them truly in CLI or GUI (still is the same as local caching value: 127.0.0.53). If you are in the initial steps of deployment and the Name Resolution system is not ready yet, you can edit the /etc/hosts file temporarily with an editor like vi and set the FQDN of External Load-Balancer, all UAG Appliances, and Connection/Replica Servers until the DNS permanent configuration has been done because of modifying the "hosts" file is not a stable solution.

4. End-to-End communication: Regardless of connection between the Horizon Servers and Virtual Desktops, you should consider the required ports for the secure channel between the Horizon Clients or Web Access to the provisioned VMs, especially through the UAG. When you connect to your Desktop via the Blast Extreme protocol, regardless of port 443 your client requires to establish a session on TCP 8443 to the UAG appliance and also UDP/TCP 22443 for accessing to the Virtual Desktop or RDS host.

In the next chapter, I will describe and explain more deeply the UAG configuration, especially about how to import the Certification Chain.

Tuesday, October 19, 2021

New VDI posts will come soon ...

I'm back after fewer activities in recent weeks but with some good news. I'm handling the construction of a complicated VDI project based on VMware Horizon Suite from the scratch. Planning and designing phases have been accomplished and I'm sticking to the pilot phase, So I decided to review and share some experiences of my hardening proceeding in a new series of VDI troubleshooting posts. Still wait, I'll be back ASAP ...





Sunday, September 26, 2021

VMSA-2021-0020

 

We should be aware of the following critical vulnerabilities issued five days ago by the VMSA that affect the VC and VCF products, I mentioned CVE titles of some of them: 

1. CVE-2021-21991 Is related to the local privilege escalation vulnerability

2. CVE-2021-22018 / CVE-2021-22013 / CVE-2021-22005 Are about the file vulnerabilities

3. CVE-2021-22017 / CVE-2021-22006 Are related to the reverse proxy and rhttpproxy

You can read the VMware full document about all 19 mentioned vulnerabilities here

Investigation about the physical devices information through the ESXi shell

Sometimes you may require to find more details about a specific device installed in your physical servers, inside the ESXi shell environment, because there is no more useful information in the GUI. So many ways to run the command lines to get related information. In this post, I want to show some of these methods.

1.  lspci -vvv | grep controller 

(you can replace controller with any other related keywords that you want to limit the results based on that subject)


2. esxcfg-info | grep controller 

3. esxcli storage core adapter list  or  esxcli storage core device list


4. esxcfg-scsidevs -l | grep vendor

Of course, there are many other ways to achieve this goal, by the way, I mentioned just some of the useful CLIs. I will be happy if you mention other ones that you executed in your management workloads

Thursday, September 2, 2021

Windows Server Failover Clustering (WSFC) - Basic Intro

In this video, I speak about the architecture of the Fail-over Clustering feature inside the Windows Server with a focus on the Fail-over detection operation via LAN and SAN Heartbeats and also describe what action is going on whenever a Network Isolation happens inside the Cluster.


Monday, August 30, 2021

VMware Horizon View Deployment - Part 2: Connecting to the VCSA & Event Database

In the second part, I will show you how to set up the initial configuration of the Connection Server, include the vCenter Server configuration and also Database server required for the event collecting.

Monday, August 9, 2021

SRM and VCSA registration problem, DNS issue

Recently I have deployed successfully a new SRM Appliance in one of my projects, but after starting the configuration of vCenter server registration, I saw it show me the following error:

"http failure response for https://srm:5480/configure/requestHandlers/... :500 Internal Server Error"



Next, I tested with the VCSA's IP Address and accepted the security warning of its self-signed certificate, but I knew it will fail again. Because the vCenter has been deployed with the FQDN, not the IP Address and, finally it will redirect to the DNS name again. As I guess, this retry brought the following error:

"http://127.0.0.1:9286/sdk invocation failed ... 30,000 ms timeout on connection ... "

Now I was ensured it should be a DNS problem, so I check DNS settings inside the VAMI interface of SRM. By the way, I couldn't change its settings via GUI (I don't know why), so I decided to do it through the CLI via VAMI_DNS commands in the following order:


First as I thought it's related to adding wrong DNS server. There are some external DNS servers, so it's not possible to reach the corresponding Domain Name. Then at the next step, I change them to the local DNS Servers like the following:


This time when I tried again to register the VCSA for the SRM, it was accomplished without any special issues. Of course I had a little problem with Time synchronization between the SRM and VCSA, so I decided to show the troubleshooting methods for both of this problems in a separate video asap. 

Thursday, July 29, 2021

VMware Horizon View Deployment - Part 1: Install the Connection Server

In this video, I speak about the Connection server installation types and also, its restriction for coexisting deploying with other types of servers like Directory services or Web servers.

Thursday, July 22, 2021

An unexpected error while VM migration!

In one of my projects, we forced to change some of the AD infrastructure and re-define the new established DNS servers for the vSphere environment. However, I should mentioned before there was no FQDN/IP address variation, because this action was just a server replacement with same configuration. Two days after establishing the normal situation, one of my team member on the virtualization support group told me we have a problem with the vMotion and also manually VM-migration. He said if we want to move a VM between the ESXi hosts inside the primary cluster, it will show irrelevant errors in the compatibility checking section of  migration operation.

All hosts of this cluster were healthy in that moment and, all of them have the same ESXi version/Build Number too. We reviewed the corresponding ESXi checklist to be ensure nothing special incidents was happened around the hypervisor operations and after that we worked on the following items to find the cause of this issue and fix it:

  1. Firewall settings on the source and destination ESXi hosts.
  2. VMkernel interface's TCP/IP settings (IP Address, Subnet Mask) and also Default Gateway.
  3.  Firewall configuration on the new DCs and DNS servers.
  4. Capturing traffics between two hosts via tcpdump-uw and analyzing the results.
  5. Re-configuration of HA and DRS settings (I couldn't accept the risk of cluster re-construction, because of its highly operational role in the availability of the infrastructure services).
  6. Removing both of these ESXi hosts and bringing back them again to the cluster.

But the problem still remained. Even we restored the final VCSA backup, but sadly I saw there is no success (Of course I didn't replace it with the current vCenter's VM). As the final thought with desperation, I decided to upgrade the vCenter Server to a newer version, because I remembered that its current version was 6.7 U1(6.7.0.20000). So I tried to upgrade it to the 6.7 U3m (6.7.0.47000) . Strangely after the successful upgrading we saw everything is working fine one more time, and like the old normal circumstance, virtual machine migration is possible again.

As the conclusion, in similar situation many of us experienced that stable versions those give us a good feedback and results, can help the IT tech staffs to fix many unreasonable and irrational issues about the infrastructure services. So never and ever forget to have an outstanding plan for patching hotfix and upgrading operations.

Friday, July 2, 2021

Checkout the vulnerability (CVE-2021-21985)

 In the following of mentioned vulnerabilities in this post: VMSA-2021-0010 I prepared a video about how to check our vCenter server's vulnerability against the CVE-2021-21985 through the published NSE-Checker script in the corresponding GitHub link. But truly I forgot to publish it after a month ... LOL :) However, Now it's time to watch it:

YouTube: https://youtu.be/uMsZSVZJTBc

Instagram: https://www.instagram.com/tv/CQ1tdf4g8jD 

 

Tuesday, June 22, 2021

I'm still on the road of vExpert ...

It’s a perfect experience to be a part of vExperts society. Three years ago, when I decided to register for the first time, I don’t have any special clue at the beginning of the road and just read about the vExpert badge's benefits. So I decided to work hard on it.

In the first step, I participated actively on the VMTN forum, and started answering the people’s questions more regularly (honestly I joined in 2010 but I wasn’t active there so much). As the second step, created this Blog (VirtualUndercity) after two years of delay. Although I wanted to do it in 2016, as the manager of some massive projects I was too busy and couldn’t reach this goal until 2018.

In the beginning, I wrote initial posts about some of my old virtualization projects that most of which were based on vSphere and Horizon solutions. But a few months later I decided to post some about the planning and designing in the virtualized environments and then describing my recent activities with a focus on VMware solutions. In the first time of vExpert registration, I failed to achieve. I think I will never forget that E-Mail because the company, especially dear Corey encourage me to not give up.

In the third step of my vExpert achievement plan, I made my YouTube channel and published some of my old collected videos after editing them. I absolutely agree to improve their multimedia quality, but still focusing on their contents via speaking details of my virtualization knowledge. Fortunately, with respect to the video publishing area, I was able finally achieve my first vExpert badge on that year (2019) but it was not the end of the road!

Final stage in my experience and roadmap of the vExpert Program is starting an Instagram page for publishing educational content, especially video training like the YouTube's way and, also introducing the VMware recent technologies and products, talking about the webinars and events, some useful tips, advanced CLIs and some other tricks. 

However, I still continue to produce and post more information, and via distributing my knowledge and experience I will train myself too. I hope anybody reads this post, gracefully achieve this badge of glory. Now it's time to register for the 2021 vExpert program in the second half phase. If you need to aware about its timeline, please read the Corey Romero recent post on VMware blog. You can apply here, and enjoy many benefits of this program if you accepted. I strongly suggest to review the example application before filling the form.


Wednesday, June 9, 2021

Fundamental storage concepts in VMware - Part 1

  In this video, I speak about the basic definitions with special attention to the VMware storage design area. I explain the concept of array, volume, and LUN, also simply describe the RDM and VMFS technologies.

 

Thursday, June 3, 2021

VMSA-2021-0010


 

 New vulnerabilities have been reported to VMware about the HTML5 web client of the vCenter server. As VMware announced here this new RCE is about the affected Virtual SAN Health Check plug-in (that is enabled by default in all vCenter Server deployment types, whether VSAN feature is being used or not).

Bob Plankers writes everything about the CVE-2021-21985 and CVE-2021-21986 here:

https://blogs.vmware.com/vsphere/2021/05/vmsa-2021-0010.html

 Also, product patches are available and you can download them based on your VCSA version.


 


Monday, May 31, 2021

Basic configuration of HP SAN Storage MSA 2040 (Video Series - Part1)

 In this video, I show you how to configure a SAN storage device (HP MSA2040). In the next part of this video series I will upgrade the firmware of this storage device, and in the final part I will work with some basic and useful CLI commands related to the managing this SAN storage:



Saturday, May 22, 2021

Importance of upgrade to solve the problems

    Last month I had a bad experience with one of the managed vSphere environments by me, after a suddenly long-time power termination of the Datacenter that causes rebooting of most of the ESXi hosts inside the primary vSphere cluster. After passing this disastrous day, accidentally I saw there is an issue when the VM Migration is going to run, and it shows the following compatibility issue:


 "A general system error occurred: Connection Refused; The remote service is not running, OR is overloaded, OR a firewall is rejecting connections."

    Our operation team decided to investigate every possible reason from the scratch, so we checked the following lists:

  1. TCP/IP stack (especially the VMkernel Gateway) configuration for each ESXi host
  2. Disable/Enable vMotion capability of the related VMKernels
  3. Disable/Enable HA feature in the Cluster settings
  4. The built-in firewall status of ESXi hosts
  5. Restart the management services, even reboot one of the mentioned servers
  6. Configuration of each networking component in this path
  7. Remove and re-Add the ESXi host to the Cluster

    Finally, to find more detail I checked the /var/log files too, But sadly nothing gives me the required feedback! So I decided to go after the next component: the vCenter Server. First of all, I checked the VAMI, and the Health status and services were OK, and nothing specially mentioned in the Log files. But meantime of checking the VCSA details, I encountered the current version of this service: 

    Because of getting no results from all my troubleshooting operations, I decided to upgrade its version to the latest build number (17713310) of the stable version (6.7.0.47000) that is published.

    After the successful upgrade operation and restart of the vCenter server Fortunately, we saw this problem has been fixed without any special action on cluster or ESXi settings. One more time, I strongly recommend taking the upgrading/updating tasks seriously, because of it will fix many unexpected issues with less concern. By the way don't forget to take a backup before the upgrade and also, pilot the installation in your test environment, before running through the main system.

 


Tuesday, May 4, 2021

Change VMkernel port settings remotely

In this video, I will show you how to simply change the IP settings and VLAN ID of the VMkernel interface, as fast enough to don't lose the ESXi management connection permanently

 

Wednesday, April 28, 2021

Disable vSphere client session timeout for monitoring purposes

 One of the usual ways of monitoring network infrastructures is checking the environment via many web-based consoles, directly in a 24*7 real-time NOC unit. vSphere web client and its built-in Performance section inside the Monitor tab for the VI's components monitoring is one of the most beneficial tools however, you certainly know it will automatically log out and terminate the currently connected session after an idle period. So it's possible to modify the timeout duration via editing webclient.properties file in the VCSA Shell and change the value like the following way:

 

cd /etc/vmware/vsphere-client

vi webclient.properties   

Change the session.timeout value from 120 to 0 if you need to disable the idle timeout logout operation. Then type :wq to save and quit the vi editor.

 

Finally login to the VAMI console (vcsa:5480) and restart the "vsphere client web service", or do it with CLI by running:

service-control --stop vsphere-ui | service-control --start vsphere-ui

 

 

Monday, April 19, 2021

Importance of monitoring virtualized environments


Which factors should we consider as the monitoring metrics?
Choosing between monitoring solutions is one of the important decisions that each IT manager should decide. But before selecting the software/platform, it’s vital to investigate and collect the critical components of infrastructure that we need to monitor. With the growth of virtualization and cloud computing technologies, we have to attend to many aspects, especially physical resources and virtualization components. Analyzing detailed metrics on each of these parts will give us many benefits, like protecting virtual machines against failures and increasing the rate of availability in the virtualization infrastructure. Many factors we need to consider, like the following:


 Physical components:

  1. Computing resources consumption: Physical processor and memory   
  2. Cluster available capacity: cluster’s resource while the host failures
  3. Availability of hosts: ESXi heartbeat issues, failover network infrastructure, and VMkernel settings
  4. Storage usage during work hours and backup operations: Datastore IOPS and rate of space usage
  5. Over-allocation & under-allocation of each physical resources: CPU, RAM, NIC, Disk
  6. Memory ballooning and dedicated datastores for swap files
Virtual components:
  1. Extra VM log files and their unexpected storage usage
  2. Not-installed VMware Tools or old version of them that are installed on the virtual machines
  3. Old-remained snapshots and many parent-child VMDK files
  4. Inactive or unused virtual machines  
  5. Not-used mounted ISO files and old connected physical media
  6. Orphaned VM files, especially VMDK files
  7. Rate of VM’s memory swapping and overall memory performance
 
Although some of these issues are very easy to resolve, they’re required a real-time 24*7 monitoring system also dedicated response teams for proper reactions against possible or even unexpected problems. Regardless of chosen monitoring solutions in your infrastructure, it’s more important to have some well-done plans for counterattacks against forecasted challenges, availability issues, and every detected incident that causes many risks against our infrastructure or data center.
 

Thursday, March 25, 2021

Fun post: Be more than the highest version ... of course sometimes ;)

 It's a little funny and strange matter that I suddenly met. I installed a security hotfix: ESXi670-202103001 (177000523) on one of my ESXi hosts. Then after the successful reboot, I wanted to check its version with the existing official version listed in VMware KB2143832, but I saw it has a higher version than the top listed version for ESXi v6.7(EP18). But today when I want to post, It has been updated ...

So sometimes it's not bad to be updated more than mentioned versions :D

Wednesday, March 24, 2021

Cluster Remediation settings: Suspend to memory

 One of the newest features in vSphere 7.0 U2 is "Suspend to memory (STM) " which is very useful during maintenance operations and can cause update operation is done so much faster while using vSphere Lifecycle Manager (vLCM) especially when you need a lot of time to move the virtual machines to another host, or if there is not enough space for migrating them. Because generally in many cases of upgrading ESXi hosts, you will encounter these two bottlenecks: time duration for temporary migration and not-enough computing resources on destination.

So STM can be helpful in a similar situation, however, you should know this feature have some restrictions, like requiring to enable the "Quick Boot" feature. If you need to know more about the limitation, requirements, and best practices of this feature, please check the following link:

https://kb.vmware.com/s/article/81555

However there is a lot of restriction in hardware support for STM, and based on KB82558 only a few server models can handle this (In date of writing this post): HPE Proliant DL380 G10 and DELL PowerEdge R740 family.


 Also, Niels Hagoort wrote a brief about the STM feature in the VMware blog:

https://core.vmware.com/blog/make-esxi-upgrades-faster-suspend-memory


Sunday, March 21, 2021

A Future Defined by Cloud: Challenges and Capabilities


 

 Hello guys. Today is a very important day, because it's the first day of spring and based on the Persian solar calendar it's the first day of a new century: 14th century! Then I decide to introduce a new Cloud-related event of VMware on March 31st (10 days later, so if you mind and don't wanna miss it, please save it in your scheduled events).

https://www.vmware.com/app-cloud-event.html?src=em_602d770dd929b&cid=7012H000001Ysn5

Also, I attached list of speakers in the attached screen and hope you enjoy and learn more and more about the VMware App and Cloud Transformation.

💚💚Happy Nowruz All💚💚

 





Tuesday, March 9, 2021

VMSA-2021-0002


 Around two weeks ago VMware announced a new series of RCE (Remote Code Executions) about some of products, especially two primary components of vSphere in versions of 6.5, 6.7, 7.0: ESXi and vCenter Server, and also Cloud Foundation (3x, 4.x). Three CVEs has been published for these security vulnerabilities and most of them is related to the HTML5 version of vSphere client (HTTPS:443). Because it will give the attackers unrestricted access to execute commands. To read more about these security breaches (CVE-2021-21972, CVE-2021-21973, CVE-2021-21974) you can read the VMSA-2021-0002.

In the following I mentioned a brief of their known targets:

21972: Let the attacker execute an RCE with unrestricted privileges on the VCSA via accessing port 443 on the network.

21973: Let to attacker send a POST request to VCSA HTML5 on port 443, and lead to information disclosure because of an SSRF (Server Side Request Forgery) vulnerability.

21974: Grant the attacker access to the ESXi via RCE on port 427 to trigger the heap-overflow issue in OpenSLP service.

Also for more information about another vulnerability about the vSphere Replication, read the VMSA-2021-0001

 




Tuesday, March 2, 2021

VMware Horizon View - Part1: VDI features

In the first part of the VDI Tutorial, I will review its architecture and talk a little about each part of VMware Horizon View: Client types, Connection Server, Replica, Composer, and so on.

 

Thursday, February 25, 2021

VCSA7 : No healthy upstream!


 I think many of us saw something like the following error when the deployment operation of VCSA7 has been finished. First of all, I decided to go directly through the VAMI and check the health of services, but everything worked correctly. Truly some of them are healthy with a warning, but after a restart, all of them are healthy, but sadly the problem still exists, even after restart the vCenter server. I searched inside the log files and nothing especially found that is related to this matter. 


You know if you type the IP address of the vCenter server, it will be redirected to the FQDN (if you set it correctly meantime of the deployment phase), So I decided to investigate more inside the "/etc/hosts" then create a backup and change the primary file's content, changing each line of IP/FQDN combination to the IPv4 and IPv6 loopback address form and also replace the "localhost" instead of FQDN, Something like the following picture.


 




 


Finally, after the second restart, the mentioned warning is gone.

Sunday, February 14, 2021

Detecting source of ESXi login failure


  Sometimes we may encounter unexpected or unknown login issues for the ESXi hosts and see some errors like the "remote access for ESXi local user account 'root' has been locked for 900 seconds after xx failed login attempt" in the vSphere client. In most cases we know they are related to the changing credentials and forgot to set them again on the connected solutions, like Backup servers or Monitoring systems. But what can we do if we couldn't find the reason for the login failure?! What should we do if we couldn't reach the real source of the problem? Is it related to a wrong credential truly, or is it a part of a hacking operation (like password guessing)?

  We know there are many log files (/var/log or /scratch/log) for the ESXi host, and with respect to the troubleshooting purposes, they are very useful to discover and realize each aspect of problematic situations. So for this mentioned issue, we can go to the following log (hostd.log) file and investigate the depth of its details.

# grep Rejected /var/log/hostd.log

or 

# cat /var/log/hostd.log | grep Rejected

After you find the source of credential rejection, then you can manually understand the root cause. Is it related to an attack preparation or forgotten password changing operation?

Saturday, February 13, 2021

VMware ESXCLI: how to upgrade and patch

In this video by using ESXCLI, I teach how to upgrade the ESXi host in the CLI environment. Also, I install a VIB Bundle file in the host. Hope gonna be helpful for you all. 

Friday, January 29, 2021

VMware vSphere Planning - Part 1: HCL

One of the major steps of vSphere planning is checking the VMware hardware compatibility guide and also interoperability between chosen solutions before deploying any virtualization products in the infrastructure. In this video, I speak about the importance of these two subjects with some examples.


 

Thursday, January 28, 2021

VMware Skyline: avoidance before occurring!


Fast and easy ways to fix known or unknown problems of every aspect in IT infrastructures is a big concern for all Admins, especially for whom managing recently modern data-centers. As you know each layer, asset and component of data-center has its own risks. VMware introduce Skyline as a proactive cloud-based platform to monitor and give us feedback for fixing and resolving configuration problems and security issues in virtualized environments of VMware customers, that are deployed based on VMware Validated Design or VxRail solution. This perfect solution give us a vision to reach more reliability in production and basically have two parts: Collector Appliance inside the virtualization infrastructure of customer and VMware Skyline Advisor, so it operates based on two following actions:

  1. Listen to the new events, aggregate the performance data and collect the usage information of VMware products (vSphere, NSX-V, NSX-T, Horizon View, vSAN, vROM and VCF.) installed in your infrastructure via a locally deployed virtual appliance (VMware Skyline Collector) within your environment.
  2. Send these collected data securely to VMware Skyline Advisor for analyzing inside the VMware Cloud. They will be transferred in an encrypted channel and will be kept in a safe repository. (VMware guarantees the privacy of your collected information)

Also there is another way to reach the goals, via Skyline Log Assist you can manually uploading log files, but based on size of the files, this operation can take many times to complete.

VMware Skyline Advisor is a self-service portal that you can have access to it through your VMware account as a customer and through this you can see the recommended actions and improvement suggestions to resolve easy or complex problems. This procedure also gives the VMware chance of better security risk assessment and fixing the issues easier and earlier, and let us to obtain opportunities of security-concerns handling.

Just remember VMware Skyline approach is to solve the detected problems not just show us only symptoms. It made the IT department to be more prominent in the company's business roadmap and be active as a part of value-added services. 


Skyline collector setup is very easy like deploying other VMware or 3rd-party virtual appliance-based solutions and do not have a complex deployment procedure. And also, like other VMware solutions there is a VAMI web interface for collector. (And like the VCSA, corresponding port is 5480) 

To see frequently questions about the VMware Skyline, you can check the Skyline FAQ and also KB55928. To know how VMware Global Support Service can help customer to obtain benefits of VMware Skyline, read the Support Entitlement Modes of Skyline Advisor. For more steps to go through the VMware Skyline you can watch many videos presented by VMware technical support or customer services.


Tuesday, January 19, 2021

Planning for upgrading to VMware vSphere 7

 In this video, I speak about more details of VMware vSphere upgrade planning and operation in 3 phases, based on titles of "Upgrading to VMware vSphere 7" document. Thanks to Nigel Hickey for writing this eBook, I review all requirements for upgrading, planning to do, tips and notes, and also benefits, limitations and committed changes in vSphere 7.

  

 

Thursday, January 14, 2021

Planning for upgrade VMware ESXi hosts. (Part I)


      

I speak and review the considerations, limitations, all challenges, and risks of updating and upgrading VMware ESXi via the direct installation method or by using vSphere Update Manager (VUM) or Lifecycle Manager (LCM).

 

I will start a new journey soon ...