Undercity of Virtualization: 2020

Monday, December 21, 2020

Which RAID type is better?

Which one is better? RAID 5 + 1 Hot-spare or RAID 6 for your storage configuration:

It’s a very common question: Which types of array configuration we must choose, whenever we want to configure storage devices like SAN storage or even local disks of an enterprise server. To answer this simple but important question, we need to dive into the details of three vital factors: Storage capacity, the overall performance (IOPS), and, data protection mechanism (fault tolerance). First of all, we need to investigate the characteristics of the most popular RAID types: 0+1, 1+0, 5, and 6.

In cases of using mirror RAIDs (1+0 or 0+1) despite increasing data-protection rate and good performance, we lose 50% of storage capacity. So, in medium-scale environments, it’s a massive limitation and can be a bottleneck on storage design. In most design documents with parity-bit array configuration like RAID 5 (single parity tier, and single disk fault tolerance) and RAID 6 (two parity tiers, and simultaneous two disks fault tolerance), there is a good recommendation: Configure one or multiple disks of the storage device as the Hot Spares of your arrays. So, if a failure happens one of the spare disks automatically replaced with the failed one. (After replacement of healthy disk, it will be select as the next hot-spare) While this is not a bad idea but what will happen if multiple concurrent disk failures or second failure occurs before the parity-bit structure rebuild the array? In this situation, unfortunately, we will lose everything in the failed array, So, now what can we do?! The answer is nothing, just re-design the storage infrastructure!

By considering the same circumstances for the storage space in a comparison between RAID6 and RAID5+1spare (In both of them, storage of two disks is sacrificed for their array construction) we should choose one of them for our storage array, with respect to the protection metric or the performance. But after executing many performance tests (For example, computing latency percent of the read/write operations) no significant difference was observed. Although I don’t want to say there is no overhead for calculation of the second parity-tier in RAID6. However, in front of the protection factor, additional mathematical operation of the RAID6 does not degrade the performance rate effectively. So, if we need to choose between RAID5+1spare and RAID6, and select the best options for our storage infrastructure, I say you should specify first of all “which one is more important? overall IOPS (performance) or fault tolerance (data protection)”

If you choose the first option, I will say configure RAID5 for all arrays, with one or multiple hot-spare disks based on needs and your design, while if you want the second one, I will suggest RAID6 is the best choice you ever made. Because traditionally we made most of the RAID structures to prevent data loss, finally I can conclude RAID6 is the winner of this game because the storage volumes can tolerate double disk failures. Especially in the case of using large capacity disks (more than 1TB space) risk of second disk failure before finishing the rebuild operation is very high and It’s an important point you should always consider.

Sunday, December 13, 2020

vSphere Standard Switch (VSS) - Introduction (Part 2)

The second part of my video series about the vSphere Networking basic introduction is ready now. I hope you enjoy it and gonna be helpful for you :)

Sunday, December 6, 2020

How to get the ESXi host info via CLI

In this video, I will show you how to get information about the version and Build Number of the ESXi host in the CLI environment.

Monday, November 30, 2020

vSphere Standard Switch: Introduction (P1)

In this video, I spoke about the fundamental concepts of networking in virtualization and the differences between physical and virtual switches. It's the first part and the second video will be uploaded soon.

Friday, November 13, 2020

ESXi BootBank partitions

VMware ESXi will create some system partitions on its boot storage device, and it's very useful to understand them with respect to troubleshooting tasks. So let's check them a little:

1. System Boot (FAT16) includes boot loader that has a fixed size: 4MB for older versions (Prior to 6.7) / 100MB for ESXi ver7.0

2. BootBanks 0 & 1: Boot Bank partition has a compressed copy of ESXi boot files and modules. BB0 is used as an active boot partition and BB1 for alternative (AltBootBank) so whenever you upgrade the ESXi version all contents of BB0 will copied into the BB1 for fail-safe purposes. When you upgrade the ESXi host, files of currently installed version are loaded into the AltBootBank (It's empty after new installation) and the system is set to use the updated bank when it reboots normally. In some cases, if the ESXi failed to boot or for any possible reasons the BootBank partition became inaccessible, to recover the latest healthy status of the ESXi host, the system automatically boots from the previously used BootBank and will return to the last good situation (However, you can choose between them manually by pressing "Shift + R" while ESXi are booting).

3. Also, all other system partitions that include non-boot modules like the Scratch partition and the CoreDump that will be placed in the new introduce unified partition in ESXi v7.0, called ESX-OSData. (I think I wrote enough about the importance of CoreDump in my blog, like the last one: Why CoreDump files are useful?) This partition can be used for storing virtual machine files, whenever there is no secondary storage device and the only chosen device must provide all VM's required spaces.

One of the major limitations of ESXi system partitions is their fixed size and to avoid related issues to this matter, VMware decided to make this parameter flexible in v7.0 (You can read more details here) So based on the disk space that we choose as the boot device and its capacity, only the size of the BootBank partitions will be different (not the system boot partition).

At last, if you need to know how to recover a failed ESXi and back it to the normal boot, check the kb59418.

Monday, November 9, 2020

VMware Carbon Black

Parts of the best webinars in the VMwrold2020 for me were moments that I learn more about VMware Carbon Black. As I believe one of the best topics on most of this event's presentations is talking about the Carbon Black. It's all about analyzing not-recognized patterns and automated threat detection, VMware Carbon Black Threat Analysis Unit (TAU) came to use all latest advanced malware detection/prevention mechanism to increase the security and cover our safety. Via this cloud-based platform with the approach on system hardening and threat detection, VMware tries to focus on discovering every global attack, especially each of them that focuses on unknown vulnerabilities that lead to the zero-day attack. Because most of their anomaly behaviors are included with undetected/unfamiliar patterns. So TAU can help us in every corner of the world to protect our infrastructures against pollution/attack.

On Augest 07 2019 this cloud-native endpoint protection announce discovery of affecting more than 500k computers in the world with the well-known cryptomining campaign that steal system access information for possible sale on the dark web and publish a full report about this matter.

However, if you wish to see VMware Carbon Black global threat report for most of the countries in the last 12 months, especially with COVID-19's side effects and tendency to home working for the staff of companies, and observation increasing rate of cyberattacks and threats like malware, review the following info-graph:

https://www.carbonblack.com/resources/global-threat-report-extended-enterprise-under-attack-infographic

Also if you want to know about the global incident response, biggest threats and most cyber-crimes, and notes to know how to fight back against them, read the full report of VMware Carbon Black :

https://www.carbonblack.com/resources/tipping-point-election-covid-19-create-perfect-storm-cyberattacks

Friday, October 30, 2020

DHCP Basics (Part2)

The second part has been published ...

In the following of the first part DHCP Basics, I will speak about how to configure DHCP service in Windows Server.

Saturday, October 24, 2020

Why CoreDump files are useful?

In a case of any unrecoverable system error, you may see such a blue screen (Windows OS) or a purple screen (Linux OS or ESXi) that shows some useful information about what's going on the server and why this crash happened? In most failures, there is a hardware reason for this problem, and many of them are related to the memory and storage devices or even processors. So there is a possibility to register this diagnostic info to a Core Dump file if you configure Disk Dump or Net Dump in the ESXi host.

In addition to the mentioned reasons, many other problems may exist for the crashing of an ESXi host, like an incompatibility between the physical server and version of the ESXi host. In most cases of PSOD (Purple Screen of Death), the ESXi host doesn’t reboot and will hang on the error screen until you reboot it manually, while you can see some cases of the automatic restart. However if you couldn't see the error message (while the server is freezed) or didn't have physical access to the host in the server room (via any physical media like monitor or KVM console) then you can use an OOB management platform used by your server's vendor (like Intel AMT, Dell iDRAC and HP iLO) to analyze and check the statistical or current status of the ESXi host.

In older versions, there was a size limitation for default CoreDump (100MB) and as you understand it’s not enough for this important file. So VMware highly recommended to generate them in one of your VMFS datastores. VMware announced "After release ESXi 7.0 it creates a VMFS-L based ESX-OSData volume and configures a CoreDump file to stored in it if the volume is larger than 4GB."
So if the ESXi is installed in a USB device or SD memory card, then setting of the following boot option is required before the host startup:

allowCoreDumpOnUsb=TRUE

Also if you need to find and investigate the current CoreDump partition, run the following esxcli command:

esxcli system coredump partition set

If you need more details about how to modify CoreDump files, you can read this post too. As the last point, you should consider both Software-based adapters iSCSI and FCoE are not supported for registering CoreDump file until the current ESXi version.

Sunday, October 11, 2020

DHCP Basics - Video Series (Part One)

This is the first video of a new series named "Windows Server: Basic Introduction of Services" and it's the first part of DHCP introduction with a fundamental review on the architecture of this service.

Thursday, October 8, 2020

Best online participations in VMworld 2020

A week has passed since the end of the great VMware global event: VMworld 2020. Many perfect interviews, webinars, new features, and technologies were presented by VMware experts and managers. However Personally I enjoyed two interesting podcasts that I mentioned below:

1. (ISNS2944) East-West is the New Perimeter. The Cutting Edge of Datacenter Firewalling presented by Tom Gillis. In this video, he spoke about the benefits of utilization VMware Carbon Black TAU (Threat Analysis Unit) for controlling East-West traffics of the Datacenter and in front of known types of attacks like DDOS or SQL Injection, in comparison to the implementing hardware firewall appliances.

2. (ISWS2943) The "Future Ready" Security Operations Center presented by Tom Corn and he review five key challenges facing security operation teams and how the Carbon Black Platform will response to these matters and also spoke what are rules of VMware XDR (eXtended Detection and Response) in today virtualization security challenges.

Thanks to both of them and all other virtualization experts for creating this memorable event.

Wednesday, September 30, 2020

NVMe architecture and its pros and cons for VMware ESXi

NVMe (Non-Volatile Memory Express) is a great technology that can be considered as the storage access and transport protocol for the next-generation of SSD disks (with PCI interface). However you should consider, the NVMe protocol is not just a physical connector for the flash-based disks because it's also used as a networking protocol. The NVMe is also an end-to-end standard (with a set of commands) to reach the highest level of throughput and performance (IOPS) for the storage systems on most of today's enterprise workloads, especially IoT, data-mining, and virtualized datacenters. NVMe offers significantly higher performance and lower latency compared to legacy SAS and SATA protocols because of parallel architecture. As a networking protocol, NVMe enables a high-performance storage networking fabric and provides a common framework for a variety of transports.

Unlike other types of storage protocols designed for the mechanical hard disks, NVMe is not just a simple SSD storage because it can use the benefits of multi-processing architectures. NVMe is also a NUMA-optimized technology and highly scalable storage protocol that connects the host to the memory subsystem to deliver the lowest latency in front of other types of storage devices and protocols, even the legacy SSD disks. This technology has many unique features like multi-stream writes and over-provisioning that are greatly useful in virtualization environments.
NVMe brings many advantages compared to legacy protocols. But why we need to use them in the virtualized infrastructure and what's the effects of NVMe usage in the metrics of performance, data transmission and storing in our virtualization and storage infrastructures. As WD perfectly mentioned about the NVMe features: IO virtualization, together with namespaces, makes NVMe very interesting for enterprise SAN, hyper-scale server SAN, virtualization, and hyper-convergence use-cases. Taking it one step further, SR-IOV allows different VMs to share a single PCIe hardware interface.

Although as I mentioned previously NVMe was designed for the flash-based disks but it can communicate between the storage interface and the System CPU using high-speed PCIe sockets, So another big difference between the NVMe and other types of flash-based storage is about how to access to the processing resources. Legacy SSDs do it via the HBA controller but NVMe SSDs are connected to the CPU directly via PCI connectors. Because of these benefits, in design of the modern datacenters NVMe will play an effective role, especially if we want to use them as the local storage for the ESXi hosts. As an important note, you should never forget we cannot use NVMe disks as part of a disk array (like RAID) because NVMe storage devices are connected via PCI, not the hardware RAID controller (PCI doesn't have RAID options). As you know VMware ESXi doesn't support software RAID techniques, so it's not possible to use NVMe as an array of disks into the ESXi storage design. So be careful when and why you use them inside the vSphere environment!

while VMware officially supports the NVMe after vSphere 5.5 in November 2014, but just as a separate driver. However it was available as part of base image of ESXi in vSphere 6.0. VMware IOVP Program (I/O Vendor Partner) certified storage drivers (VIB and binary files) developed by many storage vendors like Dell, Intel, HP, Western Digital and Samsung. VMware also released its dedicated NVMe controller for using as the virtual hardware of VMs. We can leverage them inside the virtual machine for hardware versions higher than 13, so after release of ESXi 6.5 it’s possible to use the NVMe controller inside the virtual machines

In comparison other types of storage controllers, using this feature in virtual machines will significantly reduces the software overhead of processing guest OS I/O. So it's very useful for most of virtualization solutions especially in VDI environments, because NVMe let us to use more virtual desktops per each ESXi host. (Each virtual machine supports 4 NVMe controllers and up to 15 devices per controller) You can list already installed NVMe devices in the ESXi host via running the following esxcli command:

# esxcli nvme device list

For more information about the NVMe technology you can read the below mentioned links. Also there is other related features like NVMe over Fabrics (NVMe-oF) or NVMe over Fiber Channel (NVMe/FC) that maybe I write another posts about them later ;)

https://kb.vmware.com/s/article/2147714

https://blog.westerndigital.com/nvme-important-data-driven-businesses/

Undercity of Virtualization