Saturday, January 26, 2019

VMware SDDC Design Considerations - PART Four: Leaf-Spine Architecture

 In continue and as the fourth part of SDDC Design (based on VMware Validated Design Reference Architecture Guide) we will look more about Leaf-Spine model and topology characteristics. Before explain it, an important question must be answered: Why recommended to choose the two-tier model of Leaf-Spine in datacenter networking design instead of three-tier model of Core-Aggregate-Access? (Or as Cisco named: Core-Distributed-Access). It’s so important to know the first and the main reason is the simplicity in design, especially about datacenter communication. In the networking terminology, two terms of traffic stream, North-South and East-West have been defined. Let's see what what are they?
 As the perspective of datacenter design, North-South traffics almost means incoming and outgoing traffics across datacenter and the world (outside of datacenter, usually traffics of servers in branch offices, secondary sites, requests from client subnets and regions and other part of the network infrastructure). In contrast, East-West traffics are datacenter internally streams and absolutely related to the server to server communication inside the datacenter. It means East-West traffics never leave the datacenter as the North-South traffics always do. So for the North-South, traditional definition of networking 3-tier model is still the best choice, because you may need to establish and keep redundancy at each layer of network to stand against every network failures, but fully redundant connections are never be essential for the datacenter inside links, as the true meaning for Leaf-Spine uplinks. Because it will require to high bandwidth and more throughput transfer more than network failure resilience or full-mesh switch connection inside of the datacenter (No need to spine-spine or leaf-leaf switch uplinks). Also we must consider the STP role, because it can block one of two redundant uplinks to prevent network loops, so simplicity and bandwidth growth is more and more important than full-redundancy inside each tier of the datacenter. Redundancy is important only between Spine-Leaf Connectivity.


 In Leaf-Spine architecture, consider the Spine switches as the mix of Core & Aggregate layers (called Distributed Core) as the heart of structure and Leaf switches as the Access layer for servers and especially hypervisors in virtualization area. So instead of handling total load of network by one or two massive strong core switch devices, all of the generated traffics from leaf switch uplinks will be distributed between all Spine switches. Then scalability factor is the second in lead factor in datacenter networking design.
 Many characteristic can be effective on size of datacenter and its rate of growth. For example total racks inside the fabric and provisioned bandwidth between two racks in the datacenter and type and speed of connection between leaf (as the ToR switch) and spine. To design better and reliable datacenter networking, it’s so important to implement uplink connections based on ECMP. Equal-Cost Multi-Pathing let the network transmitting happen across multiple paths as same as each other, so all of them can carry packets equal and will lead the infrastructure to better load-balancing and more bandwidth. Also this structure prevents aggregation of load in one or more uplinks.

 
 Type of services provided by each racks can lead us to think more about how to provide a scalable datacenter design? For example if you dedicated one rack for storage devices, increasing number of hypervisors on other racks can confront our design with a great bottleneck. Because you need to provide new storage connectivity (SCSI or FC HBA, SAN switch and etc.) and maybe there is no more room for your new equipment such as storage enclosure. Jumbo Frame is a great solution if you can configure it on your end-2-end devices. So MTU value on the vSwitches (VSS/VDS) and physical end devices must be same. (MTU 9000 is the best choice.)

Or if you need to improve your computing pools with new CPU cores, what should you do if you need more physical servers? So before allocated each racks for hosting specified process, services or even storing procedure you should calculate growth rate of your datacenter, especially hardware resources and physical equipment to satisfying future datacenter service providing needs.
 At last always remember to consider maximum rate and average bandwidth of different traffics inside a virtual environment, like vMotion, VSAN, NFS, VXLAN and VR to calculate datacenter infrastructure requirement accurately.

Sunday, January 20, 2019

Set Manual Routing for VCSA

Although we want to to manage all of our deployed hosts inside a single subnet or VLAN, maybe in some situations there need to place many of hypervisor on other subnets / VLANs. So if there is a way for routing the vCenter traffic from it's gateway to them, there is no problem. Only the requirement traffics for initial management (incoming TCP 443 / both side TCP 902 / outgoing UDP 902) must be permitted within your gateway / router / firewall. But if it's not possible to do that because of some management or security considerations, so you can input all of the required routes inside the vCenter Server Shell. There is two ways to do that. One method is using "route add" command on shell access. For example:

# route add -net 10.10.10.0 netmask 255.255.255.0 gw 10.10.100.1 dev eth0  

Result of this method is not persistent and will be clean after VCSA restart, Then it's useful only for testing or temporary situations. But if you want to save it, the Second way is editing of file *.network (such as 10-eth0.network) in and path "/etc/systemd/network" add intended routes in this form:
   
[Route]
Destination=10.10.20.0/24
Gateway=10.10.100.2

Remember to add each route line in separated [Routes] brackets, otherwise it's not working as you expected. Then restart the network interface:

# ifdown eth0 | ifup eth0

or restart the networkd with these commands:

# systemctl restart systemd-networkd
# service network restart

And now if you want to check the results, run: 

# route -n 
# ip route show

Without shell access if you only login to VCSA console, there is many CLI for routing check and config, so you can use of these. To check them and how to use:

> routes.list --help
> routes.add --help
> routes.delete --help
> routes.test --help 

Note I: There is another file here: "/etc/sysconfig/network/routes", if you view it's content, it will show only the system default gateway, no more routes will be shown here.

Note II: If you want to add routing to your ESXi hosts, just do:

# esxcli network ip route ipv4 add -n 10.10.20.0/24 -g 10.10.100.2


Tuesday, January 15, 2019

Special Events: Remeber 10 years of Mastering VMware vSphere


 

Everything started from this moment! Yes exactly when i found best ever book that has been written by one of the legendary network trainers, Scott Lowe (Although as i believed so) and now after more than 10 years of learning, studying, researching, working and teaching about the VMware vSphere products, I'm proud to call it: "The True Beginning of Me in Virtualization". Thanks to Scott Lowe, I'm still using this book (exact this edition of book) for reviewing of my knowledge about the VMware virtual world fundamentals.



P.S 1: This lovely mug is my old friend and partner on many of virtualization projects and still stand beside of me to change everything to the virtual assets more and more.
P.S 2: There are two great Trainers named Scott Lowe, i mean the right one ;)

Interview with EMC’s Scott Lowe on Cloud Computing and vCloud Director



Saturday, January 12, 2019

IT Benefits Management (from Alex Jauch)

Today i read a good post from Alex Jauch about "IT Benefits Management". In first part of his writing he talked about generally what we don't know and technically how to prevent "Expert Syndrome" artifact. In the following speak about a real challenge: when IT couldn't handle a service request from business internally, they should accept cloud services benefits ....
Thanks to Alex i publish link of hist post here:


https://cloud.vmware.com/community/2018/02/20/ask-alex-part-3-benefits-management/#comment-1534


Wednesday, January 9, 2019

Time differentiate between ESXi host & NTP Server

 Yes exactly, another post about NTP service and important role of time synchronization between virtual infrastructure components. In another post i described about a problem with ESXi v6.7 time setting and also talk about some of useful CLIs for the time configuration, manually ways or automated. But in a lab scenario with many versions of ESXi hypervisors (because of servers type we cannot upgrade some of them to higher version of ESXi) we planned to configure a NTP server as the "Time Source" of whole virtual environment (PSC/VC/ESXi hosts & so on).
  But our first deployed NTP server was a Microsoft Windows Server 2012 and there was a deceptive issue. Although time configuration has been done correctly and time synchronization has occurred successfully, but when i was monitoring the NTP packets with tcpdump, suddenly i saw time shifting has been happened to another timestamp. 
 


  At the first step of T-shoot, i think it's maybe happened because of time zone of vCenter server (but it worked correctly) or not being same version of NTP client and NTP Server. (to check NTP version on ESXi, use NTP query utility: ntpq --version) and also change ntp.conf file to set exact version of NTP. (vi /etc/ntp.conf and add "version #" to end of server line) But NTP is a backward compatible service as and i thought it's not reason of this matter.

 So after more and more investigation about cause of the problem, we decided to change our NTP server, for example a Mikrotik router Appliance. and after initial setup and NTP config on the Mikrotik OVF, we changed our time source. So after setting again the time manually with "esxcli hardware clock" and "esxcli system time" configure host time synchronization with NTP. Initial manual settings must be done because your time delta with NTP server must be less than 1min.



 Then after restart NTP service on the host ( /etc/init.d/ntpd restart) i checked it again to make sure the problem has been resolved.



Wednesday, January 2, 2019

VMKernel Core Dump - Part II: How to add & remove coredump files


In another post I explained about the VMKernel CoreDump files but today I want to show how to delete coredump files and set their location to the new path. So before any operation, let's check tge current settings via ESXCLI:
 # esxcli system coredump file list





Each line of the result consists of a path {/vmfs/volumes/5c2...(volume UUID)/vmkdump/filename.dumpfile} , Active, Configured and Size values. If there are "False" values for Active and Configured fields then the coredump file is not activated. You can remove each one of them by:
# esxcli system coredump file remove -F -f /vmfs/volumes/5c2.../vmkdump/filename.dumpfile

(-F to run command forcefully, especially if the file is active and -f for the dump file)

But before running last line, it's better to deactivate your dumpfiles with:
# esxcli system coredump file set -u 

At last if you want to configure your host's coredump to a new file, you can run:
# esxcli system coredump file add -d (DatastoreName) -f (FileName) -s (FileSize{>100MB})

But still it's not activated so run:
# esxcli system coredump file set -p /vmfs/volumes/5c2.../vmkdump/filename.dumpfile

You can see on below picture that second file is active and configured






Note this matter you can change all of this settings by editing hosts advanced settings or attaching Host Profile to them. For more information you can check KB2090057 and KB2077516
Don't forget to run /sbin/auto-backup.sh ;)
 

I will start a new journey soon ...