Wednesday, February 12, 2020

vSphere Storage Troubleshooting - Part 1: HBA & Connectivity

 Storage infrastructure is one of the main part of IT environment, so good design and principled configuration will cause better and easier troubleshooting of each possible issue related to this area. One of the primary components of storage infrastructure is HBA, connector of servers to the storage area. Then we can consider some of the greatly possible storage-related problems back to the Host Bus Adapter installed in the ESXi host and also its physical connections to the SAN storage or SAN switche. So let's begin how to investigate step by step storage troubleshooting inside the VMware infrastructure.

 First situation may be occured for local array of disks that are not detected as a local datastore. You can check status of internal disk controller (for exmaple in a HP Proliant server) via running the following command:
  
cat /proc/driver/hpsa/hpsa0

The result will be shown like this: 
(please be careful when I used the hba word and when the capital form)


 But if the considered datastore is not local and is a shared volume of existing SAN storage in our infrastructure, then we must check the HBA status:

./usr/lib/vmware/vmkmgmt_keyval -a | less
 The last mentioned command has been used in the ESXi version 5.5 and higher, so for older versions you must check the following folder for both of HBA most popular vendors:
  •    Qlogic:   /proc/scsi/qla2xxxx
  •    Emulex: /proc/scsi/lpfc




 

 Also if you don't find the related vmhba adapter in result of the following command, it means the ESXi host did not detect your HBA yet
  • vmkchdev -l | grep hba
  • esxcfg-info | grep HBA

 
 Also you can run the swfw.sh command and combine it with grep to find related information of the connected HBA devices to the ESXi, include: device model, driver, firmware and also WWNN for FC-HBA (InstanceID value)

 ./usr/lib/vmware/vm-support/bin/swfw.sh | grep HBA

 
 In another situation imagine you have deploy a new SAN storage inside a vSphere cluster, but you are not ensure that HBA could detect the provided LUN or not. As the first step run the below ESXCLI:
esxcli storage core device list

 For the shown result, please check important fields, like these ones: Display Name, Device Type, Devfs path, Vendor & Model. 

 And next you can run the following command, then it will give you back more information about the HBA adapters and state of each one of them:

esxcli storage core adapter list

 VMware Definition Tip1: NAA (Network Addressing Authority) or EUI (Extended Unique Identifier)  is the preferred method of identifying LUNs and the number that follows is generated by the storage device itself. Since the NAA or EUI is unique to the LUN, if the LUN is presented the same way across all ESXi hosts, the NAA or EUI identifier remains the same.

 Also this command will show you list of available and detected partition by the ESXi host: 
esxcli storage core device partition list


 VMware Definition Tip2: You can see two types of fb & fc. fb is the system ID for VMFS and fc is the vmkernel core dump partition (vmkcore)






 There is more useful storage command, like the oldman CLI esxcfg-scsidevs. (-a show HBA devices, -m for mapped VMFS volumes and -l list all known logical devices)


 So finally as the conclusion of first part of troubleshooting the problems related to the storage side of vSphere environment, we understood that we need to check the status of HBA, how they are performing and connected disk devices, LUNs & volumes via each one of them. I hope it can be helpful for you all ;)

No comments:

Post a Comment

I will start a new journey soon ...