Sunday, November 11, 2018

What is the VMKernel Core Dump - Part I

Generally coredump will be generated whenever the OS kernel sends certain signals to specified process, specially when the process send an access request to the out of address memory space. Often system will be crashed in this situation and generated errors give us related information about hardware faults or application bugs.
Sometimes you may encountered a ESXi host has been crashed, it will try to write diagnostics information on a file that has been name "VMkernel Core Dump". This file contains information about halt experience of host named purple screen state and has a high degree of importance, because in this situation, you don't have access to your system data and logs. So it's necessary to gather and analyze coredump files from all of ESXi host into one or more repositories.
There are two mechanisms for collection of coredump files: DiskDump to saving on specified permitted disk and NetDump to send coredump information by the network. If ESXi can't save coredump information on it's disk, there may be an issue with storage devices or it's connection to the host (Failed Array Controller, RAID Problem, broken physical path to storage, FC/SCSI connectivity problem, SAN switch failure and so on). So you should configure at least one alternative target to save coredump information.
But before that let's check about what is the netdump exactly?
netdump is a protocol for sending coredump information from a failed ESXi to the dump collector service that has these characteristics:
1. Listen on UDP port 6500.
2. Support only IPv4
3. Clear-text network traffic
4. No Authentication /Authorization

To retrieve current configuration for coredump saving location:
# esxcli system coredump partition get
# esxcli system coredump network get  (it can be used by check option too)

If the service is not enabled:
# esxcli system coredump network set --enable true
# esxcli system coredump partition set --enable true --smart

To set new configuration for coredump:
# esxcli system coredump partition set --partition="mpx.vmhba2:C0:T0:L0"
# esxcli system coredump network set --interface-name vmk0 --server-ipv4 10.10.10.10 --server-port 6500

To find-out which storage devices we have on the host:
# esxcli storage core path list

For the older version of VMware ESXi:
# esxcfg-dumppart --list
# esxcfg-dumppart --get-active
# esxcfg-dumppart --smart-activate
  

Network Dump Collector is a built-in service within vcenter server that provides a way of host coredump information gathering.But remember that NetDump does not work if aggregation protocols
such as LACP or Etherchannel has been configured for the vmkernel traffic.VMware recommends for segregation of VMkernel networking for NetDump by VLAN or physical LAN separation to prevent traffic interception. (In ESXi 5.0 VLAN tagging configured at the vSwitch level are ignored during network core dump transmission.)
Also the name structure and format of recieved coredump file is something like this: yyyy-mm-dd-hh_mm-N.zdump .
Maximum default size of zdump file is 2GB and older dump files automatically will be deleted. (The Dump Collector service has a non-configurable 60-second timeout and if no information is received in this period, the partial file will be deleted.)
Thanks to VMware for more information about it:  
Just after do your job by CLI remember to do /sbin/auto-backup.sh for saving configuration changes on your hosts:)

No comments:

Post a Comment

I will start a new journey soon ...