The kdump mechanism is a Linux kernel feature, which allows you to create dumps if your kernel crashes. It produces an exact copy of the memory, which can be analyzed for the root cause of the crash. This is a script which configures kdump (kernel dump). Kdump provides a memory dump into a file named vmcore when the kernel has a critical issue. Vmcore is often required to investigate the issue. The crash dump is captured from the context of a freshly-booted kernel, not from the context of the crashed kernel. Kdump uses kexec to boot into a second kernel whenever the system crashes. Kexec is a fast-boot mechanism which allows rebooting a new Linux kernel from the context of a running kernel without going through any firmware or warm start.
This post explains the steps to troubleshoot common kdump issues.
Verifying the kdump setup
1. Check if the kexec-tools package is installed in the system.
# rpm -qa | grep kexec
2. Check the kernel commandline in the current running kernel for the parameter ‘crashkernel’:
# cat /proc/cmdline
3. Check if the memory is reserved for the crashkernel when the kernel started:
# dmesg | grep Reserving
4. Check the path of the dump:
# grep -v ^# /etc/kdump.conf
5. Check the storage space available on the filesystem specified in the path parameter in the previous step:
# df -h
6. Check the status of the kdump service:
# service kdump status ### In CentOS/RHEL 6 # systemctl status kdump ### In CentOS/RHEL 7
When the kdump service is not operational
1. Verify the kdump setup following the above section.
2. Start the kdump service
# service kdump status ### In CentOS/RHEL 6 # systemctl status kdump ### In CentOS/RHEL 7
3. Check the error from the terminal.
4. More information for the service kdump startup failure could be found in /var/log/messages.
When the kdump setup is fine and the service kdump status is operational but there is no vmcore generated on triggering a crash
1. Edit the file /etc/kdump.conf and add the below line to obtain a shell when the vmcore generation fails:
default shell
2. In the shell, check the available storage, check if the vmcore destination filesystem is mounted and then try to copy the vmcore manually and find if it fails.
# cp /proc/vmcore [destination]
When a shell is not obtained and the crashkernel is stuck while booting up
1. Check the messages on the console and look for startup messages of the crashkernel. Look for where it is stuck.
2. If you see page allocation error messages, then the chances are high that the crashkernel reserved is not enough and need to increase the value of ‘crashkernel’ kernel parameter.