CentOS / RHEL 7 : How to configure kdump

kdump is an advanced crash dumping mechanism. When enabled, the system is booted from the context of another kernel. This second kernel reserves a small amount of memory, and its only purpose is to capture the core dump image in case the system crashes. Since being able to analyze the core dump helps significantly to determine the exact cause of the system failure, it is strongly recommended to have this feature enabled.

1. Install the kexec-tools package if not already installed
To use the kdump service, you must have the kexec-tools package installed. If not already installed, install the kexec-tools.

# yum install kexec-tools

2. Configuring Memory Usage in GRUB2
To configure the amount of memory that is reserved for the kdump kernel, modify /etc/default/grub and modify GRUB_CMDLINE_LINUX , set crashkernel=[size] parameter to the list of kernel options.

GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="rd.lvm.lv=centos/swap vconsole.font=latarcyrheb-sun16 rd.lvm.lv=centos/root crashkernel=128M  vconsole.keymap=us rhgb quiet"
GRUB_DISABLE_RECOVERY="true"

Run command below to regenerate grub configuration :

# grub2-mkconfig -o /boot/grub2/grub.cfg

Reboot the system to make the kernel parameter effect.

# shutdown -r now

3. Configuring Dump Location
To configure kdump, we need to edit the configuration file /etc/kdump.conf. The default option is to store the vmcore file is the /var/crash/ directory of the local file system. To change the local directory in which the core dump is to be saved and replace the value with desired directory path.
For example:

path /usr/local/cores

Optionally, you can also save the core dump directly to a raw partition.
For example:

raw /dev/sdb4

To store the dump to a remote machine using the NFS protocol, remove the hash sign (“#”) from the beginning of the #nfs my.server.com:/export/tmp line, and replace the value with a valid hostname and directory path.
For example:

nfs my.server.com:/export/tmp

4. Configuring Core Collector
To reduce the size of the vmcore dump file, kdump allows you to specify an external application to compress the data, and optionally leave out all irrelevant information. Currently, the only fully supported core collector is makedumpfile.
To enable the core collector, modify configuration file /etc/kdump.conf, remove the hash sign (“#”) from the beginning of the #core_collector makedumpfile -c –message-level 1 -d 31 line, and edit the command line options as described below.
For example:

core_collector makedumpfile -c

5. Changing Default Action
We can also specify the default action to perform when the core dump fails to generate at the desired location. If no default action is specified, “reboot” is assumed default.
For example:

default halt

6. Start kdump daemon
Check and make sure kernel command line includes the kdump config and memory was reserved for crash kernel:

# cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-3.8.13-98.2.1.el7uek.x86_64 root=/dev/mapper/rhel-root ro rd.lvm.lv=rhel/root crashkernel=128M rd.lvm.lv=rhel/swap vconsole.font=latarcyrheb-sun16 vconsole.keymap=us rhgb quiet nomodeset

Set kdump service can be started when system rebooted.

# systemctl enable kdump.service

To start the service in the current session, use the following command:

# systemctl start kdump.service

7. Testing kdump (manually trigger kdump)
To test the configuration, we can reboot the system with kdump enabled, and make sure that the service is running.

For example:

# systemctl is-active kdump
active
# service kdump status
Redirecting to /bin/systemctl status  kdump.service
kdump.service - Crash recovery kernel arming
Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled)
Active: active (exited) since 一 2015-08-31 05:12:57 GMT; 1min 6s ago
Process: 19104 ExecStop=/usr/bin/kdumpctl stop (code=exited, status=0/SUCCESS)
Process: 19116 ExecStart=/usr/bin/kdumpctl start (code=exited, status=0/SUCCESS)
Main PID: 19116 (code=exited, status=0/SUCCESS)
Aug 31 05:12:57 ol7 kdumpctl[19116]: kexec: loaded kdump kernel
Aug 31 05:12:57 ol7 kdumpctl[19116]: Starting kdump: [OK]
Aug 31 05:12:57 ol7 systemd[1]: Started Crash recovery kernel arming.

Then type the following commands at a shell prompt:

# echo 1 > /proc/sys/kernel/sysrq
# echo c > /proc/sysrq-trigger

This will force the Linux kernel to crash, and the address-YYYY-MM-DD-HH:MM:SS/vmcore file will be copied to the location you have selected in the configuration (that is, to /var/crash/ by default)

Related Post