The Problem
Occasionally INFO messages are logged onto /var/log/messages such as:
Apr 19 03:33:22 host kernel: INFO: task kjournald:2046 blocked for more than 120 seconds. Apr 19 03:33:22 host kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Apr 19 03:33:22 host kernel: kjournald D ffff810001004420 0 2046 49 2476 2044 (L-TLB) Apr 19 03:33:22 host kernel: ffff81013ce9fdd0 0000000000000046 0000000000000100 0000000000000000 Apr 19 03:33:22 host kernel: 0000000000000000 000000000000000a ffff81013c7d0820 ffffffff80309b60 Apr 19 03:33:22 host kernel: 001b51e0f255644a 0000000000000a88 ffff81013c7d0a08 0000000000000000 Apr 19 03:33:22 host kernel: Call Trace: Apr 19 03:33:22 host kernel: [] :jbd:journal_commit_transaction+0x16d/0x1066 Apr 19 03:33:22 host kernel: [] autoremove_wake_function+0x0/0x2e Apr 19 03:33:22 host kernel: [] try_to_del_timer_sync+0x7f/0x88 Apr 19 03:33:22 host kernel: [] :jbd:kjournald+0xc1/0x213 Apr 19 03:33:22 host kernel: [] autoremove_wake_function+0x0/0x2e Apr 19 03:33:22 host kernel: [] keventd_create_kthread+0x0/0xc4 Apr 19 03:33:23 host kernel: [] :jbd:kjournald+0x0/0x213 Apr 19 03:33:24 host kernel: [] keventd_create_kthread+0x0/0xc4 Apr 19 03:33:24 host kernel: [] kthread+0xfe/0x132 Apr 19 03:33:24 host kernel: [] child_rip+0xa/0x11 Apr 19 03:33:24 host kernel: [] keventd_create_kthread+0x0/0xc4 Apr 19 03:33:24 host kernel: [] kthread+0x0/0x132 Apr 19 03:33:24 host kernel: [] child_rip+0x0/0x11
What are these messages and what is their impact?
The Solution
This message intends the process is in “D” state for 120 seconds. In this example, kjournald was waiting for the finishing of journal_commit_transaction() for over 120 seconds, most likely due to heavy I/O by some processes, since journal commit is an atomic function.
Generally speaking, this could be ignored, if the messages are not logged in /var/log/messages so frequently. This is just information that intends the process was stuck by some reasons, which could occur due to heavy I/O, storage/network disconnection/delay or so.
Thus what we need to do at first after finding this message is checking whether this is logged frequently, and next if there was a network/storage trouble at the time.
Conclusion
These messages typically mean that the system is experiencing disk or memory congestion and processes are being starved of available resources. These messages serve as a warning that something may not be operating optimally. They do not necessarily indicate a serious problem and any blocked processes should eventually proceed when the system recovers. You can try capturing the output of below commands during the issue if possible:
# top -n 5 -b > /tmp/top.out # vmstat 1 50 > /tmp/vm.out # iostat -x 2 10 > /tmp/io.out # ps aux > /tmp/ps.out # ps auxH > /tmp/psh.out # sar -A > /tmp/sar.out # free > /tmp/free.out # lsof > /tmp/lsof.out
These outputs can be useful while diagnosing the system hung issue or the reason behind the “task blocked for more than 120 seconds” INFO message.