Using vmstat to troubleshoot performance issues in Linux

The vmstat (virtual memory statistics) command allows you to monitor your system’s memory usage. It shows how much virtual memory there is, and how much is free and paging activity. You can observe page-ins and page-outs as they happen. This is extremely useful for detecting shortages of physical memory, which can adversely affect system performance.

Running vmstat without any arguments

Before getting started, it is important to note that the first line of output from vmstat (and the only one given if it is run with no arguments) is a summary since system boot time. It is usually not very useful for performance issues, especially if the system has been up for a long time. However, it may still contain helpful information about events that happened in the past, but are not presently occurring.

# vmstat
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 3  0 387560  62140     44 348320    1    2    94    20   90   15  1  1 98  0  0

How to read vmstat output

The vmstat output contains more than just memory statistics. As with iostat and mpstat, vmstat accepts interval and count arguments. The following example runs 3 reports 5 seconds apart:

# vmstat 5 3
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 3  0 384120  68604   8088 327332    1    2    90    20   90   14  1  1 98  0  0
 1  0 384120  68604   8088 327364    0    0     0     0   91  179  1  0 99  0  0
 0  0 384120  68232   8088 327364    0    0     0     0  107  190  1  1 98  0  0

Output is broken up into six sections:
1. procs
2. memory
3. swap
4. io
5. system
6. cpu

procs

The first two columns give information about processes:

r Number of processes that are in a wait state. These processes are not doing anything but waiting to run.
b Number of processes that were in sleep mode and were interrupted since the last update

memory

The next four columns give information about memory:

swpd Amount of virtual memory used
free Amount of idle memory
buff Amount of memory used as buffers
cache Amount of memory used as cache

swap

The next two columns give information about swap:

si Amount of memory swapped in from disk (per second)
so Amount of memory swapped out to disk (per second)

Nonzero si and so numbers indicate that there is not enough physical memory and that the kernel is swapping memory to disk.

io

The first two columns give information about I/O (input-output):

bi Number of blocks per second received from a block device
bo Number of blocks per second sent to a block device

system

The next two columns give the following system information:

in Number of interrupts per second, including the clock
cs Number of context switches per second

cpu

The last five columns give the percentages of total CPU time:

us Percentage of CPU cycles spent on user processes
sy Percentage of CPU cycles spent on system (kernel) processes
id Percentage of CPU cycles spent idle
wa Percentage of CPU cycles spent waiting for I/O
st Percentage of CPU cycles stolen from a virtual machine

Commandline options

Additional information can be included by providing different options to the vmstat command. Some of the command-line options are listed:

-a Display active and inactive memory.
-f Display the number of forks since boot.
-t Add a time stamp to the output.
-d Report the disk statistics.

CPU Bottlenecks

There are two important areas of the vmstat output which pertain to CPU performance. The first is the r column. It is the first column in the output. It contains a value which corresponds to the number of threads which were in the run queue during the past interval in which vmstat was run. These threads were waiting for a CPU to become available in order to run. There are several schools of thought on the maximum number that’s appropriate here, but most people agree that more than 2 to 5 times the number of CPUs on the system shows a bottleneck (This estimate needs to be adjusted for multi-core CPUs).

The second place to look for CPU related data is in the right-hand columns of the output. There are three columns: us (user) time, sy (system) time, and id (idle) time. These three break down the use of the CPU time in percentages. They should add up to 100%. Ideally, a CPU will spend most of its time in the us and id categories. The sy category refers to time the CPU spends doing driver/kernel-level work. This time is taken away from user applications. If the CPUs are spending most of their time in this category, it could indicate excessive context switching due to either CPU or memory bottlenecks, issues with kernel-level locking, or other problems. A busy system will show a constant idle percentage near zero. But a busy system doesn’t necessarily mean that the system is overloaded.

Disk / IO Performance

The vmstat utility cannot tell us which disks have a bottleneck, but it can tell us if there is an IO problem overall. The important column in the output is the b (blocked) column. It refers to the number of threads that were blocked, or were waiting on IO completion in the past interval. The b column should be 0 the majority of the time. If there is a non-zero number in that column constantly, you can investigate further with iostat.

Memory Bottlenecks

Analyzing memory related issues should start with checking the amount of free memory in the vmstat output, which is in the 4th column. If free memory is low, we need to investigate more on which process is consuming more memory.

Conclusion

The vmstat command can be a useful tool for triaging performance problems. It can tell you which subsystems to examine more closely to further diagnose the problem.

Related Post