An I/O performance bottleneck can be due to a disk or even due to a HBA or a HBA driver. The command iostat (Input output statistics) help us to get started with analyzing a disk I/O bottleneck issue.
A standard iostat output would look like :
# iostat -xn 1 5 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 293.1 0.0 37510.5 0.0 0.0 31.7 0.0 108.3 1 100 c0t0d0 294.0 0.0 37632.9 0.0 0.0 31.9 0.0 108.6 0 100 c0t0d0 293.0 0.0 37504.4 0.0 0.0 31.9 0.0 1032.0 0 100 c0t0d0 294.0 0.0 37631.3 0.0 0.0 31.8 0.0 108.1 1 100 c0t0d0 294.0 0.0 37628.1 0.0 0.0 31.9 0.0 108.6 1 100 c0t0d0
The various options that can be used with iostat are :
-x --> Extended disk statistics. This prints a line per device and provides the breakdown that includes r/s, w/s, kr/s, kw/s, wait, actv, svc_t, %w, and %b. -t --> Print terminal I/O statistics. -n --> Use logical disk names rather than instance names. -c --> Print the standard system time percentages: us, sy, wt, id. -z --> Don't print lines having all zeros.
The meaning of each column value in the iostat output is :
r/s reads per second w/s writes per second kr/s kilobytes read per second kw/s kilobytes written per second wait average number of transactions waiting for service (queue length) actv average number of transactions actively being serviced (removed from the queue but not yet completed) svc_t average response time of transactions, in milliseconds %w percent of time there are transactions waiting for service (queue non-empty) %b percent of time the disk is busy (transactions in progress) wsvc_t average service time in wait queue, in milliseconds asvc_t average service time of active transactions, in milliseconds wt the I/O wait time is no longer calculated as a percentage of CPU time, and this statistic will always return zero.
The first line in the iostat output is the summary since boot. This line will give you a rough idea of average server I/O on the server. This could be very useful to compare the server I/O performance at the time of performance bottleneck. Now if you see the asvc_t column you would see a constant high value. Generally a value more than 30 to 40 ms is considered to be high. But you can safely ignore a spike of 200 ms in the asvc_t column. Here the interval was 1 sec with a count of 5.
Check for Disk Failures
Disk failure could also be a major, infact the only reason in many disk I/O bottleneck issues. To check the disk failure :
# iostat -xne extended device statistics ---- errors --- r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 fd0 1.8 0.5 34.7 2.6 0.0 0.0 0.0 19.3 0 2 0 0 0 0 c1t0d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 c0t0d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 geeklab01:vold(pid555)
Check the columns s/w (soft errors), h/w (hard errors), trn (transport errors) and tot (total errors). The meanings of various errors is :
Hard error : Re-read fails several times for CRC check
Transport error : Errors reported by I/O bus
Total errors : Soft error + Hard error + Transport errors
A large number of any of these errors (especially increasing hard errors) may point that the disk is either failed already or is on its way to fail. Another command to check the errors on the disk is :
# iostat -E sd0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: VMware, Product: VMware Virtual S Revision: 1.0 Serial No: Size: 10.74GB [10737418240 bytes] Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 9 Predictive Failure Analysis: 0 sd1 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: VMware, Product: VMware Virtual S Revision: 1.0 Serial No: Size: 24.70GB [24696061952 bytes] Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 6 Predictive Failure Analysis: 0
System activity reporter (sar) to check disk I/O
Sar (system activity reporter) is another command to check the disk I/O. Before start using sar enable the sar service using svcadm if not already enabled :
# svcadm enable sar # svcs sar STATE STIME FMRI online 4:15:08 svc:/system/sar:default
To check the disk I/O statistics using sar, we can run it by giving an interval of 2 sec and 10 counts.
# sar -d 2 10 SunOS geeklab 5.11 11.1 i86pc 12/13/2013 10:11:46 device %busy avque r+w/s blks/s avwait avserv 10:11:48 ata1 0 0.0 0 0 0.0 0.0 iscsi0 0 0.0 0 0 0.0 0.0 mpt0 0 0.0 0 0 0.0 0.0 scsi_vhc 0 0.0 0 0 0.0 0.0 sd0 0 0.0 0 0 0.0 0.0 sd0,a 0 0.0 0 0 0.0 0.0 sd0,b 0 0.0 0 0 0.0 0.0 sd0,h 0 0.0 0 0 0.0 0.0 sd0,i 0 0.0 0 0 0.0 0.0 sd0,q 0 0.0 0 0 0.0 0.0 sd0,r 0 0.0 0 0 0.0 0.0
The sar -d command reports almost the same data what iostat would report except for the reads + writes per second ( r+w/s ) and no. of blocks (512 bytes) per second (blks/s). The other column parameters which are important are average wait queue length (avque), average wait queue time (avwait), average service time (avserv) and % busy (%busy).
Now you can also use top command to get the % I/O wait time. Solaris 11, by default has the top package installed. For solaris 10, you will have to install the third party top package.
# top last pid: 7448; load avg: 0.01, 0.13, 0.11; up 0+13:54:41 60 processes: 59 sleeping, 1 on cpu CPU states: 99.5% idle, 0.0% user, 0.5% kernel, 0.0% iowait, 0.0% swap Kernel: 187 ctxsw, 1 trap, 516 intr, 421 syscall, 1 flt Memory: 2048M phys mem, 205M free mem, 1024M total swap, 1024M free swap