Oracle provides a collection of scripts that gather and store metrics for CPU, memory, disk, and network usage. The OSWatcher tool suite automates the gathering of statistics using tools such as top, vmstat, iostat, mpstat, netstat, and traceroute.
The general file format for the oswmpstat data is: [node_name]_mpstat_YY.MM.DD:HH24.dat
These files will contain output from the ‘mpstat’ command that is obtained and archived by OSWatcher at specified intervals. These files will only exist if ‘mpstat’ is installed on the OS and if the oswbb user has privileges to run the utility. Please keep in mind that what gets reported in mpstat may be different depending upon you platform. You should refer to your OS mpstat man pages for the most accurate up to date descriptions of these fields.
The mpstat command collects and displays performance statistics for all logical CPUs in the system.
The mpstat utility is fairly standard across UNIX platforms. Each platform will have a slightly different version of the mpstat utility. You should consult your operating system man pages for specifics. The sample provided below is for Solaris.
oswbb runs the mpstat utility at the specified interval and stores the data in the oswmpstat subdirectory under the archive directory. The data is stored in hourly archive files. Each entry in the file contains a timestamp prefixed by *** embedded in the mpstat output. Notice there are 2 entries for each timestamp. You should always ignore the first entry as this entry is always invalid.
Sample mpstat file produced by oswbb:
***Fri Jan 28 12:50:36 EST 2005 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 0 0 0 483 383 118 1 0 0 0 64 0 0 0 100 0 1268 0 0 486 382 414 42 0 0 0 2902 8 24 0 68 0 4 0 0 479 379 144 3 0 0 0 96 0 0 0 100
Field Descriptions
The various fields and their meaning is as follows.
Field | Description |
---|---|
cpu | Processor ID |
minf | Minor faults |
mif | Major Faults |
xcal | Processor cross-calls (when one CPU wakes up another by interrupting it). |
intr | Interrupts |
ithr | Interrupts as threads (except clock) |
csw | Context switches |
icsw | Involuntary context switches |
migr | Thread migrations to another processor |
smtx | Number of times a CPU failed to obtain a mutex |
srw | Number of times a CPU failed to obtain a read/write lock on the first try |
syscl | Number of system calls |
usr | Percentage of CPU cycles spent on user processes |
sys | Percentage of CPU cycles spent on system processes |
wt | Percentage of CPU cycles spent waiting on event |
idl | Percentage of unused CPU cycles or idle time when the CPU is basically doing nothing |
What to look for
– Involuntary context switches (this is probably the more relevant statistic when examining performance issues.)
– Number of times a CPU failed to obtain a mutex. Values consistently greater than 200 per CPU causes system time to increase.
– xcal is very important, show processor migration.