• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer navigation

The Geek Diary

  • OS
    • Linux
    • CentOS/RHEL
    • Solaris
    • Oracle Linux
    • VCS
  • Interview Questions
  • Database
    • oracle
    • oracle 12c
    • ASM
    • mysql
    • MariaDB
  • DevOps
    • Docker
    • Shell Scripting
  • Big Data
    • Hadoop
    • Cloudera
    • Hortonworks HDP

How to Use the ‘truss’ Command for Program and Error Analysis in Solaris

by admin

Truss is a debugging tool which provides insight into how a program operates by printing out system service calls along with their arguments and return statuses, faults, and signals. As such it is extremely useful for debugging errors and for figuring out how programs work.

Truss is easy to use: just prepend the word “truss” to any command (including arguments) in its simplest form, and output abounds. By default, truss output goes to stderr (error output), not to stdout (normal output). This makes it easy to separate truss output from program output with the shell I/O redirection tokens. Truss output can be saved to a file with its -o switch as well.

Truss with no switches will trace the output of a process which does not fork. The -f switch includes tracing of all forked children as well as the parent specified as the command.

Truss with the -p switch can be used to begin tracing of a process which has been running, without having to start it over. This is useful when a suspect program runs for a while and gives an indication or condition that a crash or problem is imminent (output of some kind, for example). For example, suppose a program runs for several minutes, and then gives a certain output shortly before it terminates abnormally. Run the program, then begin trussing it with the -p option (the command is “truss -p”) after the certain output but before the program terminates. Truss will provide a clue as to what happens to cause the program to terminate abnormally, without having to accumulate output during the first several minutes of operation. Output of truss can be restricted to a given set of system service calls using the -t switch. This, too, can help cut down on the amount of output to wade through in finding an answer, assuming the user knows what to look for.

How truss can be used to debug program execution

Generally, when a program terminates abnormally but does not crash, it is because a system service call returned an error. This may be due to a programming error, or may be due to circumstance. Truss can help determine the cause of the problem. Truss can also help debug where a program crashed by providing a trace of the system service calls up until the crash. This output, plus source code, can help determine where the program crashed.

Debugging a program with a circumstantial error

For an example of a program with a circumstantial error, suppose a program exits with an error condition of “No such file or directory.”. Truss can show which file the program was looking for by listing all of the open() system service calls made by the program, including those which failed. For example, suppose one executes the command:

# cat junk

where the file “junk” does not exist. A portion of the truss of the “cat” program, near the end, shows:

fstat(1, 0xEFFFF6C8) = 0
open("junk", O_RDONLY) Err#2 ENOENT
sigfillset(0xEF744060) = 0
cat: cannot open write(2, " c a t : c a n n o t ".., 17) = 17
write(2, " j u n k", 4) = 4
write(2, "\n", 1) = 1
lseek(0, 0, SEEK_CUR) = 56338
_exit(2)

In order to display the file “junk” on the screen, it has to open it first. The “open” system call listed above, shows “junk” as the first argument, and O_RDONLY as the second, indicating that the invocation of open() was to open the file “junk” in read-only mode. Note that the arguments correspond to those documented for the system call in section 2 of the man pages.

The return status of that open() call is ENOENT, or error number 2, which corresponds to “No such file or directory,” which means that it could not open the file.

Note: Please consult the file /usr/include/sys/errno.h for a list of error return codes and their meaning.

If it could have opened the file, it would have returned a valid file descriptor (indicated by an “=” and a number corresponding to the file descriptor). Noting the sequence of events, the failing open() call is the beginning of the end; we see the program call write() twice, and can see from the arguments to the write() calls that the program is printing an error message. This corresponds to the message users see when cat cannot open a file:

cat: cannot open junk

Two lines down from there is the final exit(). In this example, “cat” displays in an error message the file that it could not open, and this is correlated to the arguments of the open() call in the truss output. Other programs may not provide such an error message containing the filename, but a search for open() calls near the end of their truss output can show which file they could not find.

Debugging a program with a programming error

Programming errors may be found by first searching truss output for error return statuses, and then checking the arguments to the system service calls made. System service calls bomb when their arguments are incorrect or unexpected. Most system calls bomb gracefully, that is, without crashing the program. When this happens, though, they will leave the variables they affect in unexpected states which can cause the program to crash.

Trying, for example, to malloc() an extremely large chunk of memory may force malloc to return a NULL pointer instead of a pointer to memory. If that NULL pointer is accessed later on in the program, the program will crash. Looking through the truss output, seeing a NULL returned from malloc() and seeing an abnormally large argument passed to it in this case, indicates what happened.

Getting insight into how factory Solaris programs work

Ever wonder how ps works? Trussing it will tell how. Trussing programs which do certain things can help one discover new system calls which accomplish those tasks. Truss can therefore be a very valuable educational tool. Taking the ps example, we see the following:

execve("/usr/bin/ps", 0xEFFFF910, 0xEFFFF918) argc = 1
...
write(1, " P I D T T Y ".., 26) = 26 (1)
open("/proc", O_RDONLY|O_NDELAY) = 3 (2)
fcntl(3, F_SETFD, 0x00000001) = 0
fstat(3, 0xEFFFF760) = 0
getdents(3, 0x00026928, 1048) = 972 (3)
open("/proc/00000", O_RDONLY) = 4 (4)
ioctl(4, PIOCPSINFO, 0x00024C58) = 0 (5)
close(4) = 0
open("/proc/00001", O_RDONLY) = 4
ioctl(4, PIOCPSINFO, 0x00024C58) = 0
close(4) = 0
...
open("/proc/01316", O_RDONLY) = 4
ioctl(4, PIOCPSINFO, 0x00024C58) = 0
close(4) = 0
1316 pts/1 0:00 ksh
write(1, " 1 3 1 6 p t s / 1".., 25) = 25 (6)
open("/proc/01317", O_RDONLY) = 4
ioctl(4, PIOCPSINFO, 0x00024C58) = 0
close(4) = 0
1317 pts/1 0:00 truss
write(1, " 1 3 1 7 p t s / 1".., 27) = 27
open("/proc/00905", O_RDONLY) = 4
ioctl(4, PIOCPSINFO, 0x00024C58) = 0
close(4) = 0
...
_exit(0)

Apparently from this example, ps prints the header (1), then opens the “/proc” directory (2), calls getdents() with its first argument as the “/proc” file descriptor (3), then starts opening all of the files in the “/proc” directory (4) and does an ioctl on them to get information (5). When the information meets some condition (which cannot be determined from this output but can be determined by knowing what the ps command does), the program writes something (6).

What do all these vague things mean?

Look at the man pages for getdents(), search the man pages for “/proc” and”PIOCPSINFO”, and find out…

Truss does cause an environmental change to program context

Truss changes the timing of a program, so it could change the nature of the problem being sought. Truss works by stopping execution of a program at system call entry points, examining arguments, resuming execution until the system call returns, suspending execution again while the return status is read, then resumes execution again until the next event (fault, signal or system service call) happens. This starting and stopping of execution changes the timing of a program. For single-threaded, single process programs this isnot a problem. If, however, a problem is due to a race condition or other timing-related issue, truss may not be a good option.

Filed Under: Solaris

Some more articles you might also be interested in …

  1. How to save sar reports longer than 7 days in Solaris
  2. Solaris Interview Questions and Answers
  3. Solaris beginners guide to NFS
  4. Complete Hardware Reference : SPARC T7-1 / T7-2 / T7-4
  5. How to create an OBP boot device alias in Solaris [SPARC]
  6. Understanding “Holding a ZFS Snapshot” Feature
  7. How Passwordless SSH works in Linux / UNIX
  8. How to install and configure sudo in solaris 10 (SPARC and x86/x64)
  9. How to update Solaris 11 system Using IPS
  10. Solaris : How to start syslogd in debug mode

You May Also Like

Primary Sidebar

Recent Posts

  • nixos-rebuild Command Examples in Linux
  • nixos-option: Command Examples in Linux
  • nixos-container : Command Examples in Linux
  • nitrogen Command Examples in Linux

© 2023 · The Geek Diary

  • Archives
  • Contact Us
  • Copyright