Truss is a debugging tool which provides insight into how a program operates by printing out system service calls along with their arguments and return statuses, faults, and signals. As such it is extremely useful for debugging errors and for figuring out how programs work.
Truss is easy to use: just prepend the word “truss” to any command (including arguments) in its simplest form, and output abounds. By default, truss output goes to stderr (error output), not to stdout (normal output). This makes it easy to separate truss output from program output with the shell I/O redirection tokens. Truss output can be saved to a file with its -o switch as well.
Truss with no switches will trace the output of a process which does not fork. The -f switch includes tracing of all forked children as well as the parent specified as the command.
Truss with the -p switch can be used to begin tracing of a process which has been running, without having to start it over. This is useful when a suspect program runs for a while and gives an indication or condition that a crash or problem is imminent (output of some kind, for example). For example, suppose a program runs for several minutes, and then gives a certain output shortly before it terminates abnormally. Run the program, then begin trussing it with the -p option (the command is “truss -p”) after the certain output but before the program terminates. Truss will provide a clue as to what happens to cause the program to terminate abnormally, without having to accumulate output during the first several minutes of operation. Output of truss can be restricted to a given set of system service calls using the -t switch. This, too, can help cut down on the amount of output to wade through in finding an answer, assuming the user knows what to look for.
How truss can be used to debug program execution
Generally, when a program terminates abnormally but does not crash, it is because a system service call returned an error. This may be due to a programming error, or may be due to circumstance. Truss can help determine the cause of the problem. Truss can also help debug where a program crashed by providing a trace of the system service calls up until the crash. This output, plus source code, can help determine where the program crashed.
Debugging a program with a circumstantial error
For an example of a program with a circumstantial error, suppose a program exits with an error condition of “No such file or directory.”. Truss can show which file the program was looking for by listing all of the open() system service calls made by the program, including those which failed. For example, suppose one executes the command:
# cat junk
where the file “junk” does not exist. A portion of the truss of the “cat” program, near the end, shows:
fstat(1, 0xEFFFF6C8) = 0 open("junk", O_RDONLY) Err#2 ENOENT sigfillset(0xEF744060) = 0 cat: cannot open write(2, " c a t : c a n n o t ".., 17) = 17 write(2, " j u n k", 4) = 4 write(2, "\n", 1) = 1 lseek(0, 0, SEEK_CUR) = 56338 _exit(2)
In order to display the file “junk” on the screen, it has to open it first. The “open” system call listed above, shows “junk” as the first argument, and O_RDONLY as the second, indicating that the invocation of open() was to open the file “junk” in read-only mode. Note that the arguments correspond to those documented for the system call in section 2 of the man pages.
The return status of that open() call is ENOENT, or error number 2, which corresponds to “No such file or directory,” which means that it could not open the file.
If it could have opened the file, it would have returned a valid file descriptor (indicated by an “=” and a number corresponding to the file descriptor). Noting the sequence of events, the failing open() call is the beginning of the end; we see the program call write() twice, and can see from the arguments to the write() calls that the program is printing an error message. This corresponds to the message users see when cat cannot open a file:
cat: cannot open junk
Two lines down from there is the final exit(). In this example, “cat” displays in an error message the file that it could not open, and this is correlated to the arguments of the open() call in the truss output. Other programs may not provide such an error message containing the filename, but a search for open() calls near the end of their truss output can show which file they could not find.
Debugging a program with a programming error
Programming errors may be found by first searching truss output for error return statuses, and then checking the arguments to the system service calls made. System service calls bomb when their arguments are incorrect or unexpected. Most system calls bomb gracefully, that is, without crashing the program. When this happens, though, they will leave the variables they affect in unexpected states which can cause the program to crash.
Trying, for example, to malloc() an extremely large chunk of memory may force malloc to return a NULL pointer instead of a pointer to memory. If that NULL pointer is accessed later on in the program, the program will crash. Looking through the truss output, seeing a NULL returned from malloc() and seeing an abnormally large argument passed to it in this case, indicates what happened.
Getting insight into how factory Solaris programs work
Ever wonder how ps works? Trussing it will tell how. Trussing programs which do certain things can help one discover new system calls which accomplish those tasks. Truss can therefore be a very valuable educational tool. Taking the ps example, we see the following:
execve("/usr/bin/ps", 0xEFFFF910, 0xEFFFF918) argc = 1 ... write(1, " P I D T T Y ".., 26) = 26 (1) open("/proc", O_RDONLY|O_NDELAY) = 3 (2) fcntl(3, F_SETFD, 0x00000001) = 0 fstat(3, 0xEFFFF760) = 0 getdents(3, 0x00026928, 1048) = 972 (3) open("/proc/00000", O_RDONLY) = 4 (4) ioctl(4, PIOCPSINFO, 0x00024C58) = 0 (5) close(4) = 0 open("/proc/00001", O_RDONLY) = 4 ioctl(4, PIOCPSINFO, 0x00024C58) = 0 close(4) = 0 ... open("/proc/01316", O_RDONLY) = 4 ioctl(4, PIOCPSINFO, 0x00024C58) = 0 close(4) = 0 1316 pts/1 0:00 ksh write(1, " 1 3 1 6 p t s / 1".., 25) = 25 (6) open("/proc/01317", O_RDONLY) = 4 ioctl(4, PIOCPSINFO, 0x00024C58) = 0 close(4) = 0 1317 pts/1 0:00 truss write(1, " 1 3 1 7 p t s / 1".., 27) = 27 open("/proc/00905", O_RDONLY) = 4 ioctl(4, PIOCPSINFO, 0x00024C58) = 0 close(4) = 0 ... _exit(0)
Apparently from this example, ps prints the header (1), then opens the “/proc” directory (2), calls getdents() with its first argument as the “/proc” file descriptor (3), then starts opening all of the files in the “/proc” directory (4) and does an ioctl on them to get information (5). When the information meets some condition (which cannot be determined from this output but can be determined by knowing what the ps command does), the program writes something (6).
What do all these vague things mean?
Look at the man pages for getdents(), search the man pages for “/proc” and”PIOCPSINFO”, and find out…
Truss does cause an environmental change to program context
Truss changes the timing of a program, so it could change the nature of the problem being sought. Truss works by stopping execution of a program at system call entry points, examining arguments, resuming execution until the system call returns, suspending execution again while the return status is read, then resumes execution again until the next event (fault, signal or system service call) happens. This starting and stopping of execution changes the timing of a program. For single-threaded, single process programs this isnot a problem. If, however, a problem is due to a race condition or other timing-related issue, truss may not be a good option.