1. Why ‘du’ and ‘df’ reports different values?
– df describes disk usage.
– du adds up the size all the files on a filesystem.
They can report wildly different figures.
One reason:
Processes write to files. files can be deleted while processes are writing to files. So a file may be gone, but its inode may not be freed up. For example:
# cat > bogus testing testing ^Z Stopped (user)
# ls -i bogus 2921 bogus # fuser bogus bogus: 14947o
# rm bogus # ls -i bogus bogus: No such file or directory
# /usr/proc/bin/pfiles 14947 14947: cat Current rlimit: 64 file descriptors 0: S_IFCHR mode:0620 dev:32,120 ino:211552 uid:0 gid:7 rdev:24,17 O_RDWR 1: S_IFREG mode:0644 dev:32,5 ino:2921 uid:0 gid:1 size:16 O_WRONLY|O_LARGEFILE 2: S_IFCHR mode:0620 dev:32,120 ino:211552 uid:0 gid:7 rdev:24,17 O_RDWR #
This example shows how a file can be deleted but still take up room on a hard drive: A file (bogus: inode 2921) is created by a process (pid: 14947). The files is deleted and it is demonstrated that it no longer resides on the file system. But pfiles shows the inode is still open and is taking up 16 bytes of disks space.
2. How deleting a file may not free up disk space?
This is the same reason a disk may be full and removing a file does not clear space. If you do a fuser on a file before removing it you will determine if removing it will do any good. For example:
# ls -l /var/adm/messages -rw-r--r-- 1 root other 218611 Jul 29 02:18 messages
# fuser /var/adm/messages messages: 164o # ps -ef | grep 164 root 164 1 0 22:47:49 0:03 /usr/sbin/syslogd
# kill 164 # rm /var/adm/messages # touch /var/adm/messages # /usr/sbin/syslogd # ls -l /var/adm/messages -rw-r--r-- 1 root other 7592 Jul 29 11:46 messages #
This example demonstrates a way to remove the /var/adm/messages and actually clear disk space. Notice how the messages file jumps up to 7k after syslog is restarted. syslog will dump the contents of the dmesg buffer into the /var/adm/messages when it is started.
3. How can I be out of inodes?
Not all inodes captured by a processes are normal files. Many are character devices which take no disk space but will remove a inode from possible use. These character devices may take up memory which in turn reduce the amount of memory available to your tmpfs file systems (ie, /tmp). For example:
# /usr/proc/bin/pfiles 10234 10234: xterm Current rlimit: 64 file descriptors 0: S_IFCHR mode:0620 dev:32,120 ino:211538 uid:52475 gid:7 rdev:24,3 O_RDWR|O_LARGEFILE 1: S_IFCHR mode:0620 dev:32,120 ino:210782 uid:52475 gid:7 rdev:0,0 O_WRONLY|O_LARGEFILE 2: S_IFCHR mode:0620 dev:32,120 ino:210782 uid:52475 gid:7 rdev:0,0 O_WRONLY|O_LARGEFILE 3: S_IFCHR mode:0666 dev:32,120 ino:210791 uid:0 gid:3 rdev:13,12 O_RDWR 4: S_IFIFO mode:0666 dev:171,0 ino:1626250456 uid:0 gid:0 size:0 O_RDWR|O_NONBLOCK FD_CLOEXEC 5: S_IFCHR mode:0000 dev:32,120 ino:65232 uid:0 gid:0 rdev:23,13 O_RDWR|O_NDELAY
# ps -ef | grep 10234 dsweet 10234 10220 0 23:16:58 0:01 xterm # ls -lL /dev/dsk/c0t0d0s0 brw-r----- 1 root sys 32,120 Jun 24 19:51 /dev/dsk/c0t0d0s0 # df / / (/dev/dsk/c0t0d0s0 ): 1097712 blocks 456994 files
# find / -mount -inum 211538 /devices/pseudo/pts@0:3 # find / -mount -inum 210782 /devices/pseudo/cn@0:console # find / -mount -inum 65232 #
This example shows that dsweet’s xterm has captured 3 inodes* on his root partition but none were normal files. None of them have size so they are special character files (a raw device). The last inode (65232) was not found on the file system. Therefore inode 65232 can not be freed unless its process lets go of it or the process is killed. Inode 210782 is the console and it was captured twice.
4. A diagnostic procedure
This is all well and good, but how does a user figure out where all his inodes have gone. The procedure:
1. Determine the major and minor numbers for the troublesome device:
# df / / (/dev/dsk/c0t0d0s0 ): 1097710 blocks 456994 files # ls -lL /dev/dsk/c0t0d0s0 brw-r----- 1 root sys 32,120 Jun 24 19:51 /dev/dsk/c0t0d0s0
2. Determine which inodes on the device are opened by processes:
# ls /proc 0 10220 10235 10251 10282 10386 136 160 227 298 306 1 10222 10236 10253 10300 10429 141 178 235 3 3060 10120 10227 10237 10255 10310 10446 143 184 236 301 308 10130 10228 10238 10266 10314 10484 14778 194 267 302 99 10134 10229 10239 10268 10338 10485 14934 2 270 303 10144 10230 10240 10270 10352 10567 14935 2115 275 304 10167 10231 10241 10272 10353 106 14937 2117 285 305 10169 10232 10242 10276 10364 10605 14947 212 288 3055 10170 10233 10243 10278 10366 108 14969 215 289 3057 10219 10234 10245 10280 10385 119 15008 225 295 3058 # /usr/proc/bin/pfiles 0 | grep "dev:32,120" # /usr/proc/bin/pfiles 10220 | grep "dev:32,120" 0: S_IFCHR mode:0620 dev:32,120 ino:211538 uid:52475 gid:7 rdev:24,3 1: S_IFCHR mode:0620 dev:32,120 ino:210782 uid:52475 gid:7 rdev:0,0 2: S_IFCHR mode:0620 dev:32,120 ino:210782 uid:52475 gid:7 rdev:0,0 3: S_IFCHR mode:0666 dev:32,120 ino:210791 uid:0 gid:3 rdev:13,12 8: S_IFCHR mode:0000 dev:32,120 ino:62920 uid:0 gid:0 rdev:42,126 9: S_IFCHR mode:0000 dev:32,120 ino:55000 uid:0 gid:0 rdev:41,171 10: S_IFCHR mode:0000 dev:32,120 ino:64688 uid:0 gid:0 rdev:42,130 11: S_IFREG mode:0644 dev:32,120 ino:313393 uid:0 gid:3 size:316 #
Just by checking the first 2 processes I found 7 unique processes that are open on the file system. You have to repeat this step for all the processes found in /proc to get all the open inodes on the file system.
Determine if the inodes exists on the file system:
# find / -mount -inum 211538 /devices/pseudo/pts@0:3 # find / -mount -inum 210782 /devices/pseudo/cn@0:console # find / -mount -inum 210791 /devices/pseudo/mm@0:zero # find / -mount -inum 62920 /usr/lib/lpshut # find / -mount -inum 55000 # find / -mount -inum 64688 # find / -mount -inum 313393 /etc/group #
Therefore only 2 of the 7 inodes captured by process 10220 are rogue. Rouge is my way of describing an open inode that doesn’t have coresponding file in the file system. You would have to repeat this step for all the inodes found above to see which open inodes are rogue. Once done with the last step you can go back to see how many inodes are rogue, if they are special character devices, and how much space they are taking up.
5. A diagnostic script
There has to be an easier way then the procedure just described. You could write a script to do the work for you or use mine:
#!/sbin/sh tmpfile=/tmp/icheck.$$.tmp if [ "$1" ] then device=$1 if [ -b "$device" ] then mntpoint=`/usr/sbin/df | /usr/bin/grep $device | /usr/bin/awk ' { print $1 } '` if [ $mntpoint ] then firsthalf=`/usr/bin/ls -lL $device | /usr/bin/awk -F\, ' { print $1 } '` secondhalf=`/usr/bin/ls -lL $device | /usr/bin/awk -F\, ' { print $2 } '` major=`/usr/bin/echo $firsthalf | /usr/bin/awk ' { print $5 } '` minor=`/usr/bin/echo $secondhalf | /usr/bin/awk ' { print $1 } '` possiblePIDs="" for pid in `/usr/bin/ls /proc` do inodes=`/usr/proc/bin/pfiles $pid 2> /dev/null | /usr/bin/grep "dev:${major},${minor}" | /usr/bin/awk ' { print $5 } ' | /usr/bin/awk -F: ' { print $2 } ' | /usr/bin/sort | /usr/bin/uniq` if [ "$inodes" ] then possiblePIDs="$possiblePIDs $pid" fi for inode in $inodes do /usr/bin/echo $inode >> $tmpfile done done if [ -f $tmpfile ] then inodes=`/usr/bin/sort $tmpfile | /usr/bin/uniq` /usr/bin/rm $tmpfile fi inum=0 for ino in $inodes do inum=`/usr/bin/echo $inum + 1 | bc` done /usr/bin/echo $inum open inodes found on $device binum=0 inum=0 badinodes="" for ino in $inodes do /bin/printf "\r%d inodes found without files on file " $binum /bin/printf "system (%d searched)" $inum filename=`/usr/bin/find $mntpoint -mount -inum $ino 2>/dev/null` inum=`/usr/bin/echo $inum + 1 | bc` if [ "$filename" ] then /usr/bin/echo do nothing > /dev/null else badinodes="$badinodes $ino" binum=`/usr/bin/echo $binum + 1 | bc` fi done /bin/printf "\r%d inodes found without files on file " $binum /bin/printf "system (%d searched)\n" $inum for badino in $badinodes do /bin/printf "the following processes have captured rogue " /bin/printf "inode %s" $badino notfound="1" firsttime=1; for pid in $possiblePIDs do response=`/usr/proc/bin/pfiles $pid 2>/dev/null | /usr/bin/grep "ino:${badino}"` if [ "$response" ] then notfound="" size="" if [ $firsttime ] then firsttime=""; havesize=`/usr/bin/echo $response | /usr/bin/grep "size:"` if [ "$havesize" ] then size=`/usr/bin/echo $havesize | awk ' { print $8 } '` fi if [ $size ] then /usr/bin/echo " (${size}):" else /usr/bin/echo ":" fi fi /usr/bin/echo " $pid" fi done if [ $notfound ] then echo ":" fi done exit 0 else /usr/bin/echo error: $device is not mounted exit 1 fi else /usr/bin/echo usage: icheck.sh \ /usr/bin/echo error: $device is not a block special file /usr/bin/echo example: icheck.sh /dev/dsk/c0t0d0s0 exit 1 fi fi /usr/bin/echo usage: icheck.sh \ /usr/bin/echo example: icheck.sh /dev/dsk/c0t0d0s0 exit 1
Sample output from icheck.sh being run on the file system first shown in section 1’s example:
# /usr/local/sbin/icheck.sh /dev/dsk/c1t0d0s5 12 open inodes found on /dev/dsk/c1t0d0s5 1 inodes found without files on file system (12 searched) the following processes have captured rogue inode 2921 (size:16): 14947 #