• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer navigation

The Geek Diary

  • OS
    • Linux
    • CentOS/RHEL
    • Solaris
    • Oracle Linux
    • VCS
  • Interview Questions
  • Database
    • oracle
    • oracle 12c
    • ASM
    • mysql
    • MariaDB
  • DevOps
    • Docker
    • Shell Scripting
  • Big Data
    • Hadoop
    • Cloudera
    • Hortonworks HDP

Solaris Interview Questions – Inodes and the Filesystem Troubleshooting

by admin

1. Why ‘du’ and ‘df’ reports different values?

– df describes disk usage.
– du adds up the size all the files on a filesystem.

They can report wildly different figures.

One reason:
Processes write to files. files can be deleted while processes are writing to files. So a file may be gone, but its inode may not be freed up. For example:

# cat > bogus
testing
testing
^Z
Stopped (user)
# ls -i bogus
2921 bogus
# fuser bogus
bogus: 14947o
# rm bogus
# ls -i bogus
bogus: No such file or directory
# /usr/proc/bin/pfiles 14947
14947: cat
Current rlimit: 64 file descriptors
0: S_IFCHR mode:0620 dev:32,120 ino:211552 uid:0 gid:7 rdev:24,17
O_RDWR
1: S_IFREG mode:0644 dev:32,5 ino:2921 uid:0 gid:1 size:16
O_WRONLY|O_LARGEFILE
2: S_IFCHR mode:0620 dev:32,120 ino:211552 uid:0 gid:7 rdev:24,17
O_RDWR
#

This example shows how a file can be deleted but still take up room on a hard drive: A file (bogus: inode 2921) is created by a process (pid: 14947). The files is deleted and it is demonstrated that it no longer resides on the file system. But pfiles shows the inode is still open and is taking up 16 bytes of disks space.

2. How deleting a file may not free up disk space?

This is the same reason a disk may be full and removing a file does not clear space. If you do a fuser on a file before removing it you will determine if removing it will do any good. For example:

# ls -l /var/adm/messages
-rw-r--r-- 1 root other 218611 Jul 29 02:18 messages
# fuser /var/adm/messages
messages: 164o
# ps -ef | grep 164
root 164 1 0 22:47:49 0:03 /usr/sbin/syslogd
# kill 164
# rm /var/adm/messages
# touch /var/adm/messages
# /usr/sbin/syslogd
# ls -l /var/adm/messages
-rw-r--r-- 1 root other 7592 Jul 29 11:46 messages
#

This example demonstrates a way to remove the /var/adm/messages and actually clear disk space. Notice how the messages file jumps up to 7k after syslog is restarted. syslog will dump the contents of the dmesg buffer into the /var/adm/messages when it is started.

3. How can I be out of inodes?

Not all inodes captured by a processes are normal files. Many are character devices which take no disk space but will remove a inode from possible use. These character devices may take up memory which in turn reduce the amount of memory available to your tmpfs file systems (ie, /tmp). For example:

# /usr/proc/bin/pfiles 10234
10234: xterm
Current rlimit: 64 file descriptors
0: S_IFCHR mode:0620 dev:32,120 ino:211538 uid:52475 gid:7 rdev:24,3
O_RDWR|O_LARGEFILE
1: S_IFCHR mode:0620 dev:32,120 ino:210782 uid:52475 gid:7 rdev:0,0
O_WRONLY|O_LARGEFILE
2: S_IFCHR mode:0620 dev:32,120 ino:210782 uid:52475 gid:7 rdev:0,0
O_WRONLY|O_LARGEFILE
3: S_IFCHR mode:0666 dev:32,120 ino:210791 uid:0 gid:3 rdev:13,12
O_RDWR
4: S_IFIFO mode:0666 dev:171,0 ino:1626250456 uid:0 gid:0 size:0
O_RDWR|O_NONBLOCK FD_CLOEXEC
5: S_IFCHR mode:0000 dev:32,120 ino:65232 uid:0 gid:0 rdev:23,13
O_RDWR|O_NDELAY
# ps -ef | grep 10234
dsweet 10234 10220 0 23:16:58 0:01 xterm
# ls -lL /dev/dsk/c0t0d0s0
brw-r----- 1 root sys 32,120 Jun 24 19:51 /dev/dsk/c0t0d0s0
# df /
/ (/dev/dsk/c0t0d0s0 ): 1097712 blocks 456994 files
# find / -mount -inum 211538
/devices/pseudo/pts@0:3
# find / -mount -inum 210782
/devices/pseudo/cn@0:console
# find / -mount -inum 65232
#

This example shows that dsweet’s xterm has captured 3 inodes* on his root partition but none were normal files. None of them have size so they are special character files (a raw device). The last inode (65232) was not found on the file system. Therefore inode 65232 can not be freed unless its process lets go of it or the process is killed. Inode 210782 is the console and it was captured twice.

4. A diagnostic procedure

This is all well and good, but how does a user figure out where all his inodes have gone. The procedure:

1. Determine the major and minor numbers for the troublesome device:

# df /
/ (/dev/dsk/c0t0d0s0 ): 1097710 blocks 456994 files
# ls -lL /dev/dsk/c0t0d0s0
brw-r----- 1 root sys 32,120 Jun 24 19:51 /dev/dsk/c0t0d0s0

2. Determine which inodes on the device are opened by processes:

# ls /proc
0 10220 10235 10251 10282 10386 136 160 227 298 306
1 10222 10236 10253 10300 10429 141 178 235 3 3060
10120 10227 10237 10255 10310 10446 143 184 236 301 308
10130 10228 10238 10266 10314 10484 14778 194 267 302 99
10134 10229 10239 10268 10338 10485 14934 2 270 303
10144 10230 10240 10270 10352 10567 14935 2115 275 304
10167 10231 10241 10272 10353 106 14937 2117 285 305
10169 10232 10242 10276 10364 10605 14947 212 288 3055
10170 10233 10243 10278 10366 108 14969 215 289 3057
10219 10234 10245 10280 10385 119 15008 225 295 3058

# /usr/proc/bin/pfiles 0 | grep "dev:32,120"

# /usr/proc/bin/pfiles 10220 | grep "dev:32,120"
0: S_IFCHR mode:0620 dev:32,120 ino:211538 uid:52475 gid:7 rdev:24,3
1: S_IFCHR mode:0620 dev:32,120 ino:210782 uid:52475 gid:7 rdev:0,0
2: S_IFCHR mode:0620 dev:32,120 ino:210782 uid:52475 gid:7 rdev:0,0
3: S_IFCHR mode:0666 dev:32,120 ino:210791 uid:0 gid:3 rdev:13,12
8: S_IFCHR mode:0000 dev:32,120 ino:62920 uid:0 gid:0 rdev:42,126
9: S_IFCHR mode:0000 dev:32,120 ino:55000 uid:0 gid:0 rdev:41,171
10: S_IFCHR mode:0000 dev:32,120 ino:64688 uid:0 gid:0 rdev:42,130
11: S_IFREG mode:0644 dev:32,120 ino:313393 uid:0 gid:3 size:316
#

Just by checking the first 2 processes I found 7 unique processes that are open on the file system. You have to repeat this step for all the processes found in /proc to get all the open inodes on the file system.

Determine if the inodes exists on the file system:

# find / -mount -inum 211538
/devices/pseudo/pts@0:3

# find / -mount -inum 210782
/devices/pseudo/cn@0:console

# find / -mount -inum 210791
/devices/pseudo/mm@0:zero

# find / -mount -inum 62920
/usr/lib/lpshut

# find / -mount -inum 55000
# find / -mount -inum 64688
# find / -mount -inum 313393
/etc/group
#

Therefore only 2 of the 7 inodes captured by process 10220 are rogue. Rouge is my way of describing an open inode that doesn’t have coresponding file in the file system. You would have to repeat this step for all the inodes found above to see which open inodes are rogue. Once done with the last step you can go back to see how many inodes are rogue, if they are special character devices, and how much space they are taking up.

5. A diagnostic script

There has to be an easier way then the procedure just described. You could write a script to do the work for you or use mine:

#!/sbin/sh
tmpfile=/tmp/icheck.$$.tmp
if [ "$1" ]
then
device=$1
if [ -b "$device" ]
then
mntpoint=`/usr/sbin/df | /usr/bin/grep $device | /usr/bin/awk ' { print $1 } '`
if [ $mntpoint ]
then
firsthalf=`/usr/bin/ls -lL $device | /usr/bin/awk -F\, ' { print $1 } '`
secondhalf=`/usr/bin/ls -lL $device | /usr/bin/awk -F\, ' { print $2 } '`
major=`/usr/bin/echo $firsthalf | /usr/bin/awk ' { print $5 } '`
minor=`/usr/bin/echo $secondhalf | /usr/bin/awk ' { print $1 } '`
possiblePIDs=""
for pid in `/usr/bin/ls /proc`
do
inodes=`/usr/proc/bin/pfiles $pid 2> /dev/null | /usr/bin/grep "dev:${major},${minor}" | /usr/bin/awk ' { print $5 } ' | /usr/bin/awk -F: ' { print $2 } ' | /usr/bin/sort | /usr/bin/uniq`
if [ "$inodes" ]
then
possiblePIDs="$possiblePIDs $pid"
fi
for inode in $inodes
do
/usr/bin/echo $inode >> $tmpfile
done
done
if [ -f $tmpfile ]
then
inodes=`/usr/bin/sort $tmpfile | /usr/bin/uniq`
               /usr/bin/rm $tmpfile
fi
inum=0
for ino in $inodes
do
inum=`/usr/bin/echo $inum + 1 | bc`
done
/usr/bin/echo $inum open inodes found on $device
binum=0
inum=0
badinodes=""
for ino in $inodes
do
/bin/printf "\r%d inodes found without files on file " $binum
/bin/printf "system (%d searched)" $inum
filename=`/usr/bin/find $mntpoint -mount -inum $ino 2>/dev/null`
inum=`/usr/bin/echo $inum + 1 | bc`
if [ "$filename" ]
then
/usr/bin/echo do nothing > /dev/null
else
badinodes="$badinodes $ino"
binum=`/usr/bin/echo $binum + 1 | bc`
fi
done
/bin/printf "\r%d inodes found without files on file " $binum
/bin/printf "system (%d searched)\n" $inum
for badino in $badinodes
do
/bin/printf "the following processes have captured rogue "
/bin/printf "inode %s" $badino
notfound="1"
firsttime=1;
for pid in $possiblePIDs
do
response=`/usr/proc/bin/pfiles $pid 2>/dev/null | /usr/bin/grep "ino:${badino}"`
if [ "$response" ]
then
notfound=""
size=""
if [ $firsttime ]
then
firsttime="";
havesize=`/usr/bin/echo $response | /usr/bin/grep "size:"`
if [ "$havesize" ]
then
size=`/usr/bin/echo $havesize | awk ' { print $8 } '`
fi
if [ $size ]
then
/usr/bin/echo " (${size}):"
else
/usr/bin/echo ":"
fi
fi
/usr/bin/echo "   $pid"
fi
done
if [ $notfound ]
then
echo ":"
fi
done
exit 0
else
/usr/bin/echo error: $device is not mounted
exit 1
fi
else
/usr/bin/echo usage: icheck.sh \
/usr/bin/echo error: $device is not a block special file
/usr/bin/echo example: icheck.sh /dev/dsk/c0t0d0s0
exit 1
fi
fi
/usr/bin/echo usage: icheck.sh \
/usr/bin/echo example: icheck.sh /dev/dsk/c0t0d0s0
exit 1

Sample output from icheck.sh being run on the file system first shown in section 1’s example:

# /usr/local/sbin/icheck.sh /dev/dsk/c1t0d0s5
12 open inodes found on /dev/dsk/c1t0d0s5
1 inodes found without files on file system (12 searched)
the following processes have captured rogue inode 2921 (size:16):
14947
#

Filed Under: Solaris

Some more articles you might also be interested in …

  1. How to replace a disk under ZFS in Solaris
  2. Solaris ZFS : How to replace a failed disk in rpool (x86)
  3. How to install a ZFS boot block in solaris
  4. Complete hardware reference : T1000 / T2000 / T5120 / T5140 / T5220 / T5240 / T5440
  5. How to identify the HBA cards/ports and WWN in Solaris
  6. Beginners Guide to Solaris 11 Network Administration
  7. GUDS – A Script for Gathering Solaris Performance Data
  8. How To Change Timezone for Oracle Grid Infrastructure
  9. How to run savecore (crash dump) manually in Solaris
  10. How to Identify ZFS Snapshot Differences using “zfs diff”

You May Also Like

Primary Sidebar

Recent Posts

  • nixos-rebuild Command Examples in Linux
  • nixos-option: Command Examples in Linux
  • nixos-container : Command Examples in Linux
  • nitrogen Command Examples in Linux

© 2023 · The Geek Diary

  • Archives
  • Contact Us
  • Copyright