What are sparse files in Linux

Have you ever obeserved difference in disk usage of some file while using du and ls command. For example:

$ du -sh /u02/ticoprd/redo/*
515M redo1a.rdo
524M redo3b.rdo
518M redo4a.rdo
$ ls -ltrh /u02/ticoprd/redo/*
-rw-r----- 1 oticoprd dba 1.1G Aug 4 01:09 redo1a.rdo
-rw-r----- 1 oticoprd dba 1.1G Aug 4 02:32 redo3b.rdo
-rw-r----- 1 oticoprd dba 1.1G Aug 4 03:51 redo4a.rdo

We can see the “ls” command displays the file size to be 1.1GB whereas the “du” command file size to be only 515MB. These files are sparse files. The “ls” is displaying the apparent size of the file and “du” is displaying the actual size of the file residing on the disk.

A sparse file is a type of computer file that attempts to use file system space more efficiently when blocks allocated to the file are mostly empty. This is achieved by writing brief information (metadata) representing the empty blocks to disk instead of the actual “empty” space which makes up the block, using less disk space (i.e. sparse files contain blocks of zeros whose existence is recorded, but have no space allocated on disk). The full block size is written to disk as the actual size only when the block contains “real” (non-empty) data.

When reading sparse files, the file system transparently converts metadata representing empty blocks into “real” blocks filled with zero bytes at run-time. The application is unaware of this conversion. Sparse files are commonly used for disk images, database snapshots, log files, etc.

The advantage of sparse files is that storage is only allocated when actually needed: disk space is saved, and large files can be created even if there is insufficient free space on the file system.

Disadvantages are that sparse files may become fragmented. The file system free space reports may be misleading and copying a sparse file with a program that does not explicitly support them may copy the entire, uncompressed size of the file, including the sparse, mostly zero sections which are not on disk — losing the benefits of the sparse property in the file.

We can see this behavior with /var/log/lastlog file.

# ls -lh /var/log/lastlog
-rw-r--r--. 1 root root 144K Sep 8 22:45 /var/log/lastlog
# du -sh /var/log/lastlog
40K /var/log/lastlog 

Creating Sparse File

We can create sparse file using dd command:

# dd if=/dev/zero of=sparse_file bs=1 count=0 seek=512M
0+0 records in
0+0 records out
0 bytes (0 B) copied, 0.000381714 s, 0.0 kB/s
# ls -hl sparse_file
-rw-r--r--. 1 root root 512M Sep 9 00:13 sparse_file
# du -sh sparse_file
0 sparse_file

To see disk usage of file with “ls” command we can use “-s” option:

# ls -lhs sparse_file
0 -rw-r--r--. 1 root root 512M Sep 9 00:13 sparse_file

To see apparent size of the file using “du” we can use –apparent-size option:

# du -h --apparent-size sparse_file
512M sparse_file 

Copy sparse file with “cp” command

The ‘cp’ supports sparse file and is good in detecting that, so it suffices to run “cp”. But cp does have a –sparse=WHEN option.

# cp --sparse=always sparse_file sparse_file.2
Related Post