Basic Linux File system tutorial – ext2, ext3, ext4, JFS and XFS

The original Linux system used a simple file system that mimicked the functionality of the Unix file system. In this tutorial, we will discuss the basic file system used in Linux.

The ext file system

The original file system introduced with the Linux operating system is called the extended file system (or just ext for short). It provides a basic Unix- like file system for Linux, using virtual directories to handle physical devices, and storing data in fixed-length blocks on the physical devices.

The ext file-system uses a system called inodes to track information about the files stored in the virtual directory. The inode system creates a separate table on each physical device, called the inode table, to store the file information. Each stored file in the virtual directory has an entry in the inode table. The extended part of the name comes from the additional data that it tracks on each file, which consists of:

  • The file name
  • The file size
  • The owner of the file
  • The group the file belongs to
  • Access permissions for the file
  • Pointers to each disk block that contains data from the file

Linux references each inode in the inode table using a unique number (called the inode number), assigned by the file system as data files are created. The file system uses the inode number to identify the file rather than having to use the full file name and path.

The ext2 files system

The original ext file system had quite a few limitations, such as limiting files to only 2GB in size. Not too long after Linux was first introduced, the ext file system was upgraded to create the second extended file system, called ext2. As you can guess, the ext2 file system is an expansion of the basic abilities of the ext file system but maintains the same structure. The ext2 file system expands the inode table format to track additional information about each file on the system.

The ext2 inode table adds the created, modified, and last accessed time values for files to help system administrators track file access on the system. The ext2 file system also increases the maximum file size allowed to 2TB (then in later versions of ext2, that was increased to 32TB) to help accommodate large files commonly found in database servers.

In addition to expanding the inode table, the ext2 file system also changed the way in which files are stored in the data blocks. A common problem with the ext file system was that as a file is written to the physical device, the blocks used to store the data tend to be scattered throughout the device (called fragmentation ). Fragmentation of data blocks can reduce the performance of the file system, as it takes longer to search the storage device to access all of the blocks for a specific file.

The ext2 file system helps reduce fragmentation by allocating disk blocks in groups when you save a file. By grouping the data blocks for a file, the file system doesn’t have to search all over the physical device for the data blocks to read the file. The ext2 file system was the default file system used in Linux distributions for many years, but it, too, had its limitations. The inode table, while a nice feature that allows the file system to track additional information about files, can cause problems that can be fatal to the system. Each time the file system stores or updates a file, it has to modify the inode table with the new information. The problem is that this isn’t always a fluid action.

If something should happen to the computer system between the file being stored and the inode table being updated, the two would become out of sync. The ext2 file system is notorious for easily becoming corrupted due to system crashes and power outages. Even if the file data is stored just fine on the physical device, if the inode table entry wasn’t completed, the ext2 file system wouldn’t even know that the file existed! It wasn’t long before developers were exploring a different avenue of Linux file systems.

Journaling file systems

Journaling file systems provide a new level of safety to the Linux system. Instead of writing data directly to the storage device and then updating the inode table, journaling file systems write file changes into a temporary file (called the journal) first. After data is successfully written to the storage device and the inode table, the journal entry is deleted.

If the system should crash or suffer a power outage before the data can be written to the storage device, the journaling file system just reads through the journal file and processes any uncommitted data left over. There are three different methods of journaling commonly used in Linux, each with different levels of protection. These are shown below in the table.

Journaling File system Methods:

Method Description
Data mode Both inode and file data are journaled. Low risk of losing data, but poor performance.
Ordered mode Only inode data written to the journal, but not removed until file data is successfully written. Good compromise between performance and safety.
Writeback mode Only inode data written to the journal, no control over when the file data is written. Higher risk of losing data, but still better than not using journaling.

Limitation

The data mode journaling method is by far the safest for protecting data, but it is also the slowest. All of the data written to a storage device must be written twice, once to the journal, then again to the actual storage device. This can cause poor performance, especially for systems that do a lot of data writing. Over the years, a few different journaling file systems have appeared in Linux. The following sections describe the popular Linux journaling file systems available.

The extended Linux journaling filesystems

The same group that developed the ext and ext2 file systems as part of the Linux project also created journaling versions of the file systems. These journaling file systems are compatible with the ext2 file system, and it’s easy to convert back and forth between them. There are currently two separate journaling file systems based on the ext2 file system.

Th ext3 file system

The ext3 file system was added to the Linux kernel in 2001, and up until recently was the default file system used by just about all Linux distributions. It uses the same inode table structure as the ext2 filesystem, but adds a journal file to each storage device to journal the data written to the storage device.

By default, the ext3 file system uses the ordered mode method of journaling, only writing the inode information to the journal file, but not removing it until the data blocks have been successfully written to the storage device. You can change the journaling method used in the ext3 filesystem to either data or writeback modes with a simple command-line option when creating the file system.

While the ext3 file system added basic journaling to the Linux file system, there were still a few things it lacked. For example, the ext3 file system doesn’t provide any recovery from accidental deletion of files, there’s no built-in data compression available (although there is a patch that can be installed separately that provides this feature), and the ext3 file system doesn’t support encrypting files. For those reasons, developers in the Linux project choose to continue work on improving the ext3 file system.

The ext4 file system

The result of expanding the ext3 file system was (as you probably guessed) the ext4 file system. The ext4 file system was officially supported in the Linux kernel in 2008, and is now the default file system used in most popular Linux distributions, such as Fedora and Ubuntu.

In addition, to support compression and encryption, the ext4 file system also supports a feature called extents. Extents allocate space on a storage device in blocks, and only store the starting block location in the inode table. This helps save space in the inode table by not having to list all of the data blocks used to store data from the file.

The ext4 file system also incorporates block pre-allocation. If you want to reserve space on a storage device for a file that you know will grow in size, with the ext4 file system it’s possible to allocate all of the expected blocks for the file, not just the blocks that physically exist. The ext4 file system fills in the reserved data blocks with zeroes, and knows not to allocate them for any other file.

The reiser file system

In 2001, Hans Reiser created the first journaling file system for Linux, called ReiserFS. The ReiserFS file system only supports writeback journaling mode, writing only the inode table data to the journal file. Because it writes only the inode table data to the journal, the ReiserFS file system is one of the fastest journaling file systems in Linux.

Two interesting features incorporated into the ReiserFS file system are that you can resize an existing file system while it’s still active, and that it uses a technique called tailpacking, which stuffs data from one file into empty space in a data block from another file. The active file system resizing feature is great if you have to expand an already created file system to accommodate more data.

The journaled file system(JFS)

Possibly one of the oldest journaling file systems around, the Journaled File System (JFS) was developed by IBM in 1990 for its AIX flavor of Unix. However, it wasn’t until its second version that it was ported to the Linux environment.

Note – The official IBM name of the second version of the JFS file system is JFS2, but most Linux systems refer to it as just JFS.

The JFS file system uses the ordered journaling method, storing only the inode table data in the journal, but not removing it until the actual file data is written to the storage device. This method is a compromise between the speed of the ReiserFS and the integrity of the data mode journaling method.

The JFS file system uses extent-based file allocation, allocating a group of blocks for each file written to the storage device. This method provides for less fragmentation on the storage device. Outside of the IBM Linux offerings, the JFS file system isn’t popularly used, but you may run into it in your Linux journey.

The xfs file system

The XFS journaling file system is yet another file system originally created for a commercial Unix system that made its way into the Linux world. Silicon Graphics Incorporated (SGI) originally created XFS in 1994 for its commercial IRIX Unix system. It was released to the Linux environment for common use in 2002.

The XFS file system uses the writeback mode of journaling, which provides high performance but does introduce an amount of risk because the actual data isn’t stored in the journal file. The XFS file system also allows online resizing of the file system, similar to the ReiserFS file system, except XFS file systems can only be expanded and not shrunk.

Related Post