comm: Select or reject lines common to two files. Both files must be sorted

The “comm” command is a Unix utility that allows you to compare two sorted files and select or reject lines that are common or unique between them. It is primarily used to find the similarities or differences between two sets of data.

To use the “comm” command effectively, both input files must be sorted in the same order. If the files are not sorted, the results may not be accurate. Therefore, it is crucial to ensure proper sorting before using the command.

When you run the “comm” command, it compares the lines from the two input files and categorizes them into three columns: lines unique to the first file, lines unique to the second file, and lines common to both files.

The output of the “comm” command consists of three sections:

  • Lines only in the first file: This section displays the lines that are present only in the first file and not in the second file. These lines are preceded by a tab character.
  • Lines only in the second file: This section shows the lines that are present only in the second file and not in the first file. These lines are prefixed with two tab characters.
  • Lines common to both files: This section lists the lines that are present in both files. These lines are not indented or prefixed with any special characters.

The “comm” command provides various options to modify its behavior. For example, you can suppress the output of specific sections, change the delimiter used to separate columns, or specify the output format.

The primary use of the “comm” command is to identify differences and similarities between two sorted files. It is commonly employed in tasks such as comparing sorted lists, finding changes between different versions of files, or identifying overlaps between datasets.

comm Command Examples

1. Produce three tab-separated columns: lines only in first file, lines only in second file and common lines:

# comm file1 file2

2. Print only lines common to both files:

# comm -12 file1 file2

3. Print only lines common to both files, reading one file from stdin:

# cat file1 | comm -12 - file2

4. Get lines only found in first file, saving the result to a third file:

# comm -23 file1 file2 > file1_only

5. Print lines only found in second file, when the files aren’t sorted:

# comm -13 

Summary

In summary, the "comm" command is a Unix utility used to compare two sorted files and identify lines that are common or unique to each file. It categorizes the lines into three columns: lines unique to the first file, lines unique to the second file, and lines common to both files. By using "comm," you can efficiently analyze and understand the differences and similarities between two sets of sorted data.

Related Post