Today I Learned

hashrocket A Hashrocket project

Use comm to Compare Files

Today I learned about comm, which is used to select the common lines in two files. It's pretty neat, but has a strange output format.

Say we have two text files:

# first.txt
one
two
three

# second.txt
one
three
four

We can run the below command to find the common lines. In the output, the first column is what's only in first.txt, the second column is what's in second.txt, and the third column is what's common.

$ comm first.txt second.txt                                                                                         
                one
        three
        four
two
three

Hmm, that doesn't look right - one and three are common, not just one. The caveat with comm is that the files need to be sorted lexically. You can sort easily in bash with bird beak notation for process substitution.

$ comm <(sort first.txt) <(sort second.txt)
        four
                one
                three
two

If we only want the common lines, we can apply the flags -12 to hide the first and second columns:

$ comm -12 <(sort first.txt) <(sort second.txt)
one
three
See More #command-line TILs