Outputting common lines from 2 files and uncommon lines from both the files in one output file

filestext processing

I have 2 text files. Lets name them file1.txt and file2.txt

file1.txt is as follows

chr10   181144  225933
chr10   181243  225933
chr10   181500  225933
chr10   226069  255828
chr10   255989  267134
chr10   255989  282777
chr10   267297  282777
chr10   282856  283524
chr10   283618  285377
chr10   285466  285995

file2.txt is as follows

chr10   181144  225933
chr10   181243  225933
chr10   181500  225933
chr10   255989  282777
chr10   267297  282777
chr10   282856  283524
chr10   375542  387138
chr10   386930  387138
chr10   387270  390748
chr10   390859  390938
chr10   391051  394580
chr10   394703  395270

What I want to output in a single file is

  1. All the common lines between file1 and file2
  2. All the lines which are in file1 but are not common to both
  3. All the lines which are in file2 but are not common to both.

I wrote a Perl script to do this but I am pretty sure there must be a command line or an easier way to do it.

Best Answer

Lines common to both files:

comm -12 file1.txt file2.txt > results.txt

Add lines unique to file1.txt:

comm -23 file1.txt file2.txt >> results.txt

Add lines unique to file2.txt:

comm -13 file1.txt file2.txt >> results.txt

If the files are not already sorted, you must do so beforehand e.g. if your shell supports process substitution

comm -12 <(sort file1.txt) <(sort file2.txt)

etc.

Related Question