Ubuntu – How to delete words from txt file, that exists on another txt file

text processing

File a.txt has about 100k words, each words is in new line

july.cpp
windows.exe
ttm.rar
document.zip

File b.txt has 150k words, one word by line – some words are from file a.txt, but some words are new:

july.cpp    
NOVEMBER.txt    
windows.exe    
ttm.rar    
document.zip    
diary.txt

How can I merge this files into one, delete all duplicate lines, and keep lines that are new (lines that exist in a.txt but don't exist in b.txt, and vice versa)?

Best Answer

There is a command to do this: comm. As stated in man comm, it is plain simple:

   comm -3 file1 file2
          Print lines in file1 not in file2, and vice versa.

Note that comm expects files contents to be sorted, so You must sort them before calling comm on them, just like that:

sort unsorted-file.txt > sorted-file.txt

So to sum up:

sort a.txt > as.txt

sort b.txt > bs.txt

comm -3 as.txt bs.txt > result.txt

After above commands, You will have expected lines in the result.txt file.