Bash – Compare two URL lists and print newly added URLs to a new file

bashcommand linediff()scripting

I am initially producing two files which contain lists of URLs—I will refer to them as old and new. I would like to compare the two files and if there are any URLs in the new file which are not in the old file, I would like these to be displayed in an extra_urls file.

Now, I've read some stuff about using the diff command but from what I can tell, this also analyses the order of the information. I don't want the order to have any effect on the output. I just want the extra URL's in new printed to the extra_urls file, no matter what order they are placed in either of the other two files.

How can I do this?

Best Answer

You can use the comm command to compare two files, and selectively show lines unique to one or the other, or the lines in common. It requires the inputs to be sorted, but you can sort them on the fly, by using process substitution.

comm -13 <(sort old.txt) <(sort new.txt)

If you're using a version of bash that doesn't support process substitution, it can be emulated using named pipes. An example is shown in Wikipedia.

Related Question