I am initially producing two files which contain lists of URLs—I will refer to them as old
and new
. I would like to compare the two files and if there are any URLs in the new
file which are not in the old
file, I would like these to be displayed in an extra_urls
file.
Now, I've read some stuff about using the diff
command but from what I can tell, this also analyses the order of the information. I don't want the order to have any effect on the output. I just want the extra URL's in new
printed to the extra_urls
file, no matter what order they are placed in either of the other two files.
How can I do this?
Best Answer
You can use the
comm
command to compare two files, and selectively show lines unique to one or the other, or the lines in common. It requires the inputs to be sorted, but you can sort them on the fly, by using process substitution.If you're using a version of
bash
that doesn't support process substitution, it can be emulated using named pipes. An example is shown in Wikipedia.