Diff reports the same line as different in 2 files

diff()

I have 2 files containing a list of songs.
hdsongs.txt and sdsongs.txt

I wrote a simple script to list all songs and output to text files, to then run a diff against.
It works fine for the most part, but the actual diff command in the script is showing the same line as being different. This is actually happening for multiple lines, but not all.

Here is an example of a song in both files:

$ grep Apologize \*songs\*
hdsongs.txt:Timbaland/Apologize.mp3
sdsongs.txt:Timbaland/Apologize.mp3

There is no trailing special character that I can see:

$ cat -A hdsongs.txt sdsongs.txt | grep Apologize
Timbaland/Apologize.mp3$
Timbaland/Apologize.mp3$

When I run diff, it shows the same line being in each file; but aren't the lines the same?

$ diff hdsongs.txt sdsongs.txt | grep Apologize
> Timbaland/Apologize.mp3
< Timbaland/Apologize.mp3

This is similar to the thread here:
diff reports two files differ, although they are the same!

but this is for lines within the file, not the whole file, and the resolution there doesn't seem to fit in this case.

$ diff <(cat -A phonesongsonly.txt) <(cat -A passportsongsonly.txt) | grep Apologize
< Timbaland/Apologize.mp3$
> Timbaland/Apologize.mp3$

$ wdiff -w "$(tput bold;tput setaf 1)" -x "$(tput sgr0)" -y "$(tput bold;tput setaf 2)" -z "$(tput sgr0)" hdsongs.txt sdsongs.txt | grep Apologize
>Timbaland/Apologize.mp3
>Timbaland/Apologize.mp3

Does anyone know why diff would report the same line twice like this?

Best Answer

My guess is you simply haven't sorted the files. That's one of the behaviors you can get on unsorted input:

$ cat file1 
foo
bar
$ cat file2
bar
foo
$ $ diff file1 file2
1d0
< foo
2a2
> foo

But, if you sort:

$ diff <(sort file1) <(sort file2)
$ 

The diff program's job is to tell you whether two files are identical and, if not, where they differ. It is not designed to find similarities between different lines. If line X of the one file is not the same as line X of the other, then the files are not the same. It doesn't matter if they contain exactly the same information, if that information is organized in a different way, the files are reported as different.