I have 2 files containing a list of songs.
hdsongs.txt and sdsongs.txt
I wrote a simple script to list all songs and output to text files, to then run a diff against.
It works fine for the most part, but the actual diff command in the script is showing the same line as being different. This is actually happening for multiple lines, but not all.
Here is an example of a song in both files:
$ grep Apologize \*songs\*
hdsongs.txt:Timbaland/Apologize.mp3
sdsongs.txt:Timbaland/Apologize.mp3
There is no trailing special character that I can see:
$ cat -A hdsongs.txt sdsongs.txt | grep Apologize
Timbaland/Apologize.mp3$
Timbaland/Apologize.mp3$
When I run diff, it shows the same line being in each file; but aren't the lines the same?
$ diff hdsongs.txt sdsongs.txt | grep Apologize
> Timbaland/Apologize.mp3
< Timbaland/Apologize.mp3
This is similar to the thread here:
diff reports two files differ, although they are the same!
but this is for lines within the file, not the whole file, and the resolution there doesn't seem to fit in this case.
$ diff <(cat -A phonesongsonly.txt) <(cat -A passportsongsonly.txt) | grep Apologize
< Timbaland/Apologize.mp3$
> Timbaland/Apologize.mp3$
$ wdiff -w "$(tput bold;tput setaf 1)" -x "$(tput sgr0)" -y "$(tput bold;tput setaf 2)" -z "$(tput sgr0)" hdsongs.txt sdsongs.txt | grep Apologize
>Timbaland/Apologize.mp3
>Timbaland/Apologize.mp3
Does anyone know why diff would report the same line twice like this?
Best Answer
My guess is you simply haven't sorted the files. That's one of the behaviors you can get on unsorted input:
But, if you sort:
The
diff
program's job is to tell you whether two files are identical and, if not, where they differ. It is not designed to find similarities between different lines. If line X of the one file is not the same as line X of the other, then the files are not the same. It doesn't matter if they contain exactly the same information, if that information is organized in a different way, the files are reported as different.