Diff ignoring eol and whitespace

diff()

I would like to diff two files, such that end of lines and white space be ignored. Namely, I would like diff to find no difference between d1.txt and d2.txt:

$ cat d1.txt                                                                    
test1                                                                           

test2                                                                           

test3                                                                           

 test4                                                                          
$ cat d2.txt                                                                    
test1test2test3test4               

For some reason,

diff -d -w -a –strip-trailing-cr d1.txt d2.txt

does not do the job. Any help is appreciated.

Best Answer

diff compares lines, see man diff:

diff - compare files line by line

Ignoring white space means that foo bar will match foobar if on the same line. Since your patterns in d1.txt span multiple lines, the files will always differ. I haven;t actually read the source code but I guess diff works something like:

for each line number X in file1;
    line1 = line X from file1
    line2= line X from file2
    If line1 is equal to line2 the do something
    else do something else

The first line of your file1 is not the same as the first line of file2 so a difference is reported. If you really want to check that the files contain the exact same non-whitespace characters, you could try something like this:

diff <(perl -ne 's/\s*//xg; print' d1.txt) <(perl -ne 's/\s*/g; print' d2.txt)
Related Question