I have two log files with thousands of lines. After pre-processing, only some lines differ. These remaining lines are either real differences, or shuffled groups of lines.
Unified diffs allow me to see the detailed differences, but it makes manual comparison with eyeballs hard. Side-by-side diffs seems more useful for comparison, but it also adds thousands of unchanged lines. Is there a way to get the advantage of both worlds?
Note, these log files are generated by xscope
which is a program that monitors Xorg protocol data. I am looking for general-purpose tools that can be applied to situations similar to the above, not specialized webserver access log analysis tools for example.
Two example log files are available at http://lekensteyn.nl/files/qemu-sdl-debug/ (log13
and log14
). A pre-processor command can be found in the xscope-filter
file which removes timestamps and other minor details.
Best Answer
The 2 diff tools I use the most would be meld and sdiff.
meld
Meld is a GUI but does a great job in showing diffs between files. It's geared more for software development with features such as the ability to move changes from one side to the other to merge changes but can be used as just a straight side-by-side diffing tool.
sdiff
I've used this tool for years. I generally run it with the following switches:
-b
Ignore changes in the amount of white space.-W
Ignore all white space.-B
Ignore changes whose lines are all blank.-s
Do not output common lines.Often with log files you'll need to make the width of the columns wider, you can use
-w <num>
to make the screen wider.other tools that I use off and on
diffc
Diffc is a python script which colorizes unified diff output.
vimdiff
Vimdiff is probably as good if not better than meld and it can be run from a terminal. I always forget to use it though which, to me, is a good indicator that I find the tool just a little to tough to use day to day. But YMMV.