I'm getting a diff: memory exhausted
error when trying to diff two 27 GB files that are largely similar on a Linux box with CentOS 5 and 4 GB of RAM. This is a known problem, it seems.
I would expect there to be an alternative for such an essential utility, but I can't find one. I imagine the solution would have to use temporary files rather than memory to store the information it needs.
- I tried to use
rdiff
andxdelta
, but they are better for showing the changes between two files, like a patch, and are not that useful for inspecting the differences between two files. - Tried VBinDiff, but it is a visual tool which is better for comparing binary files. I need something that can pipe the differences to STDOUT like regular
diff
. - There are a lot of other utilities such as
vimdiff
that only work with smaller files. - I've also read about Solaris
bdiff
but I could not find a port for Linux.
Any ideas besides splitting the file into smaller pieces? I have 40 of these files so trying to avoid the work of breaking them up.
Best Answer
cmp
does things byte-by-byte, so it probably won't run out of memory (just tested it on two 7 GB files) -- but you might be looking for more detail than a list of "files X and Y differ at byte x, line y". If the similarities of your files are offset (e.g., file Y has an identical block of text, but not at the same location), you can pass offsets tocmp
; you could probably turn it into a resynchronizing compare with a small script.Aside: In case anyone else lands here when looking for a way to confirm that two directory structures (containing very large files) are identical:
diff --recursive --brief
(ordiff -r -q
for short, or maybe evendiff -rq
) will work and not run out of memory.