What effect does the “-d” option have with diff

diff()

The diff implementation on OpenBSD has a non-standard -d option with the following documentation:

-d

Try very hard to produce a diff as small as possible. This may
consume a lot of processing power and memory when processing
large files with many changes.

The GNU diff implementation has the same option with the shorter documentation

-d, --minimal

try hard to find a smaller set of changes

From time to time I've used this option just to see if it generates output that is in any shape or form different from the same diff command without the option, but I've never seen any difference (no pun intended).

Could someone provide or point to an example where this option actually produces a different result from the same command without -d? Alternatively, if someone could explain the circumstances required for this option to kick in. I'm also uncertain whether "minimal" means "fewer lines of output" or "fewer hunks".

An uneducated guess is that it has to do with very large hunks.

Best Answer

In GNU diff, also used on FreeBSD, the --minimal flag triggers an algorithm variation by Paul Eggert that causes it "to limit the cost to O(N**1.5 log N) at the price of producing suboptimal output for large inputs with differences". More specifically, it causes it to not apply several heuristics that deal in finding merely close to optimal solutions and in throwing out "confusing" lines as extra differences.

In OpenBSD diff, which uses the older Unix diff algorithm from the 1970s, the algorithm employed is credited to Harold Stone, and the --minimal flag triggers a search that is (effectively un-) bounded by the maximum value of an unsigned integer instead of by the square root of the size of the range of lines being compared (or 256 if it is greater).

Further reading