Getting diff (or git diff) to show inserted hunks properly

diff()git

Let's say I have two files. The first one has the contents:

line 1
foo
line 2

line 1
bar
line 2

And the second one has a new section inserted in the middle, so it looks like this:

line 1
foo
line 2

line 1
new text
line 2

line 1
bar
line 2

Now, when I do a "diff -u", I get output like this:

--- file1   2013-06-25 16:27:43.170231844 -0500
+++ file2   2013-06-25 16:27:59.218757056 -0500
@@ -1,7 +1,11 @@
line 1
foo
line 2

line 1
+new text
+line 2
+
+line 1
bar
line 2

This doesn't properly reflect that the middle stanza was inserted — instead, it makes it look like the second stanza was changed, and a new one added to the end (this is because the algorithm starts at the first differing line).

Is there any way to get diff (either by itself, or using git diff) to show this output instead?

--- file1   2013-06-25 16:27:43.170231844 -0500
+++ file2   2013-06-25 16:27:59.218757056 -0500
@@ -1,7 +1,11 @@
line 1
foo
line 2
+
+line 1
+new text
+line 2

line 1
bar
line 2

This is mostly an issue when generating a patch for someone to review, where a new function gets inserted into a group of similar functions. The default behavior doesn't reflect what really changed.

Best Answer

Git 2.9 was released earlier this year which included the experimental flag --compaction-heuristic on the git diff command:

In 2.9, Git's diff engine learned a new heuristic: it tries to keep hunk boundaries at blank lines, shifting the hunk "up" whenever the bottom of the hunk matches the bottom of the preceding context, until we hit a blank line.

I don't think GitHub has it enabled for diffs on the web UI for Pull Requests and comparisons, but you can do it locally. I'd recommend using it in conjunction with --word-diff if you need that level of granularity.

More details available on the GitHub blog: https://github.com/blog/2188-git-2-9-has-been-released

Related Question