How shall I understand the unified format of diff output

diff()text processingversion control

From diffutils' manual

Next come one or more hunks of diff erences; each hunk shows one area
where the files differ. Unified format hunks look like this:

@@ from-file-line-numbers to-file-line-numbers @@
line-from-either-file
line-from-either-file...

If a hunk contains just one line, only its start line number appears.
Otherwise its line numbers look like ‘start,count’. An empty hunk is
considered to start at the line that follows the hunk.

If a hunk and its context contain two or more lines, its line numbers
look like ‘start,count’. Otherwise only its end line number appears.
An empty hunk is considered to end at the line that precedes the hunk.

What do they mean? Could you also give some examples to show what they mean?

In particular, I couldn't tell the differences between the cases in the last two paragraphs. They seem to talk about the same cases but I suspect they don't.

  • What is the difference between the "if" case in the first paragraph and the "otherwise" case in the second?

  • What is the difference between the "otherwise" case in the first paragraph and the "if" case in the second?

Best Answer

I suspect the first paragraph (of the two you highlight) attempts to explain from-file-line-numbers whereas the second one attempts to explain to-file-line-numbers.

I’ll ignore the text, which is obscure, and explain how GNU diff implements unified diffs (addressing the title of your question).

diff -u <(printf "a\nb\nc\n") <(printf "a\n")

produces the following:

--- /proc/self/fd/11    2018-11-08 11:16:09.183611033 +0100
+++ /proc/self/fd/12    2018-11-08 11:16:09.184611029 +0100
@@ -1,3 +1 @@
 a
-b
-c

(I’ll omit the first two lines from subsequent examples since they don’t need much explaining.)

This shows that our two “files” differ, with one set of differences (“hunk”). In a unified patch, each file comparison is introduced by a pair of lines starting with --- (the “from” file) and +++ (the “to” file). Inside each file comparison, each hunk is introduced with a line starting and ending with @@. This line identifies the location of the change in the from and to files. The from location starts with - (which isn’t part of the number which follows), the to location starts with +. Locations are a pair of numbers: the start line, and the length (which is ommitted if it’s 1). So in the above patch, we have a change which transforms the three lines starting at line 1 in the from file to the single line starting at line 1 in the to file.

Hunks can include context, which is the case above. By default, diff includes three lines of context, if available; it will also merge hunks whose context overlaps. If there aren’t three lines of context before and/or after the change, the context is reduced; thus above we only have one line of context before the change, and none after. This context is counted as part of the change given in the hunk, so it contributes to the start line and length.

diff -u0 <(printf "a\nb\nc\n") <(printf "a\n")

illustrates this:

@@ -2,2 +1,0 @@
-b
-c

This is the same change, but with no context: it is therefore reduced to a change transforming the two lines starting at line 2 into no lines starting at line 1.

The simplest locations correspond to patches which change a single line, with no context:

$ diff -u0 <(printf "a\nb\nc\n") <(printf "a\nb\nd\n")
@@ -3 +3 @@
-c
+d

With context, this would be

@@ -1,3 +1,3 @@
 a
 b
-c
+d

(The usefulness of context is to allow patches to remain useful with “from” files which don’t quite match the original. patch will apply “fuzzy” patches where the line numbers don’t quite match up, if it finds the context within a certain distance of the original location.)

Related Question