How to diff files ignoring comments (lines starting with #)

diff()regular expression

I've two configuration files, the original from the package manager and a customized one modified by myself. I've added some comments to describe behavior.

How can I run diff on the configuration files, skipping the comments? A commented line is defined by:

  • optional leading whitespace (tabs and spaces)
  • hash sign (#)
  • anything other character

The (simplest) regular expression skipping the first requirement would be #.*. I tried the --ignore-matching-lines=RE (-I RE) option of GNU diff 3.0, but I couldn't get it working with that RE. I also tried .*#.* and .*\#.* without luck. Literally putting the line (Port 631) as RE does not match anything, neither does it help to put the RE between slashes.

As suggested in “diff” tool's flavor of regex seems lacking?, I tried grep -G:

grep -G '#.*' file

This seems to match the comments, but it does not work for diff -I '#.*' file1 file2.

So, how should this option be used? How can I make diff skip certain lines (in my case, comments)? Please do not suggest greping the file and comparing the temporary files.

Best Answer

According to Gilles, the -I option only ignores a line if nothing else inside that set matches except for the match of -I. I didn't fully get it until I tested it.

The Test

Three files are involved in my test:
File test1:

    text

File test2:

    text
    #comment

File test3:

    changed text
    #comment

The commands:

$ # comparing files with comment-only changes
$ diff -u -I '#.*' test{1,2}
$ # comparing files with both comment and regular changes
$ diff -u -I '#.*' test{2,3}
--- test2       2011-07-20 16:38:59.717701430 +0200
+++ test3       2011-07-20 16:39:10.187701435 +0200
@@ -1,2 +1,2 @@
-text
+changed text
 #comment

The alternative way

Since there is no answer so far explaining how to use the -I option correctly, I'll provide an alternative which works in bash shells:

diff -u -B <(grep -vE '^\s*(#|$)' test1)  <(grep -vE '^\s*(#|$)' test2)
  • diff -u - unified diff
    • -B - ignore blank lines
  • <(command) - a bash feature called process substitution which opens a file descriptor for the command, this removes the need for a temporary file
  • grep - command for printing lines (not) matching a pattern
    • -v - show non-matching lines
    • E - use extended regular expressions
    • '^\s*(#|$)' - a regular expression matching comments and empty lines
      • ^ - match the beginning of a line
      • \s* - match whitespace (tabs and spaces) if any
      • (#|$) match a hash mark, or alternatively, the end of a line