How to remove all comments from a file preserving escaped hash chars

text processing

I know that this has been asked before, but this is just a little bit different: I need to remove all comments, excluding escaped # or otherwise not meant as starting a comment (in-between single or double apices)

Starting with the following text:

test
# comment
comment on midline # comment
escaped hash "\# this is an escaped hash"
escaped hash "\\# this is not a comment"
not a comment "# this is not a comment - double apices"
not a comment '# this is not a comment - single apices'
this is a comment \\# this is a comment
this is not a comment \# this is not a comment

I would like to obtain

test
comment on midline
escaped hash "\# this is an escaped hash"
escaped hash "\\# this is not a comment"
not a comment "# this is not a comment - double apices"
not a comment '# this is not a comment - single apices'
this is a comment \\
this is not a comment \# this is not a comment

I tried

grep -o '^[^#]*' file

but this also deletes escaped hashes.

NOTE: text I'm working on does have escaped # (\#) but it lacks double escaped # (\\#), so it does not matter to me if they are kept or not. I guess it's more neat to delete them as as a matter of fact the hash is not escaped.

Best Answer

With sed you could delete the lines that start with a # (preceded by zero or more blanks) and remove all strings starting with # that doesn't follow a single backslash (and only if it's not in-between quotes1):

sed '/^[[:blank:]]*#/d
/["'\''].*#.*["'\'']/!{
s/\\\\#.*/\\\\/
s/\([^\]\)#.*/\1/
}' infile

1: this solution assumes a single pair of quotes on a line

Related Question