Sed command that would ignore any commented match

regular expressionsedtext processing

I'm trying to create a sed command using regex in order to substitute something in a text file only if it is not commented but I'm running into some troubles due to my almost non-existent knowledge of sed's commands.

I found solutions for small parts of the problem but some aren't complete enough or I just cannot put them together. TL;DR version available at the end.

Let's first determinate my ultimate goal

I'd like to match anything (like any regular regex (hehe)) in a text file only if it is NOT commented. As I'd like to do it for multiple languages, let's just take the common C comments.

So, in this case, words or lines can be commented different ways. We have the // to comment only what's next on the line and we also have the /* */ comment block.


Environment

I'm currently working on Mac OSX which only supports POSIX sed but I installed a GNU-sed which I find better. (Thanks to Homebrew. The package is gnu-sed and the command is gsed.) So, both of them are available to me if you prefer using one or another.

I'm writing this assuming a GNU-sed is used.


Ignoring a case

First problem, how to ignore some cases. I found that quite easily in this topic.

Now, the // part seems easy for me to do and I would just have to add an OR ( | ) condition to join it with the other condition.

It would look something like this:

    sed -E "/\/\/.*/! s/foo/bar/" file

Then, if the input file is:

foo
42
test
//foo
//42
//    foo
//something foo
foo
42
something foo
  foo

The output is:

bar
42
test
//foo
//42
//    foo
//something foo
bar
42
something bar
  bar

So now, I'm just going to focus my reflexion on the /* */ comment block only.


Matching through multiple lines

Second problem, how to to make the regex match through multiple lines. Well, I think this is the major problem. I found this topic talking about how to match through only one new line character. Well, it took me a moment to understand how it works. But this part of solution brings me a new problem and new questions.

It can obviously ignore only one new line ( \n ). So I now want to do the same but for an unknown number of lines (from 0 to infinite ( * )). I bet I have to loop through the lines but this is where I'm stuck because I know nothing about sed's commands and it's really awkward to me.

During my searches, I found a little script having the purpose of replacing the tail command and it uses a loop (I guess) but I fail at understanding its functioning.

Make it so it matches only before the */ part

The third part would be to make sure the ignored case only matches things before the end of the comment block ( */ ). So, in the end, the ignore case would only match things between /* and */. The final command would then completely ignore things written inside a commentary block.

I made no real search on this part as I didn't solve the previous point and it appears to me that this */ problem depends on the /* previous problem.


Final part: Putting all this together

Well, it is obvious I completely failed at this at the moment.


TL;DR

My question is: What would be the sed command in order to substitute anything we want in a text file only if it is not commented ?


Appendix

If you know an easier way to do it, using any other language, it's also very welcome. So, if you know how to do it with awk, python or anything else, feel free to share it.

Best Answer

You should not believe them if they tell you it cannot be done. You should believe them, however, if they tell you it's not easy.

sed '\|*/|!{ s|/\*|\n&|              #if ! */ repl 1st /* w/ \n/*
     h;      s|foo|bar|g;/\n/!b      #hold; repl all foo/bar; if ! \n branch
     G;      s|\n.*\n||;:n           #Get; clear difference; :new label
     n;      \|*/|!bn;s|^|\n/*|      #new line; if ! */ branch new label
     };s|*/|\n&|g                    #repl all */ w/ \n*/
       s|foo|&\nbar|g;:r             #repl all foo w/ foo\nbar
       s|\(/\*[^\n]*\)\nbar|\1|g;tr  #repl all /*[^\n]*\nbar w/ foo
       s|foo\n\(b\)|\1|g             #repl all foo\nbar w/ bar
       s|^\n/.||;s|\n||g             #clear any \n inserts
'    <<\INPUT
asfoo   /* asdfooasdfoo


asdfasdfoo
asdfasdfoo
foo */foo /*foo*/ foo
/*.
foo*/
foo
hello

INPUT

OUTPUT

asbar   /* asdfooasdfoo


asdfasdfoo
asdfasdfoo
foo */bar /*foo*/ bar
/*.
foo*/
bar
hello
Related Question