Ubuntu – How to replace every nth occurrence of pattern using sed ONLY

awkbashsed

I have searched all over the web but i can't find any good solutions. I have read on many forums that sed is not the tool (but awk is) for this but it makes no sense to me why. I mean you can do a simple substitution for occurrence number 5 for example like: sed 's/pattern/replace/5' to replace only the 5th instance. Why can't sed do 's/pattern/replace/<every 5th occur>' for one line? Isn't that a basic function..

Anyway my problem is that our client's software runs on a windows environment that has limited linux functions and unfortunately awk is not available on the system. Installing new commands is NOT an option for the company for whatever reason.

So my question is, what is the best way to use sed and/or mixed with bash for replacing every nth occurrence of a pattern per line? I only have one line anyway as i already stripped all the new line characters. Is bash for loop the only way?

Best Answer

Isn't that a basic function..

Obviously, it isn't. But it's still possible:

sed 's/a\([^a]*\($\|a[^a]*\($\|a[^a]*\($\|a[^a]*\($\|a\)\)\)\)\)/A\1/g'

More readable with -r where we can drop the backslashes:

sed -r 's/a([^a]*($|a[^a]*($|a[^a]*($|a[^a]*($|a)))))/A\1/g'

i.e. a, followed by any number of non-a, followed by either end of line, or another a, followed by any number of non-a, etc. Everything after the first a is remembered in \1.

BTW, if Perl is available, you can use:

perl -pe 'BEGIN { @A = qw(a A) } s/a/$A[not $i++ % 5]/g'

% is the modulo operator. $i is incremented with each match, negation of the modulo is 1, 0, 0, 0, 0, 1, etc., which used as the index into @A picks the correct a or A.

Related Solutions

Ubuntu – How to edit a range of text between 2 symbols? awk, sed, regex

This awk code could be enough:

awk -F'*' 'NF == 2 {label = $2; next} {$0 = $0 label} 1'

To break it down:

Use * as the field separator. This way, we can simply examine the number of fields (NF) to determine if the beginning or end of a block is reached.
When there are two fields, we save the second field in label and continue to the next line.
From then, we append that label to the current line, and then print. If the label is empty, we are outside a block and there's no effect. If not, we get the required output.

Ubuntu – Why does sed substitution command with the flag p print the modified output twice

sed 's/123/AAA/p' a.txt

This expression contains two commands

s/123/AAA/ means, find lines with 123 and replace 123 with AAA in its first instance on those lines where it occurs. By default, sed prints every line, so, the whole stream is printed with the modifications
p means, print the pattern space. At the point where we call p, the pattern space contains the modified lines, so they are printed again.

The combination of s and p is usually used with -n, when we only want to print the lines of the stream that were found by s and therefore changed.

The order of commands matters. In your command, If you put p first, it will print the unmodified stream, close to what you expected:

$ sed 'p;s/123/AAA/' a.txt
apple
apple
123
AAA
pear
pear
1234
AAA4

Here when we call p, the pattern space is the entire file, because we haven't specified any part of it, and it hasn't been changed. The s command also prints the whole file, but also modifies it, so we see the stream twice, modified and unmodified.¹

So perhaps it helps to think that the commands in a sed expression are applied cumulatively from left to right, and as steeldriver said, the output you get in this case is due to p being applied at the end of the cycle, after s has been used to select and modify part of the file.

¹ Originally, the command I had here was sed 'ps/123/AAA/' a.txt. As helpfully pointed out in a comment by mxmlnkn, this command does not work in GNU sed. I had been using BusyBox sed (in Termux on Android), and didn't test the command properly in GNU sed on Ubuntu (bad!). But it's interesting to know that GNU sed differs from others in requiring a semicolon here.

Best Answer

Related Solutions

Ubuntu – How to edit a range of text between 2 symbols? awk, sed, regex

Ubuntu – Why does sed substitution command with the flag p print the modified output twice

Related Question