Freebsd – BSD sed: Replace only the Nth occurrence of a pattern

bsdfreebsdregular expressionsedtext processing

Using BSD `sed`;

How can I perform the following substitution?:

Before:

hello hello hello
hello hello hello

After:

hello world hello
hello hello hello

In other words; how can I replace only the Nth occurence of a pattern?
(Or in this case; the 2nd occurrence of a pattern?)

Best Answer

With any POSIX sed:

$ sed -e'/hello/{' -e:1 -e'$!N;s/hello/world/2;t2' -eb1 -e\} -e:2 -en\;b2 <file
hello world hello
hello hello hello

After the first match /hello/, we run into a loop.
Inside loop :1, we read each Next line to the pattern space, doing substitute command for 2nd occurrence only. We test if the substitution success or not. If yes, we run into loop :2, else repeat the loop with b1.
Inside loop :2, we just print remain lines till the end of file.

Note that this approach will store all things between two hello in pattern space. It will be a problem with huge files, when the first and the second are far from each other.

Related Solutions

Text Processing – Print Line After nth Occurrence of a Match

awk -v n=3 '/<Car>/ && !--n {getline; print; exit}'

Or:

awk '/<Car>/ && ++n == 3 {getline; print; exit}'

To pass the search pattern as a variable:

var='<car>'
PATTERN="$var" awk -v n=3 '
  $0 ~ ENVIRON["PATTERN"] && ++n == 3 {getline; print; exit}'

Here using ENVIRON instead of -v as -v expands backslash-escape sequences and backslashes are often found in regular expressions (so would need to be doubled with -v).

GNU awk 4.2 or above lets you assign variables as strong typed regexps. As long as its POSIX mode is not enabled (for instance via the $POSIXLY_CORRECT environment variable, you can do:

# GNU awk 4.2 or above only, when not in POSIX mode
gawk -v n=3 -v pattern="@/$var/" '
  $0 ~ pattern && ++n == 3 {getline; print; exit}'

How to keep a part of the pattern matched and use it to replace on BSD sed

sed 's,\([a-z]\)1\.gif$,\1.gif,g'

or, if you want to allow any non-digit before the 1

sed 's,\([^0-9]\)1\.gif$,\1.gif,g'

The backslash-parenthesis construct delimits a capture group, which the FreeBSD man page calls a “bracket expression” (despite the use of parentheses — square brackets mean something else). Note that sed uses basic regular expressions (BRE), not extended regular expressions (ERE); the man page describes ERE, and the last paragraph explains the difference between BRE syntax and ERE syntax. I find the POSIX specification more readable than the BSD man page here; it calls capture groups back-reference expressions. The GNU sed manual is more readable than either; just avoid the features described as GNU extensions.

Given a capture group (a.k.a. back-reference expression), you can use backslash+digit in the replacement text to mean “the text matched by the corresponding capture group”. For example, \1 in the replacement text is replaced by the text matched by the first capture group in the regular expression. Here there's a single capture group, which captures the letter before 1.gif.

I changed 1.gif to 1\.gif to match the dot literally, and added a trailing $ to match only at the end of the line.

To give another example of capture groups, if you wanted to operate on arbitrary extensions, you could use something like

sed 's,\([^0-9]\)1\(\.[^./]*\)$,\1\2,g'

Using BSD sed;

Best Answer

Related Solutions

Text Processing – Print Line After nth Occurrence of a Match

How to keep a part of the pattern matched and use it to replace on BSD sed

Related Question

Using BSD `sed`;