I want to replace only the first k
instances of a word.
How can I do this?
Eg. Say file foo.txt
contains 100 instances occurrences of word 'linux' .
I need to replace first 50 occurrences only.
awksedtext processing
I want to replace only the first k
instances of a word.
How can I do this?
Eg. Say file foo.txt
contains 100 instances occurrences of word 'linux' .
I need to replace first 50 occurrences only.
Best Answer
The first section belows describes using
sed
to change the first k-occurrences on a line. The second section extends this approach to change only the first k-occurrences in a file, regardless of what line they appear on.Line-oriented solution
With standard sed, there is a command to replace the k-th occurrance of a word on a line. If
k
is 3, for example:Or, one can replace all occurrences with:
Neither of these is what you want.
GNU
sed
offers an extension that will change the k-th occurrance and all after that. If k is 3, for example:These can be combined to do what you want. To change the first 3 occurrences:
where
\n
is useful here because we can be sure that it never occurs on a line.Explanation:
We use three
sed
substitution commands:s/\<old\>/\n/g4
This the GNU extension to replace the fourth and all subsequent occurrences of
old
with\n
.The extended regex feature
\<
is used to match the beginning of a word and\>
to match the end of a word. This assures that only complete words are matched. Extended regex requires the-E
option tosed
.s/\<old\>/new/g
Only the first three occurrences of
old
remain and this replaces them all withnew
.s/\n/old/g
The fourth and all remaining occurrences of
old
were replaced with\n
in the first step. This returns them back to their original state.Non-GNU solution
If GNU sed is not available and you want to change the first 3 occurrences of
old
tonew
, then use threes
commands:This works well when
k
is a small number but scales poorly to largek
.Since some non-GNU seds do not support combining commands with semicolons, each command here is introduced with its own
-e
option. It may also be necessary to verify that yoursed
supports the word boundary symbols,\<
and\>
.File-oriented solution
We can tell sed to read the whole file in and then perform the substitutions. For example, to replace the first three occurrences of
old
using a BSD-style sed:The sed commands
H;1h;$!d;x
read the whole file in.Because the above does not use any GNU extension, it should work on BSD (OSX) sed. Note, thought, that this approach requires a
sed
that can handle long lines. GNUsed
should be fine. Those using a non-GNU version ofsed
should test its ability to handle long lines.With a GNU sed, we can further use the
g
trick described above, but with\n
replaced with\x00
, to replace the first three occurrences:This approach scales well as
k
becomes large. This assumes, though, that\x00
is not in your original string. Since it is impossible to put the character\x00
in a bash string, this is usually a safe assumption.