How to Search and Replace Strings Not Substrings of Other Strings

regular expressionreplacesedtext processing

I have a list of replacements like so:

search_and -> replace
big_boy -> bb
little_boy -> lb
good_dog -> gd
...

I need to make replacements for the above, but at the same time avoid matching strings that are longer like these:

big_boys
good_little_boy

I tried this:

sed -i -r "s/$(\W){search}(\W)/$\1{replacement}\2/g"

But the above does not work when the string ("good_dog" in this case) occurs at the end of a line like so:

Mary had a 'little_boy', good_little_boy, $big_boy, big_boys and good_dog

Mary had a 'lb', good_little_boy, $bb, big_boys and good_dog

And I doubt the above would work when the string occurs at the start of the line too. Is there a good way to do the search and replacement?

Best Answer

If you're using GNU sed (which bare -i suggests you are), there is a "word boundary" escape \b:

sed -i "s/\b$SEARCH\b/$REPLACE/g"

\b matches exactly on a word boundary: the character to one side is a "word" character, and the character to the other is not. It is a zero-width match, so you don't need to use capturing subgroups to keep the value with \1 and \2. There is also \B, which is exactly the opposite.


If you're not using GNU sed, you can use alternation with the start and end of line in your capturing subpatterns: (\W|^). That will match either a non-word character or the start of a line, and (\W|$) will match either a non-word character or the end of a line. In that case you still use \1 and \2 as you were. Some non-GNU seds do support \b anyway, at least in an extended mode, so it's worth giving that a try regardless.