Shell – Replacing Text with List of Replacements Including Backslashes

perlscriptingsedshelltext processing

I have a file A that contains pairs of strings, one per line:

\old1 \new1
\old2 \new2
.....

I would like to iterate over file A, and for each line perform the replacement (e.g. "\old1 -> \new1") globally in some file B. I had no trouble getting it to work without backslashes using sed or perl -pi -e using something like the following:

while read -r line
do
 set -- $line
 sed -i -e s/$1/$2/g target
done < replacements

However, I can't figure out how to make either sed or perl treat the backslashes verbatim in the replacement strings. Is there a clean solution for this?

Best Answer

You'll need to escape all characters that are special in regexps, not just backslashes but also [.*^$ and the s delimiter (for sed). In Perl, use the quotemeta function.

A further issue with your attempt is that when you run set -- $line, the shell performs its own expansion: it performs globbing in addition to word splitting, so if your line contains a* b* and there are files called a1 and a2 in the current directory then you'll be replacing a1 with a2. You need to turn off globbing with set -f in this approach.

Here's a solution that mangles the replacement list directly into a list of sed arguments. It assumes that there is no space character in the source and replacement texts, but anything other than a space and a newline should be treated correctly. The first replacement adds a \ before the characters that need protecting, and the second replacement turns each line from foo bar into -e s/foo/bar/g. Warning, untested.

set -f
sed_args=$(<replacement sed -e 's~[/.*[\\^$]~\\&~g' \
                            -e 's~^\([^ ]*\)  *\([^ ]*\).*~-e s/\1/\2/g~')
sed -i $sed_args target

In Perl, you'll have fewer issues with quoting if you just let Perl read the replacement file directly. Again, untested.

perl -i -pe 'BEGIN {
   open R, "<replacement" or die;
   while (<R>) {
       chomp;
       ($from, $to, @ignored) = split / +/;
       $s{$from} = $to;
   }
   close R;
   $regexp = join("|", map {quotemeta} keys %s);
}
s/($regexp)/$s{$1}/ego'
Related Question