Shell – A posix-compliant function for replace text with parameters and regex

posixregular expressionreplaceshell-script

I'm making a function with a replacement of strings using regular expressions in a secure way, without possibility to inject characters, and without renouncing the use of regular expressions:

#! /bin/sh

stringer()
{
    pattern="${1}"
    replace="${2}"

    printf '%s\n' "examp/e w\\th sed: " | sed "s/${pattern}/${replace}/g"
}

stringer "\\/" "l"

So far so good, but if I use:

stringer "/" "l"

it would result in a sed error, about this I know that the input parameters can be escaped, but then it wouldn't can be used with regular expressions, and I want to be able to use it with regex, any suggestions with or without sed but without extensions for posix-compliant way?

Best Answer

Escaping the / only is very difficult to do with sed because for instance it would have to be escaped in:

Foo/bar
Foo[XY]/
Foo\[/x\]
Foo\\/bar

But not in

Foo [/x]bar
Foo [^]/x]bar
Foo [x[:blank:]/y]
Foo\/bar

It may be easier to use awk instead

repl() {
  PATTERN=$1 REPL=$2 awk '
    {gsub(ENVIRON["PATTERN"], ENVIRON["REPL"]); print}'
}

However note that awk's regexps are extended regular expressions (as opposed to the basic ones in sed), and while it understand & in the replacement part to mean the matched portion, it does not support sed's \1. Except with busybox awk, it doesn't support back references in the pattern either.

Here you could stick with your approach but document the fact that / needs to be escaped. You'll need to document which are the regexp operators anyway (as the user may need to escape them), that newline can't be matched, and that newline must be escaped in the replacement and the special behaviour of & and backslash there.

Related Solutions

Command line tool for easy multiline regex search and replace

I'm not sure why Perl isn't acceptable here. On the inputs you provided, this line gives the output you asked for:

perl -0777p -e 's/.* > (.*) joined the channel\.\n(((?!.* \1 (was kicked from channel\.|was banned from channel\.)\n).*\n)+?.*\1 disconnected)/\2/mg' irc.txt

The -e argument is exactly the first argument to your magicregextool except that I added the /mg regex modifier. This may not be "unmodified" but it doesn't seem unreasonable either. If you don't want to type in the whole line, how about this script as magicregextool:

#!/usr/bin/perl -0777p
BEGIN { $::arg = shift @ARGV; }
eval $arg;

Or even:

#!/bin/sh
perl -0777pe $*

Then you just type:

magicregextool 's/.* > (.*) joined the channel\.\n(((?!.* \1 (was kicked from channel\.|was banned from channel\.)\n).*\n)+?.*\1 disconnected)/\2/mg' irc.txt

Which is the same as your sample (again other than adding the /mg modifier).

An additional benefit to this is that if you are running multiple related search/replace operations on each file, you can put them together in the same script:

#!/usr/bin/perl -0777p
s/.* > (.*) joined the channel\.\n(((?!.* \1 (was kicked from channel\.|was banned from channel\.)\n).*\n)+?.*\1 disconnected)/\2/mg;
s/(some other\n)matched text/\1/mg;

How to use sed to replace two instances of the same digits separated by a slash with one instance of those digits

The version of OSX's sed is quite annoying (it's actually the BSD's version). I usually install GNU's sed via brew:

$ brew search sed
==> Formulae
gnu-sed ✔             libxdg-basedir        minised               ssed

==> Casks
eclipse-dsl                                  marsedit
exoduseden                                   microsoft-bing-ads-editor
focused                                      osxfuse-dev
google-adwords-editor                        physicseditor
lego-mindstorms-education-ev3                prefs-editor
licensed                                     subclassed-mnemosyne

Install it:

$ brew install gnu-sed

You can then use it like so:

$ gsed ....

And voila, your example now works:

$ echo 'text (1984/1984) text' | sed -E 's_\(([0-9]{4})/\1\)_\(\1\)_g'
text (1984/1984) text
$ echo 'text (1984/1984) text' | gsed -E 's_\(([0-9]{4})/\1\)_\(\1\)_g'
text (1984) text

References

Differences between sed on Mac OSX and other "standard" sed?

Best Answer

Related Solutions

Command line tool for easy multiline regex search and replace

How to use sed to replace two instances of the same digits separated by a slash with one instance of those digits

References

Related Question