How to Replace the First Occurrence of a Pattern in a File with Slash

sedtext processing

Thanks to this link I know how to pass a variable that contains slashes as a pattern to sed:

sed "s~$var~replace~g" $file. Juste use a single-byte character in place of /.

Thanks to this other link I also know how to replace just the first occurrence of a pattern in a file (not in a line):

sed "0,/$var/s/$var/replacement/" filename
or
sed 0,/$var/{s/$var/replacement/} filename

But if I do:
sed '0,~$var~s~$var~replacement~' filename
(or anything else that begins with 0, then no slash), I've got an error: unknown command: '0'.

How could I combine the two? Maybe by using awk or perl or … ?

Best Answer

While:

sed "0,\~$var~s~$var~replacement~"

Can be used to change the regex delimiter, embedding variable expansions inside sed (or any other interpreter) code is a very unwise thing to do in the general case.

First, here, the delimiter is not the only character that needs to be escaped. All the regular expression operators need to as well.

But more importantly, and especially with GNU sed, that's a command injection vulnerability. If the content of $var is not under your control, it's just as bad as passing arbitrary data to eval.

Try for instance:

$ var='^~s/.*/uname/e;#'
$ echo | sed "0,\~$var~s~$var~replacement~"
Linux

The uname command was run, thankfully a harmless one... this time.

Non-GNU sed implementations can't run arbitrary commands, but can overwrite any file (with the w command), which is virtually as bad.

A more correct way is to escape the problematic characters in $var first:

NL='
'
case $var in
  (*"$NL"*)
    echo >&2 "Sorry, can't handle variables with newline characters"
    exit 1
esac
escaped_var=$(printf '%s\n' "$var" | sed 's:[][\/.^$*]:\\&:g')
# and then:
sed "0,/$escaped_var/s/$escaped_var/replacement/" < file

Another approach is to use perl:

var=$var perl -pe 's/\Q$ENV{var}\E/replacement/g && $n++ unless $n' < file

Note that we're not expanding the content of $var inside the code passed to perl (which would be another command injection vulnerability), but are letting perl expand its content as part of its regexp processing (and within \Q...\E which means regexp operators are not treated specially).

If $var contains newline characters, that may only match if there's only one at the end. Alternatively, one may pass the -0777 option so the input be processed as a single record instead of line-by-line.

Related Solutions

Bash – Replace multiple strings in a single pass

OK, a general solution. The following bash function requires 2k arguments; each pair consists of a placeholder and a replacement. It's up to you to quote the strings appropriately to pass them into the function. If the number of arguments is odd, an implicit empty argument will be added, which will effectively delete occurrences of the last placeholder.

Neither placeholders nor replacements may contain NUL characters, but you may use standard C \-escapes such as \0 if you need NULs (and consequently you are required to write \\ if you want a \).

It requires the standard build tools which should be present on a posix-like system (lex and cc).

replaceholder() {
  local dir=$(mktemp -d)
  ( cd "$dir"
    { printf %s\\n "%option 8bit noyywrap nounput" "%%"
      printf '"%s" {fputs("%s", yyout);}\n' "${@//\"/\\\"}"
      printf %s\\n "%%" "int main(int argc, char** argv) { return yylex(); }"
    } | lex && cc lex.yy.c
  ) && "$dir"/a.out
  rm -fR "$dir"
}

We assume that \ is already escaped if necessary in the arguments but we need to escape double quotes, if present. That's what the second argument to the second printf does. Since the lex default action is ECHO, we don't need to worry about it.

Example run (with timings for the skeptical; it's just a cheap-o commodity laptop):

$ time echo AB | replaceholder A B B A
BA

real    0m0.128s
user    0m0.106s
sys     0m0.042s
$ time printf %s\\n AB{0000..9999} | replaceholder A B B A > /dev/null

real    0m0.118s
user    0m0.117s
sys     0m0.043s

For larger inputs it might be useful to provide an optimization flag to cc, and for current Posix compatibility, it would be better to use c99. An even more ambitious implementation might try to cache the generated executables instead of generating them each time, but they're not exactly expensive to generate.

Edit

If you have tcc, you can avoid the hassle of creating a temporary directory, and enjoy the faster compile time which will help on normal sized inputs:

treplaceholder () { 
  tcc -run <(
  {
    printf %s\\n "%option 8bit noyywrap nounput" "%%"
    printf '"%s" {fputs("%s", yyout);}\n' "${@//\"/\\\"}"
    printf %s\\n "%%" "int main(int argc, char** argv) { return yylex(); }"
  } | lex -t)
}

$ time printf %s\\n AB{0000..9999} | treplaceholder A B B A > /dev/null

real    0m0.039s
user    0m0.041s
sys     0m0.031s

Replacing string in all files found by grep. Can’t get it to work

Typically, when you get a > in the next line after hitting, it means that one of your quotes isn't closed yet. I couldn't find that mistake in your regex. But you do not need to surround the path /var/www_data/somepath/ with single quotes. I assume there are no unusual characters in somepath?

Anyways, I tested your regex with sed. \d\w look like vim syntax for me, that's why I translated it to ascii (which always works). Also, inside of [] you do not need to escape .:

sed -r "s/'([A-Za-z0-9_-.]+)(@domain.com)'/'adsf'/g" test.dat

Indeed you can use sed or perl for your task. You don't necessarily need grep to generate a file list, unless you have GB of data. Then presorting could result in a speed benefit.

To test your regex, you could do the following:

cd /var/www_data/somepath/
sed -r 's|pattern|replace-pattern|g' a_single_file.php

When you're satisfied with the result, just add the -ibak (--in-place=bak) argument and run it on all files

find . -type f -name '*.php' -o -name '*.ini' -o name '*.conf' -o -name '*.sh' \
-exec sed -r -ibak 's|pattern|replace-pattern|g' '{}' \;

The original files are being put into <orignalname.php>.bak.

To answer your last question. For this job, grep is the tool you want, you could run it on the .bak files generated by sed above:

grep --recursive --include='*.bak' -E --files-with-matches 'pattern' . > files_fixed.txt

or, simply:

find . -type f -name '*.bak'

Best Answer

Related Solutions

Bash – Replace multiple strings in a single pass

Replacing string in all files found by grep. Can’t get it to work

Related Question