OK, a general solution. The following bash function requires 2k
arguments; each pair consists of a placeholder and a replacement. It's up to you to quote the strings appropriately to pass them into the function. If the number of arguments is odd, an implicit empty argument will be added, which will effectively delete occurrences of the last placeholder.
Neither placeholders nor replacements may contain NUL characters, but you may use standard C \
-escapes such as \0
if you need NUL
s (and consequently you are required to write \\
if you want a \
).
It requires the standard build tools which should be present on a posix-like system (lex and cc).
replaceholder() {
local dir=$(mktemp -d)
( cd "$dir"
{ printf %s\\n "%option 8bit noyywrap nounput" "%%"
printf '"%s" {fputs("%s", yyout);}\n' "${@//\"/\\\"}"
printf %s\\n "%%" "int main(int argc, char** argv) { return yylex(); }"
} | lex && cc lex.yy.c
) && "$dir"/a.out
rm -fR "$dir"
}
We assume that \
is already escaped if necessary in the arguments
but we need to escape double quotes, if present. That's what the
second argument to the second printf does. Since the lex
default action is ECHO
, we don't need to worry about it.
Example run (with timings for the skeptical; it's just a cheap-o commodity laptop):
$ time echo AB | replaceholder A B B A
BA
real 0m0.128s
user 0m0.106s
sys 0m0.042s
$ time printf %s\\n AB{0000..9999} | replaceholder A B B A > /dev/null
real 0m0.118s
user 0m0.117s
sys 0m0.043s
For larger inputs it might be useful to provide an optimization flag to cc
, and for current Posix compatibility, it would be better to use c99
. An even more ambitious implementation might try to cache the generated executables instead of generating them each time, but they're not exactly expensive to generate.
Edit
If you have tcc, you can avoid the hassle of creating a temporary directory, and enjoy the faster compile time which will help on normal sized inputs:
treplaceholder () {
tcc -run <(
{
printf %s\\n "%option 8bit noyywrap nounput" "%%"
printf '"%s" {fputs("%s", yyout);}\n' "${@//\"/\\\"}"
printf %s\\n "%%" "int main(int argc, char** argv) { return yylex(); }"
} | lex -t)
}
$ time printf %s\\n AB{0000..9999} | treplaceholder A B B A > /dev/null
real 0m0.039s
user 0m0.041s
sys 0m0.031s
Typically, when you get a >
in the next line after hitting, it means that one of your quotes isn't closed yet. I couldn't find that mistake in your regex. But you do not need to surround the path /var/www_data/somepath/
with single quotes. I assume there are no unusual characters in somepath
?
Anyways, I tested your regex with sed. \d\w
look like vim
syntax for me, that's why I translated it to ascii (which always works). Also, inside of []
you do not need to escape .
:
sed -r "s/'([A-Za-z0-9_-.]+)(@domain.com)'/'adsf'/g" test.dat
Indeed you can use sed
or perl
for your task. You don't necessarily need grep
to generate a file list, unless you have GB of data. Then presorting could result in a speed benefit.
To test your regex, you could do the following:
cd /var/www_data/somepath/
sed -r 's|pattern|replace-pattern|g' a_single_file.php
When you're satisfied with the result, just add the -ibak
(--in-place=bak
) argument and run it on all files
find . -type f -name '*.php' -o -name '*.ini' -o name '*.conf' -o -name '*.sh' \
-exec sed -r -ibak 's|pattern|replace-pattern|g' '{}' \;
The original files are being put into <orignalname.php>.bak
.
To answer your last question. For this job, grep
is the tool you want, you could run it on the .bak
files generated by sed above:
grep --recursive --include='*.bak' -E --files-with-matches 'pattern' . > files_fixed.txt
or, simply:
find . -type f -name '*.bak'
Best Answer
While:
Can be used to change the regex delimiter, embedding variable expansions inside
sed
(or any other interpreter) code is a very unwise thing to do in the general case.First, here, the delimiter is not the only character that needs to be escaped. All the regular expression operators need to as well.
But more importantly, and especially with GNU
sed
, that's a command injection vulnerability. If the content of$var
is not under your control, it's just as bad as passing arbitrary data toeval
.Try for instance:
The
uname
command was run, thankfully a harmless one... this time.Non-GNU
sed
implementations can't run arbitrary commands, but can overwrite any file (with thew
command), which is virtually as bad.A more correct way is to escape the problematic characters in
$var
first:Another approach is to use
perl
:Note that we're not expanding the content of
$var
inside the code passed toperl
(which would be another command injection vulnerability), but are lettingperl
expand its content as part of its regexp processing (and within\Q...\E
which means regexp operators are not treated specially).If
$var
contains newline characters, that may only match if there's only one at the end. Alternatively, one may pass the-0777
option so the input be processed as a single record instead of line-by-line.