Answer
If you are not concerned about speed (this is a one time task), then maybe you could try this:
cat map.txt | while read line; do
neww=${line##* };
oldw=${line%% *};
find /some/folder -type f -exec sed -i "s/$oldw/$neww/g" {} \;
done
Not optimal, I know... :-P
PS: check in a test folder to see if it works!
Explanation
Basically:
- Cat file map.txt.
- Read each line and get the word to be replaced
$oldw
and the replacement $neww
.
- For each pair, execute the find command you were already using (notice the double quotes this time in order to allow variable substitution).
About parameter expansion
In order to set the variables $oldw
and $neww
we have to get the first and last word of each line. For doing so, we are using parameter expansion (pure Bash implementation), although we could have used other ways to get the first and last word of the string (i.e. cut
or awk
).
${line##* }
: from variable line
, remove largest prefix (double #
) pattern, where pattern is any characters (*
) followed by a space (
). So we get the last word in line
.
${line%% *}
: from variable line
, remove largest suffix (double %
) pattern, where pattern is a space (
) followed by any characters (*
). So we get the first word in line
.
Words were separated by a space in this case, but we could have used any separator.
OK, a general solution. The following bash function requires 2k
arguments; each pair consists of a placeholder and a replacement. It's up to you to quote the strings appropriately to pass them into the function. If the number of arguments is odd, an implicit empty argument will be added, which will effectively delete occurrences of the last placeholder.
Neither placeholders nor replacements may contain NUL characters, but you may use standard C \
-escapes such as \0
if you need NUL
s (and consequently you are required to write \\
if you want a \
).
It requires the standard build tools which should be present on a posix-like system (lex and cc).
replaceholder() {
local dir=$(mktemp -d)
( cd "$dir"
{ printf %s\\n "%option 8bit noyywrap nounput" "%%"
printf '"%s" {fputs("%s", yyout);}\n' "${@//\"/\\\"}"
printf %s\\n "%%" "int main(int argc, char** argv) { return yylex(); }"
} | lex && cc lex.yy.c
) && "$dir"/a.out
rm -fR "$dir"
}
We assume that \
is already escaped if necessary in the arguments
but we need to escape double quotes, if present. That's what the
second argument to the second printf does. Since the lex
default action is ECHO
, we don't need to worry about it.
Example run (with timings for the skeptical; it's just a cheap-o commodity laptop):
$ time echo AB | replaceholder A B B A
BA
real 0m0.128s
user 0m0.106s
sys 0m0.042s
$ time printf %s\\n AB{0000..9999} | replaceholder A B B A > /dev/null
real 0m0.118s
user 0m0.117s
sys 0m0.043s
For larger inputs it might be useful to provide an optimization flag to cc
, and for current Posix compatibility, it would be better to use c99
. An even more ambitious implementation might try to cache the generated executables instead of generating them each time, but they're not exactly expensive to generate.
Edit
If you have tcc, you can avoid the hassle of creating a temporary directory, and enjoy the faster compile time which will help on normal sized inputs:
treplaceholder () {
tcc -run <(
{
printf %s\\n "%option 8bit noyywrap nounput" "%%"
printf '"%s" {fputs("%s", yyout);}\n' "${@//\"/\\\"}"
printf %s\\n "%%" "int main(int argc, char** argv) { return yylex(); }"
} | lex -t)
}
$ time printf %s\\n AB{0000..9999} | treplaceholder A B B A > /dev/null
real 0m0.039s
user 0m0.041s
sys 0m0.031s
Best Answer
1. Replacing all occurrences of one string with another in all files in the current directory:
These are for cases where you know that the directory contains only regular files and that you want to process all non-hidden files. If that is not the case, use the approaches in 2.
All
sed
solutions in this answer assume GNUsed
. If using FreeBSD or macOS, replace-i
with-i ''
. Also note that the use of the-i
switch with any version ofsed
has certain filesystem security implications and is inadvisable in any script which you plan to distribute in any way.Non recursive, files in this directory only:
(the
perl
one will fail for file names ending in|
or space)).Recursive, regular files (including hidden ones) in this and all subdirectories
If you are using zsh:
(may fail if the list is too big, see
zargs
to work around).Bash can't check directly for regular files, a loop is needed (braces avoid setting the options globally):
The files are selected when they are actual files (-f) and they are writable (-w).
2. Replace only if the file name matches another string / has a specific extension / is of a certain type etc:
Non-recursive, files in this directory only:
Recursive, regular files in this and all subdirectories
If you are using bash (braces avoid setting the options globally):
If you are using zsh:
The
--
serves to tellsed
that no more flags will be given in the command line. This is useful to protect against file names starting with-
.If a file is of a certain type, for example, executable (see
man find
for more options):zsh
:3. Replace only if the string is found in a certain context
Replace
foo
withbar
only if there is abaz
later on the same line:In
sed
, using\( \)
saves whatever is in the parentheses and you can then access it with\1
. There are many variations of this theme, to learn more about such regular expressions, see here.Replace
foo
withbar
only iffoo
is found on the 3d column (field) of the input file (assuming whitespace-separated fields):(needs
gawk
4.1.0 or newer).For a different field just use
$N
whereN
is the number of the field of interest. For a different field separator (:
in this example) use:Another solution using
perl
:NOTE: both the
awk
andperl
solutions will affect spacing in the file (remove the leading and trailing blanks, and convert sequences of blanks to one space character in those lines that match). For a different field, use$F[N-1]
whereN
is the field number you want and for a different field separator use (the$"=":"
sets the output field separator to:
):Replace
foo
withbar
only on the 4th line:4. Multiple replace operations: replace with different strings
You can combine
sed
commands:Be aware that order matters (
sed 's/foo/bar/g; s/bar/baz/g'
will substitutefoo
withbaz
).or Perl commands
If you have a large number of patterns, it is easier to save your patterns and their replacements in a
sed
script file:Or, if you have too many pattern pairs for the above to be feasible, you can read pattern pairs from a file (two space separated patterns, $pattern and $replacement, per line):
That will be quite slow for long lists of patterns and large data files so you might want to read the patterns and create a
sed
script from them instead. The following assumes a <<!>space<!>> delimiter separates a list of MATCH<<!>space<!>>REPLACE pairs occurring one-per-line in the filepatterns.txt
:The above format is largely arbitrary and, for example, doesn't allow for a <<!>space<!>> in either of MATCH or REPLACE. The method is very general though: basically, if you can create an output stream which looks like a
sed
script, then you can source that stream as ased
script by specifyingsed
's script file as-
stdin.You can combine and concatenate multiple scripts in similar fashion:
A POSIX
sed
will concatenate all scripts into one in the order they appear on the command-line. None of these need end in a\n
ewline.grep
can work the same way:When working with fixed-strings as patterns, it is good practice to escape regular expression metacharacters. You can do this rather easily:
5. Multiple replace operations: replace multiple patterns with the same string
Replace any of
foo
,bar
orbaz
withfoobar
or