How to quote special characters (portably)
The following snippet adds a backslash before each character that's special in extended regular expressions, using sed
to replace any occurence of one of the characters ][()\.^$?*+
by a backslash followed by that character:
raw_string='test[string]\.wibble'
quoted_string=$(printf %s "$raw_string" | sed 's/[][()\.^$?*+]/\\&/g')
This will remove trailing newlines in $raw_string
; if that's a problem, ensure that the string doesn't end with a newline by adding an inert character at the end, then strip off that character.
quoted_string=$(printf %sa "$raw_string" | sed 's/[][()\.^$?*+]/\\&/g')
quoted_string=${quoted_string%?}
How to quote special characters (in bash or zsh)
Bash and zsh have a pattern replacement feature, which can be faster if the string is not very long. It's cumbersome here because the replacement must be a string, so each character needs to be replaced separately. Note that you must escape the backslashes first.
quoted_string=${raw_string//\\//\\\\}
for c in \[ \] \( \) \. \^ \$ \? \* \+; do
quoted_string=${quoted_string//"$c"/"\\$c"}
done
How to quote special characters (in ksh93)
Ksh's string replacement construct is more powerful than the watered-down version in bash and zsh. It supports references to groups in the pattern.
quoted_string=${raw_string//@([][()\.^$?*+])/\\\1}
What you actually want
You don't need find
here: shell patterns are sufficient to match files ending with three digits. If no part file exists, the glob pattern is left unexpanded. There's also a simpler way of adding the file sizes: rather than use stat
(which exists on many unix variants but has a different syntax on each) and do complex pipelining to sum the values, you can call wc -c
(on regular files, on most systems, wc
will look at the file size and not bother to open the file and read the bytes).
set -- "$DESTINATION/$FILE_BASENAME".[0-9][0-9][0-9]
case $1 in
*\]) # The glob was left intact, so no part exists
do_split …;;
*) # The glob was expanded, so at least one part exists
FILE_SIZE_EXISTING=$(wc -c "$@" | sed -n '$s/[^0-9]//gp')
if [ "$FILE_SIZE_EXISTING" -ne "$(wc -c <"$DESTINATION/$FILE_BASENAME")" ]; then
do_split …
fi
Note that your test on the total size is not very reliable: if the file has changed but remained the same size, you'll end up with stale parts. That's ok if the files never change and the only risk is that parts may be truncated or missing.
That was a hard one. Assuming you have a file
like this:
$ cat file
word
line with a word and words and wording wordy words.
Where:
- Line 1: is the search pattern that should be held in the hold space and quoted to
`word`
.
- Line 2: is the line to seach and replace globally.
The sed
command:
sed -n '1h; 2{x;G;:l;s/^\([^\n]\+\)\n\(.*[^`]\)\1\([^`]\)/\1\n\2`\1`\3/;tl;p}' file
Explanation:
1h;
save the first line to the hold space (this is wait we want to search for).
- hold space contains:
word
2{...}
applies to the second line.
x;
exchange the pattern space and the hold space.
G;
append the hold space to the pattern space. In the pattern space we have now:
word # I will call this line the "pattern line" from now on
line with a word and words and wording wordy words.
:l;
set a label called l
as point for later.
s///
do the actual search/replace in the pattern space mentioned above:
^\([^\n]\+\)\n
search in the "pattern line" for all characters (from the beginning of the line ^
) which are not a newline [^\n]
(one or more times \+
), until a newline \n
. This is now stored in the back-reference \1
. It contains the "pattern line".
(.*[^`])
search for any character .*
followed by a character, which is not a backtick [^`]
. This is stored in \2
. \2
contains now: line with a word and words and wording wordy
, until the last occurence of word
, because...
\1
is the next search term (the back-reference \1
, word
), hence what the "pattern line" contains.
([^`])
this is followed by another character which is not a backtick; saved to reference \3
. If we don't do this (and the part in \2
from above), we would end of in an endless loop quoting the same word
, again and again -> ````word````
, because s///
would always be successful and tl;
jumps back to :l
(see tl;
further down).
\1\n\2\1
\3
all of the above is replaced by the back-references. The second \1
is the one we should quote (note the first reference is the "pattern line").
tl;
if the s///
was successful (we replaced something) jump to the label called l
and start again until there is nothing more to search and replace. This is the case, when all occurences of word are replaced/quoted.
p;
when all is done, print the altered line (pattern space).
The output:
$ sed -n '1h; 2{x;G;:l;s/^\([^\n]\+\)\n\(.*[^`]\)\1\([^`]\)/\1\n\2`\1`\3/;tl;p}' file
word
line with a `word` and `word`s and `word`ing `word`y `word`s.
Best Answer
This would be
csplit
except that the regex has to be a single line. That also makessed
difficult; I'd go with Perl or Python.You could see if
is good enough for your purposes. (
csplit
requires a POSIX BRE, so it can't use\d
or+
, among others.)