How to quote special characters (portably)
The following snippet adds a backslash before each character that's special in extended regular expressions, using sed
to replace any occurence of one of the characters ][()\.^$?*+
by a backslash followed by that character:
raw_string='test[string]\.wibble'
quoted_string=$(printf %s "$raw_string" | sed 's/[][()\.^$?*+]/\\&/g')
This will remove trailing newlines in $raw_string
; if that's a problem, ensure that the string doesn't end with a newline by adding an inert character at the end, then strip off that character.
quoted_string=$(printf %sa "$raw_string" | sed 's/[][()\.^$?*+]/\\&/g')
quoted_string=${quoted_string%?}
How to quote special characters (in bash or zsh)
Bash and zsh have a pattern replacement feature, which can be faster if the string is not very long. It's cumbersome here because the replacement must be a string, so each character needs to be replaced separately. Note that you must escape the backslashes first.
quoted_string=${raw_string//\\//\\\\}
for c in \[ \] \( \) \. \^ \$ \? \* \+; do
quoted_string=${quoted_string//"$c"/"\\$c"}
done
How to quote special characters (in ksh93)
Ksh's string replacement construct is more powerful than the watered-down version in bash and zsh. It supports references to groups in the pattern.
quoted_string=${raw_string//@([][()\.^$?*+])/\\\1}
What you actually want
You don't need find
here: shell patterns are sufficient to match files ending with three digits. If no part file exists, the glob pattern is left unexpanded. There's also a simpler way of adding the file sizes: rather than use stat
(which exists on many unix variants but has a different syntax on each) and do complex pipelining to sum the values, you can call wc -c
(on regular files, on most systems, wc
will look at the file size and not bother to open the file and read the bytes).
set -- "$DESTINATION/$FILE_BASENAME".[0-9][0-9][0-9]
case $1 in
*\]) # The glob was left intact, so no part exists
do_split …;;
*) # The glob was expanded, so at least one part exists
FILE_SIZE_EXISTING=$(wc -c "$@" | sed -n '$s/[^0-9]//gp')
if [ "$FILE_SIZE_EXISTING" -ne "$(wc -c <"$DESTINATION/$FILE_BASENAME")" ]; then
do_split …
fi
Note that your test on the total size is not very reliable: if the file has changed but remained the same size, you'll end up with stale parts. That's ok if the files never change and the only risk is that parts may be truncated or missing.
-e
is strictly the flag for indicating the pattern you want to match against. -E
controls whether you need to escape certain special characters.
man grep
explains -E
it a bit more:
Basic vs Extended Regular Expressions
In basic regular expressions the meta-characters ?, +, {, |, (, and ) lose their special meaning; instead use the backslashed versions \?, \+, \{, \|, \(, and \).
Traditional egrep did not support the { meta-character, and some egrep implementations support \{ instead, so portable scripts should avoid { in grep -E patterns and should use [{] to match a
literal {.
GNU grep -E attempts to support traditional usage by assuming that { is not special if it would be the start of an invalid interval specification. For example, the command grep -E '{1' searches for
the two-character string {1 instead of reporting a syntax error in the regular expression. POSIX.2 allows this behavior as an extension, but portable scripts should avoid it.
Best Answer
Be aware that matching email addresses is a LOT harder that what you have. See an excerpt from the Mastering Regular Expressions book
However, to answer your question, for a basic regular expression, your quantifiers need to be one of
*
,\+
or\{m,n\}
(with the backslashes)You need to quote the pattern variable