I encountered a problem with variable substitution in the BASH shell.
Say you define a variable a
. Then the command
$> echo ${a//[0-4]/}
prints its value with all the numbers ranged between 0 and 4 removed:
$> a="Hello1265-3World"
$> echo ${a//[0-4]/}
Hello65-World
This seems to work just fine, but let's take a look at the next example:
$> b="你1265-3好"
$> echo ${b//[0-4]/}
你1265-3好
Substitution did not take place: I assume that is because b
contains CJK characters. This issue extends to all cases in which square brackets are involved. Surprisingly enough, variable substitution without square brackets works fine in both cases:
$> a="Hello1265-3World"
$> echo ${a//2/}
Hello165-3World
$> b="你1265-3好"
$> echo ${b//2/}
你165-3好
Is it a bug or am I missing something?
I use Lubuntu 12.04, terminal is lxterminal
and echo $BASH_VERSION
returns 4.2.24(1)-release.
EDIT: Andrew Johnson in his comment stated that with gnome-terminal
4.2.37(1)-release the command works fine. I wonder whether it is a problem of lxterminal
or of its specific 4.2.24(1)-release version.
EDIT: I tried it with gnome-terminal
on Lubuntu 12.04 but the problem is still there…
Best Answer
Short answer:
set LC_ALL=C for the behaviour you expect
Long answer:
The behaviour you expect relies on collation ordering which is locale/OS implementation dependent. The POSIX standard leaves it specifically undefined except for the C locale. (Bash calls an external library for this and, at a guess, it looks like that falls back to ASCII ordering if only ASCII characters are present).
Later versions of bash have a shell option that lets you specify something like you expect.
See:
https://groups.google.com/forum/#!topic/gnu.bash.bug/S6cN9KI4vK4/discussion
for more background.