Most shells used in modern UNIX environments are meant to conform to the POSIX sh specification. POSIX sh is derived from the original Korn shell (ksh88), which is in turn derived from the earlier Bourne shell, but POSIX sh only specifies a small subset of even ksh88's functionality. A shell that only implements the minimum requirement is missing many features required for writing all but the most trivial of scripts in a safe and reasonable manner. For example, local variables and arrays are non-standard extras.
Therefore, the first reason is to extend the shell with extra features. Different shells choose to focus on different things. For example, Zsh focuses on advanced interactive features while ksh93 (the current "original" korn shell) focuses on powerful programming features and performance. Even very minimal shells like Dash add at least a few non-standard extras like local variables.
Extra features are rarely widely interoperable, if at all. Most of the ksh88 featureset is fairly well interoperable such as the extended globbing syntax, but with non-standard features, there are no guarantees, and you must really know what you're doing to use them in a portable way.
The second reason is legacy. There are still a lot of proprietary Unixes out there that use ancient non-standard implementations for their /bin/sh. Until recently, Solaris still used Bourne as their defuault and chose to maintain the Heirloom shell rather than upgrade to something modern. These systems usually come with different shells you can switch to, for instance by changing your PATH variable or altering shebangs within individual scripts.
So to summarize. There are multiple shells, often by default:
- For extra features, especially for dealing with non-portable extras.
- To handle legacy scripts which are often unmaintained.
- size / performance. Embedded systems often require small shells like mksh or busybox sh.
- Licensing reasons. AT&T ksh was proprietary software until around 2000 or so. This is largely what gave rise to all the ksh-like clones such as Zsh and Bash.
- Other historical reasons. Though not very popular today, there have been radical attempts at redesigning the language, such as scsh and es. The process substitution feature of many shells originally comes from rc (with a bit different syntax), and brace expansion from csh. Different shells have different combinations of such features available, usually with some subtle or not so subtle differences.
According to the bash
manpage, the LC_COLLATE
environment variable affects character ranges, exactly as per Hauke Laging's answer:
LC_COLLATE
This variable determines the collation order used when sorting the results of pathname expansion, and determines the behavior of
range expressions, equivalence classes, and collating sequences within pathname expansion and pattern matching.
On the other hand, LC_CTYPE
affects character classes:
LC_CTYPE This variable determines the interpretation of characters and the behavior of character classes within pathname expansion and pattern matching.
What this means is that both cases are potentially problematic if you're thinking in a English, left-to-right, Latin alphabet, Arabic-digit context.
If you're really proper, and/or are scripting for a multi-locale environment, it's probably best to make sure you know what your locale variables are when you're matching files, or to be sure that you're coding in a completely generic way.
It's very difficult to foresee some situations though, unless you've studied linguistics.
However, I don't know of a Latin-using locale that changes the order of letters, so [a-z] would work. There are extensions to the Latin alphabet that collate ligatures and diacriticals differently. However, here's a little experiment:
mkdir /tmp/test
cd /tmp/test
export LC_CTYPE=de_DE.UTF-8
export LC_COLLATE=de_DE.UTF-8
touch Grüßen
ls G* # This says ‘Grüßen’
ls *[a-z]en # This says nothing!
ls *[a-zß]en # This says ‘Grüßen’
ls Gr[a-z]*en # This says nothing!
This is interesting: at least for German, neither diacriticals like ü nor ligatures like ß are folded into latin characters. (either that, or I messed up the locale change!)
This may be bad for you, of course, if you're trying to find filenames that start with a letter, use [a-z]*
and apply it to a file that starts with ‘Ä’.
Best Answer
This is not a very good explanation. A token is a sequence of characters that forms a word or punctuation sign. Characters like
<
and|
are part of tokens too. You may call them metacharacters but this is not useful terminology. The basic rules are:()<>&|;
, but not both. For example,foo<@a&>b
consists of the tokensfoo
(ordinary),<
(operator),@a
(ordinary),&>
(operator) andb
.Then there are additional rules about quoting: special characters lose their meaning if they're quotes, with different rules depending on the type of quote. For example,
foo'&&'bar\|qux
is a single token with the character sequencefoo&&bar|qux
.