Shell – Why zsh and ksh93 chose to be non-compliant in pattern matching

kshshellzsh

POSIX documentation for pattern matching said that:

An ordinary character is a pattern that shall match itself. It can be
any character in the supported character set except for NUL, those
special shell characters in Quoting that require quoting, and the
following three special pattern characters. Matching shall be based on
the bit pattern used for encoding the character, not on the graphic
representation of the character. If any character (ordinary, shell
special, or pattern special) is quoted, that pattern shall match the
character itself
. The shell special characters always require quoting.

As I understand, the pattern ["!"a] will match any of ! and a. That's also the behavior in most shells I tried, except zsh and ksh93:

$ for shell in /bin/*[^c]sh; do
  printf '=%-17s=\n' "$shell"
  "$shell" -c 'case a in ["!"a]) echo 1;; esac'
done
=/bin/ash         =
1
=/bin/bash        =
1
=/bin/dash        =
1
=/bin/heirloom-sh =
1
=/bin/ksh         =
=/bin/lksh        =
1
=/bin/mksh        =
1
=/bin/pdksh       =
1
=/bin/posh        =
1
=/bin/schily-osh  =
1
=/bin/schily-sh   =
1
=/bin/yash        =
1
=/bin/zsh         =

zsh and ksh93 seem to treat ["!"a] the same as [!a], which match any character except a:

$ for shell in ksh93 zsh; do
  printf '=%-6s=\n' "$shell"
  "$shell" -c 'case b in ["!"a]) echo 1;; esac'
done
=ksh93 =
1
=zsh   =
1

Is there any reason (historical, development, …) for zsh and ksh93 behave like that?


zsh does the same thing in both ksh and sh emulation.

busybox sh, Solaris /usr/xpg4/bin/sh and FreeBSD sh also behave like POSIX documentation.


ksh88 also behave like most other shells, the behavior changed between kssh88 and ksh93:

$ ksh88 -c 'case a in ["!a"]) echo yes; esac'
yes
$ ksh88 -c 'case b in ["a-c"]) echo yes; esac' 
$

Best Answer

The passage you quote does not mean what you say it means.

Patterns Matching a Single Character

(…) An ordinary character is a pattern that shall match itself. (…) If any character (ordinary, shell special, or pattern special) is quoted, that pattern shall match the character itself.

All of this applies only to characters that stand for themselves in a pattern. This does not apply to characters that appear in a context other than that where a pattern character is expected. In particular, it does not apply inside a bracket expression. The syntax of bracket expressions is described under the entry for [:

If an open bracket introduces a bracket expression as in XBD RE Bracket Expression, (…)

(I omitted the bit about ! vs ^ for complementation.) The description of RE bracket expressions doesn't say anything about quoting (which is unsurprising since it's about bracket expressions in general, not about bracket expressions in a pattern in a shell script).

Going by a strict interpretation of POSIX.1-2008, it isn't clear what the pattern ["!"a] should match. One interpretation is that it should match any of the characters ", ! or a: the character " has no special meaning inside a bracket expression. I can't find anything in the specification that would invalidate this interpretation. Another interpretation is that " retains its quoting behavior, but that means that the content of the bracket expression is !a, and since there is no particular treatment of quoted characters inside bracket expressions, the set is all-but-a. I can't find any support for your interpretation (and the behavior of dash, bash and other shells) in the POSIX specification. It makes sense, sure, but it isn't what the text says.

It would make sense for a future version of POSIX to mandate your interpretation, by adding some wording to this effect. For example, the description of [ could be changed to

If an open bracket introduces a bracket expression as in XBD RE Bracket Expression, except that the \ character ('!') shall replace the \ character ('^') in its role in a non-matching list in the regular expression notation, it shall introduce a pattern bracket expression, and that any character that is quoted shall stand for itself as an element of the enclosing bracket expression, collating element or class expression. A bracket expression starting with an unquoted \ character produces unspecified results. Otherwise, '[' shall match the character itself.

Given that POSIX is mostly descriptive rather than normative, I'd expect such a change that breaks ksh (usually the reference shell) to only be included in a major update of the standard, and any defect on the existing version to instead allow at least the existing different interpretations.