Regular Expression – Definition and Explanation

regular expression

I recently got into a friendly argument with Ghoti about what constitutes a regular expression in the comments to my answer to this question. I claimed that the following is a regular expression:

`[Rr]eading[Tt]est[Dd]ata`

Ghoti disagreed, claiming it is a file glob instead. The glob page on wikipedia claims that (emphasis mine):

Globs do not include syntax for the Kleene star which allows multiple
repetitions of the preceding part of the expression; thus they are not
considered regular expressions, which can describe a larger set of
regular languages over any given finite alphabet.

However, there is no citation for this claim, indicating that it is just a particular wikipedia editor's opinion.

The The Single UNIX ® Specification, Version 2, states that a Basic Regular Expression (BRE) can even be a single character:

An ordinary character is a BRE that matches itself: any character in
the supported character set, except for the BRE special characters
listed in BRE Special Characters .

So, what is the definition of a regular expression in the *nix world, and does that definition exclude file globs?

Best Answer

As lk- said, the -name option of find will treat the argument as a glob, not a regular expression.

Whether a string is interpreted as a glob or a regex or just a plain string depends on what is being used to do the interpreting. It's a matter of context. The string in your example, [Rr]eading[Tt]est[Dd]ata can be evaluated in a number of different ways, but what it is depends on how you are using it. Use it as a glob, it's a glob. Use it as a regex, it's a regex. In the case of the question where this originated, the OP described the string as a regex. Therefore we can assume he was planning to interpret it as a regex.

A single character can also be a regex, absolutely. It can also be a string, and it can also be a glob. It could be interpreted as a byte or a tinyint, if you like. It all depends on context.

There are a number of specifications for regular expressions in various forms. BRE and ERE are well documented. PCRE adds scads of functionality. Many regex interpreters will implement, for example, "all of ERE and some of PCRE". Or they'll do ERE minus some feature. If you go by formal specifications, many many tools claim regex-support that turns out to be incorrect or incomplete. Knowing the details lets you adapt your solutions to the collection of functionality available within whatever tool is evaluating your regex.

So ... if you're looking for definitions that "exclude" globs, you're looking at this from the wrong perspective. What it is is determined by how you use it.

Related Question