Find Command – Why Hyphen Symbol Not Discoverable in Regular Expression

findregular expression

To identify files with the hyphen symbol - in file names such as test-19.1.txt, the find command combined with a regular expression does not appear to match.

The command find . -maxdepth 1 -regextype posix-egrep -regex '.*/[a-z0-9\-\.]+\.txt' -exec echo {} \; is run in a bash shell and no such file is discovered. If the hyphen is removed from the filename, the regular expression matches.

The same regular expression when tested with regexr.com is successful.

Best Answer

To include a hyphen in a character class it must be at the first or last position

From find manual "the type of regular expression used by find and locate is almost identical to that used in GNU Emacs" and from Emacs manual:

  • [ ... ]
    • To include a ‘-’, write ‘-’ as the first or last character of the set, or put it after a range. Thus, ‘[]-]’ matches both ‘]’ and ‘-’.

So your regex should be '.*/[a-z0-9.-]+\.txt'

In POSIX BRE & ERE the same rule applies

The <hyphen-minus> character shall be treated as itself if it occurs first (after an initial '^', if any) or last in the list, or as an ending range point in a range expression. As examples, the expressions "[-ac]" and "[ac-]" are equivalent and match any of the characters 'a', 'c', or '-'; "[^-ac]" and "[^ac-]" are equivalent and match any characters except 'a', 'c', or '-'; the expression "[%--]" matches any of the characters between '%' and '-' inclusive; the expression "[--@]" matches any of the characters between '-' and '@' inclusive; and the expression "[a--@]" is either invalid or equivalent to '@', because the letter 'a' follows the symbol '-' in the POSIX locale. To use a <hyphen-minus> as the starting range point, it shall either come first in the bracket expression or be specified as a collating symbol; for example, "[][.-.]-0]", which matches either a <right-square-bracket> or any character or collating element that collates between <hyphen-minus> and 0, inclusive.

If a bracket expression specifies both '-' and ']', the ']' shall be placed first (after the '^', if any) and the '-' last within the bracket expression.

Regular Expressions

In fact most regex variants has the same rule for matching hyphen

The hyphen can be included right after the opening bracket, or right before the closing bracket, or right after the negating caret. Both [-x] and [x-] match an x or a hyphen. [^-x] and [^x-] match any character that is not an x or a hyphen. This works in all flavors discussed in this tutorial. Hyphens at other positions in character classes where they can’t form a range may be interpreted as literals or as errors. Regex flavors are quite inconsistent about this.

Character Classes or Character Sets

Related Question