Regular Expression to match (not x) and y (!x & y)

regular expressionscriptingsearch

I received a desktop day-to-day calendar with puzzles. One such puzzle was deciphering a quote where the letters were substituted with symbols. I used some RegExs to find longer words, then used the returned words to solve smaller words. In the puzzle, white background symbols were vowels (including 'y') and shaded background symbols were consonants.

I'll use random letters below where bolded means consonant, plain means vowel, and italicized means the letter was given in the directions.

BOQQE

The example above was deciphered as "happy" ('e' was already given in the puzzle) by using the RegEx

 egrep -i '^[bcdfghjklmnpqrstvwxz][aiouy][bcdfghjklmnpqrstvwxz]{2}[aiouy]$' words

There were a lot of results, but I feel I could have made the RegEx better by specifying the RegEx logically as

  1. Char 1 is a consonant
  2. Char 2 is a vowel, but not 'e' because that was given in the directions.
  3. Chars 3 & 4 are the same consonant, but are different from char 1.
  4. Char 5 is a vowel, but different from char 2.

Another example would be

O R e W Y D O n

where I used the grep statement

egrep -i '^([aiouy])[bcdfghjklmnpqrstvwxz]e[bcdfghjklmnpqrstvwxz][aiouy][bcdfghjklmnpqrstvwxz]\1n$' words

to logically define the search as

  1. Char 1 is a vowel, and is a captured group because the same char appears later in the word.
  2. Char 2 is a consonant.
  3. Char 3 is 'e', given.
  4. Char 4 is a consonant.
  5. Char 5 is a vowel.
  6. Char 6 is a consonant.
  7. Char 7 is the same vowel as Char 1.
  8. Char 8 is 'n', given.

Fortunately, the grep statement returned one word, "American" (The cipher-text was a movie quote). I would like to have been able to specify in the RegEx that Char 4 is a consonant and not the same as char 2, char 5 is a vowel and not the same as char 1, etc.

Is it possible to ask this kind of pattern matching with RegExs? I'm aware of the (x|y) syntax to state that a character may be 'x' or 'y', but I don't know of the syntax, if it exists, to specify (!x) & y

Best Answer

You can using perl regular expressions that have negative look-aheads.

$ grep -Pi '^([aeiouy])([bcdfghjklmnpqrstvwxz])e(?!\2)([bcdfghjklmnpqrstvwxz])(?!\1)([aeiouy])(?!\2)(?!\3)([bcdfghjklmnpqrstvwxz])\1n$' /usr/share/dict/words
American
american
everymen

Expanded:

$ perl -lnE '
    BEGIN { $vowel = qr/[aeiouy]/i; $consonant = qr/[bcdfghjklmnpqrstvwxz]/i }
    say if /^ ($vowel)                  # vowel
              ($consonant)              # consonant
              e                         # literal
              (?!\2)($consonant)        # different consonant
              (?!\1)($vowel)            # different vowel
              (?!\2)(?!\3)($consonant)  # 3rd different consonant
              \1                        # first vowel again
              n                         # literal
            $/xi
' /usr/share/dict/words
American
american
everymen

The BOQQE example would be

grep -Pi '^([bcdfghjklmnpqrstvwxz])([aiouy])(?!\1)([bcdfghjklmnpqrstvwxz])\3(?!\2)([aiouy])$' /usr/share/dict/words

which returns 779 results (444 case sensitive) with my dictionary.

Related Question