Making grep understand byte escapes

character encodingescape-charactersgrepunicode

I'm trying to match against some UTF-8 characters.
The problem is grep doesn't translate \x byte escapes, so
this fails:

echo -e '\xd8\xaa' | grep -P '\xd8\xaa'

while this succeeds:

echo -e '\xd8\xaa' | grep -P $(printf '\xd8\xaa')

Can grep understand byte escapes directly without using printf? How?

Best Answer

This fails:

$ echo -e '\xd8\xaa' | grep -P '\xd8\xaa' | hexdump

This succeeds:

$ echo -e '\xd8\xaa' | grep -P $'\xd8\xaa' | hexdump
0000000 aad8 000a                              
0000003

Documentation

From man bash:

Words of the form $'string' are treated specially. The word expands to string, with backslash-escaped characters replaced as specified by the ANSI C standard. Backslash escape sequences, if present, are decoded as follows:

          \a     alert (bell)
          \b     backspace
          \e
          \E     an escape character
          \f     form feed
          \n     new line
          \r     carriage return
          \t     horizontal tab
          \v     vertical tab
          \\     backslash
          \'     single quote
          \"     double quote
          \?     question mark
          \nnn   the eight-bit character whose value is the octal value nnn (one to three digits)
          \xHH   the eight-bit character whose value is the hexadecimal value HH (one or two hex digits)
          \uHHHH the Unicode (ISO/IEC 10646) character whose value is the hexadecimal value HHHH (one to four hex digits)
          \UHHHHHHHH
                 the Unicode (ISO/IEC 10646) character whose value is the hexadecimal value HHHHHHHH (one to eight hex digits)
          \cx    a control-x character

The expanded result is single-quoted, as if the dollar sign had not been present.