Mysql – ny danger using binary data over an UTF-8 MySQL connection

MySQLutf-8

My application uses UTF-8 as the default encoding for all fields, and the connection itself; every connection starts with:

SET NAMES utf8;

I need to store a few SHA-1 hashes though, and for efficiency I'd like to store them in their compact, binary form (20 bytes) instead of their hexadecimal form (40 chars).

These fields are declared as BINARY(20).

I've tested it and everything seems to be working fine, but just to be on the safe side, I thought I'd ask if there is anything I need to know, any pitfalls to avoid? Or is it perfectly safe to transmit binary data over an UTF-8 connection?

Best Answer

From my experience storing such data, is it perfectly safe to transmit binary data when the character encoding of the connection is configured as UTF-8. The character encoding is just that: a way of representing the characters in binary form (this is the crux of the issue in Python bytes/str problems). MySQL rightly does not try to 'encode' binary data.

From the fine manual (emphasis added by me):

The BINARY and VARBINARY types are similar to CHAR and VARCHAR, except that they contain binary strings rather than nonbinary strings. That is, they contain byte strings rather than character strings. This means that they have no character set, and sorting and comparison are based on the numeric values of the bytes in the values.

You should be aware that "the connection" is not UTF-8, but rather UTF-8 is the encoding that the application and MySQL have agreed upon in order to represent text characters.