My application uses UTF-8 as the default encoding for all fields, and the connection itself; every connection starts with:
SET NAMES utf8;
I need to store a few SHA-1 hashes though, and for efficiency I'd like to store them in their compact, binary form (20 bytes) instead of their hexadecimal form (40 chars).
These fields are declared as BINARY(20)
.
I've tested it and everything seems to be working fine, but just to be on the safe side, I thought I'd ask if there is anything I need to know, any pitfalls to avoid? Or is it perfectly safe to transmit binary data over an UTF-8 connection?
Best Answer
From my experience storing such data, is it perfectly safe to transmit binary data when the character encoding of the connection is configured as UTF-8. The character encoding is just that: a way of representing the characters in binary form (this is the crux of the issue in Python bytes/str problems). MySQL rightly does not try to 'encode' binary data.
From the fine manual (emphasis added by me):
You should be aware that "the connection" is not UTF-8, but rather UTF-8 is the encoding that the application and MySQL have agreed upon in order to represent text characters.