Mysql – best column type for hashed data in thesql

datatypesMySQL

In my use case I have a table with a column containing hashed (sha256) IDs. I wonder which data type is better considering that It must be unique and I need to search on it.

The sha algorithm produce a binary stream out of the box. I can transform it hexadecimal or base64 before storing it. I wonder if it is convenient or necessary for my needs

Best Answer

If you are starting with hex, use UNHEX() when storing and HEX() when fetching. Use BINARY(32) for the datatype. This occupies a constant 32 bytes.

If you are starting with binary, then be careful when escaping to store/load. Probably best to do a double convert each way.

Be sure to have lots of RAM. Once the index becomes bigger than the buffer_pool, the system will slow down because of the randomness of sha256 (or any other hash or uuid).

If you go with Base64, then use CHAR(43) COLLATE ascii_bin, which will be a constant 43 bytes, and will be case-sensitive.

256 is gross overkill. If you have about 2^85 entries, there is only one chance in 2^85 of having any hash collisions. For a mere MD5, those numbers are 9 trillion.