I have a table that contains data (blobs). Data is often read, rarely inserted, and never modified.
For all of this data, I compute hashes that I also store in the DB. These hashes are only checked when inserting new data, therefore, they are hardly used in normal, day-to-day routine.
My question is, is it worth the trouble to make a separate table for the hashes, is it reasonable to expect any performance gain for putting them aside? Or can I confidently just add an extra column to the table for the hashes, even though the hashes are rarely used?
(Although I would be interested in a general answer, I must point out that I use MariaDB and sqlite).
Best Answer
(re: MySQL/MariaDB) It depends.
SELECT *
.SELECT *
; instead, spell out the columns you need. (*
will bite you if you ever add a column to the table.)BINARY(16)
for MD5, etc., rather than aVARCHAR
of twice the size.The hash (whether md5, sha1, etc) is a bigger nuisance. It is very 'random', thereby leading to cache misses when inserting a new hash or looking up by the hash. This issue comes into play when the BTree with the hash is bigger than can be cached in the buffer_pool. To phrase it another way, use of a hash on a billion-row table will slow down all accesses to the speed of the disk. And it will slow down other processes due to cache misses.
InnoDB blocks are 16KB; the underlying OS or drive block size does not factor into this discussion.
In the opposite direction, consider putting frequently updated columns (clicks, Likes, views) in a separate table. (But that's another Q&A.)