Mysql – Can/Should rarely used columns be placed in a different table for performance

mariadbMySQLsqlite

I have a table that contains data (blobs). Data is often read, rarely inserted, and never modified.

For all of this data, I compute hashes that I also store in the DB. These hashes are only checked when inserting new data, therefore, they are hardly used in normal, day-to-day routine.

My question is, is it worth the trouble to make a separate table for the hashes, is it reasonable to expect any performance gain for putting them aside? Or can I confidently just add an extra column to the table for the hashes, even though the hashes are rarely used?

(Although I would be interested in a general answer, I must point out that I use MariaDB and sqlite).

Best Answer

(re: MySQL/MariaDB) It depends.

  • For small columns, say under 1KB, don't bother with the hash.
  • For large columns, say over 7KB, it may not matter -- InnoDB puts large columns "off-record" in a block separate from the main part of the record.
  • For medium-sized columns, say 1KB-7KB, the column is likely to be kept "on record", thereby making any action on the row fetch the bulky column, too. This is even if you don't say SELECT *.
  • Regardless of what you do, avoid using SELECT *; instead, spell out the columns you need. (* will bite you if you ever add a column to the table.)
  • Pack the hash into BINARY(16) for MD5, etc., rather than a VARCHAR of twice the size.
  • The time taken to fetch a row is more than unpacking the columns. That is, "how many columns" is not a metric to worry about. (No, I don't have numbers to back that up; just decades of experience.)

The hash (whether md5, sha1, etc) is a bigger nuisance. It is very 'random', thereby leading to cache misses when inserting a new hash or looking up by the hash. This issue comes into play when the BTree with the hash is bigger than can be cached in the buffer_pool. To phrase it another way, use of a hash on a billion-row table will slow down all accesses to the speed of the disk. And it will slow down other processes due to cache misses.

InnoDB blocks are 16KB; the underlying OS or drive block size does not factor into this discussion.

In the opposite direction, consider putting frequently updated columns (clicks, Likes, views) in a separate table. (But that's another Q&A.)