MySQL – BLObs Hitting Performance

blobMySQLmysql-5.6

One of my tables contains seven BLObs. Six of these refer to raw data and are typically 1.0 – 1.5Mb each. The seventh is the result of processing this data and can be nearly equal in size to the sum of the other BLObs but may be significant less if some of the data has been rejected for various reasons. The raw data was originally kept in a directory structure but there were some issues with logical naming of directories and also with someone changing some directory names. Therefore keeping all the data in the database would appear the correct way to do it. The raw data is needed in case a different algorithm is developed and it needs to be reprocessed. The problem is that as information is added the performance is degrading. If a copy is put on another server and the six raw data BLObs are deleted performance improves significantly. Research on the web suggests that the raw data is read even if it is not used in the query. Creating two tables, one for the raw data and the other for the result of processing, seems to be a solution but this would mean repeating some of the meta data which seems bad design. Is there a better way of tacking this?

Best Answer

Do not use ENGINE=MyISAM, that will lead to blobs being costly for any operation on the table. Instead, use InnoDB.

"Vertical partitioning" is generally not needed, because of the way InnoDB effectively does it for you....

With InnoDB, there are 4 ROW_FORMATs; let's try to pick the optimal one. In all cases, some, or all, of each blob is stored away from the row itself. More in a minute.

But first, let's check the queries. Do not use SELECT *; always specify only the necessary columns. This avoids reaching into other blocks to get the blobs.

The ROW_FORMATs are redundant, compact, dynamic, and compressed. For your case, I recommend either DYNAMIC or COMPRESSED. These store none of the blob in the record itself; instead they have a 20-byte pointer to where the blob is stored. Only if you need to touch the blob does it run off to that location.

COMPRESSED is the same as DYNAMIC, except for a complicated block-level compression technique. Most benchmarks I see say you get about 2x compression. This is not as good as you might get by compressing the blobs in the client. For example, most 'text' compresses 3x. But most image formats are already compressed, so compression is a waste.

So, I leave it to you to decide between dynamic and compressed. If in doubt, test with your data and your queries.

You should have innodb_file_per_table=1 when you CREATE the table. You may need BARRACUDA and you do need at least version 5.6; 5.7 would be better.

Two tables, one for raw, one for combined? Maybe. Given the benefits described above, there may not be any benefit in having two table. However, if there is a lot of activity in one column, it might be better to move it to a separate table (plus 1:1 id, etc). (This is especially important for things like "like/unlike" counters.)

Since you are working with bulky things, be careful that you do not have long-running transactions. Such can lead to timeouts, etc. Still, use BEGIN...COMMIT where appropriate.