MySQL – Where can I find metrics on the performance of Blob vs file system

blobMySQLperformance

I know and understand that there are performance hits in storing blob data in the database, but the blob portion of the data is going to be rarely retrieved/viewed, it is for smaller data (the vast majority under 256k with a max of 10mb), it is not going to be used by most customers, and the total rows is expected to be relatively low, very likely under half a million, if not less. Also some of the data is dynamic and can change for some users, as in it's not static images. In other words we're at the edge of whether or not it's worth it.

I keep reading that it's better to store in the file system but I can't find actual metrics that show the performance difference, just people repeating each other without any concrete proof or metrics. For us it may be worth the performance cost in exchange for being fully ACID as well as guaranteeing that all our backups are completely synched.

That being said does anyone know or have any real world metrics to show the performance difference between storing items as blobs vs in the file system. I'm trying to understand if the performance penalty is worth it or not rather than blindly following the general rule of thumb and after spending at least 2-3 hours and I've yet to be able to see anyone show any actual numbers. It's all just words with nothing concrete.

By the way this is a MySQL InnoDB table. The actual data table has a link to the blob data table, so the blob is not in the main able and is only retrieved when need be to avoid any I/O issues. In other words instead of the path to the data on the filesystem, it's an ID to another table with only blobs. How does that compare in terms of performance? Is it 25% worse? Is it 100%? Is it 200-500%? Is it 1000%?

If the cost is only 100%-200% it is probably worth it for us because again the data is rarely retrieved. So even if we had say 10,000 concurrent users, maybe only 50 users would be retrieving their blob data concurrently at best. Yes the data is specific to each user, it isn't images.

Best Answer

The main cost of handling the data is the I/O. You are doing approximately the same amount of I/O, whether it is in 4KB chunks in the OS (plus directory traversal) or 16KB chunks in InnoDB (plus indirect block lookup).

The filesystem and InnoDB are cached in radically different ways; this may factor into a difference -- depending on how cacheable the blogs are.

You say "rarely retrieved". So why does speed matter?

So, I doubt if it will be more than 25% difference. And I can't predict which will be faster.

As for space, again there are several differences, so it is hard to predict which would be tighter. In any case, the diff can't be more than about 2% for the size blobs you mentioned.

How compressible are the blobs? (Most image formats are already compressed; text is typically compressible 3:1.) If compressible, then do so in the client. (InnoDB's builtin optional compression is easier, but not as good.)

And, yes, having it in a "parallel table" (as you mentioned) is often better.

Another point -- If the blob is an image destined for a web page, it is more efficient to simply have it in a file and say <img srg=file-path>. If it is in a table BLOB, you have to do extra work to hand it off to the web page. Since I/O is the main difference, I might expect the img tag to be 2x faster.