Mysql – use thesql to serve 100’s of millions of small image files

blobmyisamMySQLperformance

I need to serve 300,000,000 small image files totaling 1.5TB. Trying to unpack these files to individual files on the file system is next to impossible (1 MByte/sec throughput rates to create the small files).

For the time I have dumped the files all into large 12GB binary files and have an index that allows me to seek to the right point and read the correct # of bytes.

But that solution was a stop-gap, I need to also update the files daily, generally this means adding new files, and once-in-a-while (monthly/quarterly) do housekeeping and get rid of old ones.

So I wonder if I'll be shooting myself in the foot by trying to create a massive MyISAM table to store these 2-10k blobs?

On a 64 bit linux system there's a 4TB limit on DB file size right? Can I split such a massive table into multiple physical files as my dataset grows? Eventually we might serve 6+ TB of small image files.

My options right now are between:

  • Dump the binary data to a large flat file and manage the indexes on my own (either with metadata in the URL or a separate index file on the webserver)
  • Use a MySQL table, add new data via fast LOAD DATA INFILE, which we have had good experience with on another 100GB reference table we use.

Thoughts? Cautions?

Best Answer

I wonder why any kind of DB is needed. It sounds like you'd like it if these were just files, but that the number of them and their small size are problematic.

There are two strategies that can help here, one is to use a few levels of directories to help partition the files into smaller clusters.

Now, some file systems may have inode issues or become very wasteful of space if they are tiny files and large block sizes. Indeed, if you're on Linux it may be worth spending some time looking at the alternative file systems that are available and see if one meets your use case well.

A 'simply' way to avoid the issue is, for the last layer, rather than having files in a leaf directory, just have files in a zip file with zero compression. This pretty much works almost like a directory with files, but can circumvent the above two issues. It effectively gives you what you've been doing by hand, but with a more standard tool-chain.

From what you've described so far, I personally wouldn't be putting this anywhere near any kind of DB, NoSQL or traditional.

If you really want a DB solution, you could look at some form of partitioning across multiple tables to avoid any file size limits, but it wouldn't be my first option!