Mysql – Insert a large TEXT. Is INSERT the only way

blobMySQL

I am not a strong expert in databases at this level, so I was wondering the following.

A database I am currently working on has a large TEXT field where potentially hundreds of megabytes are stored in each row. I understand this is not optimal design but it predates me and I can't change it now.

The insertion of the data happens with a single SQL query INSERT INTO, which of course results in a gigantic query. I was wondering the following:

  • Is INSERT the only option in this case when submitting large blob of data? maybe there's a SQL extension to handle such large transfers that is better optimized.
  • Having a single query of this size obviously requires that the whole transfer of these hundreds of megabytes goes smooth without any network issue for the whole duration of the transfer. How resilient is this to network issues, and can it be restarted in case of failure without having to reissue the whole query again (and therefore restart the whole transfer from scratch)?

I am using the latest MySQL, but I wonder also about other databases for personal awareness.

Best Answer

I recommend avoiding putting more than 16MB (perhaps less) in a column in MySQL. There are many limitations in MySQL and possible in the companion tools you are using.

For storing Text (as opposed to blobs) in a column, I would strongly consider compressing it. This is likely to shrink the data by 3x. Do the compression in the Client, not the Server.

What is the Text for? How will be it be used eventually? An analogy: with images (which must be stored in BLOB/MEDIUMBLOB/LONGBLOB), it can be better to store them as files when the use is HTML's <img src=file...>. This makes the fetching trivial.

I once had the task of storing large data, even multi-GB movies. Obviously, I could not even use LONGBLOB (4GB limit). I chose to break the stream into 50KB (pick a number) chunks and store them in a BLOB in rows of a table. Together with it was a sequence number to allow reconstructing the result. This allowed me to "stream" things rather than "store (or fetch) it all at once". Hence I was not straining the limits of the client software (which was PHP and Apache).

As for inserting blob (or compressed text), it may be optimal to convert it to hex, then use UNHEX("...") in the INSERT statement. This completely avoids any issue of escaping strings, etc. (But does not avoid SQL Injection problems.) It does, however, cut in half some of the limits you may encounter.