MySQL Database Import – Determining Size from SQL Dump

MySQL

I had to manually duplicate a server for development, and the only way I could duplicate the database was via SQL dump. I was importing into MySQL 14.14 on a virtual machine running Ubuntu 14. (Yeah, really old stuff, but the client isn't ready to allow an upgrade.)

The SQL dump came to about 6GB of non-compressed, plain ANSI text, so assuming that the import queries bloated the data with all the query instructions, character-escape sequences, and textual representations of binary data, the database ought to be smaller, right? I figured 6GB for the temporary SQL dump, plus 6GB for the database plus another 8GB for the rest of the system (20GB total) ought to be more than enough. It wasn't.

I rebuilt my virtual machine with a 30GB-disk thinking surely that would be enough, but it still wasn't. So I rebuilt my virtual machine with 50GB, and ended up with 16GB leftover when all was said and done. It turns out the database blew up to 20.5GB!

Somewhere along the way I figured out how to check the size of an existing database in MySQL, but what I would like to know, when the original database is not available to query, is there an application or MySQL command that can passively process a SQL dump without building the database, for the purpose of estimating the size of the resulting database?

Best Answer

Look at the disk footprint for the database. That might be within a factor of 2 of the size of the output of mysqldump.

Your example of 6GB --> 20.5GB (3.4 ratio) suggests that there are a lot of BIGINTs with small numbers and/or CHARs that should be VARCHARs.

For example: A BIGINT will take 8 bytes of data, plus overhead of maybe another 6 bytes. A single-digit number will probably be 5, (2 bytes) in the dump. This leads to a 7:1 ratio between the dump and the database.

Your specific Question probably begs for Plan B to help avoid the surprised bloat.

A one-row table will take 16KB. A PARTITIONed table wastes at least 4MB per partition.

So, providing some of the CREATE TABLEs can help in guessing whether there some extreme cases that explode the dump well beyond the simple 2:1 that I started with.

Plan A: Write the dump to another table. For example, do this on another machine ("host2"):

host2$ mysqldump -h host1 ... >dump.sql

Note that this would have avoided your example by thinking of the VM as a separate "host" and not wasting the 16GB for the dump.

Plan B: If your VM is Docker, note how you can "mount a volume" -- that is have dump.sql live outside the VM, but access it from within.

Plan C: Never write the dump; simply consume it immediately:

mysqldump -h host1 | mysql -h host2

will mostly clone the data onto another machine. (That can be run on either host.)

Plan D: Compress on the fly:

mysqldump | gzip >dump.sql.gzip

Ordinary text compresses about 3:1. (But if you have only numbers, I can't predict the compression factor.)

Reloading that later:

gunzip <dump.sql.gzip | mysql