Mysql – Any way to estimate the required space for a SQL dump file

dumpimportMySQLstoragexml

I want to import a Wikipedia XML dump file into MySQL. The uncompressed XML dump file is 65GB. Does this mean I should free up almost 64GB in the MySQL data path? Is there any way to estimate the required space?

Here is the sql schema:https://git.wikimedia.org/blob/mediawiki%2Fcore.git/HEAD/maintenance%2Ftables.sql

Best Answer

I have dealt with questions like this in both directions

Your question is a little different because you did not give anything about the target MySQL Instance (what Storage Engine will be used, which columns are TEXT/BLOB, what indexing is needed for your data, etc.)

IMHO the most straightforward way I can think of is the following

METHOD #1

Write some Perl/Python script to sum the length of all XML tag delimiters. Subtract the sum you get from the length of the file. That would essentially be the size of a CSV file less delimiters. This would be the bytes of the raw data.

METHOD #2

Do METHOD #1 and also take the count of all XML tag delimiters and add that count to the result of METHOD #1. That would like adding commas back into a CSV file. This make be closer in size to an actual CSV should you print the output to such a file.