MySQL Import Performance – Troubleshooting Slow and Failing Large File Imports

etlMySQLpostgresql

I have a 47 GB MySQL dump of a single table:

http://dumps.wikimedia.org/commonswiki/latest/commonswiki-latest-image.sql.gz

I ultimately want it into PostgreSQL, but since I didn't figure out an easy way to transform the MySQL SQL to PostgreSQL SQL I figured, I will get it into MySQL and then write a small ETL script to do this.

I initially tried to just load it using MySQL WorkBench Data Import/Restore, but it was failing.

Now I've run split -l 2000 commonswiki-latest-image.sql and have 20 files of roughly 2 GB each, but it still fails with:

23:12:28 Restoring C:\temp\commonswiki\xaa.sql

Running: mysql.exe --defaults-extra-file="c:\users\jmurdoch\appdata\local\temp\tmpi_ltz8.cnf"  --host=127.0.0.1 --user=root --port=3306 --default-character-set=utf8 --comments --database=mediawikiimages < "C:\\temp\\commonswiki\\xaa.sql"

ERROR 2013 (HY000) at line 331: Lost connection to MySQL server during query


Operation failed with exitcode 1

00:54:17 Import of C:\temp\commonswiki\xaa.sql has finished with 1 errors

It's also terribly slow as it only imported 209236 rows during the nearly 2 hours, but I think there are about 20M items to import, so at that rate it would take 200 hours to import.

I'm using Windows 7, MySQL 5.6.14 and MySQL Workbench 6.

My main questions:

  • Is there a script which would feed MySQL with the splitted files and perform automatic error recovery in case it times out?

  • Is there a conversion tool which would convert MySQL SQL to PostgreSQL SQL without first loading it into a database?

Best Answer

Using

max_allowed_packet = 256M

in MySQL config allowed to import the files without errors.