MySQL Load from Infile Stuck – Troubleshooting Guide

bulkinnodbinsertMySQLmysql-5.5

I've got a windows 7 64 bit machine that I'm using for some load testing of a mysql db. My program uses sqlalchemy to connect and run several load from infile statements on said database. These bulk loads all happen within a single transaction, all keys are disabled beforehand, and each csv file is only a few megabytes large.

The problem I've run into is that the test machine gets IO bound. It has enough ram available (12G) to hold the entire transaction in memory and do a single flush out the other end. As far as I understand the manual, the innodb tables shouldn't touch the hard drive until it flushes the dirty pages at transaction completion.

The total data to be loaded is about 1G, spread across the different tables. It ends up taking 37 minutes to load it all. Here's my current test settings for perusal. I'd also be happy to report the results from show engine innodb status or similar queries if necessary.

To recap, I need to know if 37 minutes is a fast insert speed for this data size, and what I can do to increase the insert speed.

Edit:

Whoops! I forgot some important info.

Mysql version 5.5
Server has 12G total ram
Total rows inserted ~2,597,240

Best Answer

Your bulk insert buffer is 4G. That's great ... FOR MyISAM !!!

InnoDB does not use the bulk insert buffer.

You may need to have sqlalchemy throttle the load data infile calls into multiple transactions.

You may also want to disable innodb_change_buffering, setting it to inserts. Unfortunately, you cannot do SET GLOBAL innodb_change_buffering = 'inserts';. If you dom you may need to set it in my.cnf and restart mysql.

UPDATE 2012-07-13 16:53 EDT

I just noticed that you have two values in the my.cnf for innodb_buffer_pool_size. First one is 2385M, and the last one is 14G. If MySQL for Windows accepted 14G and you only have 12G of RAM, your server must be having a good old time swapping.

You can verify what the buffer pool size is with

SHOW VARIABLES LIKE 'innodb_buffer_pool_size';

UPDATE 2012-07-13 16:58 EDT

You may also want to check how full the buffer pool is with this

SELECT FORMAT(A.num * 100.0 / B.num,2) BufferPoolFullPct FROM
(SELECT variable_value num FROM information_schema.global_status
WHERE variable_name = 'Innodb_buffer_pool_pages_data') A,
(SELECT variable_value num FROM information_schema.global_status
WHERE variable_name = 'Innodb_buffer_pool_pages_total') B;

Related Solutions

MySQL – Can MySQL Use More Than One Core?

I actually discussed innodb_thread_concurrency with a MySQL Expert at the Percona Live NYC conference back in May 2011.

I learned something surprising: In spite of the documentation, it is best to leave innodb_thread_concurrency at 0 (infinite concurrency). That way, InnoDB decides the best number of innodb_concurrency_tickets to open for a given MySQL instance setup.

Once you set innodb_thread_concurrency to 0, you can set innodb_read_io_threads and innodb_write_io_threads (both since MySQL 5.1.38) to the maximum value of 64. This should engage more cores.

MySQL Load Data Infile Slow Query Logging – How to Resolve

The query under process ID 180233 looks like it is in distress.

Here is the query itself

SELECT COUNT(DISTINCT A.`campaignid`)  INTO _c
FROM `ox_campaigns` A 
INNER JOIN `selfserving_users` B ON B.`user_id` = A.`uid`
INNER JOIN `v3_cam_date` C ON C.`campaignid` = A.`campaignid`
WHERE A.`revenue_type` = 5 AND A.`deleted` = 0
AND A.`expire` = DATE_ADD(CURRENT_DATE, INTERVAL 1 DAY) 
AND A.`isExpired` = 0
AND IF( NAME_CONST('_permitid',3) = -1, 1=1,
IF( NAME_CONST('_permitid',3) = 0, A.`uid` IN
(SELECT C.`user_id` FROM `selfserving_users` C
WHERE C.`groupid` =  NAME_CONST('_groupid',12) ) ,
A.`uid` =  NAME_CONST('userid',388)));

The scary part about the query is the self-reference

You have selfserving_users acting in a self serving manner against itself.

Sometimes, the MySQL Query Optimizer will play a bait-and-switch, smoke-and-mirrors games with data, especially with a self reference, in order to formulate the best EXPLAIN plan possible. While mysql is very capable of completing sub-SELECTs, it can be still be expensive.

However, this is just a symptom that manifested because of Process ID 97. What is really the issue here?

LOAD DATA INFILE against an InnoDB table could make mysqld a little punch drunk. I don't believe (or at least I don't exercise full confidence) you can encapsulate it as a normal transaction although this was addressed back in MySQL 5.0.

Just picture it:

You are hammering the InnoDB Buffer
Some memory swapping may be going on
Possible full table locking issues that are affecting data pages outside the v3_zone_date table (such was with the selfserving_users table)

There may be a way to throttle the LOAD DATA INFILE process on an InnoDB table. I cannot give you a solid answer on this one, but try this link from Baron Schwartz.

UPDATE 2012-02-22 12:00 EST

There is open bug report in MySQL 5.5.7 called Deadlock when DDL under LOCK TABLES WRITE, READ + PREPARE. At the bottom of the report, a person complained about a block problem cause by the explicit LOCK TABLES.

Launching a COMMIT on locked rows in a table would hang because of trying to unraveling MVCC data assocaited with the locked rows. Based on the InnoDB Status you have shown, there would exist 6933 row locks on the table you are importing. I know that in Oracle, when introducing new rows to a table, MVCC is still generated because the previous version of the newly inserted row is a nonexistent row. The same must be occurring for InnoDB.

UPDATE 2012-02-22 12:42 EDT

In your question you stated the following about your .NET process

LOCK TABLES;
SET autocommit=0;
SET unique_checks=0;
SET foreign_key_checks=0;
LOAD DATA;
COMMIT;
UNLOCK TABLES;
SET autocommit=1;
SET unique_checks=1;

All of these events are running within the same DB Session. This is also happening within one DB Connection. Thus, this is not a deadlock in the traditional sense. It is just a case of blocking your COMMIT within a given DB Connection/Session because the tables were locked within the same DB Connection/Session.

UPDATE 2012-02-23 19:00 EDT

I would change the sequence to be this:

SET autocommit=0;
SET unique_checks=0;
SET foreign_key_checks=0;
LOCK TABLES;
LOAD DATA;
UNLOCK TABLES;
COMMIT;
SET autocommit=1;
SET unique_checks=1;
SET foreign_key_checks=1;

Please remember, a COMMIT cannot proceed if you have the tables locked in serial fashion. Therefore, UNLOCK TABLES must precede COMMIT.