Mysql – LOAD DATA stuck at Null State MySQL

bulk-insertinnodbinsertMySQLperformancequery-performance

I'm stuck trying to move one table to another, due to columns now no longer having proper references/meaning to their column titles. In doing this, I've created a new table with the same number of columns as the original, simply with the column names changed to better reflect the data coming in.

My problem is, the data is over 50 million rows, so I'm unable to simply alter the columns in place, so I've made a new table that I wish to use moving forward anyway. I've made the necessary column changes, but now when I try to load the data in, it looks like the process is simply stick.

+—–+——+———–+——+———+——+——-+———————————-

--------------------------------------------------------------------+
| Id  | User | Host      | db       | Command | Time | State | Info                                                                                                 |
+-----+------+-----------+------+---------+------+-------+------------------------------------------------------------------------------------------------------+
| 481 | root | localhost | database | Query   | 8617 | NULL  | LOAD DATA  LOCAL INFILE '/home/SQLBackUps/file.csv' INTO TABLE `tablename` FIELDS   |
| 483 | root | localhost | database | Sleep   | 7174 |       | NULL                                                                                                 |
| 485 | root | localhost | database | Query   |    0 | NULL  | show processlist                                                                                     |
+-----+------+-----------+------+---------+------+-------+------------------------------------------------------------------------------------------------------+

I ran the talk using mysqlimport with the necessary flags, so it should in theory at least be running, but as the state says null, I'm not sure what that really means as NULL, according to the documentation, only appears for show processlist. The file's permissions are as follows:

-rw-rw-rw-  1 mysql mysql 14141022004 May 15 16:42 file.csv

The file was created using SELECT INTO OUTFILE …

Is there anything I can do to speed up this process? Right now it's on a very limited Dev machine with 2GB of RAM and an okay processor, so I understand it will take a bit of time, but given it's been over 2 hours and it's still in a null state, I'm stuck wondering if something is wrong, or if there is anything I can do to speed this process up.

Are there other ways to import the data quickly? Are there any flags I can use to speed this process up? Any, and all help is appreciated.

Update:

I started the process again in the very early morning, and have let it run for about 7 and a half hours now, and the process list looks as follows:

mysql> show processlist;
+----+------+-----------+------+---------+-------+-------+------------------------------------------------------------------------------------------------------+
| Id | User | Host      | db   | Command | Time  | State | Info                                                                                                 |
+----+------+-----------+------+---------+-------+-------+------------------------------------------------------------------------------------------------------+
|  9 | root | localhost | db   | Query   | 22978 | NULL  | LOAD DATA INFILE '/home/SQLBackUps/file.csv' INTO TABLE table FIELDS TERMINATED |
| 15 | root | localhost | db   | Query   |     0 | NULL  | show processlist                                                                                     |
+----+------+-----------+------+---------+-------+-------+------------------------------------------------------------------------------------------------------+

Best Answer

Here is one thing that caught my eye when you replied to my comment: The target table is InnoDB and you are using LOAD DATA INFILE. I see two issues

ISSUE #1 : LOAD DATA INFILE

While LOAD DATA INFILE can load InnoDB tables, that command can be tuned for loading MyYSAM tables. There is only one option to do this: bulk_insert_buffer_size. either setting to very large or setting it to zero to disable it.

There is no synonymous provision for InnoDB.

ISSUE #2 : InnoDB Storage Engine

Let's take a look at the InnoDB Architecture

InnoDB Architecture

Now, picture yourself pushing 50 millions rows into one InnoDB table as a single transaction giving all the plumbing depicted in this elaborate illustration.

To ensure data consistency in the event of a crash, your data has to be written in three places:

  • There are 128 rollback segments in the System Tablespace (Physical File ibdata1). Your incoming table data must pile up on one Rollback Segment like defensive tackles on a quarterback.
  • You have an active Double Write Buffer in the System Tablespace. As the name implies, data is being written twice. InnoDB will write to the Double Write Buffer first before writing back to the .ibd files. Those data are used as source data for crash recovery.
  • The data are also being written in the Transaction Logs (Redo Logs in the Bottom Right Corner) via the Log Buffer. The physical files are ib_logfile0 and ib_logfile1.

My Perspective

InnoDB can handle 1024 current transactions but there are only 128 rollback segments. If there are other transactions going on, you got a New York City traffic jam on your hands. With all the InnoDB Internals to manage through your bulk insert, seeing NULL in the processlist should not be a surprise. You should look at four(4) things to make sure they are up-to-date:

  1. filesize of ibdata1
  2. timestamp on ibdata1
  3. timestamp on ib_logfile0
  4. timestamp on ib_logfile1

SUGGESTIONS

You could set one or more of the following

GIVE IT A TRY !!!