Have you looked into the SELECT INTO OUTFILE syntax?
Here is a quote on the NULL string issue:
If the FIELDS ESCAPED BY character is empty, no characters are escaped and NULL is output as NULL, not \N. It is probably not a good idea to specify an empty escape character, particularly if field values in your data contain any of the characters in the list just given.
So make sure you indicate a FIELDS ESCAPED BY
clause to get NULLs written as \N
Hope this helps
OBSERVATION #1
I noticed you turned off autocommit
. That will pile up so much data in ibdata1. Why?
There are seven(7) classes of information that is stored in ibdata1:
- Data Pages for InnoDB Tables
- Index Pages for InnoDB Tables
- Data Dictionary
- Double Write Buffer
- Safety Net to Prevent Data Corruption
- Helps Bypass OS for Caching
- Insert Buffer (Streamlines Changes to Secondary Indexes)
- Rollback Segments
- Undo Logs
- Click Here to see a Pictorial Representation of
ibdata1
Some of this info is made visible to certain transactions depending on the isolation level. Such actions could produce unintended primary key locks and lots of phantom data. As these two things increase, you should expect a fair slow down.
Recommendation: Leave autocommit on
OBSERVATION #2
I see you have this:
alter table item_load disable keys;
DISABLE KEYS does not work with InnoDB. Here is why:
- MyISAM :
DISABLE KEYS
simply shuts off Secondary Index updating for the MyISAM table. When you mass INSERT into a MyISAM table with keys disabled results in a fast table load along with a building of the PRIMARY KEY and all unique indexes. When you run ENABLE KEYS
, all Secondary Indexes are built linearly on the table and appended to the .MYD
.
- InnoDB : As shown in the internals picture of InnoDB, the system tablespave
ibdata1
has a structure dedicated to Secondary Index Insertions. At present, there is no provision to handle indexes the same as MyISAM.
To illustrate this, note my attempt to run DISABLE KEYS on an InnoDB table in MySQL
mysql> show create table webform\G
*************************** 1. row ***************************
Table: webform
Create Table: CREATE TABLE `webform` (
`nid` int(10) unsigned NOT NULL,
`confirmation` text NOT NULL,
`confirmation_format` tinyint(4) NOT NULL DEFAULT '0',
`redirect_url` varchar(255) DEFAULT '<confirmation>',
`status` tinyint(4) NOT NULL DEFAULT '1',
`block` tinyint(4) NOT NULL DEFAULT '0',
`teaser` tinyint(4) NOT NULL DEFAULT '0',
`allow_draft` tinyint(4) NOT NULL DEFAULT '0',
`submit_notice` tinyint(4) NOT NULL DEFAULT '1',
`submit_text` varchar(255) DEFAULT NULL,
`submit_limit` tinyint(4) NOT NULL DEFAULT '-1',
`submit_interval` int(11) NOT NULL DEFAULT '-1',
PRIMARY KEY (`nid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.00 sec)
mysql> alter table webform disable keys;
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> show warnings;
+-------+------+-------------------------------------------------------------+
| Level | Code | Message |
+-------+------+-------------------------------------------------------------+
| Note | 1031 | Table storage engine for 'webform' doesn't have this option |
+-------+------+-------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> select version();
+------------+
| version() |
+------------+
| 5.5.27-log |
+------------+
1 row in set (0.00 sec)
mysql>
OBSERVATION #3
You noticed that MyISAM loads 20x faster than InnoDB. Would you like that to be more like 24-25 times faster? Then run the following:
ALTER TABLE item_load ROW_FORMAT=Fixed;
This will speed up INSERTs times 20-25% without any other DDL Changes. Side effect: The MyISAM table can grow 80%-100% in size, possibly larger.
You could run this on an InnoDB table as well, but the ACID-compliant behavior and MVCC of InnoDB would still be the bottleneck of its performance, especially if VARCHAR fields increase significantly are are written to ibdata1
.
Best Answer
You are absolutely right. it is not replication-safe. In fact, I wrote about how LOAD DATA INFILE replicates back on Jan 22, 2012 : MySql shell command not replicated to slave. Basically, the entire data file is stored in the binary logs, replicated to the Slave, manifested as a text file in /tmp, and the LOAD DATA INFILE is executed.
What could help is setting sync-binlog=1 before running LOAD DATA INFILE. Notwithstanding, Replication of a large CSV file is at the mercy of replication and the network.
SUGGESTION
For a Text File called
mydata.csv
, you are better off doing the following:Step 01 :
cp mydata.csv to mydata.csv2
Step 02 : Make the script LoadMaster.sql
Step 03 : Make the script LoadSlave.sql
Step 04: Load both in parallel
If you have multiple slaves, you could do this:
That way