MySQL Data Import – Is Loading Data with LOAD DATA INFILE Replication Safe?

csvimportMySQLreplication

I am trying to load data into mysql database form CSV file. I found that we could use LOAD DATA INFILE command to do it. But as per the mysql documentation it is not replication safe. (See here)

Is there a better way to do it rather than to do it via application?

Best Answer

You are absolutely right. it is not replication-safe. In fact, I wrote about how LOAD DATA INFILE replicates back on Jan 22, 2012 : MySql shell command not replicated to slave. Basically, the entire data file is stored in the binary logs, replicated to the Slave, manifested as a text file in /tmp, and the LOAD DATA INFILE is executed.

What could help is setting sync-binlog=1 before running LOAD DATA INFILE. Notwithstanding, Replication of a large CSV file is at the mercy of replication and the network.

SUGGESTION

For a Text File called mydata.csv, you are better off doing the following:

Step 01 : cp mydata.csv to mydata.csv2

Step 02 : Make the script LoadMaster.sql

SET SQL_LOG_BIN=0;
SET bulk_insert_buffer_size = 1024 * 1024 * 256;
LOAD DATA INFILE 'mydata.csv' INTO tb1 ... ;

Step 03 : Make the script LoadSlave.sql

SET SQL_LOG_BIN=0;
SET bulk_insert_buffer_size = 1024 * 1024 * 256;
LOAD DATA INFILE 'mydata.csv2' INTO tb1 ... ;

Step 04: Load both in parallel

mysql -hIPMaster -u... -p... < LoadMaster.sql &
mysql -hIPSlave  -u... -p... < LoadSlave.sql &
wait

If you have multiple slaves, you could do this:

mysql -hIPMaster -u... -p... < LoadMaster.sql &
mysql -hIPSlave1  -u... -p... < LoadSlave.sql &
mysql -hIPSlave2  -u... -p... < LoadSlave.sql &
mysql -hIPSlave3  -u... -p... < LoadSlave.sql &
....
wait

That way

  • both imports are done together
  • binlogs are not bloated
  • CSV file can always be deleted