MySQL – Replacing Table Data with CSV

MySQL

I have a CSV file on about 200 MB/1 mill rows, which is updated every night. I need the values from this file available in a MySQL table.

I get this working by using LOAD DATA INFILE filepath REPLACE INTO TABLE tablename.
This is fast and it keep my indexes.
The problem is that values which are no longer in the csv will not be deleted.

Another attempt has been to TRUNCATE the table, and then LOAD DATA INFILE filepath INSERT INTO TABLE tablename afterwards. This seems to be about 25% faster. But by doing this I've had a lot of trouble with indexes being disabled(?) or stopped working. I've been reading on the MySQL docs and SO for some days now, but I can't figure that out.

My next attempt has been similar to the first. I added a column "update", and set every row to update=0 before doing a REPLACE INTO with a SET update=1, and then delete all remaining rows with update=0. This is my current method.

My question is this: Which method is considered better for completely replace a table with info from csv, but keep/refresh indexes. I'd appreciate any advice on the subject.

I am using the MyISAM engine.

Best Answer

When populating the data there are a number of issues you may run in to. The data size itself is not huge, but more inconvenient... you could choose any of the mechnisms for populating it including regular selects.

The right way to be doing it is with the TRUNCATE followed by the LOAD. If you are having problems with indexes then you should take that up with the MySQL folk, it is a bug, and it should be working.

If you temporarily need a work around for it then I recommend that you add a column to your table which will be unique for each LOAD cycle. A date may be a good candidate. You have already chosen a column 'update', which is a fine column to use.... but, instead of changing the data back to 0 each time, just increment it for each load, or use a number that matches the date like 20140110 for example. Then, after you LOAD the data you can:

DELETE from TABLE where update <> 20140110