Mysql – Database design import massive csv files is there a more efficient way

database-designMySQL

I need to let users import huge csv files varied between 200k – 1.5m lines. For this I use LOAD DATA INFILE which is the fastest way to import those files. Thing is however that I can't check the user input, I have solved this by using a trigger on a table. My table structure is as following: (narrowed down to make it more readable):

tbl_debtor: debtor_id (AI), debtor

tbl_debtor_tmp: debtor

tbl_debtor_err: error

Now the (before insert) trigger is on tbl_debtor_tmp where users can literally import anything. In that trigger I check their user input, if okay I'll do a insert on tbl_debtor. If the user input is not okay I write to tbl_debtor_err. After the import is done I'll do a DELETE FROM tbl_debtor_tmp

My question here is, is there a more efficient way? I have about 20 of those table structures which I think is alot.

Best Answer

  1. LOAD DATA ... INTO staging ...; It has VARCHARs in place of INTs, etc. It also has some unique id (perhaps auto_increment) for step 3.
  2. Copy "bad" rows to debtor_err in one pass:

    INSERT INTO debtor_err SELECT ... WHERE LENGTH(..) > .. OR NOT EXISTS( SELECT * ... ) -- FK check OR ... > 12

  3. Copy "good" rows to debtor table -- Use JOIN to see which ones are good

    INSERT INTO debtor SELECT ... -- where nec, convert from VARCHAR FROM staging s LEFT JOIN debtor_err e ON s.id = e.id -- suitable UNIQUE key WHERE e.id IS NULL -- to get the non-error rows

  4. DELETE FROM staging; -- Reset for next time. (Using DELETE instead of TRUNCATE may avoid id not working 'right' the second time around.)

Suggestion: Use MyISAM for staging; InnoDB for permanent tables.