MySQL – Resolving LOAD DATA with Subquery Performance Issues

loadMySQLsubquery

I'm migrating data into a MySql (5.7.26) database (32GB ram), running as a managed service on AWS. While importing the data, I need to map one of the columns of the CSV being imported to another value using a MEMORY table lookup; so my LOAD DATA resembles the following:

LOAD DATA LOCAL INFILE 'file.csv'
INTO TABLE table_1(col_1, @var1)
SET col_2 = (select mapped_value from table_2 where id = @var1)

table_2 is a 2-column (id, mapped_value) MEMORY table with 3.4MM rows.

When I import the CSV without the subquery, I get several million inserts per minute. However, when I run the same import with the subquery the LOAD DATA performance degrades to near zero (~100 inserts per minute). Is this to be expected with a subquery, or is there something I'm doing wrong in the example above?

Best Answer

without the subquery ... several million inserts per minute ... with the subquery ... ~100 inserts per minute

Of course. Server executes the subquery per each separate imported record!


Specify datatypes for table_1.col2 and table2.id.

If table2.id can be stored into table_1.col2 then perform 2-step importing:

LOAD DATA LOCAL INFILE 'file.csv'
INTO TABLE table_1(col_1, col_2);
UPDATE table_1, table_2
SET table_1.col2 = table_2.mapped_value
WHERE table_2.id = table_1.col2;

If the types are not compatible then I'd recommend to import the data into temporary table and then copy the data into working table with needed substitution. Anycase it will more fast then one-step import with subquery.