I'm migrating data into a MySql (5.7.26) database (32GB ram), running as a managed service on AWS. While importing the data, I need to map one of the columns of the CSV being imported to another value using a MEMORY table lookup; so my LOAD DATA resembles the following:
LOAD DATA LOCAL INFILE 'file.csv'
INTO TABLE table_1(col_1, @var1)
SET col_2 = (select mapped_value from table_2 where id = @var1)
table_2 is a 2-column (id, mapped_value) MEMORY table with 3.4MM rows.
When I import the CSV without the subquery, I get several million inserts per minute. However, when I run the same import with the subquery the LOAD DATA performance degrades to near zero (~100 inserts per minute). Is this to be expected with a subquery, or is there something I'm doing wrong in the example above?
Best Answer
Of course. Server executes the subquery per each separate imported record!
Specify datatypes for
table_1.col2
andtable2.id
.If
table2.id
can be stored intotable_1.col2
then perform 2-step importing:If the types are not compatible then I'd recommend to import the data into temporary table and then copy the data into working table with needed substitution. Anycase it will more fast then one-step import with subquery.