I currently have a PHP script that inserts thousands to millions of rows into an InnoDB table. To prevent duplicates, the table has a UNIQUE index set up on the combination of four columns, and I use INSERT IGNORE to allow the insert to continue regardless of duplicates.
For performance reasons, rather than inserting one row at a time, I batch them up into 2000 rows per query. I want to know which individual rows are ignored due to a key violation, and the only way I can think to do it is to insert one row at a time and then check the value of mysqli_affected_rows after each insert, but that feels inefficient and I will lose the advantage of batching my inserts. Alternatively I could remove the UNIQUE index and retrospectively check for duplicates using some SQL at the end.
Any other suggestions?
Many thanks.
Best Answer
This can probably be solved with a more complex set of steps. You still get the benefit of the bulk
INSERT
.Build a temp table (probably permanent, but
TRUNCATE
between uses).INSERT
the 2K rows into it.Analyze the rows via a
JOIN
between the temp table and the real table.INSERT INTO real ... SELECT ... JOIN ...
to put the "new" rows.More details in http://mysql.rjweb.org/doc.php/staging_table , especially the section on 'Normalization', which uses two queries in a slightly different way -- one to insert 'new' rows, another to pull back the
id
for every row.Your app may also benefit from the "flip-flop" technique described there.