MySQL – best practices/ how to speed up “update, else insert’ queries? Insert + delete faster

insertMySQLperformanceupdate

I'm running an ETL process and streaming data into a MySQL table.

Now it is being written over a web connection (fairly fast one) — so that can be a bottleneck.

Anyway, it's a basic insert/ update function. It's a list of IDs as the primary key/ index …. and then a few attributes.

If a new ID is found, insert, otherwise, update … you get the idea.

Currently doing an "update, else insert" function based on the ID (indexed) is taking 13 rows/ second (which seems pretty abysmal, right?). This is comparing 1000 rows to a database of 250k records, for context.

When doing a "pure" insert everything approach, for comparison, already speeds up the process to 26 rows/ second.

The thing with the pure "insert" approach is that I can have 20 parallel connections "inserting" at once … (20 is max allowed by web host) … whereas any "update" function cannot have any parallels running.

Thus 26 x 20 = 520 r/s. Quite greater than 13 r/s, especially if I can rig something up that allows even more data pushed through in parallel.

My question is … given the massive benefit of inserting vs. updating, is there a way to duplicate the 'update' functionality (I only want the most recent insert of a given ID to survive) …. by doing a massive insert, then running a delete function after the fact, that deletes duplicate IDs that aren't the 'newest' ?

Is this something easy to implement, or something that comes up often?

What else I can do to ensure this update process is faster? I know getting rid of the 'web connection' between the ETL tool and DB is a start, but what else? This seems like it would be a fairly common problem.

Ultimately there are 20 columns, max of probably varchar(50) … should I be getting a lot more than 13 rows processed/ second?

Best Answer

  1. You may use REPLACE INTO. The disadvantage of it is that it creats high IO, as each existing record will be deleted and then inserted (as opposed to being updated).
  2. Try loading the new rows' IDs into a separate table on the destination server, then run a delete on the destination joining this new table with the existing table using the ID. After that you run your ETL with INSERTs only