Mysql – Improving performance of insert..on duplicate key update

innodbmariadbMySQL

You know the feeling when you think you've had the best idea but then realise you might not know what you're talking about? This is one of those moments, and I'm hoping my idea can be confirmed or refuted.

I have a database table that saves activity in the form of "date, sender, recipient, action count", that counts how many interactions one player of the game (which you just lost :p) gives to another on a given day.

Right now, it uses insert..on duplicate key update (with the unique key being (date, sender, recipient) and while it works I have noticed that it causes deadlock errors during periods of intense activity.

I have fixed deadlock errors in other parts of the game by avoiding "insert..select from", "insert ignore" and "insert..on duplicate key update" and this appears to be one of the few, if not the only one left.

My new idea is as follows:

  • Attempt update
  • If the number of affected rows is zero…
    • Perform insert

Essentially it's become "update..on not found insert", and done manually to try and avoid locking rows.

Caveats I've thought of:

  • Race conditions are not an issue because the client-side code already includes batching operations, and a given user should therefore only be sending one request of this type at a time. If the client bypasses this code (never trust the client!) then the worst case is their displayed numbers will be lower than expected.

  • This may execute two queries instead of one, but I believe an update matching zero rows is basically a no-op.

That's about it really. Am I missing anything here or should this solve my issues?

Best Answer

Any approach will lock something. Even your SELECT. You cannot avoid it. The best approach is to minimize the locks.

Normalization

Let's look at the bigger picture. If the purpose of this is "normalization", then perform the the upsert outside the main transaction. This prevents many possible deadlocks and speeds up the processing.

If you are doing "batch normalization", see my tips here . Those two statements (not unlike your 2-steps) are designed to efficiently copy multiple items into a normalization table, and do it without "burning" AUTO_INCREMENT ids. Again, do them with autocommit=1, not inside of some bigger transaction.

If the main transaction eventually rolls back, then the worst that could happen is that you inserted something unused in the normalization table. This is not harmful.

INSERTing 100 rows in a single statement is typically about 10 times as fast as using 100 1-row INSERTs. This speed, alone, helps avoid deadlocks.

Regardless of what prevention measures you take, deadlocks will occur. Write your code to re-run the entire transaction, as this is usually the 'right fix'.

Likes, upvotes, etc

If the purpose of the UPDATE is a high-volume 'increment', then it may be best to move the counter out of the main table into its own table (together with a minimal PK). That way, "regular" processing is not being held up by "Likes".