Mysql – Real-time deduplication strategy for relational DB

concurrencydeadlockmariadbMySQLstored-procedures

I have a messages table with some fields like id, version, timestamp

I'm trying to insert and retrieve messages with real-time deduplication logic, implemented in stored procedures.

  • On insert, if a message with given id already exists in the table, I invoke a complex deduplication logic (a number of IF/ELSE statements) and then update messages's version, timestamp, etc. Otherwise I insert a new row.
  • On retrieve, I search for the oldest message and not just select it, but also update a few fields (changing state, setting transient_token, etc)

The problem is, it does not work as fast as I need: ~ 200 rps on insert and the same on retrieve

Separately my stored procedures work good, but when I execute both of them concurrently, they start fighting for locks under load > 30 rps, performance dramatically decreases and deadlocks appear periodically

Are there some best practices to solve my problem? I feel that the whole approach sucks, not my concrete implementation.

May be I should use some temp buffer table for all incoming messages and perform dedup logic afterwords? I would appreciate any ideas.

* I can't use nosql DB in my case (management decision), I need to gain max performance with Mysql/MariaDb

Best Answer

Good news. There is a SQL command usually called MERGE or UPSERT that basically does this.

See this page for more information: https://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html

You probably want something like:

INSERT INTO messasges (id,version,timestamp) VALUES (1123,1,1/1/2017)
  ON DUPLICATE KEY UPDATE version=2;