MySQL ON DUPLICATE KEY UPDATE monitoring

monitoringMySQLperformance

How do I monitor the performance of the following SQL in-depth?

INSERT INTO rating (account_id, product_id, rating)
    VALUES (133236, 107, 0), (133236, 513, 1), (133236, 575, 2), (133236, 677, 3)
    ON DUPLICATE KEY UPDATE rating=VALUES(rating)

EXPLAIN is next to useless for this statement.
Slow query log is turned on, but I don't expect this to be that slow (InnoDB row-locking)

But I have no way to be sure. I'm interested in both the time taken to complete, but also any nasty bugs or side-effects I may not have considered. Is is possible this could make the entire table unavailable?

It is wrapped in a transaction which is rolled-back if the query fails (application code). Really looking for a deep-dive around this simple code.

Background

So I'm currently tinkering with a greenfield toy application, and because I am mildly lazy, I want to use the following.

In my application, all the values are bound parameter values, rather than hard-coded like in the above example. I've got a way to generate as many placeholders as needed and then send in lists.

How do I Instrument monitoring / performance on that query, and others where I am simplifying my application code database interaction in this way?

My Goal is hopefully reducing unnecessary querying, deferring to the DBMS the engine of state. But I'm not inferring zero-cost or magic, and I'd love to know where I can look, to monitor such decisions, so that I can feed back on my own experiments.

The most viable alternatives seems to be

  • attempt to insert new records, then attempt to update existing records
  • delete all records for each account and insert new ones
  • pre-insert all product user-combo's ignoring collisions, then issue update for all.

Rows will never be deleted from the ratings table (I didn't pick the name of the columns or tables BTW). It's like an ever-growing hash-map where account_id and product_id are single-value across all transactions for ratings of products, storing a canonical "latest rating" for an account and product combo.

I did consider doing one record at a time, but that just seems inefficient.

My main concern with two separate insert and updates is that, accounting for changes to the table become a problem, I think my insert bypasses.

I also have my application code wrapped in a try/catch block, which begins a transaction and attempts to roll-back on error, checking if the connection is viable before issuing rollback.

One other oddity, I am currently working around is that if a customer ordered product-A on 4 occasions, and I were to allow 4 input values for the single product. I don't know what would happen. For-now I'm restricting updates to single-order for my own sanity.

Notes

Handler method linked by @RickJames

FLUSH STATUS;  -- zero out most of SESSION STATUS (non existent on really old versions of MySQL)
INSERT INTO rating (account_id, product_id, rating)
    VALUES (42, 107, 0), (42, 513, 1), (42, 575, 2), (42, 677, 3)
    ON DUPLICATE KEY UPDATE rating=VALUES(rating);
SHOW SESSION STATUS LIKE 'Handler%';

This produces the following in a table with all records existing (most collisions)

# Variable_name, Value
Handler_commit, 1
Handler_delete, 0
Handler_discover, 0
Handler_external_lock, 0
Handler_icp_attempts, 0
Handler_icp_match, 0
Handler_mrr_init, 0
Handler_mrr_key_refills, 0
Handler_mrr_rowid_refills, 0
Handler_prepare, 0
Handler_read_first, 0
Handler_read_key, 4
Handler_read_last, 0
Handler_read_next, 0
Handler_read_prev, 0
Handler_read_retry, 0
Handler_read_rnd, 0
Handler_read_rnd_deleted, 0
Handler_read_rnd_next, 0
Handler_rollback, 0
Handler_savepoint, 0
Handler_savepoint_rollback, 0
Handler_tmp_delete, 0
Handler_tmp_update, 0
Handler_tmp_write, 0
Handler_update, 0
Handler_write, 4

27 Handlers

I Then deleted the rows and ran an insert without ON DUPLICATE clause

# Variable_name, Value
Handler_commit, 1
Handler_delete, 0
Handler_discover, 0
Handler_external_lock, 0
Handler_icp_attempts, 0
Handler_icp_match, 0
Handler_mrr_init, 0
Handler_mrr_key_refills, 0
Handler_mrr_rowid_refills, 0
Handler_prepare, 0
Handler_read_first, 0
Handler_read_key, 0
Handler_read_last, 0
Handler_read_next, 0
Handler_read_prev, 0
Handler_read_retry, 0
Handler_read_rnd, 0
Handler_read_rnd_deleted, 0
Handler_read_rnd_next, 0
Handler_rollback, 0
Handler_savepoint, 0
Handler_savepoint_rollback, 0
Handler_tmp_delete, 0
Handler_tmp_update, 0
Handler_tmp_write, 0
Handler_update, 0
Handler_write, 4

27 Handlers

Next I tried clearing the session. DELETING all rows, then just doing an insert…

FLUSH STATUS;  -- zero out most of SESSION STATUS (non existent on really old versions of MySQL)
DELETE FROM rating WHERE account_id = 42 AND product_id IN(107, 513, 575, 677);
INSERT INTO rating (account_id, product_id, rating)
    VALUES (42, 107, 0), (42, 513, 1), (42, 575, 2), (42, 677, 3);
SHOW SESSION STATUS LIKE 'Handler%';

The results were as follows

# Variable_name, Value
Handler_commit, 2
Handler_delete, 4
Handler_discover, 0
Handler_external_lock, 0
Handler_icp_attempts, 0
Handler_icp_match, 0
Handler_mrr_init, 0
Handler_mrr_key_refills, 0
Handler_mrr_rowid_refills, 0
Handler_prepare, 0
Handler_read_first, 0
Handler_read_key, 4
Handler_read_last, 0
Handler_read_next, 4
Handler_read_prev, 0
Handler_read_retry, 0
Handler_read_rnd, 0
Handler_read_rnd_deleted, 0
Handler_read_rnd_next, 0
Handler_rollback, 0
Handler_savepoint, 0
Handler_savepoint_rollback, 0
Handler_tmp_delete, 0
Handler_tmp_update, 0
Handler_tmp_write, 0
Handler_update, 0
Handler_write, 4

27 handlers…

Then I tried again, knowing inserts without the ON UPDATE would lead to problems.

# Variable_name, Value
Handler_commit, 0
Handler_delete, 0
Handler_discover, 0
Handler_external_lock, 0
Handler_icp_attempts, 0
Handler_icp_match, 0
Handler_mrr_init, 0
Handler_mrr_key_refills, 0
Handler_mrr_rowid_refills, 0
Handler_prepare, 0
Handler_read_first, 0
Handler_read_key, 0
Handler_read_last, 0
Handler_read_next, 0
Handler_read_prev, 0
Handler_read_retry, 0
Handler_read_rnd, 0
Handler_read_rnd_deleted, 0
Handler_read_rnd_next, 0
Handler_rollback, 1
Handler_savepoint, 0
Handler_savepoint_rollback, 0
Handler_tmp_delete, 0
Handler_tmp_update, 0
Handler_tmp_write, 0
Handler_update, 0
Handler_write, 1

27 handlers…

Here is an interesting-ish case. Just updates. So first requiring rows to exist. It's an edge similarly to inserting records where none exist.

FLUSH STATUS;  -- zero out most of SESSION STATUS (non existent on really old versions of MySQL)
UPDATE rating
    SET rating = (case when product_id = 107 then 3
                        when product_id = 513 then 2
                        when product_id = 575 then 1
                        when product_id = 677 then 0
                    end)
    WHERE account_id = 42;
SHOW SESSION STATUS LIKE 'Handler%';

This is interesting both because the CASE statement would need to be generated, a bit like the current values, and I've not tested this using application code.

# Variable_name, Value
Handler_commit, 1
Handler_delete, 0
Handler_discover, 0
Handler_external_lock, 0
Handler_icp_attempts, 0
Handler_icp_match, 0
Handler_mrr_init, 0
Handler_mrr_key_refills, 0
Handler_mrr_rowid_refills, 0
Handler_prepare, 0
Handler_read_first, 0
Handler_read_key, 1
Handler_read_last, 0
Handler_read_next, 4
Handler_read_prev, 0
Handler_read_retry, 0
Handler_read_rnd, 0
Handler_read_rnd_deleted, 0
Handler_read_rnd_next, 0
Handler_rollback, 0
Handler_savepoint, 0
Handler_savepoint_rollback, 0
Handler_tmp_delete, 0
Handler_tmp_update, 0
Handler_tmp_write, 0
Handler_update, 4
Handler_write, 0

At this point I begin to think it's the right column values summed, not the number of rows in the handlers (they are all 27)

9 | insert with on duplicate update
5 | insert (ideal case only insert needed)
2 | failed insert
18 | delete then insert
10 | bulk single-query updates

So it seems that from the handler method. Upserts are half as costly as in a single request with two queries deleting, then inserting (impressive)

With 24 rows

49  |   2n+1 INSERT ON DUPLICATE KEY UPDATE (single field)
49  |   UPDATE (requires all rows to exist)
75  |   DELETE, then insert
25  |   INSERT (requires no records)

Best Answer

Short answer: IODKU (as you have) does it all in one, efficient, SQL; Use it.

Long answer:

Please tighten up the statement of the problem. As I read it...

There is a list of 'things', uniquely identified by account_id + product_id and a value, rating.

Choices:

  • IODKU (aka Upsert, essentially a blend of INSERT and UPDATE), like your example, is optimal for updating the for new ratings (for given account and product) or adding tuple new tuple. Ane that statement does I either the INSERT or UPDATE in a single statement. This seems to be what you need; was there a problem with it? IODKU does require in your case that the pair accunt_id and product_id be the PRIMARY KEY or a UNIQUE key; that way it can correctly decide between UPDATE (if the row exists) or INSERT (if it is new). (This should address "to allow 4 input values for the single product; I don't know what would happen". Namely; it depends on what PK or Unique key you have.)
  • REPLACE is especially bad since it is two somewhat independent steps: DELETE and INSERT.
  • SELECT, then INSERT/UPDATE -- several lines of client code; IODKU blends the two together. Internally, INSERT and UDPATE do something like SELECT to find where the row is, or should be, on the disk.
  • DELETE first -- that's akin to REPLACE

There is no "multi-row UPDATE". However, IODKU provides that functionality and it is not too kludgy.

In InnoDB, note that IODKU, as with many multi-row statement, is all-or-none (even without BEGIN and COMMIT around it). If for some reason, you need it to do some of the rows but avoid locking on others, there is no option for such. The IGNORE option lets INSERT skip rows that are already in the table (according to the PK or a unique key).

Correction to the previous paragraph. MySQL 8.0 and MariaDB have SELECT ... NOWAIT and SKIP LOCKED. I am not fluent in these options; they might be useful to you.

In my opinion, the best tool for peering into the inner workings is via the "Handler%" values in SHOW SESSION STATUS;. They are a bit tricky to get. Here is a discussion of such: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#handler_counts It counts the actual number of reads/writes (whether cached or not). It is useful for comparing two techniques -- fewer reads/writes, means probably faster.

Statements have overhead. So fewer statements usually means faster code overall. A specific metric: Using multi-row INSERTs runs at a speed approaching 10 times as fast as one row per INSERT.

"checking if the connection is viable before issuing rollback" -- If the connection is lost, a ROLLBACK is done for you. A spurious ROLLBACK is harmless. Do not enable "auto-reconnect", it defeats the auto-rollback-on-disconnect. Do check for errors (or try-catch) after each SQL.