MySQL ON DUPLICATE KEY UPDATE monitoring

monitoringMySQLperformance

How do I monitor the performance of the following SQL in-depth?

INSERT INTO rating (account_id, product_id, rating)
    VALUES (133236, 107, 0), (133236, 513, 1), (133236, 575, 2), (133236, 677, 3)
    ON DUPLICATE KEY UPDATE rating=VALUES(rating)

EXPLAIN is next to useless for this statement.
Slow query log is turned on, but I don't expect this to be that slow (InnoDB row-locking)

But I have no way to be sure. I'm interested in both the time taken to complete, but also any nasty bugs or side-effects I may not have considered. Is is possible this could make the entire table unavailable?

It is wrapped in a transaction which is rolled-back if the query fails (application code). Really looking for a deep-dive around this simple code.

Background

So I'm currently tinkering with a greenfield toy application, and because I am mildly lazy, I want to use the following.

In my application, all the values are bound parameter values, rather than hard-coded like in the above example. I've got a way to generate as many placeholders as needed and then send in lists.

How do I Instrument monitoring / performance on that query, and others where I am simplifying my application code database interaction in this way?

My Goal is hopefully reducing unnecessary querying, deferring to the DBMS the engine of state. But I'm not inferring zero-cost or magic, and I'd love to know where I can look, to monitor such decisions, so that I can feed back on my own experiments.

The most viable alternatives seems to be

attempt to insert new records, then attempt to update existing records
delete all records for each account and insert new ones
pre-insert all product user-combo's ignoring collisions, then issue update for all.

Rows will never be deleted from the ratings table (I didn't pick the name of the columns or tables BTW). It's like an ever-growing hash-map where account_id and product_id are single-value across all transactions for ratings of products, storing a canonical "latest rating" for an account and product combo.

I did consider doing one record at a time, but that just seems inefficient.

My main concern with two separate insert and updates is that, accounting for changes to the table become a problem, I think my insert bypasses.

I also have my application code wrapped in a try/catch block, which begins a transaction and attempts to roll-back on error, checking if the connection is viable before issuing rollback.

One other oddity, I am currently working around is that if a customer ordered product-A on 4 occasions, and I were to allow 4 input values for the single product. I don't know what would happen. For-now I'm restricting updates to single-order for my own sanity.

Notes

This link suggests that if your query will have NULL values, the queries will fail. https://stackoverflow.com/questions/25674737/mysql-update-multiple-rows-with-different-values-in-one-query/25674827#comments-34866431:~:text=This%20doesn't%20work%20if%20you%20omit,record%20before%20actually%20resorting%20to%20update. I am fine with that. I generally try to avoid null values when designing schemas.

Handler method linked by @RickJames

FLUSH STATUS;  -- zero out most of SESSION STATUS (non existent on really old versions of MySQL)
INSERT INTO rating (account_id, product_id, rating)
    VALUES (42, 107, 0), (42, 513, 1), (42, 575, 2), (42, 677, 3)
    ON DUPLICATE KEY UPDATE rating=VALUES(rating);
SHOW SESSION STATUS LIKE 'Handler%';

This produces the following in a table with all records existing (most collisions)

# Variable_name, Value
Handler_commit, 1
Handler_delete, 0
Handler_discover, 0
Handler_external_lock, 0
Handler_icp_attempts, 0
Handler_icp_match, 0
Handler_mrr_init, 0
Handler_mrr_key_refills, 0
Handler_mrr_rowid_refills, 0
Handler_prepare, 0
Handler_read_first, 0
Handler_read_key, 4
Handler_read_last, 0
Handler_read_next, 0
Handler_read_prev, 0
Handler_read_retry, 0
Handler_read_rnd, 0
Handler_read_rnd_deleted, 0
Handler_read_rnd_next, 0
Handler_rollback, 0
Handler_savepoint, 0
Handler_savepoint_rollback, 0
Handler_tmp_delete, 0
Handler_tmp_update, 0
Handler_tmp_write, 0
Handler_update, 0
Handler_write, 4

27 Handlers

I Then deleted the rows and ran an insert without ON DUPLICATE clause

# Variable_name, Value
Handler_commit, 1
Handler_delete, 0
Handler_discover, 0
Handler_external_lock, 0
Handler_icp_attempts, 0
Handler_icp_match, 0
Handler_mrr_init, 0
Handler_mrr_key_refills, 0
Handler_mrr_rowid_refills, 0
Handler_prepare, 0
Handler_read_first, 0
Handler_read_key, 0
Handler_read_last, 0
Handler_read_next, 0
Handler_read_prev, 0
Handler_read_retry, 0
Handler_read_rnd, 0
Handler_read_rnd_deleted, 0
Handler_read_rnd_next, 0
Handler_rollback, 0
Handler_savepoint, 0
Handler_savepoint_rollback, 0
Handler_tmp_delete, 0
Handler_tmp_update, 0
Handler_tmp_write, 0
Handler_update, 0
Handler_write, 4

27 Handlers

Next I tried clearing the session. DELETING all rows, then just doing an insert…

FLUSH STATUS;  -- zero out most of SESSION STATUS (non existent on really old versions of MySQL)
DELETE FROM rating WHERE account_id = 42 AND product_id IN(107, 513, 575, 677);
INSERT INTO rating (account_id, product_id, rating)
    VALUES (42, 107, 0), (42, 513, 1), (42, 575, 2), (42, 677, 3);
SHOW SESSION STATUS LIKE 'Handler%';

The results were as follows

# Variable_name, Value
Handler_commit, 2
Handler_delete, 4
Handler_discover, 0
Handler_external_lock, 0
Handler_icp_attempts, 0
Handler_icp_match, 0
Handler_mrr_init, 0
Handler_mrr_key_refills, 0
Handler_mrr_rowid_refills, 0
Handler_prepare, 0
Handler_read_first, 0
Handler_read_key, 4
Handler_read_last, 0
Handler_read_next, 4
Handler_read_prev, 0
Handler_read_retry, 0
Handler_read_rnd, 0
Handler_read_rnd_deleted, 0
Handler_read_rnd_next, 0
Handler_rollback, 0
Handler_savepoint, 0
Handler_savepoint_rollback, 0
Handler_tmp_delete, 0
Handler_tmp_update, 0
Handler_tmp_write, 0
Handler_update, 0
Handler_write, 4

27 handlers…

Then I tried again, knowing inserts without the ON UPDATE would lead to problems.

# Variable_name, Value
Handler_commit, 0
Handler_delete, 0
Handler_discover, 0
Handler_external_lock, 0
Handler_icp_attempts, 0
Handler_icp_match, 0
Handler_mrr_init, 0
Handler_mrr_key_refills, 0
Handler_mrr_rowid_refills, 0
Handler_prepare, 0
Handler_read_first, 0
Handler_read_key, 0
Handler_read_last, 0
Handler_read_next, 0
Handler_read_prev, 0
Handler_read_retry, 0
Handler_read_rnd, 0
Handler_read_rnd_deleted, 0
Handler_read_rnd_next, 0
Handler_rollback, 1
Handler_savepoint, 0
Handler_savepoint_rollback, 0
Handler_tmp_delete, 0
Handler_tmp_update, 0
Handler_tmp_write, 0
Handler_update, 0
Handler_write, 1

27 handlers…

Here is an interesting-ish case. Just updates. So first requiring rows to exist. It's an edge similarly to inserting records where none exist.

FLUSH STATUS;  -- zero out most of SESSION STATUS (non existent on really old versions of MySQL)
UPDATE rating
    SET rating = (case when product_id = 107 then 3
                        when product_id = 513 then 2
                        when product_id = 575 then 1
                        when product_id = 677 then 0
                    end)
    WHERE account_id = 42;
SHOW SESSION STATUS LIKE 'Handler%';

This is interesting both because the CASE statement would need to be generated, a bit like the current values, and I've not tested this using application code.

# Variable_name, Value
Handler_commit, 1
Handler_delete, 0
Handler_discover, 0
Handler_external_lock, 0
Handler_icp_attempts, 0
Handler_icp_match, 0
Handler_mrr_init, 0
Handler_mrr_key_refills, 0
Handler_mrr_rowid_refills, 0
Handler_prepare, 0
Handler_read_first, 0
Handler_read_key, 1
Handler_read_last, 0
Handler_read_next, 4
Handler_read_prev, 0
Handler_read_retry, 0
Handler_read_rnd, 0
Handler_read_rnd_deleted, 0
Handler_read_rnd_next, 0
Handler_rollback, 0
Handler_savepoint, 0
Handler_savepoint_rollback, 0
Handler_tmp_delete, 0
Handler_tmp_update, 0
Handler_tmp_write, 0
Handler_update, 4
Handler_write, 0

At this point I begin to think it's the right column values summed, not the number of rows in the handlers (they are all 27)

9 | insert with on duplicate update
5 | insert (ideal case only insert needed)
2 | failed insert
18 | delete then insert
10 | bulk single-query updates

So it seems that from the handler method. Upserts are half as costly as in a single request with two queries deleting, then inserting (impressive)

With 24 rows

49  |   2n+1 INSERT ON DUPLICATE KEY UPDATE (single field)
49  |   UPDATE (requires all rows to exist)
75  |   DELETE, then insert
25  |   INSERT (requires no records)

Best Answer

Short answer: IODKU (as you have) does it all in one, efficient, SQL; Use it.

Long answer:

Please tighten up the statement of the problem. As I read it...

There is a list of 'things', uniquely identified by account_id + product_id and a value, rating.

Choices:

IODKU (aka Upsert, essentially a blend of INSERT and UPDATE), like your example, is optimal for updating the for new ratings (for given account and product) or adding tuple new tuple. Ane that statement does I either the INSERT or UPDATE in a single statement. This seems to be what you need; was there a problem with it? IODKU does require in your case that the pair accunt_id and product_id be the PRIMARY KEY or a UNIQUE key; that way it can correctly decide between UPDATE (if the row exists) or INSERT (if it is new). (This should address "to allow 4 input values for the single product; I don't know what would happen". Namely; it depends on what PK or Unique key you have.)
REPLACE is especially bad since it is two somewhat independent steps: DELETE and INSERT.
SELECT, then INSERT/UPDATE -- several lines of client code; IODKU blends the two together. Internally, INSERT and UDPATE do something like SELECT to find where the row is, or should be, on the disk.
DELETE first -- that's akin to REPLACE

There is no "multi-row UPDATE". However, IODKU provides that functionality and it is not too kludgy.

In InnoDB, note that IODKU, as with many multi-row statement, is all-or-none (even without BEGIN and COMMIT around it). If for some reason, you need it to do some of the rows but avoid locking on others, there is no option for such. The IGNORE option lets INSERT skip rows that are already in the table (according to the PK or a unique key).

Correction to the previous paragraph. MySQL 8.0 and MariaDB have SELECT ... NOWAIT and SKIP LOCKED. I am not fluent in these options; they might be useful to you.

In my opinion, the best tool for peering into the inner workings is via the "Handler%" values in SHOW SESSION STATUS;. They are a bit tricky to get. Here is a discussion of such: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#handler_counts It counts the actual number of reads/writes (whether cached or not). It is useful for comparing two techniques -- fewer reads/writes, means probably faster.

Statements have overhead. So fewer statements usually means faster code overall. A specific metric: Using multi-row INSERTs runs at a speed approaching 10 times as fast as one row per INSERT.

"checking if the connection is viable before issuing rollback" -- If the connection is lost, a ROLLBACK is done for you. A spurious ROLLBACK is harmless. Do not enable "auto-reconnect", it defeats the auto-rollback-on-disconnect. Do check for errors (or try-catch) after each SQL.

Related Solutions

Mysql – Difference between show status and show global status in thesql

There is most definitely a difference between SHOW STATUS; and SHOW GLOBAL STATUS;

SHOW GLOBAL STATUS; will give you status variables that have updated since mysqld started for all sessions that are connected or have ever been connected.
SHOW STATUS; will give you status variables that have updated within your session. The command can also be expressed as SHOW SESSION STATUS; (As the MySQL Documentation says, it displays the status values for the current connection).

To physically show the difference, the information_schema database has them separated as

INFORMATION_SCHEMA.GLOBAL_STATUS
INFORMATION_SCHEMA.SESSION_STATUS

These information_schema tables have been around since MySQL 5.1.12.

Why the difference?

To thoroughly demonstrate the difference, let me run an INNER JOIN of these tables to show which values are different. Here is the query:

SELECT a.variable_name,a.variable_value,b.variable_value
FROM information_schema.global_status A INNER join information_schema.session_status B
USING (variable_name) WHERE A.variable_value <> B.variable_value;

Please note the output:

mysql> SELECT A.variable_name,A.variable_value,B.variable_value
    -> FROM information_schema.global_status A INNER join information_schema.session_status B
    -> USING (variable_name) WHERE A.variable_value <> B.variable_value;
+---------------------------+----------------+----------------+
| variable_name             | variable_value | variable_value |
+---------------------------+----------------+----------------+
| BYTES_RECEIVED            | 123641576598   | 7757           |
| BYTES_SENT                | 149888451047   | 300001         |
| COM_ADMIN_COMMANDS        | 121915         | 0              |
| COM_ALTER_TABLE           | 111            | 0              |
| COM_BEGIN                 | 1              | 0              |
| COM_CALL_PROCEDURE        | 530            | 0              |
| COM_CHANGE_DB             | 1623           | 1              |
| COM_COMMIT                | 19220          | 0              |
| COM_CREATE_FUNCTION       | 4              | 0              |
| COM_CREATE_PROCEDURE      | 126            | 0              |
| COM_CREATE_TABLE          | 354            | 0              |
| COM_DEALLOC_SQL           | 924            | 0              |
| COM_DELETE                | 64668          | 0              |
| COM_DELETE_MULTI          | 19             | 0              |
| COM_DROP_FUNCTION         | 4              | 0              |
| COM_DROP_PROCEDURE        | 148            | 0              |
| COM_DROP_TABLE            | 238            | 0              |
| COM_EXECUTE_SQL           | 945            | 0              |
| COM_INSERT                | 1182379        | 0              |
| COM_INSERT_SELECT         | 40673          | 0              |
| COM_KILL                  | 68             | 0              |
| COM_LOAD                  | 22386          | 0              |
| COM_LOCK_TABLES           | 2              | 0              |
| COM_OPTIMIZE              | 2              | 0              |
| COM_PREPARE_SQL           | 948            | 0              |
| COM_REPAIR                | 8              | 0              |
| COM_REPLACE               | 34737          | 0              |
| COM_ROLLBACK              | 13             | 0              |
| COM_SELECT                | 1107225018     | 65             |
| COM_SET_OPTION            | 602159         | 0              |
| COM_SHOW_BINLOGS          | 8              | 0              |
| COM_SHOW_CHARSETS         | 12             | 0              |
| COM_SHOW_COLLATIONS       | 100            | 0              |
| COM_SHOW_CREATE_DB        | 6              | 0              |
| COM_SHOW_CREATE_FUNC      | 2453           | 0              |
| COM_SHOW_CREATE_PROC      | 5684           | 0              |
| COM_SHOW_CREATE_TABLE     | 1313           | 0              |
| COM_SHOW_DATABASES        | 275            | 0              |
| COM_SHOW_EVENTS           | 1              | 0              |
| COM_SHOW_FIELDS           | 13666          | 1              |
| COM_SHOW_FUNCTION_STATUS  | 362            | 0              |
| COM_SHOW_KEYS             | 494            | 0              |
| COM_SHOW_PLUGINS          | 2              | 0              |
| COM_SHOW_PROCEDURE_STATUS | 361            | 0              |
| COM_SHOW_PROCESSLIST      | 488943         | 15             |
| COM_SHOW_SLAVE_STATUS     | 4              | 0              |
| COM_SHOW_STATUS           | 12315          | 10             |
| COM_SHOW_STORAGE_ENGINES  | 30             | 0              |
| COM_SHOW_TABLE_STATUS     | 320            | 0              |
| COM_SHOW_TABLES           | 584            | 0              |
| COM_SHOW_TRIGGERS         | 2              | 0              |
| COM_SHOW_VARIABLES        | 190            | 1              |
| COM_STMT_CLOSE            | 924            | 0              |
| COM_STMT_EXECUTE          | 945            | 0              |
| COM_STMT_PREPARE          | 948            | 0              |
| COM_TRUNCATE              | 522            | 0              |
| COM_UNLOCK_TABLES         | 2              | 0              |
| COM_UPDATE                | 496041         | 0              |
| COM_UPDATE_MULTI          | 625            | 0              |
| CREATED_TMP_DISK_TABLES   | 16772          | 40             |
| CREATED_TMP_TABLES        | 34336          | 63             |
| HANDLER_COMMIT            | 1109540769     | 0              |
| HANDLER_DELETE            | 12775993       | 0              |
| HANDLER_EXTERNAL_LOCK     | 2228108102     | 24             |
| HANDLER_PREPARE           | 2155764        | 0              |
| HANDLER_READ_FIRST        | 23586          | 9              |
| HANDLER_READ_KEY          | 18285349400    | 0              |
| HANDLER_READ_LAST         | 13000          | 0              |
| HANDLER_READ_NEXT         | 72142303428    | 0              |
| HANDLER_READ_PREV         | 3000146        | 0              |
| HANDLER_READ_RND          | 1261418742     | 156            |
| HANDLER_READ_RND_NEXT     | 12320861765    | 7845           |
| HANDLER_ROLLBACK          | 269376         | 0              |
| HANDLER_UPDATE            | 2596924399     | 0              |
| HANDLER_WRITE             | 8200421074     | 8241           |
| LAST_QUERY_COST           | 0.000000       | 21.399123      |
| LAST_QUERY_PARTIAL_PLANS  | 0              | 3              |
| OPENED_TABLE_DEFINITIONS  | 2482           | 0              |
| OPENED_TABLES             | 3619           | 0              |
| QUESTIONS                 | 1110214247     | 97             |
| SELECT_FULL_JOIN          | 615            | 9              |
| SELECT_RANGE              | 243635         | 0              |
| SELECT_SCAN               | 47851          | 53             |
| SLOW_QUERIES              | 29290          | 50             |
| SORT_MERGE_PASSES         | 6              | 0              |
| SORT_RANGE                | 179956         | 0              |
| SORT_ROWS                 | 321609927      | 156            |
| SORT_SCAN                 | 1829           | 39             |
| TABLE_OPEN_CACHE_HITS     | 1109365721     | 13             |
| TABLE_OPEN_CACHE_MISSES   | 1669           | 0              |
+---------------------------+----------------+----------------+
90 rows in set (0.03 sec)

mysql>

Look at four variables:

| BYTES_RECEIVED            | 123641576598   | 7757           |
| BYTES_SENT                | 149888451047   | 300001         |
| COM_ADMIN_COMMANDS        | 121915         | 0              |
| COM_SHOW_PROCESSLIST      | 488955         | 15             |

What does this tell you?

mysqld received 123,641,576,598 bytes (115.15GB) from all DB Connections since mysqld started
The session I ran the query with received 7,757 bytes (a little over 7K) in my current session
mysqld sent 149,888,451,047 bytes (139.59GB) from all DB Connections since mysqld started
The session I ran the query with sent 300,001 bytes (a little under 297K) in my current session
There have been 121,915 administrative commands that have run since mysqld started
There have benn 0 administrative commands that have run in my current session
The command SHOW PROCESSLIST has been run 488,955 times since mysqld started
The command SHOW PROCESSLIST has been run 15 times in my current session

You can compare the other 86 varibales and interpret them the same way.

Give it a Try !!!

MySQL ON DUPLICATE KEY UPDATE performance issue

Try to start a transaction or set autocommit=false prior the updates.

start transaction;
.... a lot of updates here ...
commit;

Also, the swapping may yield to the lack of physical RAM which occurs in some situations (big resources created by MySQL).

You may also try to
- increase the "key_buffer_size" to the maximum permitted by your SO and MySQL version (check manual). This will improve index updates.
- reduce "max_heap_table_size" (128MB or less) and increase "tmp_table_size" (the bigger, the better).
The idea is to avoid swappiness when some resources needs too much RAM, by sending the work to on-disk temporary tables by default. Some operations will suffer on speed (the bigger ones), but in whole the server will run smoother.

Of course, you may have a lot of memory so a perfect answer cannot be supplied without all informations, at least RAM quantity and server variables, because your problems may start there.