Mysql – Unable to understand this thesql deadlock log

deadlockinnodbMySQLmysql-5.7

I am getting a good number of these deadlocks (low % of total requests, but still high in absolute errors). For all the deadlocks, the pattern is very similar; but I am unable to understand why there is a deadlock

------------------------
LATEST DETECTED DEADLOCK
------------------------
2020-02-05 03:52:47 0x2b38b7b91700
*** (1) TRANSACTION:
TRANSACTION 257063481717, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 1
MySQL thread id 4990288, OS thread handle 47474114635520, query id 359069256602 172.26.107.139 update
INSERT INTO `message_user_records` (`message_template_id`, `user_id`, `created_at`, `updated_at`, `expires_at`, `token`) VALUES (1294, 26396574, '2020-02-05 03:52:47', '2020-02-05 03:52:47', '2020-02-05 10:52:47', 'XTBLUQXF2GVPW') 
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 676 page no 548680 n bits 1120 index index_message_user_records_on_user_id_and_dismissed of table `ebdb`.`message_user_records` trx id 257063481717 lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 124 PHYSICAL RECORD: n_fields 3; compact format; info bits 0
 0: len 4; hex 8192c7a2; asc     ;;
 1: len 1; hex 81; asc  ;;
 2: len 4; hex 83fc1ead; asc     ;;

*** (2) TRANSACTION:
TRANSACTION 257063481715, ACTIVE 0 sec inserting, thread declared inside InnoDB 5000
mysql tables in use 1, locked 1
3 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 1
MySQL thread id 4991120, OS thread handle 47522600523520, query id 359069256588 172.26.144.135 update
INSERT INTO `message_user_records` (`message_template_id`, `user_id`, `created_at`, `updated_at`, `expires_at`, `token`) VALUES (1294, 26396574, '2020-02-05 03:52:47', '2020-02-05 03:52:47', '2020-02-05 10:52:47', 'QA4DL6VSMTWR4')
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 676 page no 548680 n bits 1120 index index_message_user_records_on_user_id_and_dismissed of table `ebdb`.`message_user_records` trx id 257063481715 lock_mode X locks gap before rec
Record lock, heap no 124 PHYSICAL RECORD: n_fields 3; compact format; info bits 0
 0: len 4; hex 8192c7a2; asc     ;;
 1: len 1; hex 81; asc  ;;
 2: len 4; hex 83fc1ead; asc     ;;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 676 page no 548680 n bits 1120 index index_message_user_records_on_user_id_and_dismissed of table `ebdb`.`message_user_records` trx id 257063481715 lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 124 PHYSICAL RECORD: n_fields 3; compact format; info bits 0
 0: len 4; hex 8192c7a2; asc     ;;
 1: len 1; hex 81; asc  ;;
 2: len 4; hex 83fc1ead; asc     ;;

*** WE ROLL BACK TRANSACTION (2)

If I am understanding correctly, txn2 holds the lock that txn1 is waiting for. But the lock that txn2 is waiting for seems to be the same one it already holds. What am I missing here?

For context on the application code, it seems to be doing these statements –

BEGIN;
SELECT `message_user_records`.* FROM `message_user_records` WHERE `message_user_records`.`user_id` = <some-id> AND `message_user_records`.`message_template_id` = <some-another-id> FOR UPDATE;

...

INSERT INTO `message_user_records` (`message_template_id`, `user_id`, `created_at`, `updated_at`, `expires_at`, `token`) VALUES (list of values with same ids as above select) 
COMMIT

Edit

Here's the table description

CREATE TABLE `message_user_records` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `message_template_id` int(11) NOT NULL,
  `user_id` int(11) NOT NULL,
  `dismissed` tinyint(1) DEFAULT '0',
  `input` varchar(191) COLLATE utf8mb4_bin DEFAULT NULL,
  `created_at` datetime NOT NULL,
  `updated_at` datetime NOT NULL,
  `expires_at` datetime DEFAULT NULL,
  `token` varchar(191) COLLATE utf8mb4_bin DEFAULT NULL,
  `user_rating` int(11) DEFAULT '0',
  `option_1` tinyint(1) DEFAULT '0',
  `option_2` tinyint(1) DEFAULT '0',
  `option_3` tinyint(1) DEFAULT '0',
  `option_4` tinyint(1) DEFAULT '0',
  `option_5` tinyint(1) DEFAULT '0',
  `option_6` tinyint(1) DEFAULT '0',
  PRIMARY KEY (`id`),
  UNIQUE KEY `index_message_user_records_on_token` (`token`) USING BTREE,
  KEY `index_message_user_records_on_message_template_id` (`message_template_id`) USING BTREE,
  KEY `index_message_user_records_on_user_id_and_dismissed` (`user_id`,`dismissed`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;

Best Answer

Understanding -- I can't explain the details, but I think this is the big clue: "lock_mode X locks gap before rec insert".

Avoiding

Are there other statements in each transaction? They may have impact.

The two INSERTs are hitting the same pair of columns that you are SELECTing FOR UPDATE: (1294, 26396574)

You are only providing a conflicting token for that pair.

Please explain the business logic that calls for two different tokens.

You could try replacing the KEY(message_template_id) with INDEX(message_template_id, user_id). It would potentially make the SELECT run faster. That might help the transactions get finished faster, hence diminishing the frequency of, but not eliminating, deadlocks.

A more likely solution is to stretch the index further, into

INDEX(message_template_id, user_id, token)

This may eliminate the deadlock, perhaps by turning it into a "lock wait timeout".

But, in any case, you should plan on getting deadlocks, test for them in your code, and have the code re-play the transaction.