Thesql deadlock when update some items whose id are selected from another table

deadlockinnodbjavaMySQL

I'm sorry, I just can speak a little English.

I have two machines(10.10.102.101, 10.10.102.104) which connect to the same mysql datebase.

This is a table and deadlock happened on it.

mysql> mysql> SHOW CREATE TABLE TxTimeout \G;
*************************** 1. row ***************************
       Table: TxTimeout
Create Table: CREATE TABLE `TxTimeout` (
  `surrogateId` bigint(20) NOT NULL AUTO_INCREMENT,
  `eventId` bigint(20) NOT NULL,
  `serviceName` varchar(36) NOT NULL,
  `instanceId` varchar(36) NOT NULL,
  `globalTxId` varchar(36) NOT NULL,
  `localTxId` varchar(36) NOT NULL,
  `parentTxId` varchar(36) DEFAULT NULL,
  `type` varchar(50) NOT NULL,
  `expiryTime` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `status` varchar(12) DEFAULT NULL,
  `version` bigint(20) NOT NULL,
  PRIMARY KEY (`surrogateId`),
  UNIQUE KEY `eventId` (`eventId`),
  KEY `saga_timeouts_index` (`surrogateId`, `expiryTime`, `globalTxId`, `localTxId`, `status`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

This is another table TxEvent.

mysql> SHOW CREATE TABLE TxEvent \G;
*************************** 1. row ***************************
       Table: TxEvent
Create Table: CREATE TABLE `TxEvent` (
  `surrogateId` bigint(20) NOT NULL AUTO_INCREMENT,
  `serviceName` varchar(36) NOT NULL,
  `instanceId` varchar(36) NOT NULL,
  `creationTime` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `globalTxId` varchar(36) NOT NULL,
  `localTxId` varchar(36) NOT NULL,
  `parentTxId` varchar(36) DEFAULT NULL,
  `type` varchar(50) NOT NULL,
  `compensationMethod` varchar(256) NOT NULL,
  `expiryTime` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `payloads` blob,
  `retries` int(11) NOT NULL DEFAULT '0',
  `retryMethod` varchar(256) DEFAULT NULL,
  PRIMARY KEY (`surrogateId`),
  KEY `saga_events_index` (`surrogateId`,`globalTxId`,`localTxId`,`type`,`expiryTime`),
  KEY `saga_global_tx_index` (`globalTxId`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

Isolation level is RR.

I use “show engine innodb status \G” and get the following message:

------------------------
LATEST DETECTED DEADLOCK
------------------------
2018-10-30 15:14:00 0x7f9199a58700
*** (1) TRANSACTION:
TRANSACTION 4719387, ACTIVE 0 sec starting index read
mysql tables in use 2, locked 2
LOCK WAIT 13 lock struct(s), heap size 1136, 55 row lock(s), undo log entries 1
MySQL thread id 334, OS thread handle 140263340189440, query id 549588 10.10.102.101 root Sending data
UPDATE TxTimeout, TL_TxTimeout SET TxTimeout.STATUS = TL_TxTimeout.STATUS, TxTimeout.VERSION = TL_TxTimeout.VERSION WHERE TxTimeout.SURROGATEID = TL_TxTimeout.SURROGATEID
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 147 page no 3 n bits 136 index PRIMARY of table `saga`.`TxTimeout` trx id 4719387 lock_mode X locks rec but not gap waiting
……
*** (2) TRANSACTION:
TRANSACTION 4719388, ACTIVE 0 sec starting index read
mysql tables in use 2, locked 2
13 lock struct(s), heap size 1136, 55 row lock(s), undo log entries 1
MySQL thread id 411, OS thread handle 140263324747520, query id 549587 10.10.102.104 root Sending data
UPDATE TxTimeout, TL_TxTimeout SET TxTimeout.STATUS = TL_TxTimeout.STATUS, TxTimeout.VERSION = TL_TxTimeout.VERSION WHERE TxTimeout.SURROGATEID = TL_TxTimeout.SURROGATEID
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 147 page no 3 n bits 136 index PRIMARY of table `saga`.`TxTimeout` trx id 4719388 lock mode S
……
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 147 page no 3 n bits 136 index PRIMARY of table `saga`.`TxTimeout` trx id 4719388 lock_mode X locks rec but not gap waiting
……

The code in project is

import org.springframework.data.jpa.repository.Modifying;
import org.springframework.data.jpa.repository.Query;
import org.springframework.data.repository.CrudRepository;

interface TxTimeoutEntityRepository extends CrudRepository<TxTimeout, Long> {
  @Transactional   
  @Modifying(clearAutomatically = true)   
  @Query("UPDATE TxTimeout t "
      + "SET t.status = 'DONE' "
      + "WHERE t.status != 'DONE' AND EXISTS ("
      + "  SELECT t1.globalTxId FROM TxEvent t1 "
      + "  WHERE t1.globalTxId = t.globalTxId "
      + "    AND t1.localTxId = t.localTxId "
      + "    AND t1.type != t.type"
      + ")")   void updateStatusOfFinishedTx();
}

Then I use command "explain" to analyze this sql statement, but it makes me confused.

mysql> explain UPDATE TxTimeout t SET t.status = 'DONE' WHERE t.status != 'DONE' AND EXISTS (SELECT t1.globalTxId FROM TxEvent t1 WHERE t1.globalTxId = t.globalTxId AND t1.localTxId = t.localTxId and t1.type != t.type);
+----+--------------------+-------+------------+-------+----------------------+----------------------+---------+-------------------+------+----------+-------------+
| id | select_type        | table | partitions | type  | possible_keys        | key                  | key_len | ref               | rows | filtered | Extra       |
+----+--------------------+-------+------------+-------+----------------------+----------------------+---------+-------------------+------+----------+-------------+
|  1 | UPDATE             | t     | NULL       | index | NULL                 | PRIMARY              | 8       | NULL              |    1 |   100.00 | Using where |
|  2 | DEPENDENT SUBQUERY | t1    | NULL       | ref   | saga_global_tx_index | saga_global_tx_index | 110     | saga.t.globalTxId |    7 |     9.00 | Using where |
+----+--------------------+-------+------------+-------+----------------------+----------------------+---------+-------------------+------+----------+-------------+

It seems to use primary key of TxTimeout for update, so it will get X locks. but where does S lock come from?

At the same time, I find the sql statement in logs is not same with the sql statement in code.
So I read "general_log" and find the sql statement has been translated into a transaction.

431 Query   SET autocommit=0 
431 Query   CREATE TEMPORARY TABLE IF NOT EXISTS TL_TxTimeout (SURROGATEID BIGINT NOT NULL, EVENTID BIGINT, EXPIRYTIME DATETIME, GLOBALTXID VARCHAR(255), INSTANCEID VARCHAR(255), LOCALTXID VARCHAR(255), PARENTTXID VARCHAR(255), SERVICENAME VARCHAR(255), STATUS VARCHAR(255), TYPE VARCHAR(255), VERSION BIGINT, PRIMARY KEY (SURROGATEID)) 
431 Query   INSERT INTO TL_TxTimeout (SURROGATEID, STATUS, VERSION) SELECT t0.SURROGATEID, 'DONE', (t0.VERSION + 1) FROM TxTimeout t0 WHERE ((t0.STATUS <> 'DONE') AND EXISTS (SELECT 1 FROM TxEvent t1 WHERE (((t1.GLOBALTXID = t0.GLOBALTXID) AND (t1.LOCALTXID = t0.LOCALTXID)) AND (t1.TYPE <> t0.TYPE))) ) 
431 Query   UPDATE TxTimeout, TL_TxTimeout SET TxTimeout.STATUS = TL_TxTimeout.STATUS, TxTimeout.VERSION = TL_TxTimeout.VERSION WHERE TxTimeout.SURROGATEID = TL_TxTimeout.SURROGATEID 
431 Query   DELETE FROM TL_TxTimeout 
431 Query   commit
431 Query   SET autocommit=1

and I use command "explain" to analyze two statements of the transaction,but I still cannot understand how the deadlock happened. Who can give me some advice?

mysql> explain INSERT INTO TL_TxTimeout (SURROGATEID, STATUS, VERSION) SELECT t0.SURROGATEID, 'DONE', (t0.VERSION + 1) FROM TxTimeout t0 WHERE ((t0.STATUS <> 'DONE') AND EXISTS (SELECT 1 FROM TxEvent t1 WHERE (((t1.GLOBALTXID = t0.GLOBALTXID) AND (t1.LOCALTXID = t0.LOCALTXID)) AND (t1.TYPE <> t0.TYPE))) );
+----+--------------------+--------------+------------+------+----------------------+----------------------+---------+--------------------+------+----------+-------------+
| id | select_type        | table        | partitions | type | possible_keys        | key                  | key_len | ref                | rows | filtered | Extra       |
+----+--------------------+--------------+------------+------+----------------------+----------------------+---------+--------------------+------+----------+-------------+
|  1 | INSERT             | TL_TxTimeout | NULL       | ALL  | NULL                 | NULL                 | NULL    | NULL               | NULL |     NULL | NULL        |
|  1 | PRIMARY            | t0           | NULL       | ALL  | NULL                 | NULL                 | NULL    | NULL               |    1 |   100.00 | Using where |
|  2 | DEPENDENT SUBQUERY | t1           | NULL       | ref  | saga_global_tx_index | saga_global_tx_index | 110     | saga.t0.globalTxId |    7 |     9.00 | Using where |
+----+--------------------+--------------+------------+------+----------------------+----------------------+---------+--------------------+------+----------+-------------+

mysql> explain UPDATE TxTimeout, TL_TxTimeout SET TxTimeout.STATUS = TL_TxTimeout.STATUS, TxTimeout.VERSION = TL_TxTimeout.VERSION WHERE TxTimeout.SURROGATEID = TL_TxTimeout.SURROGATEID ;
+----+-------------+--------------+------------+--------+-----------------------------+---------+---------+----------------------------+------+----------+-------+
| id | select_type | table        | partitions | type   | possible_keys               | key     | key_len | ref                        | rows | filtered | Extra |
+----+-------------+--------------+------------+--------+-----------------------------+---------+---------+----------------------------+------+----------+-------+
|  1 | UPDATE      | TxTimeout    | NULL       | ALL    | PRIMARY,saga_timeouts_index | NULL    | NULL    | NULL                       |    1 |   100.00 | NULL  |
|  1 | SIMPLE      | TL_TxTimeout | NULL       | eq_ref | PRIMARY                     | PRIMARY | 8       | saga.TxTimeout.surrogateId |    1 |   100.00 | NULL  |
+----+-------------+--------------+------------+--------+-----------------------------+---------+---------+----------------------------+------+----------+-------+

Best Answer

  • Deadlocks happen; plan for them by checking and rerunning the transaction.
  • How many rows in TL_TxTimeout? Are those rows sorted the same way? I suspect they are not, hence getting a classic deadlock. Sorting may significantly decrease the frequency of deadlocks.
  • If those VARCHARs are UUIDs, make them CHARACTER SET ascii, not utf8.
  • Please provide SHOW CREATE TABLE TxEvent.
  • TxEvent needs INDEX(globalTxId, t1.localTxId) (in either order).
  • The secondary key in TxTimeout is useless since its first column is the same as the PRIMARY KEY.
  • Thanks for all the research (EXPLAINs, general log, etc).
  • What does @Modifying(clearAutomatically = true) mean? Are you saying that you did not explicitly create TL_TxTimeout? I'm not impressed with whatever was creating it.
  • Where could (t0.VERSION + 1) be coming from?