MySQL: delete…where..in( ) vs delete..from..join, and locked tables on delete with subselect

deleteindexjoin;MySQLoptimization

Disclaimer: please excuse my lack of knowledge about database internals. Here it goes:

We run an application (not written by us) which has a big performance problem in a periodic cleanup job in the database. The query looks like this:

delete from VARIABLE_SUBSTITUTION where BUILDRESULTSUMMARY_ID in (
       select BUILDRESULTSUMMARY_ID from BUILDRESULTSUMMARY
       where BUILDRESULTSUMMARY.BUILD_KEY = "BAM-1");

Straight forward, easy to read, and standard SQL. But unfortunately very slow.
Explaining the query shows that the existing index on VARIABLE_SUBSTITUTION.BUILDRESULTSUMMARY_ID is not used:

mysql> explain delete from VARIABLE_SUBSTITUTION where BUILDRESULTSUMMARY_ID in (
    ->        select BUILDRESULTSUMMARY_ID from BUILDRESULTSUMMARY
    ->        where BUILDRESULTSUMMARY.BUILD_KEY = "BAM-1");
| id | select_type        | table                 | type            | possible_keys                    | key     | key_len | ref  | rows    | Extra       |
+----+--------------------+-----------------------+-----------------+----------------------------------+---------+---------+------+---------+-------------+
|  1 | PRIMARY            | VARIABLE_SUBSTITUTION | ALL             | NULL                             | NULL    | NULL    | NULL | 7300039 | Using where |
|  2 | DEPENDENT SUBQUERY | BUILDRESULTSUMMARY    | unique_subquery | PRIMARY,key_number_results_index | PRIMARY | 8       | func |       1 | Using where |

This makes it very slow (120 seconds and more). In addition to that, it seems to block queries that try to insert into BUILDRESULTSUMMARY, output from show engine innodb status:

---TRANSACTION 68603695, ACTIVE 157 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 2 lock struct(s), heap size 360, 1 row lock(s)
MySQL thread id 127964, OS thread handle 0x7facd0670700, query id 956555826 localhost 127.0.0.1 bamboosrv updating
update BUILDRESULTSUMMARY set CREATED_DATE='2015-06-18 09:22:05', UPDATED_DATE='2015-06-18 09:22:32', BUILD_KEY='BLA-RELEASE1-JOB1', BUILD_NUMBER=8, BUILD_STATE='Unknown', LIFE_CYCLE_STATE='InProgress', BUILD_DATE='2015-06-18 09:22:31.792', BUILD_CANCELLED_DATE=null, BUILD_COMPLETED_DATE='2015-06-18 09:52:02.483', DURATION=1770691, PROCESSING_DURATION=1770691, TIME_TO_FIX=null, TRIGGER_REASON='com.atlassian.bamboo.plugin.system.triggerReason:CodeChangedTriggerReason', DELTA_STATE=null, BUILD_AGENT_ID=199688199, STAGERESULT_ID=230943366, RESTART_COUNT=0, QUEUE_TIME='2015-06-18 09:22:04.52
------- TRX HAS BEEN WAITING 157 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 38 page no 30140 n bits 112 index `PRIMARY` of table `bamboong`.`BUILDRESULTSUMMARY` trx id 68603695 lock_mode X locks rec but not gap waiting
------------------
---TRANSACTION 68594818, ACTIVE 378 sec starting index read
mysql tables in use 2, locked 2
646590 lock struct(s), heap size 63993384, 3775190 row lock(s), undo log entries 117
MySQL thread id 127845, OS thread handle 0x7facc6bf8700, query id 956652201 localhost 127.0.0.1 bamboosrv preparing
delete from VARIABLE_SUBSTITUTION  where BUILDRESULTSUMMARY_ID in   (select BUILDRESULTSUMMARY_ID from BUILDRESULTSUMMARY where BUILDRESULTSUMMARY.BUILD_KEY = 'BLA-BLUBB10-SON')

This slows down the system and forced us to increase innodb_lock_wait_timeout.

As we run MySQL, we rewrote the delete query to use "delete from join":

delete VARIABLE_SUBSTITUTION from VARIABLE_SUBSTITUTION join BUILDRESULTSUMMARY
   on VARIABLE_SUBSTITUTION.BUILDRESULTSUMMARY_ID = BUILDRESULTSUMMARY.BUILDRESULTSUMMARY_ID
   where BUILDRESULTSUMMARY.BUILD_KEY = "BAM-1";

This is slightly less easier to read, unfortunately no standard SQL (as far as I was able to find out), but a lot faster (0.02 seconds or so) as it uses the index:

mysql> explain delete VARIABLE_SUBSTITUTION from VARIABLE_SUBSTITUTION join BUILDRESULTSUMMARY
    ->    on VARIABLE_SUBSTITUTION.BUILDRESULTSUMMARY_ID = BUILDRESULTSUMMARY.BUILDRESULTSUMMARY_ID
    ->    where BUILDRESULTSUMMARY.BUILD_KEY = "BAM-1";
| id | select_type | table                 | type | possible_keys                    | key                      | key_len | ref                                                    | rows | Extra                    |
+----+-------------+-----------------------+------+----------------------------------+--------------------------+---------+--------------------------------------------------------+------+--------------------------+
|  1 | SIMPLE      | BUILDRESULTSUMMARY    | ref  | PRIMARY,key_number_results_index | key_number_results_index | 768     | const                                                  |    1 | Using where; Using index |
|  1 | SIMPLE      | VARIABLE_SUBSTITUTION | ref  | var_subst_result_idx             | var_subst_result_idx     | 8       | bamboo_latest.BUILDRESULTSUMMARY.BUILDRESULTSUMMARY_ID |   26 | NULL                     |

Additional info:

mysql> SHOW CREATE TABLE VARIABLE_SUBSTITUTION;
| Table                 | Create Table |
| VARIABLE_SUBSTITUTION | CREATE TABLE `VARIABLE_SUBSTITUTION` (
  `VARIABLE_SUBSTITUTION_ID` bigint(20) NOT NULL,
  `VARIABLE_KEY` varchar(255) COLLATE utf8_bin NOT NULL,
  `VARIABLE_VALUE` varchar(4000) COLLATE utf8_bin DEFAULT NULL,
  `VARIABLE_TYPE` varchar(255) COLLATE utf8_bin DEFAULT NULL,
  `BUILDRESULTSUMMARY_ID` bigint(20) NOT NULL,
  PRIMARY KEY (`VARIABLE_SUBSTITUTION_ID`),
  KEY `var_subst_result_idx` (`BUILDRESULTSUMMARY_ID`),
  KEY `var_subst_type_idx` (`VARIABLE_TYPE`),
  CONSTRAINT `FK684A7BE0A958B29F` FOREIGN KEY (`BUILDRESULTSUMMARY_ID`) REFERENCES `BUILDRESULTSUMMARY` (`BUILDRESULTSUMMARY_ID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin |

mysql> SHOW CREATE TABLE BUILDRESULTSUMMARY;
| Table              | Create Table |
| BUILDRESULTSUMMARY | CREATE TABLE `BUILDRESULTSUMMARY` (
  `BUILDRESULTSUMMARY_ID` bigint(20) NOT NULL,
....
  `SKIPPED_TEST_COUNT` int(11) DEFAULT NULL,
  PRIMARY KEY (`BUILDRESULTSUMMARY_ID`),
  KEY `FK26506D3B9E6537B` (`CHAIN_RESULT`),
  KEY `FK26506D3BCCACF65` (`MERGERESULT_ID`),
  KEY `key_number_delta_state` (`DELTA_STATE`),
  KEY `brs_build_state_idx` (`BUILD_STATE`),
  KEY `brs_life_cycle_state_idx` (`LIFE_CYCLE_STATE`),
  KEY `brs_deletion_idx` (`MARKED_FOR_DELETION`),
  KEY `brs_stage_result_id_idx` (`STAGERESULT_ID`),
  KEY `key_number_results_index` (`BUILD_KEY`,`BUILD_NUMBER`),
  KEY `brs_agent_idx` (`BUILD_AGENT_ID`),
  KEY `rs_ctx_baseline_idx` (`VARIABLE_CONTEXT_BASELINE_ID`),
  KEY `brs_chain_result_summary_idx` (`CHAIN_RESULT`),
  KEY `brs_log_size_idx` (`LOG_SIZE`),
  CONSTRAINT `FK26506D3B9E6537B` FOREIGN KEY (`CHAIN_RESULT`) REFERENCES `BUILDRESULTSUMMARY` (`BUILDRESULTSUMMARY_ID`),
  CONSTRAINT `FK26506D3BCCACF65` FOREIGN KEY (`MERGERESULT_ID`) REFERENCES `MERGE_RESULT` (`MERGERESULT_ID`),
  CONSTRAINT `FK26506D3BCEDEEF5F` FOREIGN KEY (`STAGERESULT_ID`) REFERENCES `CHAIN_STAGE_RESULT` (`STAGERESULT_ID`),
  CONSTRAINT `FK26506D3BE3B5B062` FOREIGN KEY (`VARIABLE_CONTEXT_BASELINE_ID`) REFERENCES `VARIABLE_CONTEXT_BASELINE` (`VARIABLE_CONTEXT_BASELINE_ID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin |

(some stuff omitted, it is quite a wide table).

So I have a few questions about this:

  • why is the query optimizer not able to use the index for deleting when the subquery version, while it is while using the join version?
  • is there any (ideally standards conform) way to trick it into using the index? or
  • is there a portable way to write a delete from join? The application supports PostgreSQL, MySQL, Oracle and Microsoft SQL Server, used via jdbc and Hibernate.
  • why is the delete from VARIABLE_SUBSTITUTION blocking inserts into BUILDRESULTSUMMARY, which is only used in the subselect?

Best Answer

  • why is the query optimizer not able to use the index for deleting when the subquery version, while it is while using the join version?

Because the optimizer is/was a bit dumb in that regard. Not only for DELETE and UPDATE but for SELECT statements as well, anything like WHERE column IN (SELECT ...) was not fully optimized. The execution plan usually involved running the subquery for every row of the external table (VARIABLE_SUBSTITUTION in this case). If that table is small, everything is fine. If it's large, no hope. In even older versions, a IN subquery with an IN sub-subquery would make even the EXPLAIN to run for ages.

What you can do - if you want to keep this query - is to use the latest versions that have implemented several optimizations and test again. Latest versions meaning: MySQL 5.6 (and 5.7 when it comes out of beta) and MariaDB 5.5 / 10.0

(update) You already use 5.6 which has optimization improvements, and this one is relevant: Optimizing Subqueries with Semi-Join Transformations
I suggest adding an index on (BUILD_KEY) alone. There is a composite one but that's not very useful for this query.

  • is there any (ideally standards conform) way to trick it into using the index?

None that I can think of. In my opinion, there is not much worth in trying to use standard SQL. There are so many differences and minor quirks that each DBMS has (UPDATE and DELETE statements are good examples of such differences) that when you try to use something that works everywhere, the result is a very limited subset of SQL.

  • is there a portable way to write a delete from join? The application supports PostgreSQL, MySQL, Oracle and Microsoft SQL Server, used via jdbc and Hibernate.

Same answer as the previous question.

  • why is the delete from VARIABLE_SUBSTITUTION blocking inserts into BUILDRESULTSUMMARY, which is only used in the subselect?

Not 100% sure but I think it has to do with running the subquery multiple times and what type of locks it is taking on the table.