Mysql – DELETE … IN hangs after deleting half of a large table (even for deleting 10 rows)

archivedeletemariadbMySQLtemporary-tables

I'm having trouble with a DELETE hanging, even for a small number of rows (10) in a large MariaDB InnoDB table.

Background

I had a large table (~400,000,000 rows) that needed to have rows archived and removed. After removing about half of the rows, my archive process started to hang on the DELETE statement. I took the time to run the (painfully slow) 11-hour OPTIMIZE (ALTER for InnoDB) query to reclaim disk space and hopefully unblock the DELETE process. This worked until I deleted about half of the remaining rows (down to about 106m out of 200m rows). I figure I could keep running OPTIMIZE on my giant table to let me remove another 1/2 of the rows, but there must be a better way…

Note that the table does get a lot of INSERT/UPDATE traffic, 1000s of changes per minute.

Archive Process

To simplify the archive script, I put the whole process into a stored procedure. The steps in that procedure are:

  1. Flatten and copy up to N rows from the live DB to the archive DB, where the rows are older than 1 month
  2. Make a list of archived IDs that are probably in the live DB.
    Store these in a temporary table A
  3. Filter temp table A to only include row IDs that are actually in the live DB.
    Store these in a temporary table B

  4. DELETE FROM live_db.test_table WHERE test_id IN (SELECT * FROM temp_table_B) ORDER BY test_id ASC LIMIT N;

Now that the table is half empty, step 4 is hanging, even for N = 10
Before the table was half-empty, it worked fine and N = 1000 took ~ 1 sec

Diagnostics

Adding an EXPLAIN in the stored procedure lets me know MySQL is planning on using the PRIMARY index for the DELETE, as I hoped it would:

id select_type         table          type   possible_keys  key      key_len  ref   rows     Extra
1  PRIMARY             test_runs      index  NULL           PRIMARY  4        NULL  10       Using where
8  DEPENDENT SUBQUERY  tests_to_drop  ALL    NULL           NULL     NULL     NULL  1        Using where

Output selected from SHOW ENGINE INNODB STATUS (some things redacted):

---TRANSACTION 6457899980, ACTIVE 14 sec starting index read, thread declared inside InnoDB 3008
mysql tables in use 2, locked 2
242717 lock struct(s), heap size 25507368, 5532268 row lock(s)
MySQL thread id 1834147, OS thread handle 0x7f0f68c4a700, query id 204328569 1.2.3.4 user123 Sending data
DELETE FROM live_db.test_table
WHERE test_id IN (SELECT * FROM temp_table_B)
ORDER BY test_id ASC
LIMIT 10
Trx #rec lock waits 0 #table lock waits 0
Trx total rec lock wait time 0 SEC
Trx total table lock wait time 0 SEC

(From Comment)

CREATE  TEMPORARY TABLE temp_table_B (
        test_id int(10) unsigned NOT NULL DEFAULT '0'
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1

Best Answer

I found out what's going on here:

Because of how I was choosing test_id values in steps 1 and 2, temp_table B would sometimes be empty. (see question description for step/table context)

Then, with an empty temp_table_B, I was running step 4:

DELETE FROM live_db.test_table WHERE test_id
    IN (SELECT * FROM temp_table_B) ORDER BY test_id ASC LIMIT N;

which gave

DELETE FROM live_db.test_table WHERE test_id
    IN ( <empty> ) ORDER BY test_id ASC LIMIT N;

which would hang for quite a while.

Based on this answer: https://dba.stackexchange.com/a/86103/144766 , it seems like that empty table was probably compared against each of my 106 million rows, making the do-nothing DELETE take 25 minutes.

Now that I have modified my process to detect the empty table and abort before calling DELETE, I can manually watch for the (data-specific) condition that's causing my list-of-rows-to-delete to occasionally become empty.

EDIT: Even after fixing the empty-in-list problem, my query was greatly sped up by switching from the " IN (SELECT * temp_table) " version to a JOIN version, similar to what is suggested in comments by ypercube above. This modification eliminates the 'SUBQUERY' from the execution plan and the query becomes a 'SIMPLE' query:

'IN' version:
id select_type         table           type  key     key_len rows Extra
1  PRIMARY             test_runs       index PRIMARY 4       10   Using where
8  DEPENDENT SUBQUERY  temp_table_B    ALL   NULL    NULL    10   Using where

'JOIN' version:
id  select_type  table        type   key     key_len ref                  rows Extra
1   SIMPLE       temp_table_B ALL    NULL    NULL    NULL                 10   Using where
1   SIMPLE       test_runs    eq_ref PRIMARY 4       temp_table_B.test_id 1