MySQL DELETE statement doesn’t use index although the same SELECT query does

deleteexplainMySQLmysql-5.7performancequery-performance

I've got a table with ~30 million rows (and soon twice/triple times more) where I have to do quite regular updates. The table structure is like the following:

id, 
cookie_id VARCHAR(45), 
country VARCHAR(45), 
category VARCHAR(45), 
other_non_relevant_columns

Indexes look like this:

SHOW INDEX FROM data;
+-------+------------+------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name               | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| data  |          0 | PRIMARY                |            1 | id          | A         |    24767570 |     NULL | NULL   |      | BTREE      |         |               |
| data  |          1 | cookie_index           |            1 | cookie_id   | A         |    14440214 |     NULL | NULL   |      | BTREE      |         |               |
| data  |          1 | country_category_index |            1 | country     | A         |         498 |     NULL | NULL   |      | BTREE      |         |               |
| data  |          1 | country_category_index |            2 | category    | A         |         997 |     NULL | NULL   | YES  | BTREE      |         |               |
+-------+------------+------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
4 rows in set (0.00 sec)

So there's a non-unique index on cookie_id, and non-unique index on country+category columns. Now the case is, every week I should run query to

  1. Delete all data belonging to country='Y' AND category='X' (5 to 20 million rows)
  2. Import fresh data (similar amount)

The problem is, deleting the data takes crazy amount of time – that's why I've set up an index on country+category columns. However, 'DELETE' statement still doesn't use the index and instead tries to scan the whole table:

mysql> EXPLAIN DELETE FROM data WHERE country='Y' and category='X';
+----+-------------+-------+------------+------+------------------------+------+---------+------+----------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys          | key  | key_len | ref  | rows     | filtered | Extra       |
+----+-------------+-------+------------+------+------------------------+------+---------+------+----------+----------+-------------+
|  1 | DELETE      | data  | NULL       | ALL  | country_category_index | NULL | NULL    | NULL | 24767570 |   100.00 | Using where |
+----+-------------+-------+------------+------+------------------------+------+---------+------+----------+----------+-------------+

Select works fine:

mysql> EXPLAIN SELECT id, cookie_id FROM data WHERE country='Y' and category='X';
+----+-------------+-------+------------+------+------------------------+------------------------+---------+-------------+----------+----------+-------+
| id | select_type | table | partitions | type | possible_keys          | key                    | key_len | ref         | rows     | filtered | Extra |
+----+-------------+-------+------------+------+------------------------+------------------------+---------+-------------+----------+----------+-------+
|  1 | SIMPLE      | data  | NULL       | ref  | country_category_index | country_category_index | 365     | const,const | 10130630 |   100.00 | NULL  |
+----+-------------+-------+------------+------+------------------------+------------------------+---------+-------------+----------+----------+-------+

Is there any way to optimize DELETE query?

Best Answer

By deleting the data first, you are essentially making the data inaccessible. Don't you want to avoid this "downtime"?

Consider loading the replacement data into a temp table, then doing IODKU to update the main data:

INSERT INTO main (...)
        ON DUPLICATE KEY UPDATE 
             col1 = VALUES(col1),
             ...
    SELECT ... FROM temp;

If there could be rows to remove; IODKU won't provide that feature. However, you could precede it with something like

ALTER TABLE temp ADD INDEX (...);  -- to speed up the LEFT JOIN below

DELETE FROM main
       USING main
        LEFT JOIN temp ON ...
       WHERE temp... IS NULL;