I've got a table with ~30 million rows (and soon twice/triple times more) where I have to do quite regular updates. The table structure is like the following:
id,
cookie_id VARCHAR(45),
country VARCHAR(45),
category VARCHAR(45),
other_non_relevant_columns
Indexes look like this:
SHOW INDEX FROM data;
+-------+------------+------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| data | 0 | PRIMARY | 1 | id | A | 24767570 | NULL | NULL | | BTREE | | |
| data | 1 | cookie_index | 1 | cookie_id | A | 14440214 | NULL | NULL | | BTREE | | |
| data | 1 | country_category_index | 1 | country | A | 498 | NULL | NULL | | BTREE | | |
| data | 1 | country_category_index | 2 | category | A | 997 | NULL | NULL | YES | BTREE | | |
+-------+------------+------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
4 rows in set (0.00 sec)
So there's a non-unique index on cookie_id, and non-unique index on country+category columns. Now the case is, every week I should run query to
- Delete all data belonging to country='Y' AND category='X' (5 to 20 million rows)
- Import fresh data (similar amount)
The problem is, deleting the data takes crazy amount of time – that's why I've set up an index on country+category columns. However, 'DELETE' statement still doesn't use the index and instead tries to scan the whole table:
mysql> EXPLAIN DELETE FROM data WHERE country='Y' and category='X';
+----+-------------+-------+------------+------+------------------------+------+---------+------+----------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+------------------------+------+---------+------+----------+----------+-------------+
| 1 | DELETE | data | NULL | ALL | country_category_index | NULL | NULL | NULL | 24767570 | 100.00 | Using where |
+----+-------------+-------+------------+------+------------------------+------+---------+------+----------+----------+-------------+
Select works fine:
mysql> EXPLAIN SELECT id, cookie_id FROM data WHERE country='Y' and category='X';
+----+-------------+-------+------------+------+------------------------+------------------------+---------+-------------+----------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+------------------------+------------------------+---------+-------------+----------+----------+-------+
| 1 | SIMPLE | data | NULL | ref | country_category_index | country_category_index | 365 | const,const | 10130630 | 100.00 | NULL |
+----+-------------+-------+------------+------+------------------------+------------------------+---------+-------------+----------+----------+-------+
Is there any way to optimize DELETE query?
Best Answer
By deleting the data first, you are essentially making the data inaccessible. Don't you want to avoid this "downtime"?
Consider loading the replacement data into a temp table, then doing IODKU to update the main data:
If there could be rows to remove; IODKU won't provide that feature. However, you could precede it with something like