We have a table similar to this one
mysql> SHOW CREATE TABLE my_requests\G *************************** 1. row *************************** Table: my_requests Create Table: CREATE TABLE `my_requests` ( `rq_id` bigint(20) NOT NULL, `t_id` bigint(20) NOT NULL, `u_id` bigint(20) DEFAULT NULL, `rq_date` datetime DEFAULT NULL, `rq_type` tinyint(4) DEFAULT '1', `rq_creationdate` datetime NOT NULL, `rq_modifydate` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, `rq_ttotal` mediumint(8) unsigned DEFAULT '0', KEY `index_rqcd` (`t_id`,`rq_date`), KEY `idx_rq_u` (`u_id`), KEY `idx_rq_id` (`rq_id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 /*!50100 PARTITION BY LIST (to_days(rq_date)) SUBPARTITION BY HASH (t_id) SUBPARTITIONS 10 (PARTITION p736417 VALUES IN (736417) ENGINE = InnoDB, PARTITION p736419 VALUES IN (736419) ENGINE = InnoDB, PARTITION p736430 VALUES IN (736430) ENGINE = InnoDB, PARTITION p736439 VALUES IN (736439) ENGINE = InnoDB, PARTITION p736443 VALUES IN (736443) ENGINE = InnoDB, PARTITION p736444 VALUES IN (736444) ENGINE = InnoDB, PARTITION p736445 VALUES IN (736445) ENGINE = InnoDB, PARTITION p736446 VALUES IN (736446) ENGINE = InnoDB, PARTITION p736447 VALUES IN (736447) ENGINE = InnoDB, PARTITION p736448 VALUES IN (736448) ENGINE = InnoDB, PARTITION p736449 VALUES IN (736449) ENGINE = InnoDB, PARTITION p736450 VALUES IN (736450) ENGINE = InnoDB, PARTITION p736451 VALUES IN (736451) ENGINE = InnoDB, PARTITION p736452 VALUES IN (736452) ENGINE = InnoDB, PARTITION p736453 VALUES IN (736453) ENGINE = InnoDB, PARTITION p736454 VALUES IN (736454) ENGINE = InnoDB, PARTITION p736455 VALUES IN (736455) ENGINE = InnoDB, PARTITION p736456 VALUES IN (736456) ENGINE = InnoDB, PARTITION p736457 VALUES IN (736457) ENGINE = InnoDB, PARTITION p736458 VALUES IN (736458) ENGINE = InnoDB, PARTITION p736459 VALUES IN (736459) ENGINE = InnoDB, PARTITION p736460 VALUES IN (736460) ENGINE = InnoDB, PARTITION p736461 VALUES IN (736461) ENGINE = InnoDB, PARTITION p736462 VALUES IN (736462) ENGINE = InnoDB, PARTITION p736463 VALUES IN (736463) ENGINE = InnoDB, PARTITION p736464 VALUES IN (736464) ENGINE = InnoDB, PARTITION p736465 VALUES IN (736465) ENGINE = InnoDB, PARTITION p736466 VALUES IN (736466) ENGINE = InnoDB, PARTITION p736467 VALUES IN (736467) ENGINE = InnoDB, PARTITION p736468 VALUES IN (736468) ENGINE = InnoDB, PARTITION p736469 VALUES IN (736469) ENGINE = InnoDB) */
with partitions by day, the table is huge and has hundreds of millions of registers, and I'm trying to select the data from the last day and avoid a full scan and access only the last partitions.
But when I do a explain the index is not used.
EXPLAIN SELECT * FROM my_requests USE INDEX(index_rqcd) WHERE t_id <> -1 AND rq_date = DATE_SUB(now(), INTERVAL 1 DAY); +----+-------------+-------------+------+---------------+------+---------+------+---------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------------+------+---------------+------+---------+------+---------+-------------+ | 1 | SIMPLE | my_requests | ALL | index_rqcd | NULL | NULL | NULL | 4020468 | Using where | +----+-------------+-------------+------+---------------+------+---------+------+---------+-------------+
Edit:
The EXPLAIN PARTITIONS
shows that the query only affect part of the partitions, but I still don't understand why the index is not used.
mysql> EXPLAIN PARTITIONS SELECT * FROM my_requests WHERE t_id <> -1 AND rq_date = DATE_SUB(now(), INTERVAL 1 DAY)\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: my_requests partitions: p736467_p736467sp0,p736467_p736467sp1,p736467_p736467sp2,p736467_p736467sp3,p736467_p736467sp4,p736467_p736467sp5,p736467_p736467sp6,p736467_p736467sp7,p736467_p736467sp8,p736467_p736467sp9 type: ALL possible_keys: index_rqcd key: NULL key_len: NULL ref: NULL rows: 4064737 Extra: Using where 1 row in set (0.06 sec)
Can you help me to understand why this index is not used for that query?
Best Answer
There are a few problems here:
First, as jkavalik says in the comments on the OP, the order of columns in an index matters. Basically, in your case for
index_rqcd
to be used for filtering onrq_date
,t_id
has to be used before it can "see" and filter onrq_date
. Since usually only one range scan on an index can be done for a query and it has to be the last part of the index used, the optimizer correctly skips using this index because it would probably have to use the entire index which cause a full scan on the index and then a full scan on the table, or in this case partition. If the table and had an index starting withrq_date
, that index would likely be used.Second, something you should be aware of when using partitions is that MySQL doesn't have global indexes for partitioned tables. What this means is that each partition has its own indexes and therefore can only be used inside that partition. Since the optimizer knows that the query only needs to use partition
p736467
based on the query and table definition (an optimization known as partition pruning), only that partition, and in your case the associated sub-partitions, are considered when running the query. So now that the optimizer is only using partitionp736467
, it can consider usingindex_rqcd
, but since it is still affected by the first problem and its use would lead to a full scan on the index, it chooses to do a full scan on the data. Even though theEXPLAIN
showsALL
for the type, it is only scanning the data in thep736467
partition, not the entire table, because of the partition pruning. Even if you add an index starting withrq_date
, it will not be used if you want to get all of the rows for the day the partition is for. However, the query you give will only return rows for exactly 1 day ago down to the second, not the whole day as you say when describing what you want to do. If you really do want to get rows for a specific time and not the whole day, and index starting withrq_date
will be useful.Probably the most important thing here is that, as mentioned by the link jkavalik posted, you shouldn't use partitions unless you have a good reason to. Time ranges can be a valid use for partitions, but usually only when used with a sliding window of chronological data, e.g. 7 days of log messages that are pruned daily. Generally unless you can use partition pruning in (almost) every query, they will hurt performance since every partition will likely need to be accessed. If you are always querying this table for rows on a specific date and then filtering on a range on some other indexed column, partitions might be useful. More likely, partitions are not the right way to go. Unless you have a very good reason to use the sub-partitions, they are almost certainly just adding overhead to every query.
My suggestion, assuming that most of your queries are based on a range of
rq_date
, is to not use partitions and add an index starting withrq_date
.Also, when using InnoDB, you should always define a primary key. I don't know about the data you're storing, but it looks like
rq_id
might be unique and a good candidate.