Using version 5.5 (on which I'm stuck), I have a table I want to partition, here is the DDL:
CREATE TABLE `emp` (
`id` int(11) DEFAULT NULL,
`date1` bigint(20) DEFAULT NULL,
`gen_year` int(11) GENERATED ALWAYS AS (year(from_unixtime(`date1`))) VIRTUAL,
`gen_month` int(11) GENERATED ALWAYS AS (month(from_unixtime(`date1`))) VIRTUAL,
UNIQUE KEY `id` (`id`,`gen_year`,`gen_month`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
PARTITION BY HASH (12*gen_year+gen_month) PARTITIONS 77 ;
As you can see I want to partition by unique month (taking the year into account). I set 77 partitions because the test data I have put in there uniformly cover 77 months.
When I run the 1st query below, I get pruning just fine, only one partition gets queried.
When I run the 2nd query, I'd expect just a few partitions to be queried, but instead ALL are.
select count(*) from emp where gen_year=1984 and month=8;
select count(*) from emp where gen_year=1984 and month>8;
select count(*) from emp where gen_year=1984 and month BETWEEN 8 AND 12;
EXPLAIN PARTITIONS
gives one partition for the first query, but all 77 partitions for the others.
Please advise on what I'm missing.
Best Answer
Any range (eg
month>8
ormonth BETWEEN 8 AND 12
) leads to punting on partition pruning, hence 77 partitions are selected. That is,PARTITION BY HASH
is useless for ranges. Meanwhile, non-partitioning is excellent for the other type of queries -- "point queries". So don't bother usingBY HASH
!Where does
id
come from? Typically it is anAUTO_INCREMENT
, but apparently not for your table.can be significantly sped up by
date1
as either aTIMESTAMP
orDATETIME
. With fractional seconds if needed.INDEX(date1)
Writing the query thus:
SELECT COUNT(*) FROM EMP WHERE date1 >= '1984-08-01' AND date1 < '1984-08-01' + INTERVAL 1 MONTH
If you build and maintain a Summary table, then the query becomes 'instantaneous', and even the other query will be faster than
InnoDB really needs a
PRIMARY KEY
. Promote theUNIQUE
key to primary. (Though I have serious doubts about that index.)These are all "range" tests:
BETWEEN
,>=
,<
. All may work well with non-partitioned tables, but fail miserably onPARTITION BY HASH
tables (when the range involves the partition-key).