Mysql – Question on MySQL Partitioning

MySQLmysql-5.5partitioning

Using version 5.5 (on which I'm stuck), I have a table I want to partition, here is the DDL:

CREATE TABLE `emp` (
  `id` int(11) DEFAULT NULL,
  `date1` bigint(20) DEFAULT NULL,
  `gen_year` int(11) GENERATED ALWAYS AS (year(from_unixtime(`date1`))) VIRTUAL,
  `gen_month` int(11) GENERATED ALWAYS AS (month(from_unixtime(`date1`))) VIRTUAL,
  UNIQUE KEY `id` (`id`,`gen_year`,`gen_month`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
PARTITION BY HASH (12*gen_year+gen_month) PARTITIONS 77 ;

As you can see I want to partition by unique month (taking the year into account). I set 77 partitions because the test data I have put in there uniformly cover 77 months.

When I run the 1st query below, I get pruning just fine, only one partition gets queried.
When I run the 2nd query, I'd expect just a few partitions to be queried, but instead ALL are.

  1. select count(*) from emp where gen_year=1984 and month=8;
  2. select count(*) from emp where gen_year=1984 and month>8;
  3. select count(*) from emp where gen_year=1984 and month BETWEEN 8 AND 12;

EXPLAIN PARTITIONS gives one partition for the first query, but all 77 partitions for the others.

Please advise on what I'm missing.

Best Answer

Any range (eg month>8 or month BETWEEN 8 AND 12) leads to punting on partition pruning, hence 77 partitions are selected. That is, PARTITION BY HASH is useless for ranges. Meanwhile, non-partitioning is excellent for the other type of queries -- "point queries". So don't bother using BY HASH!

Where does id come from? Typically it is an AUTO_INCREMENT, but apparently not for your table.

select count(*) from emp where gen_year=1984 and month=8;

can be significantly sped up by

  • Storing date1 as either a TIMESTAMP or DATETIME. With fractional seconds if needed.
  • Adding INDEX(date1)
  • Writing the query thus:

    SELECT COUNT(*) FROM EMP WHERE date1 >= '1984-08-01' AND date1 < '1984-08-01' + INTERVAL 1 MONTH

If you build and maintain a Summary table, then the query becomes 'instantaneous', and even the other query will be faster than

     WHERE date1 >= '1984-09-01'
       AND date1  < '1985-01-01'

InnoDB really needs a PRIMARY KEY. Promote the UNIQUE key to primary. (Though I have serious doubts about that index.)

These are all "range" tests: BETWEEN, >=, <. All may work well with non-partitioned tables, but fail miserably on PARTITION BY HASH tables (when the range involves the partition-key).