I have a table with a year
column that has consecutive values — 2000
, 2001
, and so on. It's a read-heavy MyISAM table, with 25+ million rows for every value of year
and a comparable set inserted in bulk approximately annually.
Most of our queries include a WHERE year = N
condition. In a few cases we do a self-join ON T1.year = T2.year - 1
or similar. So I'd like to explore partitioning by the year to see if we get any performance benefit. But since I don't store a full date, it's not exactly a range, and since I want on year per partition, it's not exactly a list.
So, assuming that I have a table definition that starts like this:
CREATE TABLE foo (
id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY
, year SMALLINT UNSIGNED NOT NULL
, ...
) ENGINE=MyISAM
(Note: I'm leaving the bad PRIMARY KEY
in this example so the answer below that addresses it will still make sense – but it's tangential to the real question, and not a problem for my actual schema. For the purposes of answering, please assume PRIMARY KEY (id, year)
is defined instead.)
Assuming every value of year
that might end up in the table is accounted for in one explicit partition (no years < 2001 and no LESS THAN MAXVALUE
partition for the RANGE
case), would the following definitions be equivalent? If not, what's the difference?
-- Definition 1
PARTITION BY RANGE (year) (
PARTITION p0 VALUES LESS THAN (2001)
, PARTITION p1 VALUES LESS THAN (2002)
, ...
);
-- Definition 2
PARTITION BY LIST (year) (
PARTITION p0 VALUES IN (2000)
, PARTITION p1 VALUES IN (2001)
, ...
);
Best Answer
The 2 will be equivalently impossible
;-)
. If you try that, you will get the following error:Which is one of the biggest limitations of partitioning. Any unique key must contain the columns of the partitioning function. So either:
Independently of that, I can see at least one big difference (and the reason why I would recommend you to use LIST) which is that even if your application cannot insert lower values, it is not restricted at database level using the range. Not only that could potentially break our data consistency in case of an application error, it would be advantageous if you had a query like this:
That also has a potential impact on performance, if you run
EXPLAIN PARTITIONS
on that query with theRANGE
partitioning, the full first partition would be scanned (using an index or not); in the second case, partition pruning can automatically return the empty set. In practice you may not see a big difference if you do things correctly (the partitions will be physically equivalent), but I cannot see any case whereRANGE
is better, while I can forLIST
.Addendum: If you are thinking about changing the physical structure of the table, and you are running a recent version of MySQL (5.5, 5.6), you should think about changing to InnoDB. I am not saying that you should, there are reasons to maintain the format, but it is something that you may want to ask yourself, as the table will have to be recreated anyway for partitioning.