MySQL – Difference Between RANGE and LIST Partitioning Methods

MySQLpartitioning

I have a table with a year column that has consecutive values — 2000, 2001, and so on. It's a read-heavy MyISAM table, with 25+ million rows for every value of year and a comparable set inserted in bulk approximately annually.

Most of our queries include a WHERE year = N condition. In a few cases we do a self-join ON T1.year = T2.year - 1 or similar. So I'd like to explore partitioning by the year to see if we get any performance benefit. But since I don't store a full date, it's not exactly a range, and since I want on year per partition, it's not exactly a list.

So, assuming that I have a table definition that starts like this:

CREATE TABLE foo (
  id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY
, year SMALLINT UNSIGNED NOT NULL
, ...
) ENGINE=MyISAM

(Note: I'm leaving the bad PRIMARY KEY in this example so the answer below that addresses it will still make sense – but it's tangential to the real question, and not a problem for my actual schema. For the purposes of answering, please assume PRIMARY KEY (id, year) is defined instead.)

Assuming every value of year that might end up in the table is accounted for in one explicit partition (no years < 2001 and no LESS THAN MAXVALUE partition for the RANGE case), would the following definitions be equivalent? If not, what's the difference?

-- Definition 1

PARTITION BY RANGE (year) (
  PARTITION p0 VALUES LESS THAN (2001)
, PARTITION p1 VALUES LESS THAN (2002)
, ...
);


-- Definition 2

PARTITION BY LIST (year) (
  PARTITION p0 VALUES IN (2000)
, PARTITION p1 VALUES IN (2001)
, ...
);

Best Answer

The 2 will be equivalently impossible ;-). If you try that, you will get the following error:

ERROR 1503 (HY000): A PRIMARY KEY must include all columns in the table's partitioning function

Which is one of the biggest limitations of partitioning. Any unique key must contain the columns of the partitioning function. So either:

  • Drop the primary key (not usually a good idea)
  • Add the year to the primary key (so you can have duplicated ids)
  • Create your own partitioning (forcing you to maintain yourself the constraints and making the queries a bit more difficult)

Independently of that, I can see at least one big difference (and the reason why I would recommend you to use LIST) which is that even if your application cannot insert lower values, it is not restricted at database level using the range. Not only that could potentially break our data consistency in case of an application error, it would be advantageous if you had a query like this:

SELECT * FROM foo WHERE `year` = 1999;

That also has a potential impact on performance, if you run EXPLAIN PARTITIONS on that query with the RANGE partitioning, the full first partition would be scanned (using an index or not); in the second case, partition pruning can automatically return the empty set. In practice you may not see a big difference if you do things correctly (the partitions will be physically equivalent), but I cannot see any case where RANGE is better, while I can for LIST.

Addendum: If you are thinking about changing the physical structure of the table, and you are running a recent version of MySQL (5.5, 5.6), you should think about changing to InnoDB. I am not saying that you should, there are reasons to maintain the format, but it is something that you may want to ask yourself, as the table will have to be recreated anyway for partitioning.