MySQL – Difference Between RANGE and LIST Partitioning Methods

MySQLpartitioning

I have a table with a year column that has consecutive values — 2000, 2001, and so on. It's a read-heavy MyISAM table, with 25+ million rows for every value of year and a comparable set inserted in bulk approximately annually.

Most of our queries include a WHERE year = N condition. In a few cases we do a self-join ON T1.year = T2.year - 1 or similar. So I'd like to explore partitioning by the year to see if we get any performance benefit. But since I don't store a full date, it's not exactly a range, and since I want on year per partition, it's not exactly a list.

So, assuming that I have a table definition that starts like this:

CREATE TABLE foo (
  id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY
, year SMALLINT UNSIGNED NOT NULL
, ...
) ENGINE=MyISAM

(Note: I'm leaving the bad PRIMARY KEY in this example so the answer below that addresses it will still make sense – but it's tangential to the real question, and not a problem for my actual schema. For the purposes of answering, please assume PRIMARY KEY (id, year) is defined instead.)

Assuming every value of year that might end up in the table is accounted for in one explicit partition (no years < 2001 and no LESS THAN MAXVALUE partition for the RANGE case), would the following definitions be equivalent? If not, what's the difference?

-- Definition 1

PARTITION BY RANGE (year) (
  PARTITION p0 VALUES LESS THAN (2001)
, PARTITION p1 VALUES LESS THAN (2002)
, ...
);


-- Definition 2

PARTITION BY LIST (year) (
  PARTITION p0 VALUES IN (2000)
, PARTITION p1 VALUES IN (2001)
, ...
);

Best Answer

The 2 will be equivalently impossible ;-). If you try that, you will get the following error:

ERROR 1503 (HY000): A PRIMARY KEY must include all columns in the table's partitioning function

Which is one of the biggest limitations of partitioning. Any unique key must contain the columns of the partitioning function. So either:

Drop the primary key (not usually a good idea)
Add the year to the primary key (so you can have duplicated ids)
Create your own partitioning (forcing you to maintain yourself the constraints and making the queries a bit more difficult)

Independently of that, I can see at least one big difference (and the reason why I would recommend you to use LIST) which is that even if your application cannot insert lower values, it is not restricted at database level using the range. Not only that could potentially break our data consistency in case of an application error, it would be advantageous if you had a query like this:

SELECT * FROM foo WHERE `year` = 1999;

That also has a potential impact on performance, if you run EXPLAIN PARTITIONS on that query with the RANGE partitioning, the full first partition would be scanned (using an index or not); in the second case, partition pruning can automatically return the empty set. In practice you may not see a big difference if you do things correctly (the partitions will be physically equivalent), but I cannot see any case where RANGE is better, while I can for LIST.

Addendum: If you are thinking about changing the physical structure of the table, and you are running a recent version of MySQL (5.5, 5.6), you should think about changing to InnoDB. I am not saying that you should, there are reasons to maintain the format, but it is something that you may want to ask yourself, as the table will have to be recreated anyway for partitioning.

Related Solutions

MySQL – Partitioning Large Table

First, you should consider solving the problem in another way.

Upgrade to MySQL 5.6, where OPTIMIZE TABLE works without blocking (for an InnoDB table), as it is supported by InnoDB Online DDL.
If you can't upgrade, try using Percona Toolkit's pt-online-schema-change, which can perform the table rebuild without blocking.
```
$ pt-online-schema-change h=localhost,D=mydatabase,t=mytable --execute
    --alter="ENGINE=InnoDB"
```

If you're stuck on using partition, yes, you must make id the partition key in the table you show. You can convert the table to partitioning with ALTER TABLE. If you need the conversion operation to be non-blocking, use pt-online-schema-change.

There's no way to partition to fixed-size partitions. You have to partition by values. But is it really that important to hit a specific size per partition?

Re your comment about partition size:

When using RANGE partitioning, what I do is set up a schedule to ALTER TABLE and split the last partition from time to time. If you have a regular rate of growth, this is easy, but if you have irregular patterns of growth, you might instead set up a periodic check that examines the number of rows per partition (use the INFORMATION_SCHEMA.PARTITIONS), and email you if it's getting full.

For example, let's set up a table partitioned by range on id.

CREATE TABLE `mytable` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `transactionid` int(11) NOT NULL,
  `parent` int(11) NOT NULL,
  `headers` longtext,
  `creator` int(11) NOT NULL,
  `created` datetime DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `id` (`id`),
  KEY `transactionid` (`transactionid`,`parent`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
/*!50100 PARTITION BY RANGE (id)
(PARTITION p0 VALUES LESS THAN (0) ENGINE = InnoDB,
 PARTITION p1 VALUES LESS THAN (1000) ENGINE = InnoDB,
 PARTITION p2 VALUES LESS THAN (2000) ENGINE = InnoDB,
 PARTITION p3 VALUES LESS THAN (3000) ENGINE = InnoDB,
 PARTITION p4 VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */

As the MAX(id) approaches 3000, it's getting close to filling up p3 and spilling over into p4. So it's time to reorganize. It's good to do this before any data spills over into p4, because the reorg will affect only the last, empty partition and will therefore be very quick.

ALTER TABLE mytable REORGANIZE PARTITION p4 INTO 
(PARTITION p4 VALUES LESS THAN (4000), PARTITION p5 VALUES LESS THAN MAXVALUE);

Even if you miss a day and you get some data into the old p4, chances are it's not much data. But if you neglect this for a month or two, and p4 fills up with a lot of data, then the REORGANIZE will take longer.

MySQL – Should a Multi-Column UNIQUE Index Be Created?

The good thing with unique indexes is that search stops when the first value matches, but that requires the WHERE part to match exactly with the index. In your case the index will be big. If you are lucky the value might be found quickly on the b-tree, else it might need to scan almost the entire index.

Best Answer

Related Solutions

MySQL – Partitioning Large Table

MySQL – Should a Multi-Column UNIQUE Index Be Created?

Related Question