Personally I'd use the date as your partition function, and partition by a hash of the year and month. Maybe splitting the data into 48 or more partitions. I've done this on some large volume databases and had good results.
ALTER TABLE `your_table`
PARTITION BY HASH(YEAR(`date_field`)*12 + MONTH(`date_field`))
PARTITIONS 48;
This should create a nice distributed set of data across 48 partitions (you may need to fiddle with the calculation on the date to get it quite right for your needs).
I build a model in Excel, with all the dates down one column, put the partition function on the second showing which partition that data would appear in. You can then chart the second column frequency to see how the data distribution is placed across the partitions - a really useful way of tinkering with your function before you alter your table!
Hope that helps...
First, you should consider solving the problem in another way.
Upgrade to MySQL 5.6, where OPTIMIZE TABLE
works without blocking (for an InnoDB table), as it is supported by InnoDB Online DDL.
If you can't upgrade, try using Percona Toolkit's pt-online-schema-change, which can perform the table rebuild without blocking.
$ pt-online-schema-change h=localhost,D=mydatabase,t=mytable --execute
--alter="ENGINE=InnoDB"
If you're stuck on using partition, yes, you must make id
the partition key in the table you show. You can convert the table to partitioning with ALTER TABLE
. If you need the conversion operation to be non-blocking, use pt-online-schema-change.
There's no way to partition to fixed-size partitions. You have to partition by values. But is it really that important to hit a specific size per partition?
Re your comment about partition size:
When using RANGE partitioning, what I do is set up a schedule to ALTER TABLE and split the last partition from time to time. If you have a regular rate of growth, this is easy, but if you have irregular patterns of growth, you might instead set up a periodic check that examines the number of rows per partition (use the INFORMATION_SCHEMA.PARTITIONS), and email you if it's getting full.
For example, let's set up a table partitioned by range on id
.
CREATE TABLE `mytable` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`transactionid` int(11) NOT NULL,
`parent` int(11) NOT NULL,
`headers` longtext,
`creator` int(11) NOT NULL,
`created` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `id` (`id`),
KEY `transactionid` (`transactionid`,`parent`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
/*!50100 PARTITION BY RANGE (id)
(PARTITION p0 VALUES LESS THAN (0) ENGINE = InnoDB,
PARTITION p1 VALUES LESS THAN (1000) ENGINE = InnoDB,
PARTITION p2 VALUES LESS THAN (2000) ENGINE = InnoDB,
PARTITION p3 VALUES LESS THAN (3000) ENGINE = InnoDB,
PARTITION p4 VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */
As the MAX(id)
approaches 3000, it's getting close to filling up p3
and spilling over into p4
. So it's time to reorganize. It's good to do this before any data spills over into p4
, because the reorg will affect only the last, empty partition and will therefore be very quick.
ALTER TABLE mytable REORGANIZE PARTITION p4 INTO
(PARTITION p4 VALUES LESS THAN (4000), PARTITION p5 VALUES LESS THAN MAXVALUE);
Even if you miss a day and you get some data into the old p4
, chances are it's not much data. But if you neglect this for a month or two, and p4
fills up with a lot of data, then the REORGANIZE will take longer.
Best Answer
Partitioning usually rather decreases
ALTER TABLE
speed. Index and data is stored in separate files for each partition (a separate.ibd
file for INNODB ifinnodb_file_per_table=1
), so that you might make alter in parallel. But partitioning add an extra processing step to choose the partition for storing data/index, thus the whole operation is made slower.As for the shared lock for
ALTER TABLE
, all partitions are locked in the same time, so partitioning does not help.Hints: