MySQL – How Table Partitioning Helps Performance

database-designmyisamMySQLpartitioningperformance

I am having difficulty to grab the idea of pros and cons of table partitioning. I am about to start work on a project which would have 8 tables and one of them will be the main data table which will hold 180-260 million records. As it will be properly indexed table, so I am thinking of limiting the table records to 20 million this way I would have to create 9-13 tables.

But I am not quite sure about how it will improve the performance because they will be sitting on same machine (32GB RAM)?

I am using MySQL and tables would be MyISAM and big table would have index on id field and there are not further complexities like full text search etc.

Please also shed light on table partitioning vs database partitioning.

Best Answer

The following is just insane ranting and raving...

If you leave all data in one table (no partitioning), you will have O(log n) search times using a key. Let's take the worst index in the world, the binary tree. Each tree node has exactly one key. A perfectly balanced binary tree with 268,435,455 (2^28 - 1) tree nodes would be a height of 28. If you split up this binary tree into 16 separate trees, you get 16 binary trees each with 16,777,215 (2^24 - 1) tree nodes for a height of 24. The search path is reduced by 4 nodes, a 14.2857 % height reduction. If the search time is in microseconds, a 14.2857 % reduction in search time is nil-to-negligible.

Now in the real world, a BTREE index would have treenodes with multiple keys. Each BTREE search would perform binary searching within the page with a possible decent into another page. For example, if each BTREE page contained 1024 keys, a tree height of 3 or 4 would be the norm, a short tree height indeed.

Notice that a partitiioning of a table does not reduce the height of the BTREE which is already small. Given a partitioning of 260 milliion rows, there is even the strong likelihood of having multiple BTREEs with the same height. Searching for a key may pass through all root BTREE pages every time. Only one will fulfill the path of the needed search range.

Now expand on this. All the partitions exist on the same machine. If you do not have separate disks for each partition, you will have disk I/O and spindle rotations as an automatic bottleneck outside of partition search performance.

In this case, paritioning by database does not buy you anything either if id is the only search key being utitlized.

Partitioning of data should serve to group data that are logically and cohesively in the same class. Performance of searching each partition need not be the main consideration as long as the data is correctly grouped. Once you have achieved the logical partitioning, then concentrate on search time. If you are just separating data by id only, it is possible that many rows of data may never be accessed for reads or writes. Now, that should be a major consideration: Locate all ids most frequently accessed and partition by that. All less frequently accessed ids should reside in one big archive table that is still accessible by index lookup for that 'once in a blue moon' query.

The overall impact should be to have at least two partitions: One for frequently accessed ids, and the other paritiion for the rest of the ids. If the frequently accessed ids is fairly large, you could optionally partition that.

Related Solutions

MySQL Partitioning

Personally I'd use the date as your partition function, and partition by a hash of the year and month. Maybe splitting the data into 48 or more partitions. I've done this on some large volume databases and had good results.

ALTER TABLE `your_table` 
PARTITION BY HASH(YEAR(`date_field`)*12 + MONTH(`date_field`)) 
  PARTITIONS 48;

This should create a nice distributed set of data across 48 partitions (you may need to fiddle with the calculation on the date to get it quite right for your needs).

I build a model in Excel, with all the dates down one column, put the partition function on the second showing which partition that data would appear in. You can then chart the second column frequency to see how the data distribution is placed across the partitions - a really useful way of tinkering with your function before you alter your table!

Hope that helps...

Mysql – Partitioning MySQL vs PostgreSQL

Ok, PostgreSQL does not directly support foreign keys and partitions. You can create a foreign key on each partition, but there is no way to have a foreign key against a set of partitions without either coding the constraint triggers yourself or using a key management table.

There are several ways to resolve this, but they require some understanding of underlying db implementation concepts to make perform well.

The first is to do your referential integrity enforcement manually through triggers. You can write constraint triggers which can be deferred (if needed) that check the existence of foreign key information. Note this is subject to ALTER TABLE DISABLE TRIGGER ALL issues but it should generally work.

The second is to have a separate table which stores keys and is maintained through a combination of triggers and bulk data operations. This can help a great deal.

So it is possible to enforce things but it requires a bit more work.

I don't know enough about MySQL's table partitioning capabilities to comment on it though.

Best Answer

Related Solutions

MySQL Partitioning

Mysql – Partitioning MySQL vs PostgreSQL

Related Question