MySQL – How Table Partitioning Helps Performance

database-designmyisamMySQLpartitioningperformance

I am having difficulty to grab the idea of pros and cons of table partitioning. I am about to start work on a project which would have 8 tables and one of them will be the main data table which will hold 180-260 million records. As it will be properly indexed table, so I am thinking of limiting the table records to 20 million this way I would have to create 9-13 tables.

But I am not quite sure about how it will improve the performance because they will be sitting on same machine (32GB RAM)?

I am using MySQL and tables would be MyISAM and big table would have index on id field and there are not further complexities like full text search etc.

Please also shed light on table partitioning vs database partitioning.

Best Answer

The following is just insane ranting and raving...

If you leave all data in one table (no partitioning), you will have O(log n) search times using a key. Let's take the worst index in the world, the binary tree. Each tree node has exactly one key. A perfectly balanced binary tree with 268,435,455 (2^28 - 1) tree nodes would be a height of 28. If you split up this binary tree into 16 separate trees, you get 16 binary trees each with 16,777,215 (2^24 - 1) tree nodes for a height of 24. The search path is reduced by 4 nodes, a 14.2857 % height reduction. If the search time is in microseconds, a 14.2857 % reduction in search time is nil-to-negligible.

Now in the real world, a BTREE index would have treenodes with multiple keys. Each BTREE search would perform binary searching within the page with a possible decent into another page. For example, if each BTREE page contained 1024 keys, a tree height of 3 or 4 would be the norm, a short tree height indeed.

Notice that a partitiioning of a table does not reduce the height of the BTREE which is already small. Given a partitioning of 260 milliion rows, there is even the strong likelihood of having multiple BTREEs with the same height. Searching for a key may pass through all root BTREE pages every time. Only one will fulfill the path of the needed search range.

Now expand on this. All the partitions exist on the same machine. If you do not have separate disks for each partition, you will have disk I/O and spindle rotations as an automatic bottleneck outside of partition search performance.

In this case, paritioning by database does not buy you anything either if id is the only search key being utitlized.

Partitioning of data should serve to group data that are logically and cohesively in the same class. Performance of searching each partition need not be the main consideration as long as the data is correctly grouped. Once you have achieved the logical partitioning, then concentrate on search time. If you are just separating data by id only, it is possible that many rows of data may never be accessed for reads or writes. Now, that should be a major consideration: Locate all ids most frequently accessed and partition by that. All less frequently accessed ids should reside in one big archive table that is still accessible by index lookup for that 'once in a blue moon' query.

The overall impact should be to have at least two partitions: One for frequently accessed ids, and the other paritiion for the rest of the ids. If the frequently accessed ids is fairly large, you could optionally partition that.