MySQL Partitioning: Is there a performance tradeoff between number of partitions and size of each partition

MySQLpartitioningperformance

I have a large table (several 100 million rows) that I would like to efficiently partition. My question is whether there is a tradeoff between partition size and number of partitions. As far as I understand, most queries on a column used in the partition will be faster because the query will (for most queries) only have to search within the partition applicable to the query. Thus, it would make sense that, in order to maximize efficiency, you should divide a large table into the maximum number of partitions, therefore, making each partition as small as possible. In the case of MySQL, this means 1024 partitions. But is there any performance drawback to having a large number of partitions? Is so, how does one find the optimum number of partitions?

Note: There is a somewhat similar question on stackoverflow already, but only one answer, which (from my perspective) misses the mark. So I will state the question in my own way…hopefully it is more clear

Best Answer

Let's compare them

PARTITION SIZE

If you have the following:

  • 100 million rows in a table
  • BTREE indexing
  • Each Page in the BTREE holds 1024 keys

What would the metrics look like?

Since LOG(100000000)/LOG(2) = 26.575424759099, a BTREE index with 1024 keys per page treenode would have a tree height of only 3 (CEILING(LOG(100000000)/LOG(1024))). With only three pages nodes, a binary search for the needed key in each accessed treenode would result in a pruning and isolating of about 30 keys.

NUMBER OF PARTITIONS

If you have the following:

  • 100 million rows in a table
  • BTREE indexing
  • Each Page in the BTREE holds 1024 keys
  • You create 1024 parititions

The numbers would be slightly different.

Each partition should have about 97656 rows. What would the metrics become now?

Since LOG(97656)/LOG(2) = 16.575421065795, a BTREE index with 1024 keys per page treenode would have a tree height of only 2 (CEILING(LOG(97656)/LOG(1024))). With only two pages nodes, a binary search for the needed key in each accessed treenode would result in a pruning and isolating of about 20 keys.

CONCLUSION

Spreading out the keys just removes one tree level but essentially creates 1024 indexes. The queries won't know the difference. The search time would probably be nominal at best in favor of partitions. However, make sure all the data is active. Otheriwse, you may be hitting just a few partitions, while other partitions with rarely-accessed data just takes up space and are never accessed frequently enough to justify the partitioning. You may have different performance metrics to worry about that are more blatent (such as internal defragmentation in XFS, ext3 vs ext4, etc.) You also need to worry about which storage engine you are using because:

  • InnoDB indexing would be a little messier in comparison to MyISAM because of having to manage a clustered index
  • InnoDB does double writing of data in ibdata1 as well as the current log file (ib_logfile0 or ib_logfile1)