MySQL, “columns partitioning” when multiple columns are individiually important

MySQLpartitioning

MySQL 5.5 introduces "columns partitioning".
http://dev.mysql.com/doc/refman/5.5/en/partitioning-columns.html

I'm trying to better understand how it works when two columns are important individually.

Let's say for a table which holds the messaging between two system users. We would potentially have "sender_id" and "receiver_id" columns, and we might want to query against these columns individually.

If we have separate indexes on both of these columns, we can query them individually when necessary. Results are fast.

But what if our table is 100M rows large, and we consider partitioning. My understanding is multiple column partitioning focuses on the first column in the columns definition, then the second one. Here is a sample table structure:

CREATE TABLE messages (
         message_id INT,
         sender_id INT,
         receiver_id INT
     )
     PARTITION BY RANGE COLUMNS(sender_id,receiver_id) (
         PARTITION p0 VALUES LESS THAN (10,10),
         PARTITION p1 VALUES LESS THAN (20,20),
         PARTITION p3 VALUES LESS THAN (MAXVALUE,MAXVALUE)

If we query for "WHERE receiver_id=5", partition pruning will not kick in, right? It will need to search all partitions. But if we were to search for "WHERE sender_id=5", then we would immediately know the result is in p0.

So for a table where two columns could potentially be individually important, partitioning might not be the best solution since now we lost the benefit of a full-table index for the secondary column(receiver_id, for this case) in the columns parameter. Is that right?

Best Answer

You are right that mysql will check only one partition for a specific sender_id, but checks all the partitions for a specific receiver_id, as shown here:

mysql> explain partitions select * from messages where sender_id =5;
+----+-------------+----------+------------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table    | partitions | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+-------------+----------+------------+------+---------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | messages | p0         | ALL  | NULL          | NULL | NULL    | NULL |    2 | Using where |
+----+-------------+----------+------------+------+---------------+------+---------+------+------+-------------+
1 row in set (0.00 sec)

mysql> explain partitions select * from messages where receiver_id =5;
+----+-------------+----------+------------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table    | partitions | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+-------------+----------+------------+------+---------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | messages | p0,p1,p3   | ALL  | NULL          | NULL | NULL    | NULL |    3 | Using where |
+----+-------------+----------+------------+------+---------------+------+---------+------+------+-------------+
1 row in set (0.00 sec)

However, there are still benefits to that partitioning, depending on your hardware. When looking in all partitions for a receiver_id, mysql is really performing 3 select statements, one for each partition. It may be able to parallelize these select statements. Additionally, if you index receiver_id, it will be accessing 3, smaller, indicies.

In the end, you just have to do performance testing and see if it is paying off for your use case. Seeing as 100MB fits in RAM pretty easily these days, I wouldn't consider partitioning such a small table unless you have specific reasons to do so.