MySQL partitioned tables

MySQLpartitioning

I have a database that supports a web application with several large tables. I'm wondering if partitioned tables will help speed up certain queries. Each of these tables has a colum called client_id. Data for each client_id is independent from every other client_id. In other words, web queries will always contain a where clause with a single client_id. I'm thinking this may be a good column on which to partition my large tables.

After reading up on partitioned tables, I'm still a little unsure as to how best to partition. For example, a typical table may have 50 million rows distributed more or less evenly across 35 client_ids. We add new client_ids periodically but in the short term the number of client_ids is relatively fixed.

I was thinking something along these lines:

CREATE TABLE foo (
    id INT NOT NULL PRIMARY KEY,
    ... more column defs here...
    client_id int    
)
PARTITION BY KEY(client_id)
PARTITIONS 35;

My question. Is this an optimal strategy for partitioning these types of tables? My tests indicate a considerable speedup over indexing on client_id, but can I do better with some other form of partitioning (i.e. hash or range)?

Best Answer

Partitioning could quite possibly help you in this situation. By putting the data into a partition, the database will have a smaller table for each client to deal with. This could help performance a lot.

Here is an example of how it would help. The data in the table would otherwise be interleaved, so different clients would have rows on a data page. Reading the data from a single client would then "clutter up" the available page cache with records from other clients.

However, it will not help very much if all the clients are accessing data concurrently, so all are competing for available memory. Partitioning probably will not make this worse, but it won't necessarily help.

Also, MySQL automatically partitions the indexes on the table, which is generally a good thing.

Often partitioning is applied to a date field, so older data gets put into less used partitions. This may be a case where another field is useful. But, if all the clients need all their data at the same time, then the partitioning may not give you much of a gain.