Mysql – Is searching an alternate index on a partitioned theISAM table concurrent or serial

indexmyisamMySQLpartitioning

I have a myISAM table with 30 partitions. The primary key is a SHA1 hash value, which is also used in the partition selection function. I also have an alternate key – an autoincrement integer. Since the alternate key does not include the value used in the partition selection function, selecting on the alternate index can't use "partition pruning". Therefore, a select for an alternate key must involve searches on all 30 partitions. This much makes sense.

I was recently told that such a search is carried out serially — on one partition after another. I find this hard to believe, since it sounds like something that could easily be done concurrently. Can anyone confirm this behavior?

mySQL 5.6.14

Best Answer

I agree that MySQL has to do search all 30 partitions. Since you are searching by an auto_increment column, it must be declared as a PRIMARY KEY.

Since indexes are BTREEs, I would expect O(log n) search time per partition.

MySQL (really Oracle) says in the Documentation on Partitioning Overview:

Other benefits usually associated with partitioning include those in the following list. These features are not currently implemented in MySQL Partitioning, but are high on our list of priorities.

Queries involving aggregate functions such as SUM() and COUNT() can easily be parallelized. A simple example of such a query might be SELECT salesperson_id, COUNT(orders) as order_total FROM sales GROUP BY salesperson_id;. By “parallelized,” we mean that the query can be run simultaneously on each partition, and the final result obtained merely by summing the results obtained for all partitions.

Achieving greater query throughput in virtue of spreading data seeks over multiple disks.

Given these comments about partitions being processed in parallel on their wishlist, it would have to be true at this present time to say that all partitions have to be searched sequentially.

UPDATE 2014-02-21 10:51 EST

With regard to your original question, you stated

  • Partitioning on SHA1 PRIMARY KEY
  • Partitioned MyISAM table with 30 partitions

OBSERVATION #1

Given the fact that SHA1 output is always 40 characters, my guess is that you are not using RANGE partitioning.

Here is what MySQL Documentation says on HASH partitioning

Partitioning by HASH is used primarily to ensure an even distribution of data among a predetermined number of partitions. With range or list partitioning, you must specify explicitly into which partition a given column value or set of column values is to be stored; with hash partitioning, MySQL takes care of this for you, and you need only specify a column value or expression based on a column value to be hashed and the number of partitions into which the partitioned table is to be divided.

Since you are not using RANGE partitioning

  • MySQL will place new data into a partition after figuring out which partition to choose based on how even data are distributed.
  • There is no mechanism to reasonably determine any ordering of the PRIMARY KEY.
  • Using PARTITION BY HASH creates buckets for data with no rhyme or reason for ordering.
  • Using PARTITION BY LINEAR HASH could create ordered buckets.

The overall problem is that the PRIMARY KEY is not an INT. You cannot set up any real RANGE partitioning for a non-scalar type (such as CHAR string). Given this scenario, MySQL cannot avoid doing a SELECT lookup without opening every partition.

OBSERVATION #2

A partitioned MyISAM table with 30 partitions is made up of

  • 30 .MYD files
  • 30 .MYI files

Each MYD requires a file handle. Each MYI requires a file handle. Thus, you can have 60 file handles open. Here is what you can do to see how many file handle open up after a mysql restart

FLUSH TABLES my_partitioned;
SELECT variable_value INTO @open1
FROM information_schema.global_status
WHERE variable_name='open_files';
SELECT bar FROM my_partitioned WHERE foo_ak = 42;
SELECT variable_value INTO @open2
FROM information_schema.global_status
WHERE variable_name='open_files';
SET @opendiff = @open2 - @open1;
SELECT @opendiff;

What you can conclude:

  • If @opendiff is close to 60, MySQL tried to open all the partitions.
  • If @opendiff is close to 0, MySQL had pruned the partitions

The second bulletpoint is what you want because it will show how heavy handed MySQL is at opening partitions,... or not.