MySQL – Query Using WHERE BETWEEN Hits All Partitions

MySQLpartitioningperformancequery-performance

This is probably just me misunderstanding how MySQL partitions work, but I have a table defined with:

  `ID` int(11) NOT NULL,
  `target_id` int(11) NOT NULL,
  `created_at` datetime NOT NULL,

  PRIMARY KEY (`ID`, `created_at`),
  KEY `index_created_at_target_id` (`created_at` desc, `target_id`)
  KEY `index_on_created_at` (`created_at`)
)
ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
PARTITION BY HASH(MONTH(`created_at`))
PARTITIONS 12

If I query the data with a single created_at timestamp:

select * from my_table where `created_at`= '2018-12-00 05:00:00' AND target_id in (6,7,8);

Then the explain knows to only hit 1 partition. This is correct.


However, if I query on a range:

select * from my_table where `created_at` BETWEEN '2018-12-00 05:00:00' AND '2019-01-04 04:59:59' AND target_id in (6,7,8);

The explain now hits all partitions. Is this a known limitation, or am I just doing something wrong?


Side note; judging by other "partitioning" questions, I expect a couple "partitioning doesn't help just index correctly" responses… I'm just trying it for myself and gathering some metrics so we can find the solution that performs best for our needs. (also partitioning the table, even when it then checks all partitions, still cut the query time in half versus having 1 monolithic table with the same indexes; 2.3s vs 0.8s)


Edit 1:

I did make sure that both queries do hit the index_on_created_at index. The only difference is a 'non unique key lookup' versus an 'index range scan' on the index.

Best Answer

You have just found one of many reasons why PARTITIONing is mostly useless.

Even with your first SELECT, there is no performance benefit over what you could have with a non-partitioned table with a suitable index. j In particular:

INDEX(target_id, created_at)  -- in this order

The second SELECT shows how PARTITION BY HASH treats a range -- by looking in all partitions. It has no choice but to do that. Yeah, yeah, with a lot more smarts, it could notice that that date range should hit only two months. Even so, non-partitioned with a suitable index would run faster and be simpler.

Other comments about the indexes:

KEY `index_created_at_target_id` (`created_at` desc, `target_id`)
KEY `index_on_created_at` (`created_at`)

has two issues:

  • INDEX(a,b), INDEX(a) -- you may as well toss the latter since the former can serve its purposes.
  • Until MySQL 8.0, DESC is ignored in INDEX definitions. (Since you don't have an ORDER BY that depends on it, there is no harm done.)

If your goal is to understand what partitioning can/cannot do, I recommend my blog .

even when it then checks all partitions, still cut the query time in half versus having 1 monolithic table with the same indexes

I dispute that. Did you run with the Query cache off? Did you run each test twice -- to avoid buffer_pool_ caching differences? Actually, the key is probably "with the same indexes". Almost always, when switching between partitioned and non-partitioned, the indexes need to be revised.

  • When partitioned, put the partition key later in each index where it is needed. Or leave it out if the partition pruning suffices.
  • When not partitioned, non-range columns should be early in indexes.
  • When not partitioned, range columns (eg date range) should be late in indexes.

Why even a point query does not speed up with partitioning. Notice how either way seems to be about the same amount of work:

  • Non partitioned: BTree index that is used is, say, 3 levels deep. Hop-skip-jump, there is the row you desire.
  • Partitioned: First prune to the desired partition ("hop"). The, say, the BTree is only 2 levels deep ("skip-jump").

Ah. Maybe you can discover a 5th use case for Partitioning.