MariaDB – Why the Query Optimizer Chooses a Bad Execution Plan

execution-planinnodbmariadb

We have a mariadb table (stories) with above 1TB of data, periodically running a query that fetches recently added rows for indexing somewhere else.

innodb_version: 5.6.36-82.1
version       : 10.1.26-MariaDB-0+deb9u1

The query works just fine when the query optimizer decides on using the secondary index to do a range walk through (in buckets of a 1000)

explain extended     SELECT  stories.item_guid
    FROM  `stories`
    WHERE  (updated_at >= '2018-09-21 15:00:00')
      AND  (updated_at <= '2018-09-22 05:30:00')
    ORDER BY  `stories`.`id` ASC
    LIMIT  1000;

+------+-------------+---------+-------+-----------------------------+-----------------------------+---------+------+--------+----------+---------------------------------------+
| id   | select_type | table   | type  | possible_keys               | key                         | key_len | ref  | rows   | filtered | Extra                                 |
+------+-------------+---------+-------+-----------------------------+-----------------------------+---------+------+--------+----------+---------------------------------------+
|    1 | SIMPLE      | stories | range | index_stories_on_updated_at | index_stories_on_updated_at | 5       | NULL | 192912 |   100.00 | Using index condition; Using filesort |
+------+-------------+---------+-------+-----------------------------+-----------------------------+---------+------+--------+----------+---------------------------------------+
1 row in set, 1 warning (0.00 sec)

But occasionally, even with small differences in the data set (Note the second timestamp difference with the query above, worth to mention that the whole table holds data for several years and holds several dozens millions of rows) decides to use the primary key index

explain extended     SELECT  stories.item_guid
    FROM  `stories`
    WHERE  (updated_at >= '2018-09-21 15:00:00')
      AND  (updated_at <= '2018-09-22 06:30:00')
    ORDER BY  `stories`.`id` ASC
    LIMIT  1000;

+------+-------------+---------+-------+-----------------------------+---------+---------+------+--------+----------+-------------+
| id   | select_type | table   | type  | possible_keys               | key     | key_len | ref  | rows   | filtered | Extra       |
+------+-------------+---------+-------+-----------------------------+---------+---------+------+--------+----------+-------------+
|    1 | SIMPLE      | stories | index | index_stories_on_updated_at | PRIMARY | 8       | NULL | 240259 |    83.81 | Using where |
+------+-------------+---------+-------+-----------------------------+---------+---------+------+--------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

Causing it to walk through the whole primary key index (I guess sequentially) to then filter on updated_at field taking several hours to complete instead.

The query was created by the ORM ActiveRecord and is probably far from ideal, as a workaround solution we came up with a couple of manually crafted queries, that moves the ORDER BY stories.id out, and/or using use/force index to avoid the PK since we are really filtering our data set by updated_at.

What I'm interested in understanding here is how/why the query optimizer chooses that execution plan, I understand that query optimizer uses index/table statistics for taking that decision but in this case, and If my understanding is correct about how innodb works, this is pretty clear that walking through a huge PK while not using any PK id for filtering is not ideal.

I'm essentially trying to understand where that "bad" decision comes from, what statistics or unknowns variables are used not to end up with the good plan (the one that is often chosen and which is orders of magnitude faster)

feel free to correct any of my assumptions as I'm definitively not a DBA expert,

Thanks in advance!

Best Answer

Short Answer: The WHERE and the ORDER BY are tugging in different directions. The Optimizer is not yet smart enough to always correctly decide which direction to be pulled.

Long Answer:

The WHERE benefits from any index starting with updated_at. See the "100%", and other things. Such an index does a quick job of finding the desired rows, all 192K of them. But the query then needs to sort of 192K rows (ORDER BY) before it can get to the LIMIT.

The ORDER BY id benefits from the PRIMARY KEY. This index lets the query get all rows in order (thereby avoiding a sort) and hence can get to the LIMIT (thereby never shoveling around more than 1000 rows.

If the 1000 are found at small value of id, it will run fast. If the desired 1000 are late in the table, the query will run slowly. The Optimizer can't predict (without a lot more smarts and complexity).

Does update_at mostly track id? If so, then either index would lead to essentially the same blocks. But the Optimizer neither notices this, nor does it take advantage of it.

A possible speedup:

A "covering" index is one that has all the needed columns. Since the PK is 'clustered' with the data, PRIMARY KEY(id) is sort of a covering index. But we saw how bad it could be. The following may be better:

INDEX(updated_at,      -- first, to deal with the WHERE
      id, item_guid)   -- in either order

The query would range-scan the 192K rows faster than if it needed to bounce between INDEX(updated_at) and the data 192K times to find item_guid. The sort would not be improved.

Alas, that speeds up the faster query. So, let me put my thinking cap back on.

Partitioning?

Oh, you may have found a 5th use case for PARTITIONing. Six years ago, I had only 4 use cases; I have yet to find a new use case. Let me talk through how your case might be different.

Supposed you used PARTITION BY RANGE (TO_DAYS(updated_at)) on the table, setting up about 50 partitions. (If you need to purge 'old' data, Partitioning is excellent.)

When setting up partitioning, one needs to rethink all the indexes. What indexes do you have now? I'll assume these:

PRIMARY KEY(id),
INDEX(updated_at)

In your case, perhaps not much would change:

PRIMARY KEY(id, updated_at),
INDEX(updated_at)

What would happen to your queries?

Tackling the WHERE first would not change much. Yes, there would be partition pruning, plus the secondary index. Speed would stay about the same.

By tackling the ORDER BY first, the query plan could also prune down to one (or two) partitions. Then the scan in id order within that/those partition(s) would be 50 (or 25) times as fast. There would probably be an added sort, since the rows from separate partitions won't be in order.

Other notes

Could we see SHOW CREATE TABLE? A 1TB table needs a lot of things to help with performance. If all your numbers are INTs, then there might be a lot of space to recover by shrinking to MEDIUMINT, etc. Normalization may be worth doing. If the GUID is indexed, that must be a big burden on insert performance. Etc, etc.

Ypercube's

If ORDER BY updated_at, id is OK for the task, then that eliminates the temptation to use the slow explain plan. And INDEX(updated_at, id, item_guid) becomes the best index. And partitioning becomes useless.

Pagination

Not via OFFSET, remember where you left off. And this discusses how to deal with "left off" involving more than a single column.