Mariadb – Why does the query plan start with a not-filtered-for table when the filtered-for table’s entries exceed a certain threshold

execution-planmariadb-10.3query-performance

We process data for our customers. Each customerrequest and is associated with one or more items (less than two on average) and falls into one of <100 categories, foreign keys to which are also present on the items. Customers often want tabular views/exports of their items that include columns of all three tables. Our ORM generates a reasonable-looking query along the lines of

SELECT *
FROM customeritem itm
  INNER JOIN customerrequest req ON itm.request_id=req.id
  INNER JOIN category cat ON cat.id=itm.category_id
WHERE customerrequest.customer_id=123

This works reasonably well for many customers, producing a query whose ANALYZE-is looks like this:

The query plan starts with the requests table on which the WHERE condition operates, then joins the items, then the categories, which intuitively seems the only way to go. However, for customers with many requests, a different plan is chosen and the query takes ages to execute (ANALYZE alone usually takes minutes):

Now the query plan starts with the (unfiltered) category table, then joins the items and only as a last step the requests table on which the WEHRE condition (which shrinks to results from an 8-digit to a 5 or 6-digit number) operates. If I remove the category join from the select, the query plan resembles the good one above (minus the final row for category join, of course).

Is there an intuitive explanation why this happens?

Best Answer

Perhaps you've noticed the large discrepancy between rows and r_rows for customeritem in the second analysis. According to MariaDB's manual, that table could use an "analyze" itself.

Related Solutions

SQL Server – Should a Subquery Be Used to Help Find the Correct Plan?

For future readers, I went ahead and tested both methods (with and without the subquery), and they both worked equally well on all tested environments. I do not know if the database statistics needed updated on the databases that originally had a problem or not, but like many developers, I am not in charge of that and don't have any ability to alter it.

From what I can see, the advantage of using LEFT JOIN in this case is that it removes the attraction for the plan builder to consider this a good thing (which it isn't, in this case), while still letting the plan builder choose from all the other choices. The advantage for the subquery is that you still get the INNER JOIN ability to remove any (rare) invalid rows, but you do not let the plan builder include that fact in the critical plan (because the INNER JOIN is in the outside query, not the subquery). In essence, you are restricting what joins the plan builder can use by placing them in the subquery, and doing the other joins in the main query.

Of course, anytime you restrict the plan builder, you are taking a chance that you restrict it too far and it comes up with a non-optimal plan. Going forward, I won't necessarily use either technique until I find I need to. Note that I also tried to use hints or explicitly tell SQL Server to use a particular type of join, but those were never as fast as when I let it choose everything itself.

In the end, it turned out that, due to the ugliness of the code that built this query (on the fly, even), the LEFT JOIN was far easier to implement and have some confidence that I wouldn't break anything, so I went that way. But that decision was made by factors external to the database.

Query Plan Missing ParameterCompiledValue in SQL Server

The XML plan uploaded includes:

<ParameterList>
    <ColumnReference Column="@today" ParameterCompiledValue="'2016-03-04 00:00:00.000'" />
</ParameterList>

There are multiple statements in the batch, so the XML contains multiple <StmtSimple> elements under <Statements>. Aside from the final select, the other statements are all assignments, which are not queries, so there is no parameter list at that level.

As Kin mentioned, you may find it useful to look at the plans in SQL Sentry Plan Explorer:

Related question:

Parameter Sniffing vs VARIABLES vs Recompile vs OPTIMIZE FOR UNKNOWN

Related Question