Mariadb – Why does the query plan start with a not-filtered-for table when the filtered-for table’s entries exceed a certain threshold

execution-planmariadb-10.3query-performance

We process data for our customers. Each customerrequest and is associated with one or more items (less than two on average) and falls into one of <100 categories, foreign keys to which are also present on the items. Customers often want tabular views/exports of their items that include columns of all three tables. Our ORM generates a reasonable-looking query along the lines of

SELECT *
FROM customeritem itm
  INNER JOIN customerrequest req ON itm.request_id=req.id
  INNER JOIN category cat ON cat.id=itm.category_id
WHERE customerrequest.customer_id=123

This works reasonably well for many customers, producing a query whose ANALYZE-is looks like this:

query plan for customers with not too many requests

The query plan starts with the requests table on which the WHERE condition operates, then joins the items, then the categories, which intuitively seems the only way to go. However, for customers with many requests, a different plan is chosen and the query takes ages to execute (ANALYZE alone usually takes minutes):

query plan with many requests

Now the query plan starts with the (unfiltered) category table, then joins the items and only as a last step the requests table on which the WEHRE condition (which shrinks to results from an 8-digit to a 5 or 6-digit number) operates. If I remove the category join from the select, the query plan resembles the good one above (minus the final row for category join, of course).

Is there an intuitive explanation why this happens?

Best Answer

Perhaps you've noticed the large discrepancy between rows and r_rows for customeritem in the second analysis. According to MariaDB's manual, that table could use an "analyze" itself.