We process data for our customers. Each customerrequest and is associated with one or more items (less than two on average) and falls into one of <100 categories, foreign keys to which are also present on the items. Customers often want tabular views/exports of their items that include columns of all three tables. Our ORM generates a reasonable-looking query along the lines of
SELECT *
FROM customeritem itm
INNER JOIN customerrequest req ON itm.request_id=req.id
INNER JOIN category cat ON cat.id=itm.category_id
WHERE customerrequest.customer_id=123
This works reasonably well for many customers, producing a query whose ANALYZE-is looks like this:
The query plan starts with the requests table on which the WHERE condition operates, then joins the items, then the categories, which intuitively seems the only way to go. However, for customers with many requests, a different plan is chosen and the query takes ages to execute (ANALYZE alone usually takes minutes):
Now the query plan starts with the (unfiltered) category table, then joins the items and only as a last step the requests table on which the WEHRE condition (which shrinks to results from an 8-digit to a 5 or 6-digit number) operates. If I remove the category join from the select, the query plan resembles the good one above (minus the final row for category join, of course).
Is there an intuitive explanation why this happens?
Best Answer
Perhaps you've noticed the large discrepancy between rows and r_rows for customeritem in the second analysis. According to MariaDB's manual, that table could use an "analyze" itself.