In the below query, there are repeated calculations such as the three calls to SUM(p.amount)
. Does MySQL re-calculate for each function call or is there some kind of memoization optimization under the hood? If not, how can this kind of query be optimized for maximum performance?
It seems like it would be faster after the first calculation to get the next one by the alias name, total_payemnts
, but that just throws an error.
SELECT LEFT(c.last_name, 1) AS 'last_names',
SUM(p.amount) AS 'total_payments',
COUNT(p.rental_id) AS 'num_rentals',
SUM(p.amount) / COUNT(p.rental_id) AS 'avg_pay'
FROM customer c
JOIN payment p ON p.customer_id = c.customer_id
GROUP BY LEFT(c.last_name, 1)
ORDER BY SUM(p.amount) DESC;
This query runs on the MySQL Sakila sample database.
Best Answer
A definite answer is really difficult. However, some trials might give us some hints.
We start with a few data:
We first define a
DETERMINISTIC
(although it has side effects) function to log calls:We have previously defined a
log
table, to just store counts:At this point, if MySQL were able to just perform the common subexpression elimination optimization (common for compilers), it would just make ONE call when we issue:
Because
log_numeric(30, 'x')
would return the same value in both cases (because the function is deterministic).In fact, it makes two calls to this function, as proven by watching counter 'x';
[NOTE: Tested with MySQL 5.7.12]
It might be the case that MySQL has decided that the
log_numeric
function has some side effects, or that User Defined Functions do not work the same way as built-in functions. Built-in functions could be optimized, whereas UDF would not. In any case, if MySQL would have optimized the two expressions with just one call, it would be clear that this optimization is standard. As it did not, we can make an educated guess (which is not the same as a proof) and assume that the functions are called as many times as they're present.The same can be done with the case posted:
And the results point to all calls being evaluated as many times as they appear.
You can check all theses tests at http://rextester.com/ZKS91815.
NOTE: For curiosity, postgreSQL (9.6.1) does a little better: it recognizes the ORDER BY and GROUP BY as "already computed" on the
SELECT
, but it doesn't optimize either for Common subexpression elimination. Room for improvement.Alternative query avoiding repeating functions
If you're worried things are computed more than necessary, use:
I honestly don't think the difference will be much. The weight of the functions in the time spent for the whole query is probably (ver) small. It might be signficant if the functions were complex. Depending on how MySQL deals with Subqueries (that might be temporary tables), this could actually be worse.