Under GROUP BY
and HAVING
with Hidden Columns, the MySQL manual documents (added emphasis):
MySQL extends the use of
GROUP BY
so that the select list can refer to nonaggregated columns not named in theGROUP BY
clause. This means that the preceding query is legal in MySQL. You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However, this is useful primarily when all values in each nonaggregated column not named in theGROUP BY
are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate.
Despite this explicit warning from the developers, some people continue to rely on undocumented behaviour to determine the value that will be selected from a hidden column.
In particular, MySQL often appears to select the "first" record from each group (where the notion of "first" is itself undocumented, such as the oldest record on some storage engines or according to some sort order applied to a materialised table from a subquery). I've seen this exploited to retrieve, for example, a groupwise maximum:
SELECT * FROM (
SELECT * FROM my_table ORDER BY sort_col DESC
) t GROUP BY group_col
For completeness, the same can be accomplished in a standard and documented fashion with a simple join:
SELECT * FROM my_table NATURAL JOIN (
SELECT group_col, MAX(sort_col) sort_col
FROM my_table
GROUP BY group_col
) t
I believe that one should never rely on undocumented behaviour because there may be unforeseen corner cases that cause that behaviour to break. For example, in satisfying a GROUP BY
operation with an index, MySQL sorts the results and may thereby choose an unexpected value.
What other corner cases can break this behaviour? Or is it sufficiently reliable for production systems?
Best Answer
I was thinking the
NATURAL JOIN
example you just usedIf you shift to another type of
JOIN
and imposeWHERE
, ordering can come and go without warning in spite of the ill-advised reliance on undocumented behavior of theGROUP BY
.For this example, I will
LEFT JOIN
without aWHERE
clauseLEFT JOIN
with a WHERE clauseFor the DB Environment
Using this script to generate sample data
and these two queries for testing the
GROUP BY
subsequent use;Let's test the durability of the
GROUP BY
's results;STEP 01 : Create the Sample Data
STEP 02 : Use
LEFT JOIN
without aWHERE
clauseSTEP 03 : Use
LEFT JOIN
with aWHERE
clauseANALYSIS
Looking at the aforementioned results, here are two questions:
LEFT JOIN
keep an ordering byid
?WHERE
impose a reordering ?No one foresaw any of these effects because the behavior of explicit clauses was relied upon by the implicit behavior of the Query Optimizer.
CONCLUSION
From my perspective, corner cases can only be of an external nature. In light of this, developers must be willing to fully evaluate the results of a
GROUP BY
in conjunction with the following twelve(12) aspects:JOINs
clausesWHERE
clausesORDER BY
clausemy.cnf
Here is the key thing to remember : Any instance of MySQL that works for your query in a specific environment is itself a corner case. Once you change one or more of the twelve(12) evaluation aspects, the corner case is due to break, especially given the first nine(9) aspects.