I have a join that works but seems hacky because it orders the subquery and uses group by to filter everything but the top subquery result. Without doing so, the cartesian product of s and print_issues.pub_date would be returned.
SELECT
print_issues.pub_date,
print_issues.x,
s.page_size
FROM print_issues,
( select pub_date, page_size
from print_page_size,
print_issues
where effective_date < print_issues.pub_date
order by effective_date desc
) as s
WHERE s.pub_date = print_issues.pub_date
group by print_issues.pub_date
order by print_issues.pub_date desc
The query is an artifact of the schema of print_page_size
. That table has the columns page_size
and effective-date
. New entries with a greater effective_date
supersede old ones. The goal of the query is to unite the print_issue
table with the page_size
in effect when each issue was published.
Sample Data:
TABLE: print_page_size
======================
effective_date page_size
2014-01-01 100
2017-05-01 105
TABLE: print_issues
===================
pub_date x
2017-04-26 "Random"
2017-05-02 "Data"
OUTPUT
======
print_issues.pub_date print_issues.x s.page_size
2017-04-26 "Random" 100
2017-05-02 "Data" 105
My questions: Is there a problem with this approach? Is there a more direct way to to achieve the desired result?
Best Answer
Your query is non-deterministic as it abuses
GROUP BY
.It may give incorrect/unexpected results in any version of MySQL, depending on the execution plan and available indexes. You may not have seen such behaviour because it hasn't happened yet (you were just lucky or the indexes and size of the table have up to now lead to plans that give correct results) or it happened but you didn't notice.
There are versions (MariaDB 5.3+) where similar queries - not sure about this one but similar - will almost always give incorrect results due to some optimizations that remove the redundant
ORDER BY
in the subquery.There are versions of MySQL (5.7) where you'll get an error if you try the query - exactly because it might give inconsistent results. In that version, you'll have to change the default settings (and get the wrong non-deterministic behaviour) or you'll have to rewrite/correct the query.
In short, don't use it.
I would write the query like this. Much simpler, deterministic and probably more efficient as there is no
GROUP BY
and doesn't join thepi
table twice:An index on
print_page_size (effective_date, pagesize)
would be further help.