Imagine issues with linked
- Time entries to track the time spent
- Journal entries to track progression evolution
(Actually a part of Redmine schema)
Let's take some sample data:
Issues
id | subject | estimated_hours |
---|---|---|
1 | First issue | 4 |
2 | Second issue | 8 |
Time entries
id | issue_id | spent_on | hours |
---|---|---|---|
1 | 1 | 2020-12-02 | 1 |
2 | 1 | 2020-12-05 | 1.5 |
3 | 2 | 2020-12-06 | 0.5 |
4 | 1 | 2020-12-10 | 1 |
5 | 2 | 2020-12-10 | 1 |
6 | 2 | 2020-12-15 | 2 |
7 | 2 | 2020-12-15 | 1 |
Journal
id | issue_id | updated_on | done_ratio |
---|---|---|---|
1 | 1 | 2020-11-20 | 0 |
2 | 2 | 2020-11-20 | 5 |
3 | 1 | 2020-12-01 | 20 |
4 | 1 | 2020-12-03 | 40 |
5 | 2 | 2020-12-06 | 30 |
6 | 1 | 2020-12-07 | 70 |
7 | 1 | 2020-12-11 | 100 |
8 | 2 | 2020-12-12 | 50 |
Now for each time entry I need to know the latest done ratio.
Here is what I'm expecting
id | subject | spent_on | hours | done_ratio |
---|---|---|---|---|
1 | First issue | 2020-12-02 | 1 | 20 |
2 | First issue | 2020-12-05 | 1.5 | 40 |
3 | Second issue | 2020-12-06 | 0.5 | 30 |
4 | First issue | 2020-12-10 | 1 | 70 |
5 | Second issue | 2020-12-10 | 1 | 30 |
6 | Second issue | 2020-12-15 | 2 | 50 |
7 | Second issue | 2020-12-15 | 1 | 50 |
I thought I could get that with this query:
SELECT
tie.id
, iss.subject
, tie.spent_on
, tie.hours
, jou.done_ratio
FROM time_entry AS tie
JOIN issue AS iss ON (tie.issue_id = iss.id)
JOIN (
SELECT
issue_id
, updated_on
, done_ratio
FROM journal
ORDER BY updated_on DESC
) AS jou ON (
jou.issue_id = tie.issue_id
AND jou.updated_on <= tie.spent_on
)
GROUP BY tie.id, tie.spent_on
ORDER BY tie.spent_on
But it's returning the first done ratio for every time entry.
Here is a playground to fiddle with the mentioned data
How can I get the done ratio of the latest journal prior to (<=) each time entry ?
I'm using mariadb:10.2.9 here but this should pretty much be standard ANSI SQL
Best Answer
To solve your problem, I did the following:
Took your fiddle and put it into dbfiddle.uk - the best one around with the most servers! I used MariaDB 10.3 (10.2 isn't available on dbfiddle) and I also used PostgreSQL - the most standards compliant F/LOSS server around (possibly along with Firebird). MySQL and its derivatives are not great - I presume that fixing some of the bugs/inconsistencies would break a lot of systems now - see the discussion below on
ONLY_FULL_GROUP_BY
).The MariaDB fiddle is here and the PostgreSQL one is here if you're interested.
I'll show some SQL that demonstrates my train of thought and then the final (correct) result. I'm a big fan of Window functions in SQL and I thought that I could use them to answer this question. They are really powerful and well worth getting to know!
So, I was attempting SQL like this:
Result (better viewed in the fiddle):
As you can see, it took considerable experimentation (not to mention most of the night, much to my financée's displeasure...) to find a unique combination of fields using ROW_NUMBER(), but I eventually found it - using
does the trick as follows:
Result:
which is as requested! QED - et voilà!
Just for kicks, I decided to look at the
performance of my own and the other answers
to your question. This is best done using MySQL 8 (version >= 8.0.18) which has the newEXPLAIN ANALYZE
functionality, which is a vast improvement over the oldEXPLAIN EXTENDED
- it's much closer to PostgreSQL's functionality now (although PG is still by far and away the superior database).You can check these out for yourself in the fiddle - it's a lot of text! Furthermore, unfortunately, the elapsed time for a query functionality covered in this article isn't returned by dbfiddle - and I don't have a running instance of MySQL at the minute. The plans are rather cryptic unless you're an expert, but a decent rule of thumb is that fewer lines mean better performance!
My own query appears to do rather poorly - but it would take (and this applies to the other solutions also) testing with a realistic dataset (plus with proper
PRIMARY
andFOREIGN
KEY
s) to be able to pronounce definitively on the optimal solution.Your first query fails under MySQL 8:
also under PostgreSQL:
The ONLY_FULL_GROUP_BY problem in the MySQL error message is well known - by not having it enabled, you are running a very high risk of your query not returning correct results - it may work for small test cases, but then you'll start to obtain (ahem...) anomalous results (i.e. just plain WRONG!). Enable it on all of your systems NOW - it's enabled by default on MySQL 8, but not on any of the MariaDBs on dbfiddle.uk. I cannot stress strongly enough how important it is that you make this change!
Your second query works under version 8, but also has the biggest (most lines) in the plan, so mightn't have optimal performance.
@bbaird's plans are the best (i.e. smallest) - the best one being:
which gives the plan:
which is very compact relative to the others. However, my remarks about realistic testing and proper indexing still apply!
+1 pour une excellente question qui m'a obligé à beaucoup réfléchir (avec fiddle en plus!). p.s. bienvenu sur le forum!