I have a situation which can be boiled down to the following:
TABLE students
COLUMNS id, name, email
TABLE attendances
COLUMNS id, student_id, event_id
TABLE events
COLUMNS id, name, date_of_event
I want to be able to see all the students, and how many events they've attended, so I have:
select name, email, count(attendances.id) as number_of_shows_attended from students
left join attendances
on student_id = students.id
group by name, email
RESULT
Jane Doe | jane@uni.edu | 0
John Doe | john@uni.edu | 10
Will Doe | will@uni.edu | 2
Perfect. Except I also wish to have a final column which lists the most recent event attended (which can be based on the events.date_of_event
date column).
I have tried simply adding the missing column, like so:
select name, email, count(attendances.id) as number_of_shows_attended,
event.name as most_recent_event_attended from students
left join attendances
on student_id = students.id
inner join events
on attendances.event_id on events.id
group by name, email
It gives me an event name as expected, but how can I ensure it is the most recent one? I tried adding order by events.date_of_event desc
, but of course it's sorting the results of the query.
Best Answer
You could do something like this (see fiddle here - it shows development of all stages of the query). There is a small possible flaw in your design - see discussion and (slightly changed) DDL and DML in an alternative schema available here (with slightly changed SQL also!).
In future, could you please provide us with some sample tables (DDL) data and data (DML) so that we have a Single Source of Truth for the question - it avoids errors and eliminates duplication of effort on behalf of those trying to help you!
I couldn't test my solution because I didn't have any data and I've stopped doing my own data mockups since when I do, the OP (frequently) comes back with something like "Oh, but, if I put in records with x, y and z...) your solution doens't work..." - that's then me then going #FF0000/rgb(255,0,0) and swearing at the computer (and the OP)...I simulated data and tested - my previous solution fails - this one now works - it even takes account (through
LEFT JOIN
s) of students who haven't attended any events!Please test with real datasets with a large number of records and apply indexes to your tables' fields as appropriate. You can use MySQL 8.0.18 and above's new
EXPLAIN ANALYZE
(and see here) functionality to get a good idea of where indexes are required! If you run into problems, ask a new question here.Populate the tables:
The run the query:
Result:
Alternative schema:
The
"flaw"
to which I referred at the beginning is that you have an attendance table with an identity typePRIMARY KEY
- it is not required - at least in this case.Your use case (at least for this query) is a classic example of the
JOIN
ing table design pattern - aka alinking
table (and many other names - there are 16 listed on the reference below). The formal name for this type of database object is anAssociative Entity
and thePK
for such a table is normally just the twoJOIN
ing fields - no need for an identity column at all!So, your
attendance
table definition should look more like this:The final query is now as below - the joins are now done exclusively using the
JOIN
ing fields and not the, no longer needed, surrogatePRIMARY KEY
.Result identical!
Performance seems very similar - see the profiling at the end of the fiddle - however, the composite
PK
appears on average - tried 5 runs of each - to be slightly better but YMMV depending on your data distribution?