Mysql – How to handle nested groupings with possible null elements in one query

group bygroup-concatenationMySQL

As you can see in this sql-fiddle, I have three tables: events, event_dates, and event_times.

The original tables contain various other fields, which I omitted as they aren't relevant, I just need to have them in my final result, but I do no operation/sort/filtering/grouping on them.

An event has 1+ event_dates and an event_date has 0+ event_times.

From an object point of view, an event object has a list of event_date objects, which in turn can have a list of event_time objects.

My final goal is to obtain an array of event objects.

My plan is to design a query returning the ids of matching records and from these results fetch the records from their tables in 3 queries and assemble everything in code (this is necessary because of various data-transformations outside the scope of SQL).

My problem is getting the first query right

At the moment I have the following query

SELECT events.id,
CONCAT('[', GROUP_CONCAT(event_dates.id SEPARATOR ','), ']') AS event_date_ids,
CONCAT('[', COALESCE(GROUP_CONCAT(event_times_per_event_date.event_time_ids SEPARATOR ','), '[]'), ']') AS event_time_ids
FROM events
INNER JOIN event_dates ON event_dates.event_id = events.id
LEFT JOIN (
  SELECT date_id AS event_date_id, CONCAT('[', COALESCE(GROUP_CONCAT(event_times.id SEPARATOR ','), ''), ']') AS event_time_ids
  FROM event_dates
  LEFT JOIN event_times ON event_times.date_id = event_dates.id
  GROUP BY event_dates.id
) AS event_times_per_event_date ON event_times_per_event_date.event_date_id = event_dates.id
GROUP BY events.id

which fails in the following situation:

Given event e with three dates d1,d2 and d3, where d1 has two times (t1,t2), d2 has zero times, d3 has two times(t3,t4).
The result of the query would be:

event_id  |  event_date_ids  |  event_time_ids
1         |  [1,2,3]         |  [[1,2],[3,4]] 

What I'd expect would be an empty array in event_time_ids, so that I could match the indices of the array in event_date_ids with the one for the array in event_time_ids:

event_id  |  event_date_ids  |  event_time_ids
1         |  [1,2,3]         |  [[1,2], [], [3,4]] 

How can I obtain this?

P.S.: I deeply apologise if my question is chaotic or unclear and I'm ready to provide every other necessary information.

Thank you!

UPDATE 1

Thanks a lot to @Andriy M for spotting my mistake.

Moving on, I have to add to the equation event_categories where events have 1+ categories.

Adding a DISTINCT in the GROUP_CONCAT would break my current solution as empty arrays would be removed.
The solution is to make this query into a subquery, and join its results to a query fetching all other event data and grouping by events.id.
The event_time_ids would be duplicate for every event row, thus selecting any row for that column is fine.

Best Answer

You need to apply one small change to your query which is emphasised below:

SELECT events.id,
CONCAT('[', GROUP_CONCAT(event_dates.id SEPARATOR ','), ']') AS event_date_ids,
CONCAT('[', COALESCE(GROUP_CONCAT(event_times_per_event_date.event_time_ids SEPARATOR ','), '[]'), ']') AS event_time_ids
FROM events
INNER JOIN event_dates ON event_dates.event_id = events.id
LEFT JOIN (
  SELECT event_dates.id AS event_date_id, CONCAT('[', COALESCE(GROUP_CONCAT(event_times.id SEPARATOR ','), ''), ']') AS event_time_ids
  FROM event_dates
  LEFT JOIN event_times ON event_times.date_id = event_dates.id
  GROUP BY event_dates.id
) AS event_times_per_event_date ON event_times_per_event_date.event_date_id = event_dates.id
GROUP BY events.id;

In the derived table you are correctly grouping by event_dates.id but retrieving event_times.date_id instead. That is a mistake because when there is no match, it is null. And it is because one value is returned as null that the outer query omits the corresponding empty group. Replacing event_times.date_id AS event_date_id with event_dates.id AS event_date_id fixes the issue.