Mysql – How to group same columns from same two tables, but from two different result sets and/or conditions

group bygroup-concatenationMySQL

I am having troubles coming up with a query for something that should be quite simple.

Initially, I had two tables of data that I generated joint result sets from; an events table, and a groups table (think communities or organisations). I would use a query to get event data, such as event name, ID, the group that hosted the event (single ID only), amongst others. Every single event and group has its own unique ID number, given by AUTO_INCREMENT. I would use the following query:

SELECT g.group_id,
       g.name,
       prev_id,
       activity_state+0,
       state,
       first_event_id,
       last_event_id,
       GROUP_CONCAT(e.event_id ORDER BY start_date_time SEPARATOR ','),
       GROUP_CONCAT(e.name ORDER BY start_date_time SEPARATOR '__'),
       GROUP_CONCAT(start_date_time ORDER BY start_date_time SEPARATOR ',') AS first_dates,
       GROUP_CONCAT(end_date_time ORDER BY start_date_time SEPARATOR ',') 
FROM events e 
INNER JOIN groups g ON e.group_id=g.group_id 
WHERE start_date_time BETWEEN ? AND ? 
GROUP BY g.name 
ORDER BY first_dates

The two question marks represent two bound dates and times in a prepared statement. This query would work perfectly fine.

However, I recently updated the structure of the database to include so-called "joint events", where multiple groups host the same events together. I added a joint_event table, which simply contains an event_id column, and a group_id column. The same event IDs appear multiple times in the table (at least twice), and is used to indicate which groups hosted particular events. Joint events are also added to the event table, but are given an ID of 0 (zero).

I now need a query that is similar in functionality to the one above. The query must include the initial result set from the event table and group table join condition, also any occurrences of joint events matching the same conditions from the joint events table. They must be appropriately grouped together, so that I don't get duplicates of the same data (each group should appear in its own row, and only one row per group).

Separately, the join conditions in SQL-89 syntax would be as follows:

events.group_id = groups.group_id

(gives the bulk of results, at least 98%, plus…)

events.event_id = joint_events.event_id AND joint_events.group_id = groups.group_id

(for the joint events)

I have tried using UNION, but it doesn't appear to support grouping by the entire data set returned whole. I have tried using subqueries, but run into issues when using aliases to describe the same columns across two separate data result sets, as it usually produces a "ambiguous" column name error. Using DISTINCT didn't work, as it would either remove events that have the exact same name as each other (these need to be kept), or create an inconsistency in the grouped columns when not applied to all four columns that need to be grouped. The closest I came to was a query that resulted in the joint event data being separated from the rest, instead of being grouped into the rest; or the data was being included, but event data was being duplicated.

I cannot have NULLs anywhere, or use two completely separate queries, otherwise I would have to refactor my PHP script, and would likely make the operation inefficient using PHP to sort the data out.

Thanks in advance.

Best Answer

Keep in mind that queries like this operate thus:

  1. JOIN all the tables together, generating a 'large' temp table with all combinations that have not been filtered out.
  2. Then do the aggregates (SUM, GROUP_CONCAT, etc)

A common problem is that SUMs and COUNTs are inflated because of all the combinations. In your case, GROUP_CONCAT may have duplicates; this can be handled by saying GROUP_CONCAT(DISTINCT ...).

A common solution is to focus on the aggregates, using a minimum of JOINs to get the aggregates right, then JOIN to the rest of the tables.

Another solution that sometimes works is to replace a JOIN with a subquery (in the SELECT part) that computes "looks up" a value (such as mapping an id to a name).

You must be using an older version of MySQL; new versions don't like you to say GROUP BY one-column while listing 7 columns in the SELECT.

Sorry, this is only "handwaving", but your question is also missing details, such as details of "when I added...", SHOW CREATE TABLE, specifying which table each column belongs to, etc.