MySQL – GROUP_CONCAT returns duplicate data, can’t use DISTINCT

duplicationgroup-concatenationjoin;MySQL

I have a normalized database and I'm trying to return data from multiple tables using JOINs and GROUP_CONCAT.

Problem: Rows are being duplicated with GROUP_CONCAT. I can't use DISTINCT because some of the data (ingredient mfr) does need to be duplicated.

Here is my current query and db structure (SQL Fiddle):

SELECT recipe.*, 
GROUP_CONCAT(recipe_detail.ingredient_id) AS iid,  
GROUP_CONCAT(ingredient.name) AS iname, 
GROUP_CONCAT(ingredient_mfr.abbr) AS mabbr, 
GROUP_CONCAT(recipe_tag.name) AS tag
FROM  recipe
LEFT JOIN recipe_detail
    ON recipe.id = recipe_detail.recipe_id
LEFT JOIN ingredient
    ON recipe_detail.ingredient_id = ingredient.id
LEFT JOIN ingredient_mfr
    ON ingredient.mfr_id = ingredient_mfr.id
LEFT JOIN recipe_tagmap
    ON recipe.id = recipe_tagmap.recipe_id
LEFT JOIN recipe_tag
    ON recipe_tagmap.tag_id = recipe_tag.id
WHERE recipe.user_id = 1
GROUP BY recipe.id

recipe
+------------+------------+-----------+
|    id      |    name    |  user_id  |
+============+============+===========+
|     1      |  Test123   |     1     |
+------------+------------+-----------+
|     2      |  Test456   |     1     |
+------------+------------+-----------+
|     3      |  Test789   |     1     |
+------------+------------+-----------+

recipe_detail
+------------+---------------+
| recipe_id  | ingredient_id |
+============+===============+
|     1      |      193      |
+------------+---------------+
|     1      |      194      |
+------------+---------------+
|     2      |       16      |
+------------+---------------+
|     3      |      277      |
+------------+---------------+

ingredient
+------------+---------------+---------+
|     id     |      name     |  mfr_id |
+============+===============+=========+
|     16     |       Gin     |    4    |
+------------+---------------+---------+
|     193    |       Fig     |    3    |
+------------+---------------+---------+
|     194    |       Tea     |    3    |
+------------+---------------+---------+
|     277    |       Nut     |    2    |
+------------+---------------+---------+

ingredient_mfr
+------------+------------+
|    id      |    abbr    |
+============+============+
|     2      |    TFA     |
+------------+------------+
|     3      |    FA      |
+------------+------------+
|     4      |    LOR     |
+------------+------------+

recipe_tag
+------------+------------+
|    id      |    name    |
+============+============+
|     1      |    one     |
+------------+------------+
|     2      |    two     |
+------------+------------+
|     3      |    three   |
+------------+------------+
|     4      |    four    |
+------------+------------+
|     5      |    five    |
+------------+------------+
|     6      |    six     |
+------------+------------+
|     7      |    seven   |
+------------+------------+
|     8      |    eight   |
+------------+------------+
|     9      |    nine    |
+------------+------------+

recipe_tagmap
+------------+---------------+---------+
|     id     |   recipe_id   |  tag_id |
+============+===============+=========+
|     1      |       1       |    1    |
+------------+---------------+---------+
|     2      |       1       |    2    |
+------------+---------------+---------+
|     3      |       1       |    3    |
+------------+---------------+---------+
|     4      |       2       |    4    |
+------------+---------------+---------+
|     5      |       2       |    5    |
+------------+---------------+---------+
|     6      |       2       |    6    |
+------------+---------------+---------+
|     7      |       3       |    7    |
+------------+---------------+---------+
|     8      |       3       |    8    |
+------------+---------------+---------+
|     9      |       3       |    9    |
+------------+---------------+---------+

With my current query, my results look like this:

+------+---------+--------------+----------- ----+---------------+------------------+
|  id  |  name   |      iid     |     iname      |    mabbr      |       tag        |
+======+=========+==============+================+===============+==================+
|   1  | Test123 | 193,193,193, | Fig, Fig, Fig, | FA, FA, FA,   | one, two, three, |
|      |         | 194,194,194  | Tea, Tea, Tea  | FA, FA, FA    | one, two, three  |
+------+---------+--------------+----------------+---------------+------------------+
|   2  | Test456 | 16,16,16     | Gin, Gin, Gin  | LOR, LOR, LOR | four, five six   |
+------+---------+--------------+----------------+---------------+------------------+
|   3  | Test789 | 277,277,277  | Nut, Nut, Nut  | TFA, TFA, TFA | seven,eight,nine |
+------+---------+--------------+----------------+---------------+------------------+

What I would like my results to look like:

+------+---------+--------------+----------- ----+---------------+------------------+
|  id  |  name   |      iid     |     iname      |    mabbr      |       tag        |
+======+=========+==============+================+===============+==================+
|   1  | Test123 |   193, 194   |    Fig, Tea    |    FA, FA     | one, two, three, |
+------+---------+--------------+----------------+---------------+------------------+
|   2  | Test456 |      16      |      Gin       |     LOR       | four, five six   |
+------+---------+--------------+----------------+---------------+------------------+
|   3  | Test789 |     277      |      Nut       |     TFA       | seven,eight,nine |
+------+---------+--------------+----------------+---------------+------------------+

As you can see, the presence of multiple tags causes the ingredient data to duplicate. The presence of multiple ingredients causes the tags to duplicate. I have tried to use DISTINCT, but sometimes I will have multiple ingredients and each one of those will return it's own "mabbr", which may be the same it's other ingredient (see first row of expected results). Using DISTINCT, it will only return one instance of that "mabbr".

Is there a change I can make to my query to achieve what I'd like to do?

SQL Fiddle

Best Answer

You have identified the source of the problem: that recipe is joined to two tables, recipe_detail and recipe_tagmap (and these to several other tables related to respectively "ingredients" and "tags"), and recipe is having one-to-many relationships with both of them.

One solution is to individually GROUP BY and aggregate first (one aggregation for the list of the tables related to ingredients and another for the group of tables related to tags, and then join back (again) to the main table (recipe):

SELECT recipe.*, 
       iid,  
       iname, 
       mabbr, 
       tag
FROM  recipe

  LEFT JOIN 
    ( SELECT recipe_detail.recipe_id,
             GROUP_CONCAT(recipe_detail.ingredient_id) AS iid,  
             GROUP_CONCAT(ingredient.name) AS iname, 
             GROUP_CONCAT(ingredient_mfr.abbr) AS mabbr
      FROM recipe
        JOIN recipe_detail
          ON recipe.id = recipe_detail.recipe_id
        LEFT JOIN ingredient
          ON recipe_detail.ingredient_id = ingredient.id
        LEFT JOIN ingredient_mfr
          ON ingredient.mfr_id = ingredient_mfr.id
      WHERE recipe.user_id = 1
      GROUP BY recipe_detail.recipe_id
    ) AS details
        ON recipe.id = details.recipe_id

  LEFT JOIN
    ( SELECT recipe_tagmap.recipe_id,
             GROUP_CONCAT(recipe_tag.name) AS tag 
      FROM recipe
        JOIN recipe_tagmap
          ON recipe.id = recipe_tagmap.recipe_id
        LEFT JOIN recipe_tag
         ON recipe_tagmap.tag_id = recipe_tag.id
      WHERE recipe.user_id = 1
      GROUP BY recipe_tagmap.recipe_id
    ) AS tags
      ON recipe.id = tags.recipe_id

WHERE recipe.user_id = 1 ;

Tested at: SQL-Fiddle

(Using the recipe table inside the 2 aggregations is not strictly needed but since you only want the recipes of one user, it will help for efficiency, restricting the number of rows retrieved from several tables and aggregated.)