Mysql – Alternative to removing ONLY_FULL_GROUP_BY

group byMySQLmysql-5.5mysql-5.7

After upgrading from MySQL 5.5 to MySQL 5.7 I'm getting an error with some of my queries:

ERROR 1055 (42000):

Expression #1 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'grocery.Product_Category.category_id' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by

I did my research and found the cause of the problem and how to solve it, basically I just need to remove ONLY_FULL_GROUP_BY from @@sql_mode and everything will work again.

However I was wondering if that is the right course of action. Is there an alternative for this, maybe a better way to build the query?

This is my case (http://sqlfiddle.com/#!9/6f1bd):

I have two tables (I simplified their structure here but is basically the same): Product and Category and a many to many relationship table to allow products to belong to more than one category:

SELECT * FROM Product;

+------------+---------+
| product_id | name    |
+------------+---------+
|          1 | Tomato  |
|          2 | Orange  |
|          3 | Banana  |
|          4 | Lettuce |
|          5 | Carrot  |
+------------+---------+
5 rows in set (0,00 sec)

SELECT * FROM Category;

+-------------+------------+
| category_id | name       |
+-------------+------------+
|           1 | Fruits     |
|           2 | Vegetables |
+-------------+------------+
2 rows in set (0,00 sec)

I want to get the products from both categories so the simplest query wold be:

SELECT * FROM Product JOIN Product_Category USING(product_id)
  JOIN Category USING(category_id);

+-------------+------------+---------+------------+
| category_id | product_id | name    | name       |
+-------------+------------+---------+------------+
|           1 |          1 | Tomato  | Fruits     |
|           1 |          2 | Orange  | Fruits     |
|           1 |          3 | Banana  | Fruits     |
|           2 |          1 | Tomato  | Vegetables |
|           2 |          4 | Lettuce | Vegetables |
|           2 |          5 | Carrot  | Vegetables |
+-------------+------------+---------+------------+
6 rows in set (0,00 sec

But if a product exists in both categories I just want it once, doing a DISTINCT select won't help since the category_id differs:

SELECT DISTINCT * FROM Product JOIN Product_Category USING(product_id) JOIN Category USING(category_id);

+-------------+------------+---------+------------+
| category_id | product_id | name    | name       |
+-------------+------------+---------+------------+
|           1 |          1 | Tomato  | Fruits     |
|           1 |          2 | Orange  | Fruits     |
|           1 |          3 | Banana  | Fruits     |
|           2 |          1 | Tomato  | Vegetables |
|           2 |          4 | Lettuce | Vegetables |
|           2 |          5 | Carrot  | Vegetables |
+-------------+------------+---------+------------+
6 rows in set (0,00 sec)

So with MySQL 5.5 I used a GROUP BY clause on the product_id field:

SELECT * FROM Product JOIN Product_Category USING(product_id)
  JOIN Category USING(category_id) GROUP BY product_id;

+-------------+------------+---------+------------+
| category_id | product_id | name    | name       |
+-------------+------------+---------+------------+
|           1 |          1 | Tomato  | Fruits     |
|           1 |          2 | Orange  | Fruits     |
|           1 |          3 | Banana  | Fruits     |
|           2 |          4 | Lettuce | Vegetables |
|           2 |          5 | Carrot  | Vegetables |
+-------------+------------+---------+------------+
5 rows in set (0,00 sec)

Which effectively removed the duplicate, I know the result is not deterministic but I don't care if Tomato appears listed in Fruits or Vegetables category, all I care about is getting it only once in the result set.

But this query with MySQL 5.7 causes the error mentioned above.

So, my question is: Is there another (perhaps better) way to get the same result without having to remove ONLY_FULL_GROUP_BY from @@sql_mode?

Best Answer

I would suggest rewriting the query in a way that minimises the number of columns necessary to put into GROUP BY. In your case you can do that by applying the grouping to the Product_Category table only.

According to your example, that table has the following entries:

+------------+-------------+
| product_id | category_id |
+------------+-------------+
|          1 |           1 |
|          2 |           1 |
|          3 |           1 |
|          1 |           2 |
|          4 |           2 |
|          5 |           2 |
+------------+-------------+

Since you want product names to be unique in the output, group this table by product_id, and for category_id select e.g. the minimum value per product:

SELECT
  product_id,
  MIN(category_id) AS category_id
FROM
  Product_Category
GROUP BY
  product_id

That will give you an output like this:

+------------+-------------+
| product_id | category_id |
+------------+-------------+
|          1 |           1 |
|          2 |           1 |
|          3 |           1 |
|          4 |           2 |
|          5 |           2 |
+------------+-------------+

You can see that each product is listed only once. Joining that table to the other two will not produce duplicates. Therefore, just substitute the above query, as a derived table, for the Product_Category in your query (also removing your GROUP BY from it, of course):

SELECT
  *
FROM
  Product
  JOIN (
    SELECT
      product_id,
      MIN(category_id) AS category_id
    FROM
      Product_Category
    GROUP BY
      product_id
  ) AS pc USING(product_id)
  JOIN Category USING(category_id)
;