Schema
I have the following set-up in MySQL database:
CREATE TABLE items (
id SERIAL,
name VARCHAR(100),
group_id INT,
price DECIMAL(10,2),
KEY items_group_id_idx (group_id),
PRIMARY KEY (id)
);
INSERT INTO items VALUES
(1, 'Item A', NULL, 10),
(2, 'Item B', NULL, 20),
(3, 'Item C', NULL, 30),
(4, 'Item D', 1, 40),
(5, 'Item E', 2, 50),
(6, 'Item F', 2, 60),
(7, 'Item G', 2, 70);
Problem
I need to select:
- All items with
group_id
that hasNULL
value, and - One item from each group identified by
group_id
having the lowest price.
Expected results
+----+--------+----------+-------+
| id | name | group_id | price |
+----+--------+----------+-------+
| 1 | Item A | NULL | 10.00 |
| 2 | Item B | NULL | 20.00 |
| 3 | Item C | NULL | 30.00 |
| 4 | Item D | 1 | 40.00 |
| 5 | Item E | 2 | 50.00 |
+----+--------+----------+-------+
Possible solution 1
Two queries with UNION ALL
:
SELECT id, name, group_id, price FROM items
WHERE group_id IS NULL
UNION ALL
SELECT id, name, MIN(price) FROM items
WHERE group_id IS NOT NULL
GROUP BY group_id;
/* EXPLAIN */
+----+--------------+------------+------+--------------------+--------------------+---------+-------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+------------+------+--------------------+--------------------+---------+-------+------+----------------------------------------------+
| 1 | PRIMARY | items | ref | items_group_id_idx | items_group_id_idx | 5 | const | 3 | Using where |
| 2 | UNION | items | ALL | items_group_id_idx | NULL | NULL | NULL | 7 | Using where; Using temporary; Using filesort |
| NULL | UNION RESULT | <union1,2> | ALL | NULL | NULL | NULL | NULL | NULL | |
+----+--------------+------------+------+--------------------+--------------------+---------+-------+------+----------------------------------------------+
However it is undesirable to have two queries since there will be more complex condition in WHERE
clause and I would need to sort the final results.
Possible solution 2
GROUP BY
on expression (reference)
SELECT id, name, group_id, MIN(price) FROM items
GROUP BY CASE WHEN group_id IS NOT NULL THEN group_id ELSE RAND() END;
/* EXPLAIN */
+----+-------------+-------+------+---------------+------+---------+------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+---------------------------------+
| 1 | SIMPLE | items | ALL | NULL | NULL | NULL | NULL | 7 | Using temporary; Using filesort |
+----+-------------+-------+------+---------------+------+---------+------+------+---------------------------------+
Solution 2 seems to be faster and simple to use but I'm wondering whether there is a better approach in terms of performance.
Update: According to MySQL documentation, this query is illegal in SQL92 and earlier and may work in MySQL only.
Best Answer
First, do not use either of your two queries. Both have a group by some column (
GROUP BY group_id
) and then select other columns, non-aggregated (SELECT id, name
). This may give you wrong and unexpected results, despite that it may work in your tests, with some small sized table.Second, the
UNION ALL
is not a problem. If the two subqueries perform efficiently, then the final union is ok, too. If you need a sort, the efficiency will depend on how that sort differs from the indexes used.Now, the problem of "groupwise-max" or "greatest-n-per-group" has many solutions (and even a tag, both at SO and here). There are two sub-problems, depending on whether ties can happen and what the wanted results are in those cases.
If you want all the tied rows, the solution with
GROUP BY
inside a derived table is usually good. In your case, that you want just one row returned per group, another approach is easier to write and usually performs very well when there is a small number of group overall:An index on
(group_id, price, id)
will be helpful