Optimizing MySQL Queries – Replace Left Joins with Union

MySQL

I'm trying to optimize a query that returns the membership of objects in two groups. Group "a" is a one-to-one link table (ie. object_id is a unique foreign key in group_a) and group "b" is one-to-many (object_id can have multiple entries in group_b). Here is a simplified version of the query:

-- show objects and their membership w/ LEFT JOIN
SELECT DISTINCT o.name,
    a.object_id IS NOT NULL as in_group_a,
    b.object_id IS NOT NULL as in_group_b
FROM object o
LEFT JOIN group_a a ON (a.object_id = o.object_id)
LEFT JOIN group_b b ON (b.object_id = o.object_id)
WHERE a.object_id IS NOT NULL
OR b.object_id IS NOT NULL;

It returns something like this:

+--------+------------+------------+
| name   | in_group_a | in_group_b |
+--------+------------+------------+
| Fred   |          1 |          0 |
| Barney |          1 |          1 |
| Betty  |          0 |          1 |
+--------+------------+------------+

Note that Barney is in both groups but only one row is returned; this is critical. I tried to rewrite this query with a union:

-- show objects and their membership w/ UNION
SELECT o.name, tmp.in_group_a, tmp.in_group_b
FROM ((
    SELECT a.object_id, 1 as in_group_a, 0 as in_group_b
    FROM group_a a
) UNION (
    SELECT b.object_id, 0 as in_group_a, 1 as in_group_b
    FROM group_b b
)) tmp
INNER JOIN object o ON (tmp.object_id = o.object_id);

This query is 10 times faster but I get multiple rows for objects in both groups:

+--------+------------+------------+
| name   | in_group_a | in_group_b |
+--------+------------+------------+
| Fred   |          1 |          0 |
| Barney |          1 |          0 |
| Barney |          0 |          1 |
| Betty  |          0 |          1 |
+--------+------------+------------+

I'm guessing the original query is slow because it starts with the object table, which is large and most entries are not in either group.

Is it possible to make the union query return the results I want, and if not is there another way to optimize this query?

FYI, here are my simplified example tables and data:

CREATE TABLE object (
    object_id INTEGER UNSIGNED NOT NULL DEFAULT 0,
    name VARCHAR(64) NOT NULL,
    PRIMARY KEY (object_id)
);

CREATE TABLE group_a (
    object_id INTEGER UNSIGNED NOT NULL DEFAULT 0,
    UNIQUE KEY (object_id),
    FOREIGN KEY (object_id) REFERENCES object (object_id)
);

CREATE TABLE group_b (
    group_b_id INTEGER UNSIGNED NOT NULL DEFAULT 0,
    object_id INTEGER UNSIGNED NOT NULL DEFAULT 0,
    description VARCHAR(64) NOT NULL,
    UNIQUE KEY (group_b_id),
    KEY (object_id),
    FOREIGN KEY (object_id) REFERENCES object (object_id)
);

INSERT INTO object VALUES (1, "Fred"), (2, "Barney"), (3, "Wilma"), (4, "Betty");
INSERT INTO group_a VALUES (1), (2);
INSERT INTO group_b VALUES (1, 2, "a"), (2, 4, "a"), (3, 4, "b"), (4, 4, "c");

Thanks!

Best Answer

@Jorge Campos is correct

SELECT o.name, sum(tmp.in_group_a), sum(tmp.in_group_b)
FROM ((
   SELECT a.object_id, 1 as in_group_a, 0 as in_group_b
   FROM group_a a
   ) UNION (
   SELECT b.object_id, 0 as in_group_a, 1 as in_group_b
   FROM group_b b
   )) tmp
INNER JOIN object o ON (tmp.object_id = o.object_id)
GROUP BY o.name;