The semantics of the two queries are not the same - UNION
removes duplicates, whereas the FULL OUTER JOIN
will not:
DECLARE @T1 AS table (id bigint NULL, val integer NULL);
DECLARE @T2 AS table (id bigint NULL, val integer NULL);
INSERT @T1 (id, val) VALUES (1, 1);
INSERT @T1 (id, val) VALUES (1, 1);
INSERT @T2 (id, val) VALUES (1, 1);
INSERT @T2 (id, val) VALUES (1, 1);
SELECT COALESCE(t1.id, t2.id) AS id, COALESCE(t1.val, t2.val) AS val
FROM @t1 t1
FULL OUTER JOIN @t2 t2
ON t2.id = t1.id
AND t2.val = t1.val;
SELECT t1.id, t1.val
FROM @t1 t1
UNION
SELECT t2.id, t2.val
FROM @t2 t2;
Output:
╔════╦═════╗
║ id ║ val ║
╠════╬═════╣
║ 1 ║ 1 ║
║ 1 ║ 1 ║
║ 1 ║ 1 ║
║ 1 ║ 1 ║
╚════╩═════╝
╔════╦═════╗
║ id ║ val ║
╠════╬═════╣
║ 1 ║ 1 ║
╚════╩═════╝
That said, the optimizer does not know many FOJN
tricks, so it is always possible that there is a better way to express the query than the natural UNION
. Only commonly-useful and always-correct transformations are implemented.
Note that with a unique constraint only on the larger table, the optimizer chooses a hash union, without expensive duplicate-removal on the probe input, that makes it choose Concat Union All in the question example:
ALTER TABLE #t2
ADD CONSTRAINT UQ2
UNIQUE CLUSTERED (id);
SELECT COUNT(*), AVG(x.id), AVG(x.val)
FROM (
SELECT t1.id, t1.val
FROM #t1 t1
UNION
SELECT t2.id, t2.val
FROM #t2 t2
) AS x;
The FOJN
rewrite may well be a useful one in cases where you know there cannot be duplicates within each input set, but this condition is not enforced with a unique constraint or index (particularly on the large input).
If such a uniqueness guarantee does exist, and yet the optimizer does not select a Hash Union, you might try an OPTION (HASH UNION)
hint, to see how it compares.
Best Answer
When you hint a query like that, you force the join order -- you've actually answered your own question.
If you want to test join types and still leave the optimizer with room to explore things, use query options, like this: