I think the general advice of this community is to avoid temp tables in favor of CTEs. However, I sometimes encounter situations in which CTE constructions are very slow, while their temp table equivalents are very fast.
For example, this spins for hours and never seems to yield results. The query plan is full of nested loops.
CREATE TABLE #TRIANGLES
(
NODE_A VARCHAR(22),
NODE_B VARCHAR(22),
NODE_C VARCHAR(22)
)
;
INSERT INTO #TRIANGLES VALUES
/* 150,000 ROWS */
;
CREATE NONCLUSTERED INDEX IDX_A ON #TRIANGLES (NODE_A);
CREATE NONCLUSTERED INDEX IDX_B ON #TRIANGLES (NODE_B);
CREATE NONCLUSTERED INDEX IDX_C ON #TRIANGLES (NODE_C);
WITH
TRIANGLES_FILTERED AS
(
-- **** FILTERING OF THE TRIANGLE TABLE OCCURS IN A CTE ****
SELECT *
FROM #TRIANGLES AS T
WHERE LEN(T.NODE_A) = 2 AND
LEN(T.NODE_B) = 2 AND
LEN(T.NODE_C) = 2
),
CONNECTABLE_NODES AS
(
SELECT DISTINCT T1.NODE_C AS [NODE]
FROM TRIANGLES_FILTERED AS T1
INNER JOIN
TRIANGLES_FILTERED AS T2
ON T1.NODE_B = T2.NODE_A AND
T1.NODE_C = T2.NODE_B
INNER JOIN
TRIANGLES_FILTERED AS T3
ON T2.NODE_B = T3.NODE_A AND
T2.NODE_C = T3.NODE_B
WHERE T1.NODE_A <> T2.NODE_C AND
T1.NODE_A <> T3.NODE_C AND
T2.NODE_A <> T3.NODE_C
)
SELECT *
FROM #TRIANGLES AS T1
WHERE T1.NODE_A IN (SELECT * FROM CONNECTABLE_NODES) AND
T1.NODE_B IN (SELECT * FROM CONNECTABLE_NODES) AND
T1.NODE_C IN (SELECT * FROM CONNECTABLE_NODES)
;
Query plan:
https://www.brentozar.com/pastetheplan/?id=rk_5TaiiP
Whereas, the query plan for this uses hash matches and it runs in a flash:
CREATE TABLE #TRIANGLES
(
NODE_A VARCHAR(22),
NODE_B VARCHAR(22),
NODE_C VARCHAR(22)
)
;
INSERT INTO #TRIANGLES VALUES
/* 150,000 ROWS */
;
CREATE NONCLUSTERED INDEX IDX_A ON #TRIANGLES (NODE_A);
CREATE NONCLUSTERED INDEX IDX_B ON #TRIANGLES (NODE_B);
CREATE NONCLUSTERED INDEX IDX_C ON #TRIANGLES (NODE_C);
-- **** FILTERING OF THE TRIANGLE TABLE SAVED INTO A TEMP TABLE ****
SELECT *
INTO #TRIANGLES_FILTERED
FROM #TRIANGLES AS T
WHERE LEN(T.NODE_A) = 2 AND
LEN(T.NODE_B) = 2 AND
LEN(T.NODE_C) = 2
;
CREATE NONCLUSTERED INDEX IDX_A ON #TRIANGLES_FILTERED (NODE_A);
CREATE NONCLUSTERED INDEX IDX_B ON #TRIANGLES_FILTERED (NODE_B);
CREATE NONCLUSTERED INDEX IDX_C ON #TRIANGLES_FILTERED (NODE_C);
WITH
CONNECTABLE_NODES AS
(
SELECT DISTINCT T1.NODE_C AS [NODE]
FROM #TRIANGLES_FILTERED AS T1
INNER JOIN
#TRIANGLES_FILTERED AS T2
ON T1.NODE_B = T2.NODE_A AND
T1.NODE_C = T2.NODE_B
INNER JOIN
#TRIANGLES_FILTERED AS T3
ON T2.NODE_B = T3.NODE_A AND
T2.NODE_C = T3.NODE_B
WHERE T1.NODE_A <> T2.NODE_C AND
T1.NODE_A <> T3.NODE_C AND
T2.NODE_A <> T3.NODE_C
)
SELECT *
FROM #TRIANGLES AS T1
WHERE T1.NODE_A IN (SELECT * FROM CONNECTABLE_NODES) AND
T1.NODE_B IN (SELECT * FROM CONNECTABLE_NODES) AND
T1.NODE_C IN (SELECT * FROM CONNECTABLE_NODES)
;
Query plan:
https://www.brentozar.com/pastetheplan/?id=B1cZC6isD
How would I rewrite the first one to be as fast as a second one?
BTW, if you're wondering what all the geometry/topology is about, I needed to know how all the triangles connect with each other in the creation of this puzzle:
https://puzzling.stackexchange.com/questions/105275/dragon-summoning-spell
Best Answer
Sometimes CTE has got the wrong estimation. The temp table is good at it.
So, the CTE uses those indexes because they think fewer rows are there. The reason for the slowness of the first one is RID Lookup. If you drop your indexes or add your output column as include on your index. It will faster.
There is an awesome blog post here.
I think there is no win between them. You should use them to depends on the situation. And try them both in the same situation. In this way you can see the costs.