I have 2 different queries for identifying dups (below). The only difference between the 2 queries is that one uses a CTE and the other uses a #Temp table.
Does anyone know why the CTE is so much faster than the #Temp table (0:20 seconds VS. 1:22)?
I would rather use the CTE but I need to run 2 STATEMENTS using the CTE (DELETE from CTE and then INSERT INTO table FROM CTE) but SQL Server only allows you to write one STATEMENT on the CTE.
Query 1:
WITH DUPS AS(
Id,
Column1,
Column2,
Column3,
RN = ROW_NUMBER()OVER(PARTITION BY ID ORDER BY id)
FROM mytable
)
Select top 1 * FROM DUPS WHERE RN > 1
Query 2:
SELECT
Id,
Column1,
Column2,
Column3,
RN = ROW_NUMBER()OVER(PARTITION BY ID ORDER BY id)
INTO #DUPS
FROM mytable
Select top 1 * FROM #DUPS WHERE RN > 1
Best Answer
I believe if you look at the execution plans you'll see that the CTE gives you a better plan because it can take better advantage of the
TOP
. The Temp table will have to create the table in TempDB, persist all of the data to disk, and then select the top record. If you remove theTOP
from both queries I bet you'll get much closer performance. That being said I would generally expect the CTE to out perform the temp table in a single use scenario like you have here since the CTE is doing less work than creating/inserting into a temp table.This is probably just sample code but it is worth mentioning your results are non-determinate because you're using
TOP
without anORDER BY
clause.For further reading you should check out this question. This answer to that question seems to apply to your situation in particular.
For the implied second half of your question:
Since you say you need to use the data from the CTE at least twice then you have four options:
I tend to prefer the option 2 (table variable) or option 4 (tailored CTE's) approach. I don't like the duplication and extra maintenance of copy/pasted CTE's. I also like the explicitly reduced scope of the table variable over a temp table. For an authoritative treatment on the differences between table variables and temp tables check out this answer. As far as choosing between options 2 and 4; that will depend on whatever your profiling dictates and/or what constraints you have to work within.