SQL Server – Using EXCEPT in a Recursive Common Table Expression

exceptexecution-planrecursivesql serversql-server-2008

Why does the following query return infinite rows? I would have expected the EXCEPT clause to terminate the recursion..

with cte as (
    select *
    from (
        values(1),(2),(3),(4),(5)
    ) v (a)
)
,r as (
    select a
    from cte
    where a in (1,2,3)
    union all
    select a
    from (
        select a
        from cte
        except
        select a
        from r
    ) x
)
select a
from r

I came across this while trying to answer a question on Stack Overflow.

Best Answer

See Martin Smith's answer for information about the current status of EXCEPT in a recursive CTE.

To explain what you were seeing, and why:

I'm using a table variable here, to make the distinction between the anchor values and recursive item clearer (it does not change the semantic).

DECLARE @V TABLE (a INTEGER NOT NULL)
INSERT  @V (a) VALUES (1),(2)
;
WITH rCTE AS 
(
    -- Anchor
    SELECT
        v.a
    FROM @V AS v

    UNION ALL

    -- Recursive
    SELECT
        x.a
    FROM
    (
        SELECT
            v2.a
        FROM @V AS v2

        EXCEPT

        SELECT
            r.a
        FROM rCTE AS r
    ) AS x
)
SELECT
    r2.a
FROM rCTE AS r2
OPTION (MAXRECURSION 0)

The query plan is:

Recursive CTE Plan

Execution starts at the root of the plan (the SELECT) and control passes down the tree to the Index Spool, Concatenation, and then to the top-level Table Scan.

The first row from the scan passes up the tree and is (a) stored in the Stack Spool, and (b) returned to the client. Which row is first is not defined, but let us assume it is the row with the value {1}, for the sake of argument. The first row to appear is therefore {1}.

Control again passes down to the Table Scan (the Concatenation operator consumes all rows from its outermost input before opening the next one). The scan emits the second row (value {2}), and this again passes up the tree to be stored on the stack and output to the client. The client has now received the sequence {1}, {2}.

Adopting a convention where the top of the LIFO stack is on the left, the stack now contains {2, 1}. As control again passes to the Table Scan, it reports no more rows, and control passes back to the Concatenation operator, which opens it's second input (it needs a row to pass up to the stack spool), and control passes to the Inner Join for the first time.

The Inner join calls the Table Spool on its outer input, which reads the top row from the stack {2} and deletes it from the worktable. The stack now contains {1}.

Having received a row on its outer input, the Inner Join passes control down its inner input to the Left Anti-Semi Join (LASJ). This requests a row from its outer input, passing control to the Sort. Sort is a blocking iterator, so it reads all rows from the table variable and sorts them ascending (as it happens).

The first row emitted by the Sort is therefore the value {1}. The inner side of the LASJ returns the current value of the recursive member (the value just popped off the stack), which is {2}. The values at the LASJ are {1} and {2} so {1} is emitted, since the values do not match.

This row {1} flows up the query plan tree to the Index (Stack) Spool where it is added to the stack, which now contains {1, 1}, and emitted to the client. The client has now received the sequence {1}, {2}, {1}.

Control now passes back to the Concatenation, back down the inner side (it returned a row last time, might do again), down through the Inner Join, to the LASJ. It reads its inner input again, obtaining the value {2} from the Sort.

The recursive member is still {2}, so this time the LASJ finds {2} and {2}, resulting in no row being emitted. Finding no more rows on its inner input (the Sort is now out of rows), control passes back up to the Inner Join.

The Inner Join reads its outer input, which results in the value {1} being popped off the stack {1, 1}, leaving the stack with just {1}. The process now repeats, with the value {2} from a new invocation of the Table Scan and Sort passing the LASJ test and being added to the stack, and passes to the client, which has now received {1}, {2}, {1}, {2}...and round we go.

My favourite explanation of the Stack spool used in recursive CTE plans is Craig Freedman's.