Coming to SQL from other programming languages, the structure of a recursive query looks rather odd. Walk through it step by step, and it seems to fall apart.
Consider the following simple example:
CREATE TABLE #NUMS
(N BIGINT);
INSERT INTO #NUMS
VALUES (3), (5), (7);
WITH R AS
(
SELECT N FROM #NUMS
UNION ALL
SELECT N*N AS N FROM R WHERE N*N < 10000000
)
SELECT N FROM R ORDER BY N;
Let's walk through it.
First, the anchor member executes and the result set is put into R. So R is initialized to {3, 5, 7}.
Then, execution drops below the UNION ALL and the recursive member is executed for the first time. It executes on R (that is, on the R that we currently have in hand: {3, 5, 7}). This results in {9, 25, 49}.
What does it do with this new result? Does it append {9, 25, 49} to the existing {3, 5, 7}, label the resulting union R, and then carry on with the recursion from there? Or does it redefine R to be only this new result {9, 25, 49} and do all the union-ing later?
Neither choice makes sense.
If R is now {3, 5, 7, 9, 25, 49} and we execute the next iteration of the recursion, then we will end up with {9, 25, 49, 81, 625, 2401} and we've lost {3, 5, 7}.
If R is now only {9, 25, 49}, then we have a mislabeling problem. R is understood to be the union of the anchor member result set and all the subsequent recursive member result sets. Whereas {9, 25, 49} is only a component of R. It is not the full R that we have accrued so far. Therefore, to write the recursive member as selecting from R makes no sense.
I certainly appreciate what @Max Vernon and @Michael S. have detailed below. Namely, that (1) all the components are created up to the recursion limit or null set, and then (2) all the components are unioned together. This is how I understand SQL recursion to actually work.
If we were redesigning SQL, maybe we would enforce a more clear and explicit syntax, something like this:
WITH R AS
(
SELECT N
INTO R[0]
FROM #NUMS
UNION ALL
SELECT N*N AS N
INTO R[K+1]
FROM R[K]
WHERE N*N < 10000000
)
SELECT N FROM R ORDER BY N;
Sort of like an inductive proof in mathematics.
The problem with SQL recursion as it currently stands is that it is written in a confusing way. The way it is written says that each component is formed by selecting from R, but it does not mean the full R that has been (or, appears to have been) constructed so far. It just means the previous component.
Best Answer
The BOL description of recursive CTEs describes the semantics of recursive execution as being as follows:
So each level only has as input the level above not the entire result set accumulated so far.
The above is how it works logically. Physically recursive CTEs are currently always implemented with nested loops and a stack spool in SQL Server. This is described here and here and means that in practice each recursive element is just working with the parent row from the previous level, not the whole level. But the various restrictions on allowable syntax in recursive CTEs mean this approach works.
If you remove the
ORDER BY
from your query the results are ordered as followsThis is because the execution plan operates very similarly to the following
C#
NB1: As above by the time the first child of anchor member
3
is being processed all information about its siblings,5
and7
, and their descendants, has already been discarded from the spool and is no longer accessible.NB2: The C# above has the same overall semantics as the execution plan but the flow in the execution plan is not identical, as there the operators work in a pipelined exection fashion. This is a simplified example to demonstrate the gist of the approach. See the earlier links for more details on the plan itself.
NB3: The stack spool itself is apparently implemented as a non unique clustered index with key column of recursion level and uniqueifiers added as needed (source)