Sql-server – Why doesn’t this recursive CTE with a parameter use an index when it does with a literal

cteexecution-planoptimizationrecursivesql server

I am using a recursive CTE on a tree structure to list all the descendants of a particular node in the tree. If I write a literal node value in my WHERE clause, SQL Server seems to actually apply the CTE just to that value, giving a query plan with low actual rows counts, et cetera:

However, if I pass the value as a parameter, it seems to realize (spool) the CTE and then filter it after the fact:

I could be reading the plans wrong. I haven’t noticed a performance issue, but I am worried that realization of the CTE could cause issues with larger data sets, especially in a busier system. Also, I normally compound this traversal on itself: I traverse up to ancestors and back down to descendants (to ensure that I gather all related nodes). Due to how my data is, each set of “related” nodes is rather small, so realization of the CTE doesn’t make sense. And when SQL Server seems to realize the CTE, it is giving me some quite large numbers in its “actual” counts.

Is there a way to get the parameterized version of the query to act like the literal version? I want to put the CTE in a reusable view.

Query with literal:

CREATE PROCEDURE #c AS BEGIN;
    WITH descendants AS (SELECT
         t.ParentId Id
        ,t.Id DescendantId
    FROM #tree t
    WHERE t.ParentId IS NOT NULL
    UNION ALL SELECT
         d.Id
        ,t.Id DescendantId
    FROM descendants d
    JOIN #tree t ON d.DescendantId = t.ParentId)
    SELECT d.*
    FROM descendants d
    WHERE d.Id = 24
    ORDER BY d.Id, d.DescendantId;
END;
GO
EXEC #c;

Query with parameter:

CREATE PROCEDURE #c (@Id BIGINT) AS BEGIN;
    WITH descendants AS (SELECT
         t.ParentId Id
        ,t.Id DescendantId
    FROM #tree t
    WHERE t.ParentId IS NOT NULL
    UNION ALL SELECT
         d.Id
        ,t.Id DescendantId
    FROM descendants d
    JOIN #tree t ON d.DescendantId = t.ParentId)
    SELECT d.*
    FROM descendants d
    WHERE d.Id = @Id
    ORDER BY d.Id, d.DescendantId;
END;
GO
EXEC #c 24;

Setup code:

DECLARE @count BIGINT = 100000;
CREATE TABLE #tree (
     Id BIGINT NOT NULL PRIMARY KEY
    ,ParentId BIGINT
);
CREATE INDEX tree_23lk4j23lk4j ON #tree (ParentId);
WITH number AS (SELECT
         CAST(1 AS BIGINT) Value
    UNION ALL SELECT
         n.Value * 2 + 1
    FROM number n
    WHERE n.Value * 2 + 1 <= @count
    UNION ALL SELECT
         n.Value * 2
    FROM number n
    WHERE n.Value * 2 <= @count)
INSERT #tree (Id, ParentId)
SELECT n.Value, CASE WHEN n.Value % 3 = 0 THEN n.Value / 4 END
FROM number n;

Best Answer

Randi Vertongen's answer correctly addresses how you can get the plan you want with the parameterized version of the query. This answer supplements that by addressing the title of the question in case you are interested in the details.

SQL Server rewrites tail-recursive common table expressions (CTEs) as iteration. Everything from the Lazy Index Spool down is the runtime implementation of the iterative translation. I wrote a detailed account of how this section of an execution plan works in answer to Using EXCEPT in a recursive common table expression.

You want to specify a predicate (filter) outside the CTE and have the query optimizer push this filter down inside the recursion (rewritten as iteration) and have it applied to the anchor member. This will mean the recursion starts with only those records that match ParentId = @Id.

This is quite a reasonable expectation, whether a literal value, variable, or parameter is used; however, the optimizer can only do things for which rules have been written. Rules specify how a logical query tree is modified to achieve a particular transformation. They include logic to make sure the end result is safe - i.e. it returns exactly the same data as the original query specification in all possible cases.

The rule responsible for pushing predicates on a recursive CTE is called SelOnIterator - a relational selection (= predicate) on an iterator implementing recursion. More precisely, this rule can copy a selection down to the anchor part of recursive iteration:

Sel(Iter(A,R)) -> Sel(Iter(Sel(A),R))

This rule can be disabled with the undocumented hint OPTION(QUERYRULEOFF SelOnIterator). When this is used, the optimizer can no longer push predicates with a literal value down to the anchor of a recursive CTE. You don't want that, but it illustrates the point.

Originally, this rule was limited to working on predicates with literal values only. It could also be made to work with variables or parameters by specifying OPTION (RECOMPILE), since that hint enables the Parameter Embedding Optimization, whereby the runtime literal value of the variable (or parameter) is used when compiling the plan. The plan is not cached, so the downside of this is a fresh compilation on each execution.

At some point, the SelOnIterator rule was improved to also work with variables and parameters. To avoid unexpected plan changes, this was protected under the 4199 trace flag, database compatibility level, and query optimizer hotfix compatibility level. This is quite a normal pattern for optimizer improvements, which are not always documented. Improvements are normally good for most people, but there is always a chance that any change will introduce a regression for someone.

I want to put the CTE in a reusable view

You could use an inline table-valued function instead of a view. Provide the value you want to push down as a parameter, and place the predicate in the recursive anchor member.

If you prefer, enabling trace flag 4199 globally is also an option. There are many optimizer changes covered by this flag, so you would need to carefully test your workload with it enabled, and be prepared to handle regressions.

Best Answer

Related Solutions

Sql-server – Oracle GoldenGate add trandata errors

Sql-server – Why does Recursive CTE estimate just 1 row

Related Question