Sql-server – Why does Recursive CTE estimate just 1 row

cardinality-estimatescterecursivesql serversql-server-2012

Given two cascading, self-contained (no real tables) recursive CTE's:

create view NumberSequence_0_100_View
as
with NumberSequence as
(
    select 0 as Number
    union all
    select Number + 1
      from NumberSequence    
     where Number < 100
)
select Number
  from NumberSequence;
go

create view NumberSequence_0_10000_View
as
select top 10001
       v100.Number * 100 + v1.Number as Number
  from Common.NumberSequence_0_100_View v100
 cross join Common.NumberSequence_0_100_View v1
 where v1.Number < 100
   and v100.Number * 100 + v1.Number <= 10000
    -- please resist complaining about "order by in view" for this question
 order by v100.Number * 100 + v1.Number 
go

Then generate estimate/actual plans for:

select * from NumberSequence_0_10000_View

Estimate
enter image description here
Actual
CascadingCtePlan

Runtime 23ms but estimating just one row for final output (2 rows for just the first view).

The problem is that when this is used as a subquery to join with real data (by "DaysAgo" for instance), the plan is usually a very slow nested loop and I often need to add a join hint/reverse order etc.

Is there anyway to improve the estimate while keeping the CTE approach? Has there ever been a request for a "with (AssumeMinRows=N)" hint? That seems like a great general purpose helper for many cases (not just CTEs).

Best Answer

Why does Recursive CTE estimate just 1 row?

Cardinality estimation for recursive common table expressions is extremely limited.

Under the original cardinality estimation model, the estimate is a simple sum of the cardinality estimates for the anchor and recursive parts. This is equivalent to assuming the recursive part is executed exactly once.

In SQL Server 2014, with the new cardinality estimation model enabled, the logic is slightly changed to assume three executions of the recursive part, with the same number of rows returned by each iteration.

Both of these are uneducated guesses, so it is no surprise that the use of recursive CTes usually results in poor quality estimations. More generally, estimating the result of a recursive process is nigh-on impossible, so the optimizer doesn't even try. This is not changed by using a particularly simple recursive structure, clearly intended to produce sequential numbers - the optimizer has no logic to detect this pattern.

In your particular case, the final estimation is one because the optimizer makes further guesses about the selectivity of predicates like [Recr1007]<(100) (in the Filter at Node ID 3) and ([Recr1003]*(100)+[Recr1007])<=(10000) (the residual predicate on the Nested Loops Join at Node ID 2). Again, these are guesses, and the results are unfortunate, though not surprising.

Is there anyway to improve the estimate while keeping the CTE approach?

Not that I am aware of.

Has there ever been a request for a "with (AssumeMinRows=N)" hint?

Not directly in those terms. There have been plenty of requests for materializing a CTE, which would help if such materialization came with automatic statistics generation. I won't list the others because you seem to have commented on most of those suggestions already :)

There have also been suggestions for selectivity hints, but nothing like this has made its way into the product yet.

As noted in comments on the question, your best bet right now is to use a real numbers table instead of generating one on-the-fly using a recursive CTE. A second option is to use manual materialization - a temporary table - as I am sure you are aware.

Best Answer

Related Solutions

Sql-server – Recursive CTE performance

Sql-server – How (and why) does TOP impact an execution plan

Related Question