SQL Server Cardinality Estimates – Why Subquery Reduces Row Estimate to 1

cardinality-estimatessql serversql-server-2016

Consider the following contrived but simple query:

SELECT 
  ID
, CASE
    WHEN ID <> 0 
    THEN (SELECT TOP 1 ID FROM X_OTHER_TABLE) 
    ELSE (SELECT TOP 1 ID FROM X_OTHER_TABLE_2) 
  END AS ID2
FROM X_HEAP;

I would expect the final row estimate for this query to be equal to the number of rows in the X_HEAP table. Whatever I'm doing in the subquery shouldn't matter for the row estimate because it cannot filter out any rows. However, on SQL Server 2016 I see the row estimate reduced to 1 because of the subquery:

Why does this happen? What can I do about it?

It's very easy to reproduce this issue with the right syntax. Here is one set of table definitions that will do it:

CREATE TABLE dbo.X_HEAP (ID INT NOT NULL)
CREATE TABLE dbo.X_OTHER_TABLE (ID INT NOT NULL);
CREATE TABLE dbo.X_OTHER_TABLE_2 (ID INT NOT NULL);

INSERT INTO dbo.X_HEAP WITH (TABLOCK)
SELECT TOP (1000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM master..spt_values;

CREATE STATISTICS X_HEAP__ID ON X_HEAP (ID) WITH FULLSCAN;

db fiddle link.

Best Answer

This cardinality estimation (CE) issue surfaces when:

The join is an outer join with a pass-through predicate
The selectivity of the pass-through predicate is estimated to be exactly 1.

Note: The particular calculator used to determine the selectivity is not important.

Details

The CE computes the selectivity of the outer join as the sum of:

The inner join selectivity with the same predicate
The anti join selectivity with the same predicate

The only difference between an outer and inner join is that an outer join also returns rows that do not match on the join predicate. The anti join provides exactly this difference. Cardinality estimation for inner and anti join is easier than for outer join directly.

The join selectivity estimation process is very straightforward:

First, the selectivity S_PT of the pass-through predicate is assessed.
- This is done using whichever calculator is appropriate to the circumstances.
- The predicate is the whole thing, including any negating IsFalseOrNull component.
Inner join selectivity := 1 - S_PT
Anti join selectivity := S_PT

The anti join represents rows that will 'pass through' the join. The inner join represents rows that will not 'pass through'. Note that 'pass through' means rows that flow through the join without running the inner side at all. To emphasise: all rows will be returned by the join, the distinction is between rows that run the inner side of the join before emerging, and those that do not.

Clearly, adding 1 - S_PT to S_PT should always give a total selectivity of 1, meaning all rows are returned by the join, as expected.

Indeed, the above calculation works exactly as described for all values of S_PT except 1.

When S_PT = 1, both inner join and anti join selectivities are estimated to be zero, resulting in a cardinality estimate (for the join as a whole) of one row. As far as I can tell, this is unintentional, and should be reported as a bug.

A related issue

This bug is more likely to manifest than one might think, due to a separate CE limitation. This arises when the CASE expression uses an EXISTS clause (as is common). For example the following modified query from the question does not encounter the unexpected cardinality estimate:

-- This is fine
SELECT 
    CASE
        WHEN XH.ID = 1
        THEN (SELECT TOP (1) XOT.ID FROM dbo.X_OTHER_TABLE AS XOT) 
    END
FROM dbo.X_HEAP AS XH;

Introducing a trivial EXISTS does cause the issue to surface:

-- This is not fine
SELECT 
    CASE
        WHEN EXISTS (SELECT 1 WHERE XH.ID = 1)
        THEN (SELECT TOP (1) XOT.ID FROM dbo.X_OTHER_TABLE AS XOT) 
    END
FROM dbo.X_HEAP AS XH;

Using EXISTS introduces a semi join (highlighted) to the execution plan:

The estimate for the semi join is fine. The problem is that the CE treats the associated probe column as a simple projection, with a fixed selectivity of 1:

Semijoin with probe column treated as a Project.

Selectivity of probe column = 1

This automatically meets one of the conditions required for this CE issue to manifest, regardless of the contents of the EXISTS clause.

For important background information, see Subqueries in CASE Expressions by Craig Freedman.

Best Answer

Details

A related issue

Related Solutions

Sql-server – Why is an additional filtered statistic being ignored (EAV schema)

SQL Server 2012 – Why Does Recursive CTE Estimate Just 1 Row

Related Question