SQL Server Execution Plans – Understanding Odd Stream Aggregate Behavior

execution-plansql server

Query:

declare @X xml = '
<item ID = "0"/>
<item ID = "1"/>
<item/>
<item/>';

select I.X.value('@ID', 'int')
from @X.nodes('/item') as I(X);

Result:

-----------
0
1
NULL
NULL

Execution plan:

enter image description here

The top branch shreds the XML to four rows and the bottom branch fetches the value for the attribute ID.

What strikes me as odd is the number of rows returned from the Stream Aggregate operator. The 2 rows that comes from the Filter is the ID attribute from the first and second item nodes in the XML. The Stream Aggregate returns four rows, one for each input row, effectively turning the Inner Join to an Outer Join.

Is this something that Stream Aggregate does in other circumstances as well or is it just something odd going on when doing XML queries?

I can not see any hints in the XML version of the query plan that this Stream Aggregate should behave any differently than any other Stream Aggregate I have noticed before.

Best Answer

The aggregate is a scalar aggregate (no group by clause). These are defined in SQL Server to always produce a row, even if the input is empty.

For a scalar aggregate, MAX of no rows is NULL, COUNT of no rows is zero, for example. The optimizer knows all about this, and can transform an outer join into an inner join in suitable circumstances.

-- NULL for a scalar aggregate
SELECT MAX(V.v) FROM (VALUES(1)) AS V (v) WHERE V.v = 2;

-- No row for a vector aggregate
SELECT MAX(V.v) FROM (VALUES(1)) AS V (v) WHERE V.v = 2 GROUP BY ();

For more about aggregates, see my article Fun With Scalar and Vector Aggregates.