Sql-server – Why is an empty rowset not equivalent to null when assigning a query result to a variable and how best to work around

sql serversql-server-2016subqueryt-sql

I have a cursor. Within my loop I select a single column and assign the result to a variable. When the select returns an empty rowset, nothing is assigned to the variable. In particular, the variable retains its current value which, in my case, is the value it held on the last iteration of the loop.

The following code illustrates this.

declare @testA int;
declare @testB int;

set @testA = 1;
set @testB = 2;

select @testA =         id from (values (3)) as testtable(id) where testtable.id = 4 ; -- empty rowset
select @testB = (select id from (values (5)) as testtable(id) where testtable.id = 6); -- empty rowset

select @testA as testA, @testB as testB;

@testA retained its value (1) while @testB was updated to null.

The only difference between the two assignment attempts was that in case B a subquery was used (via the addition of the parentheses and an additional select keyword).

This says to me a subquery (which always returns a single value) represents an empty rowset as null in that return value, but attempting to select a single column from an empty rowset does not return null. Rather, it kind of says "I have no value, so there is nothing to assign".

My understanding of the definition of null is that it represents an "undefined" value.

If I am saying "select 'id' where these conditions are met" and there are no 'id' values meeting such conditions, does this not imply the 'id' value I want is undefined?

Best Answer

This says to me a subquery (which always returns a single value) represents an empty rowset as null in that return value, ...

A subquery used in a context that expects a single, scalar value needs to return a single, scalar value. Such contexts are when used as an expression or with operators that expect a single, scalar value. When no scalar value exists, it returns NULL for precisely the reason you thought: it means "unknown". Think of a subquery in these contexts as being a replacement for a scalar UDF. It will always return a single something (T-SQL does not allow for a return type of void). So, what you have is effectively: select @testB = dbo.Function(6); assuming that "dbo.Function" is defined as:

CREATE FUNCTION dbo.Function (@Input INT)
RETURNS INT
AS
BEGIN
  DECLARE @Temp INT;
  SELECT @Temp = TestTable.[id]
  FROM   (VALUES (5)) TestTable([id])
  WHERE  TestTable.[id] = @Input;

  RETURN @Temp;
END;

More examples:

SELECT 1 AS [one], (SELECT 2 WHERE 1 = 0) AS [nothing];
-- 1   NULL

and:

DECLARE @MoreThanOneRow INT;

SELECT @MoreThanOneRow = (SELECT [object_id] FROM sys.objects);
/*
Msg 512, Level 16, State 1, Line XXXXX
Subquery returned more than 1 value. This is not permitted when the subquery
  follows =, !=, <, <= , >, >= or when the subquery is used as an expression.
*/

and:

DECLARE @MoreThanOneColumn INT;

SELECT @MoreThanOneColumn = (SELECT TOP (1) [object_id], [schema_id] FROM sys.objects);
/*
Msg 116, Level 16, State 1, Line XXXXX
Only one expression can be specified in the select list when the
  subquery is not introduced with EXISTS.
*/

... but attempting to select a single column from an empty rowset does not return null. Rather, it kind of says "I have no value, so there is nothing to assign".

I wouldn't think of an empty result set in terms of "I have no value, so there is nothing to assign". Instead, think of it in terms of: an assignment will be performed for every row returned, and no rows returned means that there is no action to take. It might be a subtle difference, but the focus really needs to be on "rows = actions" more so than "returned value is something that can be assigned".

This distinction is not so subtle when:

You are executing a T-SQL scalar UDF in a query and passing in a column. Assuming we are not dealing with Scalar UDF inlining (starting in SQL Server 2019), then scalar UDFs are executed per each row, hence the performance issues associated with them and the impetus for the new feature in SQL Server 2019 to inline UDFs (if they meet the requirements).
You are setting a value like you are showing for select @testA = id from .... The assignment will happen per each and every row returned. And the final value of the variable will be the last row returned. That can be predictable if you use an ORDER BY, or you can live life on the edge and go for unpredictable results by not using an ORDER BY. For example:
```
DECLARE @Temp INT;

SELECT @Temp = tab.col
FROM   (VALUES (1), (3), (4), (2)) tab(col);

SELECT @Temp AS [NotOrdered];
-- 2


SELECT @Temp = tab.col
FROM   (VALUES (1), (3), (4), (2)) tab(col)
ORDER BY tab.col;

SELECT @Temp AS [Ordered];
-- 4
```

Another way to look at this is that you are using a result set as a loop:

-- pseudo-code
while (ResultSet.GetNextRow())
{
    set @testA = ResultSet.GetColumnValue("id");
}

If the first call to ResultSet.GetNextRow() returns false (due to no rows in the result set), then the loop is never entered and the assignment never happens.
If there are multiple rows in the result set, then the assignment happens for each and every row returned, and in the order that they are returned.

In fact, this is how string concatenation works when done via result set (please note: as Martin Smith pointed out in this Stack Overflow answer, string concatenation done in this manner is not guaranteed to produce the expected result, even if most of the time it does; however, that does not take away from this method proving my point here):

DECLARE @List NVARCHAR(MAX) = N'**';

SELECT @List = @List + N', ' + QUOTENAME([name]) 
FROM   sys.schemas
WHERE  1 = 0
ORDER BY [name];

SELECT @List AS [NoChange];
-- **

SELECT @List = @List + N', ' + QUOTENAME([name]) 
FROM   sys.schemas
ORDER BY [name];

SELECT TRIM(N'*, ' FROM @List) AS [SQL2017orNewer];
--SELECT SUBSTRING(@List, 5, 80000) AS [PreSQL2017];

I even made use of this behavior to generate a complex T-SQL script:

Uppercase All String Columns (and in a Single Query!)

Best Answer

Related Solutions

Sql-server – How to check for a null or empty table-valued parameter

Mysql – Why does IN (subquery) perform bad when = (subquery) is blazing fast

Related Question