Sql-server – NEWID() In Joined Virtual Table Causes Unintended Cross Apply Behavior

database-internalssql serversql-server-2008

My actual work query was an inner join, but this simple example with cross join seems to nearly always reproduce the problem.

SELECT *
FROM (
    SELECT 1 UNION ALL
    SELECT 2
) AA ( A )
CROSS JOIN (
    SELECT NEWID() TEST_ID
) BB ( B )

With my inner join I had many rows for which I added to each a GUID using the NEWID() function, and for about 9 out of 10 such rows the multiplication with the 2-row virtual table produced the expected results, just 2 copies of the same GUID, while 1 out of 10 would produce different results. This was unexpected to say the least and gave me a really hard time trying to find this bug in my test data generation script.

If you take a look at the following queries using as well non-deterministic getdate and sysdatetime functions, you won't see this, I don't anyway-I always see the same datetime value in both final result rows.

SELECT *
FROM (
    SELECT 1 UNION ALL
    SELECT 2
) AA ( A )
CROSS JOIN (
    SELECT GETDATE() TEST_ID
) BB ( B )

SELECT *
FROM (
    SELECT 1 UNION ALL
    SELECT 2
) AA ( A )
CROSS JOIN (
    SELECT SYSDATETIME() TEST_ID
) BB ( B )

I'm currently using SQL Server 2008 and my work around for now is to load my rows with GUIDs into a table variable before finishing out my random data generation script. Once I have them as values in a table as opposed to virtual table, the problem goes away.

I have a workaround, but I'm looking for the ways to workaround without actual tables or table variables.

While writing this I tried without success these possibilities:
1) placing the newid() into a nested virtual table:

SELECT *
FROM (
    SELECT 1 UNION ALL
    SELECT 2
) AA ( A )
CROSS JOIN (
    SELECT TEST_ID
    FROM (
        SELECT NEWID() TEST_ID
    ) TT
) BB ( B )

2) wrapping the newid() within a cast expression such as:

SELECT CAST(NEWID() AS VARCHAR(100)) TEST_ID

3) reversing the order of appearance of the virtual tables within the join expression

SELECT *
FROM (
    SELECT NEWID() TEST_ID
) BB ( B )
CROSS JOIN (
    SELECT 1 UNION ALL
    SELECT 2
) AA ( A )

4) using uncorrelated cross apply

SELECT *
FROM (
    SELECT NEWID() TEST_ID
) BB ( B )
CROSS APPLY (
    SELECT 1 UNION ALL
    SELECT 2
) AA ( A )

Just before finally posting this question, now I tried this with success it seems, a correlated cross apply:

SELECT *
FROM (
    SELECT NEWID() TEST_ID
) BB ( B )
CROSS APPLY (
    SELECT A
    FROM (
        SELECT 1 UNION ALL
        SELECT 2
    ) TT ( A )
    WHERE BB.B IS NOT NULL
) AA ( A )

Anyone have any other more elegant, simpler workaround? I really don't want to use cross apply or correlation for a simple row multiplication if I don't have to.

Best Answer

This behaviour is by design, as explained in detail on this Connect bug report. The most pertinent Microsoft reply is reproduced below for convenience (and in case the link dies at some point):

Posted by Microsoft on 7/7/2008 at 9:27 AM

Closing the loop . . . I've discussed this question with the Dev team. And eventually we have decided not to change current behavior, for the following reasons:

The optimizer does not guarantee timing or number of executions of scalar functions. This is a long-estabilished tenet. It's the fundamental 'leeway' tha allows the optimizer enough freedom to gain significant improvements in query-plan execution.

This "once-per-row behavior" is not a new issue, although it's not widely discussed. We started to tweak its behavior back in the Yukon release. But it's quite hard to pin down precisely, in all cases, exactly what it means! For example, does it a apply to interim rows calculated 'on the way' to the final result? - in which case it clearly depends on the plan chosen. Or does it apply only to the rows that will eventually appear in the completed result? - there's a nasty recursion going on here, as I'm sure you'll agree!

As I mentioned earlier, we default to "optimize performance" - which is good for 99% of cases. The 1% of cases where it might change results are fairly easy to spot - side-effecting 'functions' such as NEWID - and easy to 'fix' (trading perf, as a consequence). This default to "optimize performance" again, is long-established, and accepted. (Yes, it's not the stance chosen by compilers for conventional programming languages, but so be it).

So, our recommendations are:

Avoid reliance on non-guaranteed timing and number-of-executions semantics.

Avoid using NEWID() deep in table expressions.

Use OPTION to force a particular behavior (trading perf)

Hope this explanation helps clarify our reasons for closing this bug as "won't fix".

The GETDATE and SYSDATETIME functions are indeed non-deterministic, but they are treated as runtime constants for a particular query. Broadly, this means the function's value is cached when query execution starts, and the result re-used for all references within the query.

None of the 'workarounds' in the question are safe; there is no guarantee the behaviour will not change the next time the plan is compiled, when you next apply a service pack or cumulative update...or for other reasons.

The only safe solution is to use a temporary object of some kind - a variable, table, or multi-statement function for example. Using a workaround that appears to work today based on observation is a great way to experience unexpected behaviours in future, typically in the form of a paging alert at 3am on Sunday morning.

More details

A Lazy Index Spool lazily caches inner side result rows, in a work table indexed by outer reference (correlated parameter) values. If a Lazy Index Spool is asked for an outer reference it has seen before, it fetches the cached result row from its work table (a "rewind"). If the spool is asked for an outer reference value it has not seen before, it runs its subtree with the current outer reference value and caches the result (a "rebind"). The seek predicate on the Lazy Index Spool indicates the key(s) for its work table.

The problem occurs in this specific plan shape when the spool checks to see if a new outer reference is the same as one it has seen before. The Nested Loops Join updates its outer references correctly, and notifies operators on its inner input via their PrepRecompute interface methods. At the start of this check, inner side operators read the CParamBounds:FNeedToReload property to see if the outer reference has changed from last time. An example stack trace is shown below:

CParamBounds:FNeedToReload

When the subtree shown above exists, specifically where Concatenation is used, something goes wrong (perhaps a ByVal/ByRef/Copy problem) with the bindings such that CParamBounds:FNeedToReload always returns false, regardless of whether the outer reference actually changed or not.

When the same subtree exists, but a Merge Union or Hash Union is used, this essential property is set correctly on each iteration, and the Lazy Index Spool rewinds or rebinds each time as appropriate. The Distinct Sort and Stream Aggregate are blameless, by the way. My suspicion is that Merge and Hash Union make a copy of the previous value, whereas Concatenation uses a reference. It is just about impossible to verify this without access to the SQL Server source code, unfortunately.

The net result is that the Lazy Index Spool in the problematic plan shape always thinks it has already seen the current outer reference, rewinds by seeking into its work table, generally finds nothing, so no row is returned for that outer reference. Stepping through the execution in a debugger, the spool only ever executes its RewindHelper method, and never its ReloadHelper method (reload = rebind in this context). This is evident in the execution plan because operators under the spool all have 'Number of Executions = 1'.

RewindHelper

The exception, of course, is for the first outer reference the Lazy Index Spool is given. This always executes the subtree and caches a result row in the work table. All subsequent iterations result in a rewind, which will only produce a row (the single cached row) when the current iteration has the same value for the outer reference as the first time around.

So, for any given input set on the outer side of the Nested Loops Join, the query will return as many rows as there are duplicates of the first row processed (plus one for the first row itself of course).

Demo

Table and sample data:

CREATE TABLE #T1 
(
    pk integer IDENTITY NOT NULL,
    c1 integer NOT NULL,

    CONSTRAINT PK_T1
    PRIMARY KEY CLUSTERED (pk)
);
GO
INSERT #T1 (c1)
VALUES
    (1), (2), (3), (4), (5), (6),
    (1), (2), (3), (4), (5), (6),
    (1), (2), (3), (4), (5), (6);

The following (trivial) query produces a correct count of two for each row (18 in total) using a Merge Union:

SELECT T1.c1, C.c1
FROM #T1 AS T1
CROSS APPLY 
(
    SELECT COUNT_BIG(*) AS c1
    FROM
    (
        SELECT T1.c1
        UNION
        SELECT NULL
    ) AS U
) AS C;

Merge Union Plan

If we now add a query hint to force a Concatenation:

SELECT T1.c1, C.c1
FROM #T1 AS T1
CROSS APPLY 
(
    SELECT COUNT_BIG(*) AS c1
    FROM
    (
        SELECT T1.c1
        UNION
        SELECT NULL
    ) AS U
) AS C
OPTION (CONCAT UNION);

The execution plan has the problematic shape:

Concatenation Plan

And the result is now incorrect, just three rows:

Three row result

Though this behaviour is not guaranteed, the first row from the Clustered Index Scan has a c1 value of 1. There are two other rows with this value, so three rows are produced in total.

Now truncate the data table and load it with more duplicates of the 'first' row:

TRUNCATE TABLE #T1;

INSERT #T1 (c1)
VALUES
    (1), (2), (3), (4), (5), (6),
    (1), (2), (3), (4), (5), (6),
    (1), (1), (1), (1), (1), (1);

Now the Concatenation plan is:

8 row Concatenation Plan

And, as indicated, 8 rows are produced, all with c1 = 1 of course:

8 row result

I notice you have opened a Connect item for this bug but really that is not the place to report issues that are having a production impact. If that is the case, you really ought to contact Microsoft Support.

This wrong-results bug was fixed at some stage. It no longer reproduces for me on any version of SQL Server from 2012 onward. It does repro on SQL Server 2008 R2 SP3-GDR build 10.50.6560.0 (X64).

Best Answer

Related Solutions

Sql-server – How to use merge hints to isolate complex queries in SQL Server

Sql-server – SQL Server unpredictable select results (dbms error?)

More details

Demo

Related Question