SQL Server – Does Concatenation Physical Operation Guarantee Order of Execution?

database-internalsexecution-plansql serverunion

In standard SQL, the result of a union all is not guaranteed to be in any order. So, something like:

select 'A' as c union all select 'B'

Could return two rows in any order (although, in practice on any database I know of, 'A' will come before 'B').

In SQL Server, this turns into an execution plan using a "concatenation" physical operation.

I could easily imagine that the concatenation operation would scan its inputs, returning whatever input has records available. However, I found the following statement on the web (here):

The Query Processor will execute this plan in the order that the
operators appear in the plan, the first is the top one and the last is
the end one.

Question: Is this true in practice? Is this guaranteed to be true?

I haven't found any reference in Microsoft documentation that the inputs are scanned in order, from the first to the last. On the other hand, whenever I try running it, the results suggest that the inputs are, indeed, processed in order.

Is there a way to have the engine process more than one input at a time? My tests (using much more complicated expressions than constants) are on a parallel-enabled 8-core machine, and most queries do take advantage of the parallelism.

Best Answer

No, there is no documentation from Microsoft guaranteeing the behavior, therefore it is not guaranteed.

Additionally, assuming that the Simple Talk article is correct, and that the Concatenation physical operator always processes inputs in the order shown in the plan (very likely to be true), then without a guarantee that SQL Server will always generate plans that keep the same the order between the query text and the query plan, you're only slightly better off.

We can investigate this further though. If the query optimizer was able to reorder the Concatenation operator input, there should exist rows in the undocumented DMV, sys.dm_exec_query_transformation_stats corresponding to that optimization.

SELECT * FROM sys.dm_exec_query_transformation_stats 
    WHERE name LIKE '%CON%' OR name LIKE '%UNIA%'

On SQL Server 2012 Enterprise Edition, this produces 24 rows. Ignoring the false matches for transformations related to constants, there is one transformation related to the Concatenation Physical Operator UNIAtoCON (Union All to Concatenation). So, at the physical operator level, it appears that once a concatenation operator is selected, it will be processed in the order of the logical Union All operator it was derived from.

In fact that is not quite true. Post-optimization rewrites exist that can reorder the inputs to a physical Concatenation operator after cost-based optimization has completed. One example occurs when the Concatenation is subject to a row goal (so it may be important to read from the cheaper input first). See UNION ALL Optimization by Paul White for more details.

That late physical rewrite was functional up to and including SQL Server 2008 R2, but a regression meant it no longer applied to SQL Server 2012 and later. A fix has been issued that reinstates this rewrite for SQL Server 2014 and later (not 2012) with query optimizer hotfixes enabled (e.g. trace flag 4199).

But about the Logical Union All operator (UNIA)? There is a UNIAReorderInputs transformation, which can reorder the inputs. There are also two physical operators that can be used to implement a logical Union All, UNIAtoCON and UNIAtoMERGE (Union All to Merge Union).

Therefore it appears that the query optimizer can reorder the inputs for a UNION ALL; however, it doesn't appear to be a common transformation (zero uses of UNIAReorderInputs on the SQL Servers I have readily accessible. We don't know the circumstances that would make the optimizer use UNIAReorderInputs; though it is certainly used when a plan guide or use plan hint is used to force a plan generated using the row goal physical reordered inputs mentioned above.

Is there a way to have the engine process more than one input at a time?

The Concatenation physical operator can exist within a parallel section of a plan. With some difficulty, I was able to produce a plan with parallel concatenations using the following query:

SELECT userid, regdate  FROM (  --Users table is around 3mil rows
    SELECT  userid, RegDate FROM users WHERE userid > 1000000
    UNION 
    SELECT  userid, RegDate FROM users WHERE userid < 1000000
    UNION all
    SELECT userid, RegDate FROM users WHERE userid < 2000000
    ) d ORDER BY RegDate OPTION (RECOMPILE)

So, in the strictest sense, the physical Concatenation operator does seem to always process inputs in a consistent fashion (top one first, bottom second); however, the optimizer could switch the order of the inputs before choosing the physical operator, or use a Merge union instead of a Concatenation.

Best Answer

Related Solutions

Sql-server – stored procedure times out while subsequent runs take 1/6 the time

Sql-server – How to determine cause of runtime increase given two query plans with SpillToTempDb warning

Related Question