SQL Server – Creating a Plan Guide to Cache CTE Result

ctesql servert-sql

I normally create plan guides by first constructing a query that uses the correct plan, and copying it across to the similar query that doesn't. However, that is sometimes tricky, especially if the query is not exactly the same. What is the right way of creating plan guides from scratch?

SQLKiwi has mentioned drawing up plans in SSIS, is there a way or useful tool to assist in laying out a good plan for SQL Server?

The specific instance in question is this CTE: SQLFiddle

with cte(guid,other) as (
  select newid(),1 union all
  select newid(),2 union all
  select newid(),3)
select a.guid, a.other, b.guid guidb, b.other otherb
from cte a
cross join cte b
order by a.other, b.other;

Is there ANY way to make the result come up with exactly 3 distinct guids and no more? I'm hoping to be able to better answer questions in future by including plan guides with CTE-type queries that are referenced multiple times to overcome some SQL Server CTE quirks.

Best Answer

Is there ANY way to make the result come up with exactly 3 distinct guids and no more? I'm hoping to be able to better answer questions in future by including plan guides with CTE-type queries that are referenced multiple times to overcome some SQL Server CTE quirks.

Not today. Non-recursive common table expressions (CTEs) are treated as in-line view definitions and expanded into the logical query tree at each place they are referenced (just like regular view definitions are) before optimization. The logical tree for your query is:

LogOp_OrderByCOL: Union1007 ASC COL: Union1015 ASC 
    LogOp_Project COL: Union1006 COL: Union1007 COL: Union1014 COL: Union1015
        LogOp_Join
            LogOp_ViewAnchor
                LogOp_UnionAll
                    LogOp_Project ScaOp_Intrinsic newid, ScaOp_Const
                    LogOp_Project ScaOp_Intrinsic newid, ScaOp_Const
                    LogOp_Project ScaOp_Intrinsic newid, ScaOp_Const

            LogOp_ViewAnchor
                LogOp_UnionAll
                    LogOp_Project ScaOp_Intrinsic newid, ScaOp_Const
                    LogOp_Project ScaOp_Intrinsic newid, ScaOp_Const
                    LogOp_Project ScaOp_Intrinsic newid, ScaOp_Const

Notice the two View Anchors and the six calls to the intrinsic function newid before optimization gets started. Nevertheless, many people consider that the optimizer ought to be able to identify that the expanded sub-trees were originally a single referenced object and simplify accordingly. There have also been several Connect requests to allow explicit materialization of a CTE or derived table.

A more general implementation would have the optimizer consider materializing arbitrary common expressions to improve performance (CASE with a subquery is another example where problems can occur today). Microsoft Research published a paper (PDF) on that back in 2007, though it remains unimplemented to date. For the time being, we are limited to explicit materialization using things like table variables and temporary tables.

SQLKiwi has mentioned drawing up plans in SSIS, is there a way or useful tool to assist in laying out a good plan for SQL Server?

This was just wishful thinking on my part, and went well beyond the idea of modifying plan guides. It is possible, in principle, to write a tool to manipulate show plan XML directly, but without specific optimizer instrumentation using the tool would likely be a frustrating experience for the user (and the developer come to think of it).

In the particular context of this question, such a tool would still be unable to materialize the CTE contents in a way that could used by multiple consumers (to feed both inputs to the cross join in this case). The optimizer and execution engine do support multi-consumer spools, but only for specific purposes - none of which could be made to apply to this particular example.

While I'm not certain, I have a fairly strong hunch that the RelOps can be followed (Nested Loop, Lazy Spool) even if the query is not exactly the same as the plan - for instance if you added 4 and 5 to the CTE, it still continues to use the same plan (seemingly - tested on SQL Server 2012 RTM Express).

There is a reasonable amount of flexibility here. The broad shape of the XML plan is used to guide the search for a final plan (though many attributes are ignored completely e.g. partitioning type on exchanges) and the normal search rules are considerably relaxed as well. For example, early pruning of alternatives based on cost considerations is disabled, the explicit introduction of cross joins is allowed, and scalar operations are ignored.

There are too many details to go into in depth, but the placement of Filters and Compute Scalars cannot be forced, and predicates of the form column = value are generalized so a plan containing X = 1 or X = @X can be applied to a query containing X = 502 or X = @Y. This particular flexibility can help greatly in finding a natural plan to force.

In the specific example, constant Union All can always be implemented as a Constant Scan; the number of inputs to the Union All does not matter.