Sql-server – Persisted computed column causing scan

execution-planoptimizationsql server

Converting a regular column to a persisted computed column is causing this query to not be able to do index seeks. Why?

Tested on several SQL Server versions, including 2016 SP1 CU1.

Repros

The trouble is with table1, col7.

The tables and query are a partial (and simplified) version of the originals. I'm aware the query could be rewritten differently, and for some reason avoid the problem, but we need to avoid touching the code, and the question of why table1 cannot be seeked still stands.

As Paul White showed (thanks!), the seek is available if forced, so the question is: Why the seek is not chosen by the optimizer, and whether we can do something differently to make the seek happen as it should, without changing the code?

To clarify the problematic part, here's the relevant scan in the bad execution plan:

plan

Best Answer

Why the seek is not chosen by the optimizer


TL:DR The expanded computed column definition interferes with the optimizer's ability to reorder joins initially. With a different starting point, cost-based optimization takes a different path through the optimizer, and ends up with a different final plan choice.


Details

For all but the very simplest of queries, the optimizer does not attempt to explore anything like the whole space of possible plans. Instead, it picks a reasonable-looking starting point, then spends a budgeted amount of effort exploring logical and physical variations, in one or more search phases, until it finds a reasonable plan.

The main reason you get different plans (with different final cost estimates) for the two cases is that there are different starting points. Starting from a different place, optimization ends up at a different place (after its limited number of exploration and implementation iterations). I hope this is reasonably intuitive.

The starting point I mentioned, is somewhat based on the textual representation of the query, but changes are made to the internal tree representation as it passes through the parsing, binding, normalization, and simplification stages of query compilation.

Importantly, the exact starting point depends heavily on the initial join order selected by the optimizer. This choice is made before statistics are loaded, and before any cardinality estimations have been derived. The total cardinality (number of rows) in each table is however known, having been obtained from system metadata.

The initial join ordering is therefore based on heuristics. For example, the optimizer tries to rewrite the tree such that smaller tables are joined before larger ones, and inner joins come before outer joins (and cross joins).

The presence of the computed column interferes with this process, most specifically with the optimizer's ability to push outer joins down the query tree. This is because the computed column is expanded into its underlying expression before join reordering occurs, and moving a join past a complex expression is much more difficult than moving it past a simple column reference.

The trees involved are quite large, but to illustrate, the non-computed column initial query tree begins with: (note the two outer joins at the top)

LogOp_Select
    LogOp_Apply (x_jtLeftOuter)
        LogOp_LeftOuterJoin
            LogOp_NAryJoin
                LogOp_LeftAntiSemiJoin
                    LogOp_NAryJoin
                        LogOp_Get TBL: dbo.table1(alias TBL: a4)
                        LogOp_Select
                            LogOp_Get TBL: dbo.table6(alias TBL: a3)
                            ScaOp_Comp x_cmpEq
                                ScaOp_Identifier QCOL: [a3].col18
                                ScaOp_Const TI(varchar collate 53256,Var,Trim,ML=16)
                        LogOp_Select
                            LogOp_Get TBL: dbo.table1(alias TBL: a1)
                            ScaOp_Comp x_cmpEq
                                ScaOp_Identifier QCOL: [a1].col2
                                ScaOp_Const TI(varchar collate 53256,Var,Trim,ML=16)
                        LogOp_Select
                            LogOp_Get TBL: dbo.table5(alias TBL: a2)
                            ScaOp_Comp x_cmpEq
                                ScaOp_Identifier QCOL: [a2].col2
                                ScaOp_Const TI(varchar collate 53256,Var,Trim,ML=16)
                        ScaOp_Comp x_cmpEq
                            ScaOp_Identifier QCOL: [a4].col2
                            ScaOp_Identifier QCOL: [a3].col19
                    LogOp_Select
                        LogOp_Get TBL: dbo.table7(alias TBL: a7)
                        ScaOp_Comp x_cmpEq
                            ScaOp_Identifier QCOL: [a7].col22
                            ScaOp_Const TI(varchar collate 53256,Var,Trim,ML=16)
                    ScaOp_Comp x_cmpEq
                        ScaOp_Identifier QCOL: [a4].col2
                        ScaOp_Identifier QCOL: [a7].col23
                LogOp_Select
                    LogOp_Get TBL: table1(alias TBL: cdc)
                    ScaOp_Comp x_cmpEq
                        ScaOp_Identifier QCOL: [cdc].col6
                        ScaOp_Const TI(smallint,ML=2) XVAR(smallint,Not Owned,Value=4)
                LogOp_Get TBL: dbo.table5(alias TBL: a5) 
                LogOp_Get TBL: table2(alias TBL: cdt)  
                ScaOp_Logical x_lopAnd
                    ScaOp_Comp x_cmpEq
                        ScaOp_Identifier QCOL: [a5].col2
                        ScaOp_Identifier QCOL: [cdc].col2
                    ScaOp_Comp x_cmpEq
                        ScaOp_Identifier QCOL: [a4].col2
                        ScaOp_Identifier QCOL: [cdc].col2
                    ScaOp_Comp x_cmpEq
                        ScaOp_Identifier QCOL: [cdt].col1
                        ScaOp_Identifier QCOL: [cdc].col1
            LogOp_Get TBL: table3(alias TBL: ahcr)
            ScaOp_Comp x_cmpEq
                ScaOp_Identifier QCOL: [ahcr].col9
                ScaOp_Identifier QCOL: [cdt].col1

The same fragment of the computed column query is: (note the outer join much lower down, the expanded computed column definition, and some other subtle differences in (inner) join ordering)

LogOp_Select
    LogOp_Apply (x_jtLeftOuter)
        LogOp_NAryJoin
            LogOp_LeftAntiSemiJoin
                LogOp_NAryJoin
                    LogOp_Get TBL: dbo.table1(alias TBL: a4)
                    LogOp_Select
                        LogOp_Get TBL: dbo.table6(alias TBL: a3)
                        ScaOp_Comp x_cmpEq
                            ScaOp_Identifier QCOL: [a3].col18
                            ScaOp_Const TI(varchar collate 53256,Var,Trim,ML=16)
                    LogOp_Select
                        LogOp_Get TBL: dbo.table1(alias TBL: a1
                        ScaOp_Comp x_cmpEq
                            ScaOp_Identifier QCOL: [a1].col2
                            ScaOp_Const TI(varchar collate 53256,Var,Trim,ML=16)
                    LogOp_Select
                        LogOp_Get TBL: dbo.table5(alias TBL: a2)
                        ScaOp_Comp x_cmpEq
                            ScaOp_Identifier QCOL: [a2].col2
                            ScaOp_Const TI(varchar collate 53256,Var,Trim,ML=16)
                    ScaOp_Comp x_cmpEq
                        ScaOp_Identifier QCOL: [a4].col2
                        ScaOp_Identifier QCOL: [a3].col19
                LogOp_Select
                    LogOp_Get TBL: dbo.table7(alias TBL: a7) 
                    ScaOp_Comp x_cmpEq
                        ScaOp_Identifier QCOL: [a7].col22
                        ScaOp_Const TI(varchar collate 53256,Var,Trim,ML=16)
                ScaOp_Comp x_cmpEq
                    ScaOp_Identifier QCOL: [a4].col2
                    ScaOp_Identifier QCOL: [a7].col23
            LogOp_Project
                LogOp_LeftOuterJoin
                    LogOp_Join
                        LogOp_Select
                            LogOp_Get TBL: table1(alias TBL: cdc) 
                            ScaOp_Comp x_cmpEq
                                ScaOp_Identifier QCOL: [cdc].col6
                                ScaOp_Const TI(smallint,ML=2) XVAR(smallint,Not Owned,Value=4)
                        LogOp_Get TBL: table2(alias TBL: cdt) 
                        ScaOp_Comp x_cmpEq
                            ScaOp_Identifier QCOL: [cdc].col1
                            ScaOp_Identifier QCOL: [cdt].col1
                    LogOp_Get TBL: table3(alias TBL: ahcr) 
                    ScaOp_Comp x_cmpEq
                        ScaOp_Identifier QCOL: [ahcr].col9
                        ScaOp_Identifier QCOL: [cdt].col1
                AncOp_PrjList 
                    AncOp_PrjEl QCOL: [cdc].col7
                        ScaOp_Convert char collate 53256,Null,Trim,ML=6
                            ScaOp_IIF varchar collate 53256,Null,Var,Trim,ML=6
                                ScaOp_Comp x_cmpEq
                                    ScaOp_Intrinsic isnumeric
                                        ScaOp_Intrinsic right
                                            ScaOp_Identifier QCOL: [cdc].col4
                                            ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=4)
                                    ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=0)
                                ScaOp_Const TI(varchar collate 53256,Var,Trim,ML=1) XVAR(varchar,Owned,Value=Len,Data = (0,))
                                ScaOp_Intrinsic substring
                                    ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=6)
                                    ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=1)
                                    ScaOp_Identifier QCOL: [cdc].col4
            LogOp_Get TBL: dbo.table5(alias TBL: a5)
            ScaOp_Logical x_lopAnd
                ScaOp_Comp x_cmpEq
                    ScaOp_Identifier QCOL: [a5].col2
                    ScaOp_Identifier QCOL: [cdc].col2
                ScaOp_Comp x_cmpEq
                    ScaOp_Identifier QCOL: [a4].col2
                    ScaOp_Identifier QCOL: [cdc].col2

Statistics are loaded and an initial cardinality estimation is performed on the tree just after the initial join order is set. Having the joins in different orders also affects these estimates, and so has a knock-on effect during later cost-based optimization.

Finally for this section, having an outer join stuck in the middle of the tree can prevent some further join reordering rules matching during cost-based optimization.


Using a plan guide (or, equivalently a USE PLAN hint - example for your query) changes the search strategy to a more goal-oriented approach, guided by the general shape and features of the supplied template. This explains why the optimizer can find the same table1 seek plan against both computed and non-computed column schemas, when a plan guide or hint is used.

Whether we can do something differently to make the seek happen

This is something you only need to worry about if the optimizer does not find a plan with acceptable performance characteristics on its own.

All the normal tuning tools are potentially applicable. You can, for example, break the query up into simpler parts, review and improve the available indexing, update or create new statistics...and so on.

All these things can affect cardinality estimates, the code path taken through the optimizer, and influence cost-based decisions in subtle ways.

You may ultimately resort to using hints (or a plan guide), but that's not usually the ideal solution.


Additional questions from comments

I agree that it's best to simplify the query etc., but is there a way (trace flag) to make the optimizer continue with the optimization and reach the same result?

No, there's no trace flag to perform an exhaustive search, and you don't want one. The possible search space is vast, and compilation times that exceed the age of the universe would not be well-received. Also, the optimizer doesn't know every possible logical transform (no one does).

Also, why is the complex expansion needed, as the column is persisted? Why can't the optimizer avoid expanding it, treat it like a regular column, and reach the same starting point?

Computed columns are expanded (like views are) to enable additional optimization opportunities. The expansion may be matched back to e.g. a persisted column or index later in the process, but this happens after the initial join order is fixed.