Sql-server – Warning in query plan “Cardinality Estimate”

cardinality-estimatesexecution-plansql serversql-server-2012type conversion

create table T(ID int identity primary key)
insert into T default values
insert into T default values

go

select cast(ID as varchar(10)) as ID
from T
where ID = 1

The query above has a warning in the query plan.

<Warnings>
  <PlanAffectingConvert ConvertIssue="Cardinality Estimate" Expression="CONVERT(varchar(10),[xx].[dbo].[T].[ID],0)" />
</Warnings>

Why does it have the warning?

How could a cast in the field list affect the cardinality estimate?

Best Answer

This warning was new for SQL Server 2012.

From New "Type Conversion in Expression....." warning in SQL2012 ,to noisy to practical use

I see what you mean. While I agree that this is noise in most cases, it is low priority for us to fix. We will look at it if we get more feedback. For now I have closed this by design.

Connect was killed and it doesn’t look like the original issue was transferred to UserVoice. Here’s a different UserVoice issue about the same problem, Type conversion in may affect CardinalityEstimate - Convert/cast on selected columns

I will provide the boring answer until someone comes along with a better one.

Why does it have the warning?

Speculation on my part.
There is a cast on a column that is used in the where clause which make statistics of that column interesting. A change of datatype makes the statistics no good so lets warn about that in case the value from the field list might end up to be used somewhere.

How could a cast in the field list affect the cardinality estimate?

It can't unless it is the field list in a derived table.

Related Solutions

Sql-server – Why is this join cardinality estimate so large

I know that doing COALESCE on a few columns and joining on them isn't a good practice.

Generating good cardinality and distribution estimates is hard enough when the schema is 3NF+ (with keys and constraints) and the query is relational and primarily SPJG (selection-projection-join-group by). The CE model is built on those principles. The more unusual or non-relational features there are in a query, the closer one gets to the boundaries of what the cardinality and selectivity framework can handle. Go too far and CE will give up and guess.

Most of the MCVE example is simple SPJ (no G), albeit with predominantly outer equijoins (modelled as inner join plus anti-semijoin) rather than the simpler inner equijoin (or semijoin). All the relations have keys, though no foreign keys or other constraints. All but one of the joins are one-to-many, which is good.

The exception is the many-to-many outer join between X_DETAIL_1 and X_DETAIL_LINK. The only function of this join in the MCVE is to potentially duplicate rows in X_DETAIL_1. This is an unusual sort of a thing.

Simple equality predicates (selections) and scalar operators are also better. For example attribute compare-equal attribute/constant normally works well in the model. It is relatively "easy" to modify histograms and frequency statistics to reflect the application of such predicates.

COALESCE is built on CASE, which is in turn implemented internally as IIF (and this was true well before IIF appeared in the Transact-SQL language). The CE models IIF as a UNION with two mutually-exclusive children, each consisting of a project on a selection on the input relation. Each of the listed components has model support, so combining them is relatively straightforward. Even so, the more one layers abstractions, the less accurate the end result tends to be - a reason why larger execution plans tend to be less stable and reliable.

ISNULL, on the other hand, is intrinsic to the engine. It is not built up using any more basic components. Applying the effect of ISNULL to a histogram, for example, is as simple as replacing the step for NULL values (and compacting as necessary). It is still relatively opaque, as scalar operators go, and so best avoided where possible. Nevertheless, it is generally speaking more optimizer-friendly (less optimizer-unfriendly) than a CASE-based alternate.

The CE (70 and 120+) is very complex, even by SQL Server standards. It is not a case of applying simple logic (with a secret formula) to each operator. The CE knows about keys and functional dependencies; it knows how to estimate using frequencies, multivariate statistics, and histograms; and there is an absolute ton of special cases, refinements, checks & balances, and supporting structures. It often estimates e.g. joins in multiple ways (frequency, histogram) and decides on an outcome or adjustment based on the differences between the two.

One last basic thing to cover: Initial cardinality estimation runs for every operation in the query tree, from the bottom up. Selectivity and cardinality is derived for leaf operators first (base relations). Modified histograms and density/frequency information is derived for parent operators. The further up the tree we go, the lower the quality of estimates tends to be as errors tend to accumulate.

This single initial comprehensive estimation provides a starting point, and occurs well before any consideration is given to a final execution plan (it happens a way before even the trivial plan compilation stage). The query tree at this point tends to reflect the written form of the query fairly closely (though with subqueries removed, and simplifications applied etc.)

Immediately after initial estimation, SQL Server performs heuristic join reordering, which loosely speaking tries to reorder the tree to place smaller tables and high-selectivity joins first. It also tries to position inner joins before outer joins and cross products. Its capabilities are not extensive; its efforts are not exhaustive; and it does not consider physical costs (since they do not exist yet - only statistical information and metadata information are present). Heuristic reorder is most successful on simple inner equijoin trees. It exists to provide a "better" starting point for cost-based optimization.

Why is this join cardinality estimate so large?

The MCVE has an "unusual" mostly-redundant many-to-many join, and an equi join with COALESCE in the predicate. The operator tree also has an inner join last, which heuristic join reorder was unable to move up the tree to a more preferred position. Leaving aside all the scalars and projections, the join tree is:

LogOp_Join [ Card=4.52803e+009 ]
    LogOp_LeftOuterJoin [ Card=481577 ]
        LogOp_LeftOuterJoin [ Card=481577 ]
            LogOp_LeftOuterJoin [ Card=481577 ]
                LogOp_LeftOuterJoin [ Card=481577 ]
                LogOp_Get TBL: X_DRIVING_TABLE(alias TBL: dt) [ Card=481577 ]
                LogOp_Get TBL: X_DETAIL_1(alias TBL: d1) [ Card=70 ]
                LogOp_Get TBL: X_DETAIL_LINK(alias TBL: lnk) [ Card=47 ]
            LogOp_Get TBL: X_DETAIL_2(alias TBL: d2) X_DETAIL_2 [ Card=119 ]
        LogOp_Get TBL: X_DETAIL_3(alias TBL: d3) X_DETAIL_3 [ Card=281 ]
    LogOp_Get TBL: X_LAST_TABLE(alias TBL: lst) X_LAST_TABLE [ Card=94025 ]

Note the faulty final estimate is already in place. It is printed as Card=4.52803e+009 and stored internally as the double precision floating point value 4.5280277425e+9 (4528027742.5 in decimal).

The derived table in the original query has been removed, and projections normalized. A SQL representation of the tree on which initial cardinality and selectivity estimation was performed is:

SELECT 
    PRIMARY_ID = COALESCE(d1.JOIN_ID, d2.JOIN_ID, d3.JOIN_ID)
FROM X_DRIVING_TABLE dt
LEFT OUTER JOIN X_DETAIL_1 d1
    ON dt.ID = d1.ID
LEFT OUTER JOIN X_DETAIL_LINK lnk 
    ON d1.LINK_ID = lnk.LINK_ID
LEFT OUTER JOIN X_DETAIL_2 d2 
    ON dt.ID = d2.ID
LEFT OUTER JOIN X_DETAIL_3 d3 
    ON dt.ID = d3.ID
INNER JOIN X_LAST_TABLE lst 
    ON lst.JOIN_ID = COALESCE(d1.JOIN_ID, d2.JOIN_ID, d3.JOIN_ID)

(As a aside, the repeated COALESCE is also present in the final plan - once in the final Compute Scalar, and once on the inner side of the inner join).

Notice the final join. This inner join is (by definition) the cartesian product of X_LAST_TABLE and the preceding join output, with a selection (join predicate) of lst.JOIN_ID = COALESCE(d1.JOIN_ID, d2.JOIN_ID, d3.JOIN_ID) applied. The cardinality of the cartesian product is simply 481577 * 94025 = 45280277425.

To that, we need to determine and apply the selectivity of the predicate. The combination of the opaque expanded COALESCE tree (in terms of UNION and IIF, remember) together with the impact on key information, derived histograms and frequencies of the earlier "unusual" mostly-redundant many-to-many outer join combined means the CE is unable to derive an acceptable estimate in any of the normal ways.

As a result, it enters the Guess Logic. The guess logic is moderately complex, with layers of "educated" guess and "not-so educated" guess algorithms tried. If no better basis for a guess is found, the model uses a guess of last resort, which for an equality comparison is: sqllang!x_Selectivity_Equal = fixed 0.1 selectivity (10% guess):

-- the moment of doom
movsd xmm0,mmword ptr [sqllang!x_Selectivity_Equal

The result is 0.1 selectivity on the cartesian product: 481577 * 94025 * 0.1 = 4528027742.5 (~4.52803e+009) as mentioned before.

Rewrites

When the problematic join is commented out, a better estimate is produced because the fixed-selectivity "guess of last resort" is avoided (key information is retained by the 1-M joins). The quality of the estimate is still low-confidence, because a COALESCE join predicate is not at all CE-friendly. The revised estimate does at least look more reasonable to humans, I suppose.

When the query is written with the outer join to X_DETAIL_LINK placed last, heuristic reorder is able to swap it with the final inner join to X_LAST_TABLE. Putting the inner join right next to the problem outer join gives the limited abilities of early reorder the opportunity to improve the final estimate, since the effects of the mostly-redundant "unusual" many-to-many outer join come after the tricky selectivity estimation for COALESCE. Again, the estimates are little better than fixed guesses, and probably would not stand up to determined cross-examination in a court of law.

Reordering a mixture of inner and outer joins is hard and time-consuming (even stage 2 full optimization only attempts a limited subset of theoretical moves).

The nested ISNULL suggested in Max Vernon's answer manages to avoid the bail-out fixed guess, but the final estimate is an improbable zero rows (uplifted to one row for decency). This might as well be a fixed guess of 1 row, for all the statistical basis the calculation has.

I would expect a join cardinality estimate between 0 and 481577 rows.

This is a reasonable expectation, even if one accepts that cardinality estimation can occur at different times (during cost-based optimization) on physically different, but logically and semantically identical subtrees - with the final plan being a sort of stitched-together best of the best (per memo group). The lack of a plan-wide consistency guarantee does not mean that an individual join should be able to flout respectability, I get that.

On the other hand, if we end up at the guess of last resort, hope is already lost, so why bother. We tried all the tricks we knew, and gave up. If nothing else, the wild final estimate is a great warning sign that not everything went well inside the CE during the compilation and optimization of this query.

When I tried the MCVE, the 120+ CE produced a zero (= 1) row final estimate (like the nested ISNULL) for the original query, which is just as unacceptable to my way of thinking.

The real solution probably involves a design change, to allow simple equi-joins without COALESCE or ISNULL, and ideally foreign keys & other constraints useful for query compilation.

Sql-server – Update statistics on sys.objects

The warnings you're seeing most likely come from the sys.sysdepends view.

If you script it out using

EXEC sys.sp_helptext @objname = N'sys.sysdepends'

The definition has a bunch of converts and other nonsense going on.

CREATE VIEW sys.sysdepends AS  
 SELECT  
  id = object_id,  
  depid = referenced_major_id,  
  number = convert(smallint,  
   case when objectproperty(object_id, 'isprocedure') = 1 then 1 else column_id end),  
  depnumber = convert(smallint, referenced_minor_id),  
  status = convert(smallint, is_select_all * 2 + is_updated * 4 + is_selected * 8),  
  deptype = class,  
  depdbid = convert(smallint, 0),  
  depsiteid = convert(smallint, 0),  
  selall = is_select_all,  
  resultobj = is_updated,  
  readobj = is_selected  
 FROM sys.sql_dependencies  
 WHERE class < 2  
 UNION ALL  
 SELECT  -- blobtype dependencies  
  id = object_id, depid = object_id,  
  number = convert(smallint, column_id), depnumber = convert(smallint, type_column_id),  
  status = convert(smallint, 0), deptype = sysconv(tinyint, 1),  
  depdbid = convert(smallint, 0), depsiteid = convert(smallint, 0),  
  selall = sysconv(bit, 0), resultobj = sysconv(bit, 0), readobj = sysconv(bit, 0)  
 FROM sys.fulltext_index_columns  
 WHERE type_column_id IS NOT NULL

sys.objects, on the other hand, is fairly straightforward.

CREATE VIEW sys.objects AS  
 SELECT name,   
  object_id,  
  principal_id,  
  schema_id,  
  parent_object_id,  
  type,   
  type_desc,   
  create_date,   
  modify_date,  
  is_ms_shipped,  
  is_published,  
  is_schema_published  
 FROM sys.objects$

The view definition for sys.sysdepends causes the same warnings when queried on its own.

SELECT *
FROM  sys.sysdepends

In general, if you want to control datatypes and indexes and have some performance tuning ability when referencing system views or tables, your best bet is to dump them into a temp table first.

Best Answer

Related Solutions

Sql-server – Why is this join cardinality estimate so large

Rewrites

Sql-server – Update statistics on sys.objects

Related Question