Sql-server – Incorrect statistics are used on index seek

execution-planset-returning-functionssql server

I have a table-valued function and ordinary table and need to perform INNER JOIN on them. Because, the engine thinks that the function is returning always one row, I try to store the results in temporary table and then to perform the join operation – in both cases, the Estimate number of rows count that are read from the table/index is wrong.

enter image description here

enter image description here

When the temporary table is used, the correct Estimate number of rows count value is used for the rows returned by the function. All involved indexes are rebuilt.

Why I am worried about this? This join is part of more complex query which execution is very slow. I believe this is due to the fact that the statistics that are used by the engine are wrong and not the optimal execution plan is used. Even when the query use parallelism again nested loop is used to join the tables.

If I use a HASH or MERGE hints on join the correct statistics are used and better execution plan is built but I prefer to find a way to provide the correct statistics instead of forcing the engine to do something.

Could anyone tell why always one row is expected? Maybe because table-valued function is used, the engine thinks one row is going to be read from the table/index?

Best Answer

The estimate for the inner (lower) input to a nested loops join is per iteration in SSMS. The 'actual' number of rows shown in SSMS is a total over all iterations. This often causes confusion, and is the result of a questionable design decision.

The one row estimate therefore needs to be multiplied by the estimated number of executions. Many people prefer to view plans in SQL Sentry Plan Explorer, which does this for you, so the numbers are directly comparable (both estimate and actual are shown over all iterations).

Using a temporary table to materialize a relatively small result to improve cardinality estimation is a very valid tuning technique. If the code executes as part of a function or stored procedure, you may need to manually update statistics on the temporary table and request a recompile to get the best plan in some circumstances, see my article Temporary tables in Stored Procedures for details.

There are still times when the optimizer will choose the 'wrong' join type even with good cardinality and distribution information, but these are relatively rare. You certainly maximize your chances of getting a good plan by ensuring the optimizer has accurate information to base its decisions on.

I realize this answer is a bit generic, but there's only so much to be said on the basis of a couple of partial execution plan images. If you add showplan XML for both queries to the question (link to an off-site host e.g. gist), I might may be able say more.