Sql-server – Scalar Operator in Seek Predicate and a Row Estimate of 1

execution-plansql serversql-server-2012

I'm trying to understand some behavior I'm seeing in a query plan that's rather large and unruly. In particular, I'm looking at a Clustered Index Seek Operation containing a Scalar Operation within its Predicate. I suspect the Scalar operation is simply the aliasing of one of the tables (as described in Scalar Operator in Seek Predicate) as both columns are of the same type and this operation feeds into a Parallel Nested Loops (Left Outer Join) operator.

My question though is more about the Row Estimate coming in at 1 instead of a number closer to the Actual Number of Rows (~6.7 million). Is the Scalar Operation killing the optimizer's ability to estimate the rows properly? I assume so and also assume this is hurting my query execution plan from being optimal, but I don't really know for certain. Can someone confirm or refute my suspicions along with why?

Here's the operation in question:

enter image description here

Version: SQL Server 2012 Enterprise

Best Answer

On the inner side of a nested loop join, the estimated number of rows represents an estimate per iteration of the loop. The actual number of rows is the total number of rows returned for all iterations of the join. This is unintuitive and a common source of confusion.

We can see this action with a simple demo:

CREATE TABLE dbo.X_OUTER_TABLE (ID INT NOT NULL);

INSERT INTO dbo.X_OUTER_TABLE WITH (TABLOCK)
SELECT TOP 100 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM master..spt_values;

CREATE TABLE dbo.X_INNER_TABLE (ID INT NOT NULL, PRIMARY KEY (ID));

INSERT INTO dbo.X_INNER_TABLE WITH (TABLOCK)
SELECT TOP 1000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM master..spt_values;

Detailed information for the clustered index seek against X_INNER_TABLE:

node info

The estimated number of rows is 1 because the seek is performed against the full primary key of the table. The actual number of rows is 100 because the outer table has 100 rows, the loop iterates 100 times, and 1 row is found in each loop.

A similar scalar operator shows up in the simple demo as you observed in your query plan. There isn't a lot to go on in your original question, but it's likely that the scalar is indeed a red herring. Instead, I would focus on the difference between the estimated number of executions (580308) and the actual number of executions (6740779). SQL Server estimated that the outer table would have 10% of the rows that it actually had, so the nested loop is costed inappropriately low. It's possible that if the optimizer had a more accurate cardinality estimate for the outer table it would have switched from a nested loop join against this table to a different plan.