Sql-server – Why would the Parallelism (Repartition Streams) Operator Reduce Row Estimates to 1

cardinality-estimatesexecution-plansql serversql-server-2012

I'm using SQL Server 2012 Enterprise. I've come across a SQL Plan that is exhibiting some behavior I don't find entirely intuitive. After a heavy Parallel Index Scan operation, a Parallelism (Repartition Streams) operation occurs, but it is killing the row estimates being returned by the Index Scan (Object10.Index2), reducing the estimate to 1. I've done some searching, but haven't come across anything that explains this behavior. The query is quite simple, though each of the tables contain records in the low millions. This is part of a DWH load process and this intermediate data set is touched a few times throughout, but the question I have is related to the row estimates in particular. Can someone explain why accurate row estimates go to 1 within the Parallelism (Repartition Strems) Operator? Also, is this something I should be concerned with in this particular situation?

I've posted the full plan to Paste the Plan.

Here's the operation in question:

Including the Plan Tree in case that adds any more context:

Could I be running into some variation of this Connect item filed by Paul White (further in-depth explination on his blog here)? At least it's the only thing I've found that seems to be even remotely close to what I'm running into even though there is no TOP operator in play.

Best Answer

Query plans with bitmap filters can sometimes be tricky to read. From the BOL article for repartition streams (emphasis mine):

The Repartition Streams operator consumes multiple streams and produces multiple streams of records. The record contents and format are not changed. If the query optimizer uses a bitmap filter, the number of rows in the output stream is reduced.

In addition, an article on bitmap filters is also helpful:

When analyzing an execution plan containing bitmap filtering, it is important to understand how the data flows through the plan and where filtering is applied. The bitmap filter and optimized bitmap is created on the build input (the dimension table) side of a hash join; however, the actual filtering is typically done within the Parallelism operator, which is on the probe input (the fact table) side of the hash join. However, when the bitmap filter is based on an integer column, the filter can be applied directly to the initial table or index scan operation rather than the Parallelism operator. This technique is called in-row optimization.

I believe that's what you're observing with your query. It is possible to come up with a relatively simple demo to show a repartition streams operator reducing a cardinality estimate, even when the bitmap operator is IN_ROW against the fact table. Data prep:

create table outer_tbl (ID BIGINT NOT NULL);

INSERT INTO outer_tbl WITH (TABLOCK)
SELECT TOP (1000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM master..spt_values;

create table inner_tbl_1 (ID BIGINT NULL);
create table inner_tbl_2 (ID BIGINT NULL);

INSERT INTO inner_tbl_1 WITH (TABLOCK)
SELECT (ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) / 2000000 - 2) NUM
FROM master..spt_values t1
CROSS JOIN master..spt_values t2;

INSERT INTO inner_tbl_2 WITH (TABLOCK)
SELECT (ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) / 2000000 - 2) NUM
FROM master..spt_values t1
CROSS JOIN master..spt_values t2;

Here is a query that you should not run:

SELECT *
FROM outer_tbl o
INNER JOIN inner_tbl_1 i ON o.ID = i.ID
INNER JOIN inner_tbl_2 i2 ON o.ID = i2.ID
OPTION (HASH JOIN, QUERYTRACEON 9481, QUERYTRACEON 8649);

I uploaded the plan. Take a look at the operator near inner_tbl_2:

You may also find the second test in Hash Joins on Nullable Columns by Paul White helpful.

There are some inconsistencies in how the row reduction is applied. I was only able to see it in a plan with at least three tables. However, the reduction in expected rows seems reasonable with the right data distribution. Suppose that the joined column in the fact table has many repeated values that aren't present in the dimension table. A bitmap filter might eliminate those rows before they reach the join. For your query the estimate is reduced all the way to 1. How the rows are distributed amongst the hash function provides a good hint:

Based on that I suspect that you have a lot of repeated values for the Object1.Column21 column. If the repeated columns happen to not be in the stats histogram for Object4.Column19 then SQL Server could get the cardinality estimate very wrong.

I think that you should be concerned in that it might be possible to improve the performance of the query. Of course, if the query meets response time or SLA requirements then it may not be worth further investigation. However, if you do wish to investigate further there are a few things that you can do (other than updating stats) to get an idea on if the query optimizer would pick a better plan if it had better information. You could put the results of the join between Database1.Schema1.Object10 and Database1.Schema1.Object11 into a temp table and see if you continue to get nested loop joins. You could change that join to a LEFT OUTER JOIN so the query optimizer won't reduce the number of rows at that step. You could add a MAXDOP 1 hint to your query to see what happens. You could use TOP along with a derived table to force the join to go last, or you could even comment out the join from the query. Hopefully these suggestions are enough to get you started.

Regarding the connect item in the question, it is extremely unlikely that it is related to your question. That issue doesn't have to do with poor row estimates. It has to do with a race condition in parallelism that causes too many rows to be processed in the query plan behind the scenes. Here it looks like your query isn't doing any extra work.

Related Solutions

Sql-server – Inaccurate ‘Actual’ row counts in parallel plan

Rows are passed across exchanges internally from producer to consumer thread in packets (hence CXPACKET - class exchange packet), rather than a row-at-a-time. There is a certain amount of buffering inside the exchange. Also, the call to shut down the pipeline from the consumer side of the Gather Streams has to be passed in a control packet back to the producer threads. Scheduling and other internal considerations mean that parallel plans always have a certain 'stopping distance'.

As a consequence, you will often see this sort of row count difference where less than the whole potential rowset of a sub-tree is actually required. In this case, the TOP brings execution to an 'early end'.

More information:

Sql-server – How to row estimates be improved in order to reduce chances of spills to tempdb

I won't comment about spills, tempdb or hints because the query seems pretty simple to need that much consideration. I think SQL-Server's optimizer will do its job quite good, if there are indexes suited for the query.

And your splitting into two queries is good as it shows what indexes will be useful. The first part:

(select convert(bigint, Value) NodeId
 from Oav.ValueArray
 where PropertyId = 3331  
   and ObjectId = 3540233
   and Sequence = 2)

needs an index on (PropertyId, ObjectId, Sequence) including the Value. I'd make it UNIQUE to be safe. The query would throw error anyway during runtime if more than one rows were returned, so it's good to ensure in advance that this won't happen, with the unique index:

CREATE UNIQUE INDEX
    PropertyId_ObjectId_Sequence_UQ
  ON Oav.ValueArray
    (PropertyId, ObjectId, Sequence) INCLUDE (Value) ;

The second part of the query:

select Value
  from Oav.ValueArray
 where ObjectId = @a               
   and PropertyId = 2840

needs an index on (PropertyId, ObjectId) including the Value:

CREATE INDEX
    PropertyId_ObjectId_IX
  ON Oav.ValueArray
    (PropertyId, ObjectId) INCLUDE (Value) ;

If efficiency is not improved or these indexes were not used or there are still differences in row estimates appearing, then there would be need to look further into this query.

In that case, the conversions (needed from the EAV design and the storing of different datatypes in the same columns) are a probable cause and your solution of splitting (as @AAron Bertrand and @Paul White comment) the query into two parts seems natural and the way to go. A redesign so to have different datatypes in their respective columns might be another.

Best Answer

Related Solutions

Sql-server – Inaccurate ‘Actual’ row counts in parallel plan

Sql-server – How to row estimates be improved in order to reduce chances of spills to tempdb

Related Question