Sql-server – Parallel Plan Selection

execution-planparallelismsql serversql-server-2008-r2

I have a weird query plan problem. I have two databases (Let's call them DB1 and DB2) where both are sitting in the same SQL-Server instance and have identical schema. In there, we have a couple tables, dbo.CostCard where we have 43258326 rows, and dbo.CostType, where we have 150 rows for both databases.

We have been doing some application tests for the past few weeks against DB1. As a result of these test, data has changed for both tables. Currently, table dbo.CostCard increased to 43379268 (An addition of 120942 rows), and table dbo.CostType increased to 199 (Addition of 49 rows). We also have implemented a maintenance strategy that uses logic to reorganize/rebuild indexes based on fragmentation, and also update statistics if data has been changed, where we're updating with full scan.

Currently, only DB1 has this maintenance routine setup, and we've noticed that stats and indexes for both tables have been updated correctly. So far so good!

Now here comes the weird part. We have a fairly simple statement that we've noticed a pretty big performance degradation. Here's the statement:

SELECT DISTINCT TOP 100
    n1t1.Description
FROM
    CostCard AS n0t0
JOIN CostType AS n1t1
    ON ( n0t0.CostType ) = ( n1t1.Code )
WHERE
    ( ( n1t1.Description ) LIKE ( '%legal research%' ) )
    AND ( ( n1t1.Description ) IS NOT NULL )
ORDER BY
    n1t1.Description

What we noticed is that the query optimizer is creating a Serial plan (We've recompile it many times) for DB1, where we know stats and indexes are being regularly updated, and is using a Parallel plan (Better Plan, and yes, we've recompiled many times as well) for DB2 which has been sitting idle for the past few months!!!

How is this possible? I've been trying to figure this out for the past couple weeks but have ran out of ideas. Can someone shed some light here?

P.S.: I have attached a compressed file with all the info, including the query plans and statistics info.

https://dl.dropboxusercontent.com/u/72497299/Terrible%20Bad%20Query%20Plan.zip

Thanks and REALLY, REALLY appreciate any help!!!

Best Answer

From the SQL Server query optimizer's point of view, there is not much to choose between the parallel and serial execution plans in this case.

In general, the optimizer's cost model reduces the CPU cost (not the I/O cost) of operators in a parallel plan in proportion to the estimated degree of parallelism available. This CPU adjustment explains why the optimizer ever chooses a parallel plan (which will generally consume more resources) over a serial plan.

Unfortunately, the cost model does not apply this CPU reduction to the inner side of a nested loops join. It makes no sense to me (because the inner side still uses parallelism efficiently), but I didn't design the cost model.

Anyway, because the majority of the CPU cost in this execution plan is associated with the Clustered Index Scan (for which CPU reduction does not apply), the choice between serial and parallel is a close one. In broad terms, to be selected, a parallel plan must save enough using the local/global Stream Aggregate to compensate for the extra exchanges (Distribute and Gather Streams). The costs involved in that decision depend sensitively on the distribution of row values, as well as the number of rows. With relatively few rows and low-CPU operators, the trade-off can easily go either way.

In short, this query suffers from a debatable design choice applied to the costing of parallel nested loops joins. You can force the selection of a parallel plan using a plan guide, or by using the undocumented trace flag 8649. In SQL Server 2016 SP1 CU2 onward, you can also use the undocumented ENABLE_PARALLEL_PLAN_PREFERENCE hint.