Sql-server – Parallel Plan Selection

execution-planparallelismsql serversql-server-2008-r2

I have a weird query plan problem. I have two databases (Let's call them DB1 and DB2) where both are sitting in the same SQL-Server instance and have identical schema. In there, we have a couple tables, dbo.CostCard where we have 43258326 rows, and dbo.CostType, where we have 150 rows for both databases.

We have been doing some application tests for the past few weeks against DB1. As a result of these test, data has changed for both tables. Currently, table dbo.CostCard increased to 43379268 (An addition of 120942 rows), and table dbo.CostType increased to 199 (Addition of 49 rows). We also have implemented a maintenance strategy that uses logic to reorganize/rebuild indexes based on fragmentation, and also update statistics if data has been changed, where we're updating with full scan.

Currently, only DB1 has this maintenance routine setup, and we've noticed that stats and indexes for both tables have been updated correctly. So far so good!

Now here comes the weird part. We have a fairly simple statement that we've noticed a pretty big performance degradation. Here's the statement:

SELECT DISTINCT TOP 100
    n1t1.Description
FROM
    CostCard AS n0t0
JOIN CostType AS n1t1
    ON ( n0t0.CostType ) = ( n1t1.Code )
WHERE
    ( ( n1t1.Description ) LIKE ( '%legal research%' ) )
    AND ( ( n1t1.Description ) IS NOT NULL )
ORDER BY
    n1t1.Description

What we noticed is that the query optimizer is creating a Serial plan (We've recompile it many times) for DB1, where we know stats and indexes are being regularly updated, and is using a Parallel plan (Better Plan, and yes, we've recompiled many times as well) for DB2 which has been sitting idle for the past few months!!!

How is this possible? I've been trying to figure this out for the past couple weeks but have ran out of ideas. Can someone shed some light here?

P.S.: I have attached a compressed file with all the info, including the query plans and statistics info.

https://dl.dropboxusercontent.com/u/72497299/Terrible%20Bad%20Query%20Plan.zip

Thanks and REALLY, REALLY appreciate any help!!!

Best Answer

From the SQL Server query optimizer's point of view, there is not much to choose between the parallel and serial execution plans in this case.

In general, the optimizer's cost model reduces the CPU cost (not the I/O cost) of operators in a parallel plan in proportion to the estimated degree of parallelism available. This CPU adjustment explains why the optimizer ever chooses a parallel plan (which will generally consume more resources) over a serial plan.

Unfortunately, the cost model does not apply this CPU reduction to the inner side of a nested loops join. It makes no sense to me (because the inner side still uses parallelism efficiently), but I didn't design the cost model.

Anyway, because the majority of the CPU cost in this execution plan is associated with the Clustered Index Scan (for which CPU reduction does not apply), the choice between serial and parallel is a close one. In broad terms, to be selected, a parallel plan must save enough using the local/global Stream Aggregate to compensate for the extra exchanges (Distribute and Gather Streams). The costs involved in that decision depend sensitively on the distribution of row values, as well as the number of rows. With relatively few rows and low-CPU operators, the trade-off can easily go either way.

In short, this query suffers from a debatable design choice applied to the costing of parallel nested loops joins. You can force the selection of a parallel plan using a plan guide, or by using the undocumented trace flag 8649. In SQL Server 2016 SP1 CU2 onward, you can also use the undocumented ENABLE_PARALLEL_PLAN_PREFERENCE hint.

Related Solutions

Sql-server – SQL Server Maintenance Plan TSQL query to log the backup databases

You can get the backup date for each database backed up after the plan started from

--INSERT INTO YourBackupLogTable
--     (DatabaseName, LastBackupStartDate, LastBakupFinishDate, LastBackupDurationSecs)
SELECT db.name AS DatabaseName
    , MAX(lb.backup_start_date) AS LastBackupStartDate
    , MAX(lb.backup_finish_date) AS LastBakupFinishDate
    , DATEDIFF(S, MAX(lb.backup_start_date), MAX(lb.backup_finish_date) ) AS LastBackupDurationSecs
FROM sys.databases AS db
JOIN msdb.dbo.backupset lb ON lb.database_name = db.name
WHERE lb.backup_start_date >= (SELECT MAX(l.start_time) AS Start 
                                 FROM msdb.dbo.sysmaintplan_plans p
                           INNER JOIN msdb.dbo.sysmaintplan_log l ON p.id = l.plan_id
                                WHERE name = 'Your backup plan')
GROUP BY db.name

Sql-server – Should I refresh query plan cache

Lets take your problem step by step:

It has been running great until one fine morning I pushed a lot of rows to a table that is heavily used and since then I was getting query timeouts.

Whenever you do large updates/inserts to you tables, highly recommend to update stats and reorg/rebuild indexes. That way query optimizer does not select or produce bad plans on wrong estimates.

In order to fix the problem I run the Database Tuning Advisor and I added some Indexes and Statistics.

Never do that without understanding your workload and proper testing the recommendations. Refer to Don’t just blindly create those “missing” indexes! by Aaron Bertrand.

I also created a maintenance plan to rebuild indexes daily.

I would recommend you to look at the fragmentation ratio of the indexes and accordingly reorganize them or rebuild them. Best is to use SQL Server Index and Statistics Maintenance as it is best free software out and implemented widely.

Next, life forced me to clean up the table in matter, and the amount of rows in it is even smaller now than it was before but the timeouts are still happening. So, I removed all indexes that I created and now the website is much more stable but from time to time I still see some timeouts.

Same as above. Now the optimizer has wrong statistics and hence inefficient query plan can be produced. Best is to UPDATE STATISTICS and mark that table for recompile using sp_recompile , so next time the optimizer will generate new plan based on updated stats available.

Also read up on : Slow in the Application, Fast in SSMS?

Best Answer

Related Solutions

Sql-server – SQL Server Maintenance Plan TSQL query to log the backup databases

Sql-server – Should I refresh query plan cache

Related Question