Sql-server – SQL not engaging parallelism for extremely large query

parallelismsql server

I have a very large query (~630 lines) that involves a lot of nested SELECT statements and pulls from multiple views. Our SQL server has parallelism set to 2 with a threshold of 95 (set that way because our DBA was optimizing it based on some other applications). This query has recently started taking 5-10 minutes to complete, up from usually less than a minute. While investigating the cause, we noticed that it never seems to trigger parallelism, always running serial, and suspect that may have something to do with its performance. What's weird is while experimenting, we even dropped the threshold back down to the default value of 5 seconds, and it still won't run parallel. What could be preventing it?

We've been doing the testing on a non-production environment that nobody else was using at the time, so this was the only query being run. Our DBA also tried things like clearing caches and plans, and even recycling the system, but it had no effect.

Update 1: Per comments, I've verified that stats are updated nightly but the problem persists. We're actually rolling back the code to an earlier version that didn't have such serious performance issues, but will continue to test this code because it was supposed to improve performance over the old code, and actually did in initial testing. Will update here accordingly.

Best Answer

What's weird is while experimenting, we even dropped the threshold back down to the default value of 5 seconds, and it still won't run parallel. What could be preventing it?

There are many reasons that the optimizer will not (or cannot) produce a parallel plan. Broadly:

Obvious causes, such as the configuration setting max degree of parallelism being set to one, running under a Resource Governor workload group with MAX_DOP = 1, or having only one logical processor available to SQL Server
Parallelism-inhibiting operations

There are many operation that prevent parallelism, either because they make no sense in a parallel plan, or because the product just does not support them (yet). Some examples (not exhaustive!) of things that force the whole plan to be serial:
- Modifying the contents of a table variable (reading is fine)
- Any T-SQL scalar function (which are evil anyway)
- CLR scalar functions marked as performing data access (normal ones are fine)
- Some intrinsic functions including OBJECT_NAME, ENCYPTBYCERT, and IDENT_CURRENT
- Fast Forward cursors
- System table access (e.g. sys.tables)
Any reference to a table (or view) with a computed column that uses a non-inline T-SQL scalar function will result in a serial plan, even if the that column is not referenced in the query.
Cardinality estimation errors

If there is nothing that absolutely prevents parallelism in the target query, the optimizer may still choose a serial alternative if it has a lower estimated cost. This can be caused by incorrect cardinality estimation.
Costing model limitations

SQL Server uses a model to estimate the runtime cost of each operator in a query plan. This model may incorrect cost a serial plan cheaper than the parallel alternative. The optimizer always chooses the cheapest-looking option it considers.

When SQL Server costs a parallel plan, it generally reduces the CPU cost for a parallel iterator by the a factor equal to the expected runtime DOP. Plans with Nested Loops joins can be a special problem, because the inner side almost always runs multiple threads serially. The parallelism icons are still present, but they indicate that there are DOP independent serial threads.

The distinction is perhaps a subtle one, but it (a) explains why operators that normally force a serial zone can run 'in parallel' on the inner side of a loops join; and (b) the optimizer does not reduce the CPU costs on the inner side by the estimated runtime DOP. This puts nested loops at a disadvantage when it comes to parallelism costing, compared with Hash and Merge Joins, and can explain why the optimizer may fail to choose a parallel Nested Loops plan.
Optimizer code path issues

The optimizer may not get as far as evaluating a parallel plan at all. One way this can occur is if the final plan is found during the Trivial Plan stage. If a Trivial Plan is possible, and the resulting cost is less than the configured cost threshold for parallelism, the full optimization stages are skipped and a serial plan is returned immediately.

Queries that pass Trivial Plan may still terminate optimization early (before parallelism is considered), either with a Good Enough Plan Found message, or a Time Out. Both of these are heuristics to prevent the optimizer spending more time optimizing than it stands to gain by reducing estimated execution time (cost).

Testing

There is no supported way to require a parallel plan but there are a couple of undocumented tricks (not suitable for production). One is to temporarily set the CPU weighting used during optimization much higher, and the other is to set trace flag 8649. For SQL Server 2016 SP1 CU2 and later, the undocumented query hint OPTION (USE HINT ('ENABLE_PARALLEL_PLAN_PREFERENCE')) performs the same function as TF 8649, but without the need for admin permissions.

The plan produced might not be one the optimizer would normally consider, but you may be able to capture it and use it in a plan guide in production after careful testing and review.

For more information, see my article Forcing a Parallel Query Execution Plan and Non Parallelizable operations in SQL Server by Simon Sabin.

Best Answer

Testing

Related Solutions

SQL Server – Dealing with CXPACKET Waits by Setting Cost Threshold for Parallelism

Sql-server – When to make changes to the cost threshold for parallelism

Related Question