Sql-server – Unable to run union in parallel on SQL Server

sql serverunion

I am unable to get this fairly simple query to parallelize the union operation:

select va.ObjectId, 0 as IsFlag
  from Oav.ValueArray va     
 where va.PropertyId = @pPropertyId                
   and va.value in (select value from #MatchValues)                  
 group by va.ObjectId
having count(distinct va.Value) = (select count(*) from #MatchValues)

union all    

select odv.ObjectId, 1 as IsFlag
  from Pub.OtherTable codv
 where PropertyId = 2551
   and Id in (select value from #Ids) 
   and Flag = @pFlag
   and Value in (select value from #MatchValues)
 group by codv.ObjectId
having count(distinct codv.Value) = (select count(*) from #MatchValues)

Running with MAXDOP 1 gives an expected .8s (.5 + .3). I was hoping that increasing MAXDOP to 2 would optimize for the biggest gain by using one processor for each side but that is not the case. Maxdop zero on a lightly loaded 12 Cpu machine all ~4% only results in parallel execution about 10% of the time.

Is there a way to weight the hints so that parallelization at the union point is the most important? Does the syntax support separate MAXDOP's for each side?

I have tried (concat/hash/merge union) with little change.

Match values is usually a small table (~10 rows).

Best Answer

There is no separate MAXDOP for each side. But you could play with:

OPTION (QUERYTRACEON 8649)

This sets the cost threshold of parallelism to 0, meaning it will consider a parallel plan even if the costs are very low. You can also play with DBCC SETCPUWEIGHT, which Paul White describes here or other techniques he has for forcing parallel plans here. or even play with DBCC OPTIMIZER_WHATIF - which really should just be for playing.

There has been a suggestion on Connect to allow for a MINDOP syntax or something similar.

In any case, I'm not convinced that parallelism will necessarily help you in this case. Sure, you might get a parallel plan, but is it really going to reduce the runtime of the query? With all those GROUP BY and DISTINCT (why would you ever need both?) I think you should focus your optimization elsewhere (such as pre-aggregating some of this information perhaps). Or even something simple, like perhaps assign the COUNT(*) FROM #MatchValues to a variable instead of trying to evaluate it twice (not sure if SQL Server will do that in this case, but it can't hurt to remove the temptation).

Related Solutions

Sql-server – Does Detach/Attach or Offline/Online Clear the Buffer Cache for a Particular Database

I initially thought you were on to something here. Working assumption was along the lines that perhaps the buffer pool wasn't immediately flushed as it requires "some work" to do so and why bother until the memory was required. But...

Your test is flawed.

What you're seeing in the buffer pool is the pages read as a result of re-attaching the database, not the remains of the previous instance of the database.

And we can see that the buffer pool was not totally blown away by the detach/attach. Seems like my buddy was wrong. Does anyone disagree or have a better argument?

Yes. You're interpreting physical reads 0 as meaning there were not any physical reads

Table 'DatabaseLog'. Scan count 1, logical reads 782, physical reads 0, read-ahead reads 768, lob logical reads 94, lob physical reads 4, lob read-ahead reads 24.

As described on Craig Freedman's blog the sequential read ahead mechanism tries to ensure that pages are in memory before they're requested by the query processor, which is why you see zero or a lower than expected physical read count reported.

When SQL Server performs a sequential scan of a large table, the storage engine initiates the read ahead mechanism to ensure that pages are in memory and ready to scan before they are needed by the query processor. The read ahead mechanism tries to stay 500 pages ahead of the scan.

None of the pages required to satisfy your query were in memory until read-ahead put them there.

As to why online/offline results in a different buffer pool profile warrants a little more idle investigation. @MarkSRasmussen might be able to help us out with that next time he visits.

Sql-server – Should we really turn on trace flag 1118 for every SQL instance

For SQL Server 2000 → 2014

I generally agree with Paul Randal, that this is something you could always turn on, but I do have a little bit of hesitation to say it should always be on and you should leave it on regardless of what you observe (with or without adequate testing).

People can have poorly set up tempdb configurations, for example not enough files, files too small, files not equally sized, improper growth settings, instant file initialization not enabled, and bad coding symptoms where temp tables are not cached and reused (see here and here). In this scenario, with the perfect storm workload, it's conceivable that the trace flag could actually lead to worse overall performance. Think about a case where those settings are not optimal and you have tons of users all trying to create their own, not-from-cache copies of temp tables, and they're all using their own dedicated extents instead of sharing them. There may be some level, even if I can't produce it at will, where the overhead of dedicated extents outweighs the allocation contention savings.

Just because Paul hasn't seen a scenario where the scales tip at some threshold doesn't mean it's impossible, just that it's unlikely / uncommon.

You can likely avoid that by following all of the other best practices, because I suspect it really would require a perfect storm, I just do not feel strong enough to say always and without exception.

For SQL Server 2016

In SQL Server 2016, uniform extent allocation is the default behavior - see Bob Dorr's "it just runs faster" post about it. Other than that, many of the same things above hold true.

You can't turn this off for tempdb; so, effectively, trace flag 1118 is always on in 2016, and is permanent. So my suggestion would be to make sure tempdb works effectively under TF 1118 on your current instances and under your current workload, to avoid any surprises when you upgrade (at which point you will no longer have the option to turn it off).

If you want to turn it off for user databases in 2016, you use ALTER DATABASE:

ALTER DATABASE MyDatabase SET MIXED_PAGE_ALLOCATION ON;

Best Answer

Related Solutions

Sql-server – Does Detach/Attach or Offline/Online Clear the Buffer Cache for a Particular Database

Sql-server – Should we really turn on trace flag 1118 for every SQL instance

Related Question