I initially thought you were on to something here. Working assumption was along the lines that perhaps the buffer pool wasn't immediately flushed as it requires "some work" to do so and why bother until the memory was required. But...
Your test is flawed.
What you're seeing in the buffer pool is the pages read as a result of re-attaching the database, not the remains of the previous instance of the database.
And we can see that the buffer pool was not totally blown away by the
detach/attach. Seems like my buddy was wrong. Does anyone disagree or
have a better argument?
Yes. You're interpreting physical reads 0
as meaning there were not any physical reads
Table 'DatabaseLog'. Scan count 1, logical reads 782, physical reads
0, read-ahead reads 768, lob logical reads 94, lob physical reads 4,
lob read-ahead reads 24.
As described on Craig Freedman's blog the sequential read ahead mechanism tries to ensure that pages are in memory before they're requested by the query processor, which is why you see zero or a lower than expected physical read count reported.
When SQL Server performs a sequential scan of a large table, the
storage engine initiates the read ahead mechanism to ensure that pages
are in memory and ready to scan before they are needed by the query
processor. The read ahead mechanism tries to stay 500 pages ahead of
the scan.
None of the pages required to satisfy your query were in memory until read-ahead put them there.
As to why online/offline results in a different buffer pool profile warrants a little more idle investigation. @MarkSRasmussen might be able to help us out with that next time he visits.
For SQL Server 2000 → 2014
I generally agree with Paul Randal, that this is something you could always turn on, but I do have a little bit of hesitation to say it should always be on and you should leave it on regardless of what you observe (with or without adequate testing).
People can have poorly set up tempdb configurations, for example not enough files, files too small, files not equally sized, improper growth settings, instant file initialization not enabled, and bad coding symptoms where temp tables are not cached and reused (see here and here). In this scenario, with the perfect storm workload, it's conceivable that the trace flag could actually lead to worse overall performance. Think about a case where those settings are not optimal and you have tons of users all trying to create their own, not-from-cache copies of temp tables, and they're all using their own dedicated extents instead of sharing them. There may be some level, even if I can't produce it at will, where the overhead of dedicated extents outweighs the allocation contention savings.
Just because Paul hasn't seen a scenario where the scales tip at some threshold doesn't mean it's impossible, just that it's unlikely / uncommon.
You can likely avoid that by following all of the other best practices, because I suspect it really would require a perfect storm, I just do not feel strong enough to say always and without exception.
For SQL Server 2016
In SQL Server 2016, uniform extent allocation is the default behavior - see Bob Dorr's "it just runs faster" post about it. Other than that, many of the same things above hold true.
You can't turn this off for tempdb; so, effectively, trace flag 1118 is always on in 2016, and is permanent. So my suggestion would be to make sure tempdb works effectively under TF 1118 on your current instances and under your current workload, to avoid any surprises when you upgrade (at which point you will no longer have the option to turn it off).
If you want to turn it off for user databases in 2016, you use ALTER DATABASE
:
ALTER DATABASE MyDatabase SET MIXED_PAGE_ALLOCATION ON;
Best Answer
There is no separate
MAXDOP
for each side. But you could play with:This sets the cost threshold of parallelism to 0, meaning it will consider a parallel plan even if the costs are very low. You can also play with
DBCC SETCPUWEIGHT
, which Paul White describes here or other techniques he has for forcing parallel plans here. or even play withDBCC OPTIMIZER_WHATIF
- which really should just be for playing.There has been a suggestion on Connect to allow for a
MINDOP
syntax or something similar.In any case, I'm not convinced that parallelism will necessarily help you in this case. Sure, you might get a parallel plan, but is it really going to reduce the runtime of the query? With all those
GROUP BY
andDISTINCT
(why would you ever need both?) I think you should focus your optimization elsewhere (such as pre-aggregating some of this information perhaps). Or even something simple, like perhaps assign theCOUNT(*) FROM #MatchValues
to a variable instead of trying to evaluate it twice (not sure if SQL Server will do that in this case, but it can't hurt to remove the temptation).