Sql-server – Displaying an Estimated Execution Plan generates CXPACKET, PAGELATCH_SH, and LATCH_EX [ACCESS_METHODS_DATASET_PARENT] waits

performancesql serversql-server-2016

I'm running Microsoft SQL Server 2016 SP2-CU6 (13.0.5292.0) on a 4 vCPU VM with max degree of parallelism set to 2 and cost threshold for parallelism set to 50.

In the mornings, when trying to display an Estimated Execution Plan for a SELECT TOP 100 query, I run into massive waits and the operation to render the estimated plan takes minutes, often times in the 5 – 7 minute range. Again, this is not the actual execution of the query, this is just the process to display an Estimated Execution Plan.

sp_WhoIsActive will show either PAGEIOLATCH_SH waits or LATCH_EX [ACCESS_METHODS_DATASET_PARENT] waits and when I run Paul Randal's WaitingTasks.sql script during the operation it shows CXPACKET waits with the worker threads showing PAGEIOLATCH_SH waits:

enter image description here

*resource description field = exchangeEvent id=Port5f6069e600 WaitType=e_waitPortOpen waiterType=Coordinator nodeId=1 tid=0 ownerActivity=notYetOpened waiterActivity=waitForAllOwnersToOpen

The worker threads look to be bringing the entire stats table into memory (as those page numbers as well as subsequent page numbers shown from Paul Randal's query point back to clustered key for the stats table). Once the plan does come back, it's basically instantaneous for the remainder of the day, even after I see most of the stats table attrition from cache with only various records remaining (that I assume were pulled due to seek operations from similar queries).

I would expect this initial behavior if the query was actually executing with a plan that used SCAN operators, but why is it doing this when evaluating execution plans only to arrive at a SEEK operator as shown in the plan linked above? What can I do (aside from running this statement before office hours so my data is appropriately cached) to help improve performance here? I'm assuming a pair of covering indexes would be beneficial, but would they really guarantee any changes in behavior? I have to work within some storage and maintenance window limitations here, and the query itself is generated from a vendor solution, so any other suggestions (besides better indexing) would be welcome at this point.

Best Answer

It appears your request for an actual execution plan triggered stats updates. Since you mention this happens in the mornings, I imagine there's an overnight process that does a lot of modifications to the tables involved?

Thus SQL Server uses the stats to create the plan, has hit the modification threshold, and executes automatic stats updates as part of the operation.

In the XML for the estimated plan you shared, I see these close-together update dates for stats from this morning:

LastUpdate="2019-05-06T09:12:49.92"
LastUpdate="2019-05-06T09:12:58.3"
LastUpdate="2019-05-06T09:13:20.33"
LastUpdate="2019-05-06T09:13:09.67"
LastUpdate="2019-05-06T09:12:59.05"
LastUpdate="2019-05-06T09:12:39.56"

If these are very large, busy tables (seems likely based on the sampling percentages), then it's not too surprising that the stats updates are taking a lot of horsepower.