From the SQL Server query optimizer's point of view, there is not much to choose between the parallel and serial execution plans in this case.
In general, the optimizer's cost model reduces the CPU cost (not the I/O cost) of operators in a parallel plan in proportion to the estimated degree of parallelism available. This CPU adjustment explains why the optimizer ever chooses a parallel plan (which will generally consume more resources) over a serial plan.
Unfortunately, the cost model does not apply this CPU reduction to the inner side of a nested loops join. It makes no sense to me (because the inner side still uses parallelism efficiently), but I didn't design the cost model.
Anyway, because the majority of the CPU cost in this execution plan is associated with the Clustered Index Scan (for which CPU reduction does not apply), the choice between serial and parallel is a close one. In broad terms, to be selected, a parallel plan must save enough using the local/global Stream Aggregate to compensate for the extra exchanges (Distribute and Gather Streams). The costs involved in that decision depend sensitively on the distribution of row values, as well as the number of rows. With relatively few rows and low-CPU operators, the trade-off can easily go either way.
In short, this query suffers from a debatable design choice applied to the costing of parallel nested loops joins. You can force the selection of a parallel plan using a plan guide, or by using the undocumented trace flag 8649. In SQL Server 2016 SP1 CU2 onward, you can also use the undocumented ENABLE_PARALLEL_PLAN_PREFERENCE
hint.
Am I correct in saying that statistics are only used when creating the execution plan for a stored procedure, and they are not used in the actual execution context?
No, what happens is that the execution plan for a stored procedure is cached. Assuming there is enough available memory to continue holding the plan, it won't change unless one of the following happens (from Execution Plan Caching and Reuse in the SQL Server documentation, emphasis added):
- Changes made to a table or view referenced by the query (ALTER TABLE and ALTER VIEW).
- Changes made to a single procedure, which would drop all plans for that procedure from the cache (ALTER PROCEDURE).
- Changes to any indexes used by the execution plan.
- Updates on statistics used by the execution plan, generated either explicitly from a statement, such as UPDATE STATISTICS, or generated automatically.
- Dropping an index used by the execution plan.
- An explicit call to sp_recompile.
- Large numbers of changes to keys (generated by INSERT or DELETE statements from other users that modify a table referenced by the query).
- For tables with triggers, if the number of rows in the inserted or deleted tables grows significantly.
- Executing a stored procedure using the WITH RECOMPILE option.
So if the statistics are updated, the cached plan will automatically take the new stats into account and be recompiled.
How do you prevent execution plans from growing stale when you have a hundred thousand rows being added a day?
One way is if there are a lot of updates to the table, as mentioned above. A few hundred thousand changed rows may satisfy this condition. But if you want to be sure or have more granular control: by updating your stats. You can allow SQL Server to auto create and manage statistics, or manually do it yourself. You can find more info on either method at SQL Server Auto Update and Auto Create Statistics Options. When/if you do a weekly rebuild of indexes, this will also trigger the plans to be updated as well. Do some testing to see what is most beneficial to you, as updating statistics too often may not yield any real performance results.
If we're updating statistics frequently to combat this issue, would it make sense to use the OPTION (RECOMPILE) hint on this stored procedure's query?
You don't need to use RECOMPILE
, since based off the excerpt above you can see that the execution plan gets updated appropriately whenever new stats are available. You may be fine with an end of day statistics update (if you're truly concerned) but I don't think it's explicitly a need based on what you've said so far. Again, though, I'd test it to see what impact this may have on your stored procedure performance and plan accordingly.
Best Answer
Statistics are being updated when the following number of rows have been affected in the said table : 20% * (table row count) + 500, unless the table contains less than 6 rows (in this case it is updated every time 6 rows have been affected), or less than 500 (same rule applies).
Now for the plan you displayed, it looks very weird. I can't repro it either. Can you tell us how yo updated the statistics please ?
@++ ;)