It is fairly well documented that UDFs force an overall serial plan.
I'm not certain it is all that well documented.
- A scalar T-SQL function prevents parallelism anywhere in the plan.
- A scalar CLR function can be executed in parallel, so long as it does not access the database.
- A multi-statement table-valued T-SQL function forces a serial zone in a plan that may use parallelism elsewhere.
- An inline table-valued T-SQL function is expanded like a view, so has no direct effect.
See Forcing a Parallel Execution Plan and/or Craig Freedman's Parallel Execution presentation.
There are claims about UDFs being a black box must use cursor.
These claims are not correct.
Extra points for explaining why the engine forces the whole plan to be serial instead of just the UDF calculation stage.
My understanding is that the current restrictions are a purely the result of certain implementation details. There is no fundamental reason why functions could not be executed using parallelism.
Specifically, T-SQL scalar functions execute inside a separate T-SQL context, which complicates correct operation, coordination and shutdown (especially in the case of an error) significantly.
Equally, table variables do support parallel reads (but not writes) in general, but the table variable exposed by a table-valued function is not able to support parallel reads for implementation-specific reasons. You would need someone with source code access (and the freedom to share details) to provide an authoritative answer, I'm afraid.
Is support for parallel UDF a reasonable feature to request?
Of course, if you can make a strong-enough case. My own feeling is that the work involved would be extensive, so your proposal would have to meet an extremely high bar. For example, a related (and much simpler) request to provide inline scalar functions has great support, but has languished unimplemented for years now.
You might like to read the Microsoft paper:
...which outlines the approach Microsoft look to be taking to address T-SQL scalar function performance issues in the release after SQL Server 2017.
The goal of Froid is to enable developers to use the abstractions of UDFs
and procedures without compromising on performance. Froid achieves this goal using a novel technique to automatically convert imperative programs into equivalent relational algebraic forms whenever possible. Froid models blocks of imperative code as relational expressions, and systematically combines them into a single expression using the Apply operator, thereby enabling the query optimizer to choose efficient set-oriented, parallel query plans.
(emphasis mine)
Inline scalar T-SQL functions are now implemented in SQL Server 2019.
After clearing the plan cache on SQL Server, how do I regenerate execution plans for stored procedures without executing the stored procedures themselves?
You don't / can't. Execution plans are generated when the object is executed. If you want a plan in the plan cache, then the object must be executed. BUT, assuming that there is at least one input parameter to the stored procedure, you would need to use a value that represents probably the majority case, else you could be forcing a non-ideal / "bad" plan into the cache.
if I have to do this again, I'd like to avoid the performance hit from regenerating the plans while a user is waiting for a response.
You are better off taking the performance hit because you can't guarantee that the plan will even still be in the cache by the time the first person uses it. Plans can be dropped if SQL Server decides that it needs more memory for other operations. So if it does get dropped before the first person uses it, you saved them nothing, and now caused the server to generate the plan twice.
Besides, it is only a hit the first time. You are aware of this as you are trying to use it to your advantage in "pre"-caching it. But this also means that it will only be the first user that first time that gets a slight performance hit. After that, as long as the plan isn't dropped, others wouldn't be experiencing the performance hit anyway.
Save yourself a lot of wasted time / effort and don't worry about pre-caching. Worry more about performance problems with the queries in the stored procedures that would affect all users each time and spend your time wisely improving those trouble spots.
Best Answer
At first glance, this sounds like a classic parameter sniffing problem.
SQL Server builds an execution plan for the first set of parameters that get called when the plan needs to be compiled, and then reuses that plan over and over through the day. You can see what parameters they are - when you're viewing the serial plan, right-click on the select statement, and go into properties. On the properties window, look for Parameters, and then Compiled Values. That'll show you which values produce the serial plan.
To force the plan to always run parallel, you have several different options (many of which Erland covers in his excellent post that I linked to above), including:
That's just a quick answer - but for much, much more, read Erland's excellent post, Slow in the App, Fast in SSMS that explains how one query can get different plans, and how to fix it.