Sql-server – Is support for Parallel Scalar UDF a reasonable feature request

functionsparallelismsql server

It is fairly well documented that scalar UDF's force an overall serial plan.

Running functions in parallel

Given a large number of rows coming into a point in the pipeline where a UDF must be calculated, why can't the engine just distribute them among the processors? If there is no state within a UDF then the order shouldn't matter.

There are claims about UDF's being a black box must use cursor. I can see that a user cursor cannot be parallelized within an SP for the cases where some state is maintained between iterations but seems like it should be parallelizable otherwise.

Extra points for explaining why the engine forces the whole plan to be serial instead of just the UDF calculation stage.

Is support for parallel UDF a reasonable feature to request?

Best Answer

It is fairly well documented that UDFs force an overall serial plan.

I'm not certain it is all that well documented.

  • A scalar T-SQL function prevents parallelism anywhere in the plan.
  • A scalar CLR function can be executed in parallel, so long as it does not access the database.
  • A multi-statement table-valued T-SQL function forces a serial zone in a plan that may use parallelism elsewhere.
  • An inline table-valued T-SQL function is expanded like a view, so has no direct effect.

See Forcing a Parallel Execution Plan and/or Craig Freedman's Parallel Execution presentation.

There are claims about UDFs being a black box must use cursor.

These claims are not correct.

Extra points for explaining why the engine forces the whole plan to be serial instead of just the UDF calculation stage.

My understanding is that the current restrictions are a purely the result of certain implementation details. There is no fundamental reason why functions could not be executed using parallelism.

Specifically, T-SQL scalar functions execute inside a separate T-SQL context, which complicates correct operation, coordination and shutdown (especially in the case of an error) significantly.

Equally, table variables do support parallel reads (but not writes) in general, but the table variable exposed by a table-valued function is not able to support parallel reads for implementation-specific reasons. You would need someone with source code access (and the freedom to share details) to provide an authoritative answer, I'm afraid.

Is support for parallel UDF a reasonable feature to request?

Of course, if you can make a strong-enough case. My own feeling is that the work involved would be extensive, so your proposal would have to meet an extremely high bar. For example, a related (and much simpler) request to provide inline scalar functions has great support, but has languished unimplemented for years now.


You might like to read the Microsoft paper:

...which outlines the approach Microsoft look to be taking to address T-SQL scalar function performance issues in the release after SQL Server 2017.

The goal of Froid is to enable developers to use the abstractions of UDFs and procedures without compromising on performance. Froid achieves this goal using a novel technique to automatically convert imperative programs into equivalent relational algebraic forms whenever possible. Froid models blocks of imperative code as relational expressions, and systematically combines them into a single expression using the Apply operator, thereby enabling the query optimizer to choose efficient set-oriented, parallel query plans.

(emphasis mine)


Inline scalar T-SQL functions are now implemented in SQL Server 2019.