Sql-server – Is support for Parallel Scalar UDF a reasonable feature request

functionsparallelismsql server

It is fairly well documented that scalar UDF's force an overall serial plan.

Given a large number of rows coming into a point in the pipeline where a UDF must be calculated, why can't the engine just distribute them among the processors? If there is no state within a UDF then the order shouldn't matter.

There are claims about UDF's being a black box must use cursor. I can see that a user cursor cannot be parallelized within an SP for the cases where some state is maintained between iterations but seems like it should be parallelizable otherwise.

Extra points for explaining why the engine forces the whole plan to be serial instead of just the UDF calculation stage.

Is support for parallel UDF a reasonable feature to request?

Best Answer

It is fairly well documented that UDFs force an overall serial plan.

I'm not certain it is all that well documented.

A scalar T-SQL function prevents parallelism anywhere in the plan.
A scalar CLR function can be executed in parallel, so long as it does not access the database.
A multi-statement table-valued T-SQL function forces a serial zone in a plan that may use parallelism elsewhere.
An inline table-valued T-SQL function is expanded like a view, so has no direct effect.

See Forcing a Parallel Execution Plan and/or Craig Freedman's Parallel Execution presentation.

There are claims about UDFs being a black box must use cursor.

These claims are not correct.

Extra points for explaining why the engine forces the whole plan to be serial instead of just the UDF calculation stage.

My understanding is that the current restrictions are a purely the result of certain implementation details. There is no fundamental reason why functions could not be executed using parallelism.

Specifically, T-SQL scalar functions execute inside a separate T-SQL context, which complicates correct operation, coordination and shutdown (especially in the case of an error) significantly.

Equally, table variables do support parallel reads (but not writes) in general, but the table variable exposed by a table-valued function is not able to support parallel reads for implementation-specific reasons. You would need someone with source code access (and the freedom to share details) to provide an authoritative answer, I'm afraid.

Is support for parallel UDF a reasonable feature to request?

Of course, if you can make a strong-enough case. My own feeling is that the work involved would be extensive, so your proposal would have to meet an extremely high bar. For example, a related (and much simpler) request to provide inline scalar functions has great support, but has languished unimplemented for years now.

You might like to read the Microsoft paper:

Froid: Optimization of Imperative Programs in a Relational Database (pdf)

...which outlines the approach Microsoft look to be taking to address T-SQL scalar function performance issues in the release after SQL Server 2017.

The goal of Froid is to enable developers to use the abstractions of UDFs and procedures without compromising on performance. Froid achieves this goal using a novel technique to automatically convert imperative programs into equivalent relational algebraic forms whenever possible. Froid models blocks of imperative code as relational expressions, and systematically combines them into a single expression using the Apply operator, thereby enabling the query optimizer to choose efficient set-oriented, parallel query plans.

(emphasis mine)

Inline scalar T-SQL functions are now implemented in SQL Server 2019.

Related Solutions

Sql-server – Faster alternative to scalar UDF with recursion to walk tree hierarchy

Let's start with this. Let us know if it looks something like this. Please edit as necessary. I am adding data into the table:

INSERT INTO Customer (CustomerID, ParentCustomerID, IsBillToCustomer)
select  1   ,   0   ,   0   union all
select  2   ,   0   ,   0   union all
select  3   ,   0   ,   0       union all
select  4   ,   1   ,   0   union all
select  5   ,   2   ,   0   union all
select  6   ,   3   ,   1   union all
select  7   ,   1   ,   1   union all
select  8   ,   2   ,   1   union all
select  9   ,   3   ,   1   union all
select  10  ,   1   ,   0   union all
select  11  ,   2   ,   0   union all
select  12  ,   3   ,   0   union all
select  13  ,   1   ,   0   union all
select  14  ,   2   ,   0   union all
select  15  ,   3   ,   1   union all
select  16  ,   1   ,   1   union all
select  17  ,   2   ,   1   union all
select  18  ,   3   ,   1   union all
select  19  ,   1   ,   0   union all
select  20  ,   2   ,   1   union all
select  21  ,   3   ,   1   union all
select  22  ,   1   ,   1   union all
select  23  ,   2   ,   0   union all
select  24  ,   3   ,   0   union all
select  25  ,   1   ,   1   union all
select  26  ,   2   ,   1   union all
select  27  ,   3   ,   1   union all
select  28  ,   1   ,   1   union all
select  29  ,   2   ,   0   union all
select  30  ,   3   ,   0   union all
select  31  ,   1   ,   0   union all
select  32  ,   2   ,   0   union all
select  33  ,   3   ,   0   union all
select  34  ,   1   ,   1   union all
select  35  ,   2   ,   1   union all
select  36  ,   3   ,   1   union all
select  37  ,   1   ,   1   union all
select  38  ,   2   ,   0   union all
select  39  ,   3   ,   1   union all
select  40  ,   1   ,   1   union all
select  41  ,   2   ,   1   union all
select  42  ,   3   ,   0   union all
select  43  ,   1   ,   0   union all
select  44  ,   2   ,   1   union all
select  45  ,   3   ,   1   union all
select  46  ,   1   ,   1   union all
select  47  ,   2   ,   1   union all
select  48  ,   3   ,   0   union all
select  49  ,   1   ,   0   union all
select  50  ,   2   ,   0   union all
select  51  ,   3   ,   0   union all
select  52  ,   1   ,   0   union all
select  53  ,   2   ,   1   union all
select  54  ,   3   ,   1   union all
select  55  ,   1   ,   1   union all
select  56  ,   2   ,   1   union all
select  57  ,   3   ,   0   union all
select  58  ,   1   ,   1   union all
select  59  ,   2   ,   1   union all
select  60  ,   3   ,   1   union all
select  61  ,   1   ,   0   union all
select  62  ,   2   ,   0   union all
select  63  ,   3   ,   1   union all
select  64  ,   1   ,   1   union all
select  65  ,   2   ,   1   union all
select  66  ,   3   ,   1   union all
select  67  ,   1   ,   0   union all
select  68  ,   2   ,   0   union all
select  69  ,   3   ,   0   union all
select  70  ,   1   ,   0   union all
select  71  ,   2   ,   0   union all
select  72  ,   3   ,   1   union all
select  73  ,   1   ,   1   union all
select  74  ,   2   ,   1   union all
select  75  ,   3   ,   1   union all
select  76  ,   1   ,   0   union all
select  77  ,   2   ,   1   union all
select  78  ,   3   ,   1   union all
select  79  ,   1   ,   1   union all
select  80  ,   2   ,   0   union all
select  81  ,   3   ,   0   union all
select  82  ,   1   ,   1   union all
select  83  ,   2   ,   1   union all
select  84  ,   3   ,   1   union all
select  85  ,   1   ,   1   union all
select  86  ,   2   ,   0   union all
select  87  ,   3   ,   0   union all
select  88  ,   1   ,   0   union all
select  89  ,   2   ,   0   union all
select  90  ,   3   ,   0   union all
select  91  ,   1   ,   1   union all
select  92  ,   2   ,   1   union all
select  93  ,   3   ,   1   union all
select  94  ,   1   ,   1   union all
select  95  ,   2   ,   0   union all
select  96  ,   3   ,   1   union all
select  97  ,   1   ,   1   union all
select  98  ,   2   ,   1   union all
select  99  ,   3   ,   0   union all
select  100 ,   1   ,   0   union all
select  101 ,   2   ,   1   union all
select  102 ,   3   ,   1   union all
select  103 ,   1   ,   1

So, let's run a simple CTE with the data we have. Then from here show us where we are and what we're trying to achieve:

;With Parent(CustomerID, ParentCustomerID, IsBillToCustomer)
As
(
   Select c.CustomerID, c.ParentCustomerID, c.IsBillToCustomer
    from Customer c
    WHERE c.ParentCustomerID = 0
    UNION ALL
  Select c.CustomerID, c.ParentCustomerID, c.IsBillToCustomer
    from Customer c
    inner join Customer p on p.CustomerID=c.ParentCustomerID
)

Select *
  from Parent p

Help us understand the problem. Thanks.

Sql-server – “Lock request time out period exceeded” when publishing SqlServer databases in parallel

Looking at the MSDN documentation for sp_fulltext_database we see the following note:

Has no effect on full-text catalogs in SQL Server 2008 and later versions and is supported for backward compatibility only. sp_fulltext_database does not disable the Full-Text Engine for a given database. All user-created databases in SQL Server 20xx are always enabled for full-text indexing.

The "xx" in the "20xx" above changes based on what version of the documentation you are looking at, but starting with SQL Server 2008, that xx will be: "08", "12", or "16".

You are on SQL Server 2014, so I am questioning why EXECUTE sp_fulltext_database 'disable'; is even showing up in your SSDT-generated deploy scripts. I just did some testing and it seems that no matter what the project's "Target platform" is set to, the deploy script always generates lines for:

IF fulltextserviceproperty(N'IsFulltextInstalled') = 1
    EXECUTE sp_fulltext_database 'enable';

The "enable" or "disable" is controlled by the "Database Setting..." on the "Project Settings" tab of Project Properties, in the "Miscellaneous" tab.

The only way to get rid of these 2 lines is to uncheck the "Deploy database properties" option, under "Deployment Options" on the "Debug" tab of Project Properties. But if you uncheck that option, then it won't enforce any of the database properties. So:

If you are not using the SSDT deployment to enforce the database properties then go ahead and uncheck this option. If you haven't made any other changes since the last build, this change alone will not update the deploy script, in which case you need to do a "rebuild".
If you are using the SSDT deployment to enforce the database properties, then either remove those two lines manually or find a way to do it programmatically.

I don't know why there is a call to an unused system stored procedure outside of it probably being a low priority to remove since it doesn't break anything and most people aren't doing parallel deployments ;-).

Best Answer

Related Solutions

Sql-server – Faster alternative to scalar UDF with recursion to walk tree hierarchy

Sql-server – “Lock request time out period exceeded” when publishing SqlServer databases in parallel

Related Question