How to Avoid Functions for Better Performance in Azure SQL Database

azure-sql-databasesql servert-sql

I read this article quite recently on the performance issues to do with Functions.

I'm currently in the development phase of a new database on the Azure SQL Database platform. It won't go live for a new months yet. I'm using a variety of scalar-valued functions with one being to convert UTC to Local Date where it is specified in the Config table, e.g. AUS Eastern Standard Time

The function is called as the following:

SELECT
    dbo.fnGetLocalDate(DateColumn) as DateColumn,
FROM
    table

Does anyone have any suggestions on how to avoid functions in cases such as this? I find them useful for code reuse and I'm not sure how to avoid it here.

I'm also wondering whether with vNext fixing the performance issues with Functions in the Azure platform that I'm better off continuing to work with functions.

CREATE FUNCTION [dbo].[fnGetLocalDate]
(
    @DateToConvert datetimeoffset = NULL
)
RETURNS datetimeoffset
AS
BEGIN

    DECLARE @TimeZone varchar(50)

    RETURN 
    CASE 
        WHEN @DateToConvert is NULL then NULL
    ELSE 
        CONVERT(datetimeoffset, @DateToConvert AT TIME ZONE (SELECT Value FROM LookUp.Config WHERE Property = 'TimeZone'))
    END

END

Best Answer

You can rewrite it as an inline TVF returning a single column and row and CROSS APPLY it to get the benefits of inlining now (parallelism, no overhead of switching execution contexts, holistic query costing and optimisation) without having to wait for the work done on inlining of Scalar UDFs to get released.

So your function definition would be

CREATE FUNCTION [dbo].[fnGetLocalDate] (@DateToConvert DATETIMEOFFSET = NULL) 
RETURNS TABLE 
AS 
    RETURN 
      SELECT CONVERT(DATETIMEOFFSET, @DateToConvert AT TIME ZONE 
                                     (SELECT Value 
                                      FROM   LookUp.Config 
                                      WHERE  Property = 'TimeZone')) AS 
                    LocalDate

With example usage

SELECT o.name, L.LocalDate
FROM sys.objects o
CROSS APPLY [dbo].[fnGetLocalDate](o.modify_date) AS L

You don't really need the CASE as it will return NULL on NULL input anyway and may get better plans without it

After the edit to the question

I see the function contains some specific logic. You could still look to use TRY_CONVERT as part of that, but you should definitely convert the scalar function to an in-line function. In-line functions (RETURNS TABLE) use a single SELECT statement and are expanded into the calling query and fully optimized in much the same way views are. It can be helpful to think of in-line functions as parameterized views.

For example, an approximate translation of the scalar function to an in-line version is:

CREATE FUNCTION dbo.CleanDate
    (@UnformattedString  varchar(12))
RETURNS TABLE
AS RETURN
SELECT Result =
    -- Successful conversion or NULL after
    -- workarounds applied in CROSS APPLY
    -- clauses below
    TRY_CONVERT(smalldatetime, ca3.string)
FROM
(
    -- Logic starts here
    SELECT        
        CASE
            WHEN @UnformattedString IS NULL
                THEN NULL
            WHEN LEN(@UnformattedString) <= 1
                THEN NULL
            WHEN LEN(@UnformattedString) = 12
                THEN LEFT(@UnformattedString, 8)
            ELSE @UnformattedString
        END
) AS Input (string)
CROSS APPLY
(
    -- Next stage using result so far
    SELECT 
        CASE 
            WHEN @UnformattedString = '20000000' 
            THEN '20790606' 
            ELSE Input.string
        END
) AS ca1 (string)
CROSS APPLY 
(
    -- Next stage using result so far
    SELECT CASE
        WHEN LEFT(ca1.string, 2) = '00' THEN '20' + RIGHT(ca1.string, 6)
        WHEN LEFT(ca1.string, 2) = '18' THEN '19' + RIGHT(ca1.string, 6)
        WHEN LEFT(ca1.string, 2) = '19' THEN ca1.string
        WHEN LEFT(ca1.string, 2) = '20' THEN ca1.string
        WHEN LEN(ca1.string) <> 6 THEN '20' + RIGHT(ca1.string, 6)
        ELSE ca1.string
    END
) AS ca2 (string)
CROSS APPLY
(
    -- Next stage using result so far
    SELECT
        CASE 
            WHEN TRY_CONVERT(integer, LEFT(ca2.string, 4)) > YEAR(GETDATE())
                THEN '20790606'
            WHEN YEAR(GETDATE()) - TRY_CONVERT(integer, LEFT(ca2.string, 4)) >= 100
                THEN '20790606'
            ELSE ca2.string
        END
) AS ca3 (string);

The function used on the sample data:

SELECT
    InsertID,
    Result1 = CD1.Result,
    Result2 = CD2.Result,
    Result3 = CD3.Result
FROM dbo.RawData AS RD
CROSS APPLY dbo.CleanDate(RD.MangledDateTime1) AS CD1
CROSS APPLY dbo.CleanDate(RD.MangledDateTime2) AS CD2
CROSS APPLY dbo.CleanDate(RD.MangledDateTime3) AS CD3;

Output:

╔══════════╦═════════════════════╦═════════════════════╦═════════════════════╗
║ InsertID ║       Result1       ║       Result2       ║       Result3       ║
╠══════════╬═════════════════════╬═════════════════════╬═════════════════════╣
║        1 ║ 2000-10-10 00:00:00 ║ 2079-06-06 00:00:00 ║ NULL                ║
║        1 ║ 2079-06-06 00:00:00 ║ 2013-06-30 00:00:00 ║ 2079-06-06 00:00:00 ║
║        1 ║ 2000-10-10 00:00:00 ║ 2079-06-06 00:00:00 ║ 2013-06-30 00:00:00 ║
╚══════════╩═════════════════════╩═════════════════════╩═════════════════════╝

_{*CLR scalar functions have a much faster invocation path than T-SQL scalar functions and do not prevent parallelism.}

SQL Server – Understanding Sargable Queries

As ypercube commented

No, if the query is what you show, he is totally wrong. It's pretty sargable as it is.

You can verify this by:

Creating a simple test table with a [Date] column.
Insert a large number of rows with varying dates.
NOTE: In the above "large number" and "varying dates" is a precaution to ensure that your query is selective enough. Otherwise the optimiser may choose not to use your index in any case.
Generate a query plan for your query (once with an index on the Date column, and once without).
You can also use STATISTICS IO to show the difference.
If you have enough test data, the difference will be easily observable.

Once you've got the evidence, I suggest you do take it up with the DBA. However, don't get into an argument. Just show the test and data that demonstrates the index is used.

The point is you don't want to be forced to bend over backwards to avoid non-existant issues.

Fortunately in the case you demonstrated it won't be a problem to move the functions outside the query. E.g.

DECLARE @FromDate date ='inputdate',
        @ToDate date = DATEADD(day, 1, @FromDate)

In fact, the above may even be more maintainable in the long run.

However, there will come a time when you have something that cannot be trivially changed according to the DBA's wishes. Such as:

--Granted this probably belongs in a JOIN clause, but is primarily for illustrative purposes.
--Also the issue of sargability applies just as much to JOIN clauses as WHERE clauses
WHERE a.date >= b.date
  AND a.date < DATEADD(day, 1, b.date)

The only way to get this function out of the WHERE clause would be to precalculate another column b.NextDay. Which is exactly why you need the DBA's to understand Sargability correctly. I.e. that the above:

Would be able to leverage an index on a.date.
But not be able to leverage an index on b.date.
So the most selective column/index should not have a function applied, but the other can.
Attempting to hack a solution without a function in the WHERE clause will reduce both maintainability and performance.

If you really can't get buy-in from the DBA's perhaps the following will work and bypass their cargo-cult rules:

;WITH CTE_B AS (
    SELECT  b.Date, DATEADD(day, 1, b.DATE) AS NextDay
    FROM    b
    )
SELECT  ...
WHERE   a.Date >= CTE_B.Date
    AND a.Date < CTE_B.NextDay

The optimiser will almost certainly optimise this in the same way as if the function were in the WHERE clause, so you shouldn't get a performance knock. But it's certainly an unnecessary reduction in maintainability.

Best Answer

Related Solutions

Sql-server – Running functions in parallel

After the edit to the question

SQL Server – Understanding Sargable Queries

Related Question