Sql-server – Why high compile times for functions with XML on SQL Server 2016

functionsoptimizationsql serversql-server-2016xml

We have inline functions which gather up data into XML, pass that derived XML to other functions, which then shred it and reform it as a string.

(Your "you shouldn't do that sort of stuff in T-SQL" is a discussion for another day.)

This has worked fine for years with 2005 and 2008 R2. We are now upgrading to 2016 SP1. A query using these functions which runs in under a second on our production server now runs even faster in 2016 SP1. That would be great, but it takes an hour to compile on 2016 SP1. Seriously:

enter image description here

Production environment:

enter image description here

(Those are both from SQL Sentry Plan Explorer.)

We have tried with the database in 2008 (100) and 2016 (130) compatibility levels (but we have not yet played with the "Legacy Cardinality Estimation" and "Query Optimizer Fixes" settings, which are both "OFF" currently). We have tried using QUERYTRACEON 9481, which seems to have no effect.

Again, this isn't about the resulting plans, as they all run in a fine amount of time. It's about the time it takes to make the plan.

We've managed to reproduce this issue– to an extent– with a simplified set of code. A statement calling the top-level function from the below example takes 30-60 seconds to compile on SQL Server 2016 (SP1-CU5), but it takes less than a second to compile and run on SQL Server 2008 R2 (SP3).

Example

/*

Create and populate table...

*/

CREATE TABLE TestXMLStuff (OrderID int, ProdLength int, ProdWidth int, ProdHeight int);
INSERT INTO TestXMLStuff (OrderID, ProdLength, ProdWidth, ProdHeight) VALUES
    (1, 10, 15, 20),
    (1, 15, 20, 25),
    (2, 20, 25, 30),
    (2, 25, 30, 35),
    (2, 30, 35, 40);
GO

/*

Function which accepts XML, shreds it and reforms it as a string...

*/


CREATE FUNCTION TestCalc
(   
    @T varchar(8000),
    @X xml
)
RETURNS TABLE 
AS
RETURN 
    WITH p AS
    (
        SELECT  
            LF = CHAR(13) + CHAR(10),
            Tab = CHAR(9),
            T = isNull(@T,'')
    ), pid AS
    (
        SELECT
            isNull(ProdInfoXMLTable.ProdInfoXML.query('(/ProdInfo)').value('(.)[1]','varchar(max)'),'') AS ProdInfoText
        FROM        (
                        SELECT
                            ProdInfoXML =
                                (
                                    SELECT
                                        ProdInfo = 
                                            CASE WHEN Products.ProdNum > 1 THEN '--' + p.LF ELSE '' END +
                                            'Product Number: ' + CONVERT(varchar(50),Products.ProdNum) + p.LF +
                                                CASE WHEN Products.ProdLength       = '' THEN '' ELSE p.Tab + 'Length: '                + Products.ProdLength       + p.LF END +
                                                CASE WHEN Products.ProdWidth        = '' THEN '' ELSE p.Tab + 'Width: '                 + Products.ProdHeight       + p.LF END +
                                                CASE WHEN Products.ProdHeight       = '' THEN '' ELSE p.Tab + 'Height: '                + Products.ProdHeight       + p.LF END
                                    FROM        (
                                                    SELECT
                                                        ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS ProdNum,
                                                        isNull(P.X.value('(./Length)[1]','varchar(500)'),'') AS ProdLength,
                                                        isNull(P.X.value('(./Width)[1]','varchar(500)'),'') AS ProdWidth,
                                                        isNull(P.X.value('(./Height)[1]','varchar(500)'),'') AS ProdHeight
                                                    FROM        @x.nodes('/Products/Product') AS P(X)
                                                ) AS Products
                                    CROSS JOIN  p
                                    FOR XML PATH(''), TYPE
                                )
                    ) AS ProdInfoXMLTable
    )
    SELECT
        Final = p.T + p.LF + p.LF + pid.ProdInfoText
    FROM        p
    CROSS JOIN  pid;

GO

/*

Function to create XML in the format required for TestCalc...

*/

CREATE FUNCTION TestGetXML
(   
    @N int
)
RETURNS TABLE 
AS
RETURN 
    WITH p AS
    (
        SELECT  
            N = isNull(@N,0)
    )
    SELECT
        ProdInfoXML =
            (
                SELECT
                    [Length] = ProdData.ProdLength,
                    [Width] = ProdData.ProdWidth,
                    [Height] = ProdData.ProdHeight
                FROM        TestXMLStuff ProdData
                WHERE       ProdData.OrderID = @N
                FOR XML PATH('Product'), ROOT('Products'), TYPE
            );
GO

/*

Function to join the other two functions, gathering the XML and feeding it to the string creator which shreds and reforms it...

*/

CREATE FUNCTION TestGetFromTableUsingFunc
(   
    @N int
)
RETURNS TABLE 
AS
RETURN 
    WITH p AS
    (
        SELECT  
            N = isNull(@N,0)
    )
    SELECT
        FinalResult = 'This is a ' + TestCalcResults.Final
    FROM        p
    CROSS APPLY TestGetXML
                (
                    p.N
                ) AS x
    CROSS APPLY TestCalc
                (
                    'test',
                    x.ProdInfoXML
                ) AS TestCalcResults;
GO

/*

Code to call the function. This is what takes around 60 seconds to compile on our 2016 system but basically no time on the 2008 R2 system.

*/

SELECT      *
FROM        TestXMLStuff
CROSS APPLY TestGetFromTableUsingFunc
            (
                OrderID
            )
OPTION      (RECOMPILE);
GO

Production @@version where compilation isn't an issue:

Microsoft SQL Server 2008 R2 (SP3) - 10.50.6000.34 (X64)

Test @@version where compilation takes "forever":

Microsoft SQL Server 2016 (SP1-CU5) (KB4040714) - 13.0.4451.0 (X64) 

Questions

  1. Why is there such a degradation in compile time going to 2016 from 2008 R2?

  2. Does that answer reveal an easy solution to this dilemma short of changing the way all of this works? (I'm hoping for magic trace flags or an upcoming Microsoft update.)

(If this were SQL Server 2017, I would use JSON to gather and pass the data, which seems to be faster and lower overhead, and then I would use the JSON functions to shred and STRING_AGG to reform into text. But alas, that's not yet available.)

Based on a tip from Joe Obbish, I gathered the results when using trace flag 8675, using this code:

DBCC TRACEON(3604)
SELECT      *
FROM        TestXMLStuff
CROSS APPLY TestGetFromTableUsingFunc
            (
                OrderID
            )
OPTION      (RECOMPILE, QUERYTRACEON 8675);
DBCC TRACEOFF(3604)

On a 2008 R2 instance, it took much less than a second, and produced this:

DBCC execution completed. If DBCC printed error messages, contact your system administrator. End of simplification, time: 0.008 net: 0.008 total: 0.008 net: 0.008

end exploration, tasks: 597 no total cost time: 0.005 net: 0.005 total: 0.014 net: 0.014

end search(0),  cost: 2071.66 tasks: 2267 time: 0.005 net: 0.005 total: 0.02 net: 0.02

end exploration, tasks: 2703 Cost = 2071.66 time: 0.002 net: 0.002 total: 0.022 net: 0.022

end search(1),  cost: 1731.11 tasks: 3362 time: 0.004 net: 0.004 total: 0.026 net: 0.026

end exploration, tasks: 3363 Cost = 1731.11 time: 0 net: 0 total:
0.026 net: 0.026

end search(1),  cost: 1731.11 tasks: 3382 time: 0 net: 0 total: 0.026 net: 0.026

end exploration, tasks: 3413 Cost = 1731.11 time: 0 net: 0 total:
0.027 net: 0.027

end search(2),  cost: 1731.11 tasks: 3515 time: 0 net: 0 total: 0.027 net: 0.027

End of post optimization rewrite, time: 0.001 net: 0.001 total: 0.029 net: 0.029

End of query plan compilation, time: 0.001 net: 0.001 total: 0.03 net:
0.03


(5 row(s) affected) DBCC execution completed. If DBCC printed error messages, contact your system administrator.

On a 2016 SP1-CU5 instance, it took 1 minute 11 seconds and produced this:

DBCC execution completed. If DBCC printed error messages, contact your system administrator. End of simplification, time: 0.004 net: 0.004 total: 0.004 net: 0.004

end exploration, tasks: 612 no total cost time: 0.003 net: 0.003 total: 0.008 net: 0.008

end exploration, tasks: 613 no total cost time: 0 net: 0 total: 0.008 net: 0.008

end exploration, tasks: 2305 no total cost time: 0.002 net: 0.002 total: 0.011 net: 0.011

end exploration, tasks: 2306 no total cost time: 0 net: 0 total: 0.011 net: 0.011

end search(0),  cost: 4402.32 tasks: 2306 time: 0 net: 0 total: 0.011 net: 0.011

end exploration, tasks: 2738 Cost = 4402.32 time: 0.001 net: 0.001 total: 0.013 net: 0.013

end exploration, tasks: 2739 Cost = 4402.32 time: 0 net: 0 total:
0.013 net: 0.013

end exploration, tasks: 3466 Cost = 4402.32 time: 0.002 net: 0.002 total: 0.015 net: 0.015

end exploration, tasks: 3467 Cost = 4402.32 time: 0 net: 0 total:
0.015 net: 0.015

end search(1),  cost: 3938.19 tasks: 3467 time: 0 net: 0 total: 0.015 net: 0.015

end exploration, tasks: 3468 Cost = 3938.19 time: 0 net: 0 total:
0.015 net: 0.015

end exploration, tasks: 3469 Cost = 3938.19 time: 0 net: 0 total:
0.015 net: 0.015

end exploration, tasks: 3489 Cost = 3938.19 time: 0 net: 0 total:
0.015 net: 0.015

end exploration, tasks: 3490 Cost = 3938.19 time: 0 net: 0 total:
0.015 net: 0.015

end search(1),  cost: 3938.19 tasks: 3490 time: 0 net: 0 total: 0.015 net: 0.015

end exploration, tasks: 3521 Cost = 3938.19 time: 0 net: 0 total:
0.015 net: 0.015

end exploration, tasks: 3522 Cost = 3938.19 time: 0 net: 0 total:
0.015 net: 0.015

end exploration, tasks: 3625 Cost = 3938.19 time: 0 net: 0 total:
0.016 net: 0.016

end exploration, tasks: 3626 Cost = 3938.19 time: 0 net: 0 total:
0.016 net: 0.016

end search(2),  cost: 3938.19 tasks: 3626 time: 0 net: 0 total: 0.016 net: 0.016

End of post optimization rewrite, time: 0 net: 0 total: 0.016 net:
0.016

End of query plan compilation, time: 0.001 net: 0.001 total: 0.018 net: 0.018


(5 row(s) affected) DBCC execution completed. If DBCC printed error messages, contact your system administrator.

Although more is going on, it looks like the time elapsed was only 0.018 (seconds?), which is less than the 0.03 on 2008 R2. So this phase of compilation/optimization/execution must not be what's taking so long. But something definitely is.


Our eventual 2016 production instance will have same "hardware" as current production 2008 R2. Test/Dev 2016 instances have different specs, so these comparisons aren't apples to apples. Prod is 78 gig. Dev is 16 gig. But I tested on another 2008 R2 box with 20 gig and it was fast. Also we're talking about well indexed, tiny amounts of data. And during compilation little IO but lots of CPU.

I could see stats being an issue with the real functions which hit real (big) tables, but in my contrived, simplified example, it takes 1+ minute to do some simple XML/text manipulation on 5 integer-laden rows. I could type the results faster. 🙂 I believe we have auto-stats on the production, and I see no other seemingly stats-related performance issues. And another non-production 2008 R2 dev/test environment (with similar stale production copy as 2016 dev/test) is lickety-split.

Best Answer

It's possible that CU7 for SQL Server 2016 SP1 resolves your issue. The relevant KB article appears to be KB 4056955 - FIX: Queries that casts string or binary data to XML take a long time to compile in SQL Server 2016.

I ran your repro code in SQL Server 2016 SP1 CU6 and got the following for compile times per SET STATISTICS TIME ON:

SQL Server parse and compile time: CPU time = 24437 ms, elapsed time = 24600 ms.

SQL Server parse and compile time: CPU time = 48968 ms, elapsed time = 49451 ms.

Here's how the compile times changed after upgrading to SP1 CU7:

SQL Server parse and compile time: CPU time = 16 ms, elapsed time = 16 ms.

SQL Server parse and compile time: CPU time = 16 ms, elapsed time = 17 ms.

SQL Server parse and compile time: CPU time = 16 ms, elapsed time = 17 ms.

SQL Server parse and compile time: CPU time = 44 ms, elapsed time = 44 ms.

It's about 750 times faster.