Sql-server – Super poor UDF performance on only 1 server, not identical other server

functionsperformancequery-performancesql serversql server 2014

I have the following code that takes 20 times longer on one production box (db35) than another identical box (db34).

Both have:

Microsoft SQL Server 2014 (SP2-CU3) (KB3204388) – 12.0.5538.0 (X64)
Dec 15 2016 17:52:48
Copyright (c) Microsoft Corporation
Standard Edition (64-bit) on Windows NT 6.3 (Build 9600: )

Any idea what could cause this? These are production servers, so a reboot is not available for 2 weeks for our normal maintenance window. I also know that using the UDF is terrible and should be replaced, but that still does not answer the basic question. Below is sample code and results in milliseconds. Any suggestions would be greatly appreciated.

DB34 – good server results

Table '#A945FC1F'. Scan count 0, logical reads 1004081, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'sysschobjs'. Scan count 3, logical reads 2021, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 2, logical reads 2950297, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:  

   CPU time = 23922 ms,  elapsed time = 26046 ms.

DB34 execution time 26050ms.

DB35 – bad server results

Table '#BC355925'. Scan count 0, logical reads 1004081, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'sysschobjs'. Scan count 3, logical reads 1755, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 2, logical reads 2949345, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

   CPU time = 407687 ms,  elapsed time = 415084 ms.

DB35 execution time 415093ms.

Function

use master;
IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[GenerateDateKey]') AND type in (N'FN', N'IF', N'TF', N'FS', N'FT'))
BEGIN
execute dbo.sp_executesql @statement = N'
    CREATE FUNCTION [dbo].[GenerateDateKey] (
        @value [datetime]
      ) RETURNS [int] BEGIN

      RETURN DATEPART(yyyy, @value) * 10000
        + DATEPART(M, @value) * 100
        + DATEPART(d, @value)
    END
  ' 
END


set nocount on
declare @temp1 table (cd1 int, cd2 int, cd3 int, cd4 int, cd5 int, cd6 int);
declare @startDate datetime

set @startDate = getdate()

set statistics io on
set statistics time on

insert into @temp1
select top 1000000 [dbo].[GenerateDateKey](s1.create_date) as cd1,
                   [dbo].[GenerateDateKey](s1.modify_date) as cd2,
                   [dbo].[GenerateDateKey](s2.create_date) as cd3,
                   [dbo].[GenerateDateKey](s2.modify_date) as cd4,
                   [dbo].[GenerateDateKey](s3.create_date) as cd5,
                   [dbo].[GenerateDateKey](s3.modify_date) as cd6
from sys.objects s1 with (nolock), sys.objects s2 with (nolock), sys.objects s3 with (nolock)

print @@servername + ' generate  ' + cast(datediff(ms, @startdate, getdate()) as varchar(20))

Same exact hardware. I get the same results regardless of load/time of day run. The query plan is the same. The scans/reads are approx the same. Both physical Dell poweredge R610s. 192GB physical ram. 128GB to SQL Server. 50GB ramdisk for tempdb.

I have 32 total SQL Server 2014 instances in production, though some are different configurations (same version), but ALL of them run this query about the same except for the one (db35), which is 20 times worse.

If I replace the UDF with, for example, a cast/convert I get the same on both servers.

insert into @temp1
select top 1000000 
            cast(convert(char(10),s1.create_date,112) as int) as cd1,
            cast(convert(char(10),s1.modify_date,112) as int) as cd2,
            cast(convert(char(10),s2.create_date,112) as int) as cd3,
            cast(convert(char(10),s2.modify_date,112) as int) as cd4,
            cast(convert(char(10),s3.create_date,112) as int) as cd5,
            cast(convert(char(10),s3.modify_date,112) as int) as cd6
from sys.objects s1 with (nolock), sys.objects s2 with (nolock), sys.objects s3 with (nolock)

The issue isn't re-writing the query, that's easy with the cast/convert from above. The issue is why the UDF takes 20 times longer on one server than the other. I did notice when I updated the function with schemabinding that there was a 40% improvement, but it's still 4 minutes instead of 20 seconds.

Best Answer

Looks like Paul White nailed it in a comment:

Check for a profiler trace or extended events running on the slow server. Each iteration of the udf can generate multiple events, with shocking consequences.

There was a trace running against the server (by another DBA). He actually terminated just before Paul mentioned to look for traces, so that's why I didn't notice any. This trace had been running for 5 days, so it was apparently putting a lot of pressure on the memory utilization on the box. Anyway, I want to thank Paul and blobbles for their help.