I am trying to tune a query where the same table-valued function (TVF) is called on 20 columns.
The first thing I did was convert the scalar function into an inline table-valued function.
Is using CROSS APPLY
the best performing way to execute the same function on multiple columns in a query?
A simplistic example:
SELECT Col1 = A.val
,Col2 = B.val
,Col3 = C.val
--do the same for other 17 columns
,Col21
,Col22
,Col23
FROM t
CROSS APPLY
dbo.function1(Col1) A
CROSS APPLY
dbo.function1(Col2) B
CROSS APPLY
dbo.function1(Col3) C
--do the same for other 17 columns
Are there better alternatives?
The same function can be called in multiple queries against X number of columns.
Here's the function:
CREATE FUNCTION dbo.ConvertAmountVerified_TVF
(
@amt VARCHAR(60)
)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
(
WITH cteLastChar
AS(
SELECT LastChar = RIGHT(RTRIM(@amt), 1)
)
SELECT
AmountVerified = CAST(RET.Y AS NUMERIC(18,2))
FROM (SELECT 1 t) t
OUTER APPLY (
SELECT N =
CAST(
CASE
WHEN CHARINDEX(L.LastChar COLLATE Latin1_General_CS_AS, '{ABCDEFGHI}', 0) >0
THEN CHARINDEX(L.LastChar COLLATE Latin1_General_CS_AS, '{ABCDEFGHI}', 0)-1
WHEN CHARINDEX(L.LastChar COLLATE Latin1_General_CS_AS, 'JKLMNOPQR', 0) >0
THEN CHARINDEX(L.LastChar COLLATE Latin1_General_CS_AS, 'JKLMNOPQR', 0)-1
WHEN CHARINDEX(L.LastChar COLLATE Latin1_General_CS_AS, 'pqrstuvwxy', 0) >0
THEN CHARINDEX(L.LastChar COLLATE Latin1_General_CS_AS, 'pqrstuvwxy', 0)-1
ELSE
NULL
END
AS VARCHAR(1))
FROM
cteLastChar L
) NUM
OUTER APPLY (
SELECT N =
CASE
WHEN CHARINDEX(L.LastChar COLLATE Latin1_General_CS_AS, '{ABCDEFGHI}', 0) >0
THEN 0
WHEN CHARINDEX(L.LastChar COLLATE Latin1_General_CS_AS, 'JKLMNOPQRpqrstuvwxy', 0) >0
THEN 1
ELSE 0
END
FROM cteLastChar L
) NEG
OUTER APPLY(
SELECT Amt= CASE
WHEN NUM.N IS NULL
THEN @amt
ELSE
SUBSTRING(RTRIM(@amt),1, LEN(@amt) - 1) + Num.N
END
) TP
OUTER APPLY(
SELECT Y = CASE
WHEN NEG.N = 0
THEN (CAST(TP.Amt AS NUMERIC) / 100)
WHEN NEG.N = 1
THEN (CAST (TP.Amt AS NUMERIC) /100) * -1
END
) RET
) ;
GO
Here's the scalar function version that I inherited, if anyone is interested:
CREATE FUNCTION dbo.ConvertAmountVerified
(
@amt VARCHAR(50)
)
RETURNS NUMERIC (18,3)
AS
BEGIN
-- Declare the return variable here
DECLARE @Amount NUMERIC(18, 3);
DECLARE @TempAmount VARCHAR (50);
DECLARE @Num VARCHAR(1);
DECLARE @LastChar VARCHAR(1);
DECLARE @Negative BIT ;
-- Get Last Character
SELECT @LastChar = RIGHT(RTRIM(@amt), 1) ;
SELECT @Num = CASE @LastChar collate latin1_general_cs_as
WHEN '{' THEN '0'
WHEN 'A' THEN '1'
WHEN 'B' THEN '2'
WHEN 'C' THEN '3'
WHEN 'D' THEN '4'
WHEN 'E' THEN '5'
WHEN 'F' THEN '6'
WHEN 'G' THEN '7'
WHEN 'H' THEN '8'
WHEN 'I' THEN '9'
WHEN '}' THEN '0'
WHEN 'J' THEN '1'
WHEN 'K' THEN '2'
WHEN 'L' THEN '3'
WHEN 'M' THEN '4'
WHEN 'N' THEN '5'
WHEN 'O' THEN '6'
WHEN 'P' THEN '7'
WHEN 'Q' THEN '8'
WHEN 'R' THEN '9'
---ASCII
WHEN 'p' Then '0'
WHEN 'q' Then '1'
WHEN 'r' Then '2'
WHEN 's' Then '3'
WHEN 't' Then '4'
WHEN 'u' Then '5'
WHEN 'v' Then '6'
WHEN 'w' Then '7'
WHEN 'x' Then '8'
WHEN 'y' Then '9'
ELSE ''
END
SELECT @Negative = CASE @LastChar collate latin1_general_cs_as
WHEN '{' THEN 0
WHEN 'A' THEN 0
WHEN 'B' THEN 0
WHEN 'C' THEN 0
WHEN 'D' THEN 0
WHEN 'E' THEN 0
WHEN 'F' THEN 0
WHEN 'G' THEN 0
WHEN 'H' THEN 0
WHEN 'I' THEN 0
WHEN '}' THEN 1
WHEN 'J' THEN 1
WHEN 'K' THEN 1
WHEN 'L' THEN 1
WHEN 'M' THEN 1
WHEN 'N' THEN 1
WHEN 'O' THEN 1
WHEN 'P' THEN 1
WHEN 'Q' THEN 1
WHEN 'R' THEN 1
---ASCII
WHEN 'p' Then '1'
WHEN 'q' Then '1'
WHEN 'r' Then '1'
WHEN 's' Then '1'
WHEN 't' Then '1'
WHEN 'u' Then '1'
WHEN 'v' Then '1'
WHEN 'w' Then '1'
WHEN 'x' Then '1'
WHEN 'y' Then '1'
ELSE 0
END
-- Add the T-SQL statements to compute the return value here
if (@Num ='')
begin
SELECT @TempAmount=@amt;
end
else
begin
SELECT @TempAmount = SUBSTRING(RTRIM(@amt),1, LEN(@amt) - 1) + @Num;
end
SELECT @Amount = CASE @Negative
WHEN 0 THEN (CAST(@TempAmount AS NUMERIC) / 100)
WHEN 1 THEN (CAST (@TempAmount AS NUMERIC) /100) * -1
END ;
-- Return the result of the function
RETURN @Amount
END
Sample test data:
SELECT dbo.ConvertAmountVerified('00064170') -- 641.700
SELECT * FROM dbo.ConvertAmountVerified_TVF('00064170') -- 641.700
SELECT dbo.ConvertAmountVerified('00057600A') -- 5760.010
SELECT * FROM dbo.ConvertAmountVerified_TVF('00057600A') -- 5760.010
SELECT dbo.ConvertAmountVerified('00059224y') -- -5922.490
SELECT * FROM dbo.ConvertAmountVerified_TVF('00059224y') -- -5922.490
Best Answer
FIRST: it should be mentioned that the absolutely fastest method of getting the desired results is to do the following:
{name}_new
to the table with theDECIMAL(18, 3)
datatypeVARCHAR
columns to theDECIMAL
columns{name}_old
{name}
{table_name}_new
usingDECIMAL(18, 3)
datatypeDECIMAL
-based table._old
_new
from new tableTHAT BEING SAID: You can get rid of a lot of that code as it is largely unnecessary duplication. Also, there are at least two bugs that cause the output to sometimes be incorrect, or sometimes throw an error. And those bugs were copied into Joe's code as it produces the same results (including the error) as the O.P.'s code. For example:
These values produce a correct result:
These values produce an incorrect result:
This value produces an error:
Comparing all 3 versions against 448,740 rows using
SET STATISTICS TIME ON;
, they all ran in just over 5000 ms of elapsed time. But for CPU time, the results were:SETUP: DATA
The following creates a table and populates it. This should create the same data set across all systems running SQL Server 2017 since they will have the same rows in
spt_values
. This helps provide a basis of comparison across other people testing on their system since randomly generated data would factor into timing differences across systems, or even between tests on the same system if the sample data is regenerated. I started with the same 3 column table as Joe did, but used the sample values from the question as a template to come up with a variety of numeric values appended with each of the possible trailing character options (including no trailing character). This is also why I forced the Collation on the columns: I didn't want the fact that I am using a binary-Collation Instance to unfairly negate the effect of using theCOLLATE
keyword to force a different Collation in the TVF).The only difference is in the ordering of the rows in the table.
SETUP: TVF
Please note:
_BIN2
) Collation which is faster than a case-sensitive Collation as it does not need to account for any linguistic rules.VARCHAR(50)
toVARCHAR(60)
, and fromNUMERIC (18,3)
toNUMERIC (18,2)
(good reason would be "they were wrong"), then I would stick with the original signature / types.100.
,-1.
, and1.
. This was not in my original version of this TVF (in the history of this answer) but I noticed someCONVERT_IMPLICIT
calls in the XML execution plan (since100
is anINT
but the operation needs to beNUMERIC
/DECIMAL
) so I just took care of that ahead of time.CHAR()
function rather than passing a string version of a number (e.g.'2'
) into aCONVERT
function (which was what I was originally doing, again in the history). This appears to be ever so slightly faster. Only a few milliseconds, but still.TEST
Please note that I had to filter out rows ending with
}
as that caused the O.P.'s and Joe's TVFs to error. While my code handles the}
correctly, I wanted to be consistent with what rows were being tested across the 3 versions. This is why the number of rows generated by the setup query is slightly higher than the number I noted above the test results for how many rows were being tested.CPU time is only slightly lower when uncommenting the
--@Dummy =
, and the ranking among the 3 TVFs is the same. But interestingly enough, when uncommenting the variable, the rankings change a little:Not sure why the O.P.'s code would perform so much better in this scenario (whereas my and Joe's code only improved marginally), but it does seem consistent across many tests. And no, I did not look at execution plan differences as I don't have time to investigate that.
EVEN FASTERER
I have completed testing of the alternate approach and it does provide a slight but definite improvement to what is shown above. The new approach uses SQLCLR and it appears to scale better. I found that when adding in the second column to the query, the T-SQL approach double in time. But, when adding in additional columns using a SQLCLR Scalar UDF, the time went up, but not by the same amount as the single column timing. Maybe there is some initial overhead in invoking the SQLCLR method (not associated with the overhead of the initial loading of the App Domain and of the Assembly into the App Domain) because the timings were (elapsed time, not CPU time):
So it's possible that the timing (of dumping to a variable, not returning the result set) has a 200 ms - 250 ms overhead and then 750 ms - 800 ms per instance time. CPU timings were: 950 ms, 1750 ms, and 2400 ms for 1, 2, and 3 instances of the UDF, respectively.
C# CODE
I originally used
SqlDecimal
as the return type, but there is a performance penalty for using that as opposed toSqlDouble
/FLOAT
. Sometimes FLOAT has issues (due to it being an imprecise type), but I verified against the T-SQL TVF via the following query and no differences were detected:TEST