I'm attempting to sort a result from a table that is multilingual. I would like to have the sorting algorithm come from a function but then it adds a 8% hit to the performance, at least. So, I'm not quite sure how to go about it. So, for the sorting I'm using this method described in an article on how to sort multilingual tables as shown below:
select UnicodeData,Collation
from (
select
ML.UnicodeData,
ML.Collation,
RN =
CASE
when Collation = 'he-IL' then ROW_NUMBER() OVER (order by unicodedata Collate Hebrew_CI_AS )
when Collation = 'en-US' then ROW_NUMBER() OVER (order by unicodedata Collate SQL_Latin1_General_CP1_CI_AS )
when Collation = 'kn-IN' then ROW_NUMBER() OVER (order by unicodedata Collate Indic_General_100_CI_AS )
when Collation = 'hi-IN' then ROW_NUMBER() OVER (order by unicodedata Collate Indic_General_100_CI_AS )
when Collation = 'ar-EG' then ROW_NUMBER() OVER (order by unicodedata Collate Arabic_CI_AS )
when Collation = 'cs' then ROW_NUMBER() OVER (order by unicodedata Collate Czech_CI_AS )
END
from MultipleLanguages ML
) T
order by RN
Except I abstracted out the collation
code into its own function, like so:
CREATE FUNCTION [utils].[OrderByLanguage]
( @LanguageID tinyint
, @IDName utils.ID_Name READONLY
) RETURNS TABLE AS RETURN
SELECT
t.ID
, CASE @LanguageID
WHEN 1 THEN ROW_NUMBER() OVER (ORDER BY t.[Name]) -- en
WHEN 3 THEN ROW_NUMBER() OVER (ORDER BY t.[Name]) -- en-ca
WHEN 6 THEN ROW_NUMBER() OVER (ORDER BY t.[Name]) -- 'en-nz'
WHEN 5 THEN ROW_NUMBER() OVER (ORDER BY t.[Name]) -- 'en-za'
WHEN 2 THEN ROW_NUMBER() OVER (ORDER BY t.[Name] COLLATE Modern_Spanish_CI_AI) -- es
WHEN 4 THEN ROW_NUMBER() OVER (ORDER BY t.[Name] COLLATE French_CI_AI) -- 'fr-ca'
END RowNumber
FROM @IDName t
But then when I call this function I have to do this awkward double call on the table-valued function.
CREATE FUNCTION api.GetTable
( @LanguageCode VARCHAR(10)
) RETURNS NVARCHAR(MAX)
AS BEGIN
DECLARE
@Result NVARCHAR(MAX)
, @LangID tinyint
DECLARE @Sort utils.ID_Name
SET @LangID = api_utils.GetLanguageID(@LanguageCode)
INSERT INTO @Sort (ID, [Name])
SELECT
t.ID
, t.title
FROM api_utils.GetTable(@LangID) t
SET @Result = (
SELECT
CONVERT(VARCHAR(10), t.ID) id,
t.category,
t.[system],
t.title,
JSON_QUERY(utils.ToRawJsonArray((
SELECT x.[Description]
FROM api_utils.GetKeywords(t.ID, @LangID) x
ORDER BY x.[Description]
FOR JSON AUTO), 'Description')
) keywords
FROM api_utils.GetTable(@LangID) t
ORDER BY (SELECT s.RowNumber
FROM utils.OrderByLanguage(@LangID, @Sort) s
WHERE s.ID = t.ID)
FOR JSON AUTO, ROOT('titles')
)
RETURN @Result
END
So, you can see that I have to call the function api_utils.GetTable
twice. As far as I can tell the only other way to abstract out the collation sorting would be to put in the actual sorting algorithm and then have a script that searches all the code base and add in another collation language whenever I need to add another language. Is there any other way to do this? What have others done? What would be the best practice? The performance on this isn't absolutely critical but it is nice to keep it lean so doesn't take too long since it is already an intensive call.
Thanks in advance!
Update
Answering @srutzky's questions in comments:
1) How much data is returned by api_utils.GetTable?
There's about 150 records returned from the table.
2) Why call api_utils.GetTable twice when the first time the result is dumped into @Sort?
The @Sort
table is a memory optimized user defined table (UDT
). Since I'm passing a table into the utils.OrderByLanguage
function it needs to be a UDT
. Which means I need to get the data from the inline function api_utils.GetTable
twice. I'm not sure if it caused a performance issue calling api_utils.GetTable
twice. Maybe SQL Server
is smart enough to cache the result? Testing it again the INSERT
query cost is 38%. So, a fairly significant part of the query cost.
Wouldn't it be faster to add the category and system columns to @Sort and pull them back in the first call and then use @Sort in the FROM clause?
Since the UDT
is generic for use across all the different procedures that call the function utils.OrderByLanguage
It would make it very difficult to generalize for an unknown number of columns that the different procedures would use.
3) Does this have to be a function or can it be a stored procedure?
Are you talking about api_utils.GetTable
? I would prefer that api_utils.GetTable
stay a function as it makes it easier to use and test. Where I call api_utils.GetTable
it is a Stored Procedure
.
If you are talking about utils.OrderByLanguage
I don't mind if it would be a stored procedure
. I'm not sure how that would help though. So, if it would please let me know!
Update to Accepted Answer
I found adding a an index didn't make a difference performance wise. I also figured I might as well put the sort
column in the original #sort
table since it has to be the same anyways. This lowers the number of warnings in my SSDT project. Then I just do an alter
on the column like so:
ALTER TABLE #AlterSort ALTER COLUMN [sort] nvarchar(max) COLLATE SQL_Latin1_General_CP1_CI_AS
Best Answer
Looking at what you have so far, it's good that
[utils].[OrderByLanguage]
is an Inline Table-Valued Function (ITVF), but it still appears to be a correlated subquery where for each row inapi_utils.GetTable(@LangID)
it passes in all rows inapi_utils.GetTable(@LangID)
to sort it (i.e. theORDER BY
clause.Applying Collations at run-time can be quite costly since it has to generate the sort keys for those values in that moment. For the best performance, creating an index will generate the sort keys ahead of time and even put them in the proper order. But dealing with multiple locales is indeed tricky. Creating a copy of the source column per each locale can require a huge amount of additional disk space (and I/O), depending on the size of the strings, how many locales / Collations one needs, and how many rows there will be over the next 3 - 5 years. Fortunately, you can create non-persisted computed columns to be those copies (which don't take up any space) and index those (which does take up space). While this is likely not feasible if there were 10 locales, a base column of at least
NVARCHAR(200)
, and 1 million (or more) rows, for your situation it should work just fine. With this approach, the dynamic portion will be in choosing which column to select from (achievable via Dynamic SQL, orIF
statements, depending on the situation). But as you can see with the following example (turn on "Include Actual Execution Plan"), both filtered queries (last 2 queries) get Index Seeks on the intended indexes and return the expected results:You can see a live demo of this on dbfiddle.uk.
However, given the code for
api.GetTable
this might not be the best approach. If you wanted to keep the current structure (as much as possible), then you might could do the following:api.GetTable
into a Stored ProcedureOUTPUT
parameter so you don't need to deal with a result set@Sort
should be a temporary table, not a table variable#Sort
temporary table with all of the columns it will need except for[Name]
/title
):ID
,category
, and[system]
.In a series of
IF
statements, add the[title]
column, but with the proper Collation:FROM api_utils.GetTable(@LangID) t
in the main@Result=
query to be:FROM #Sort t
ORDER BY (SELECT s.RowNumber...
in the main@Result=
query to be:ORDER BY t.[title]
This will require re-applying the Collation at each execution, but:
api.GetTable
(no double call)[Name]
column[Name]
column after it has been populated.Code Reuse
By switching over to stored procedures and temporary tables we can actually keep with the goal of code reuse. The code to add the sort column to the local temp table can be abstracted to another stored procedure. While creating a temporary table in a sub-procedure call does not help as that temp table will disappear once that sub-procedure call ends, changes made to a temporary table that exists prior to the sub-procedure call will survive the completion of that call. For example:
Setup
Test
By placing the specific Collations only in this one Stored Procedure, you should have only this one place to update when adding a new language to support.
Putting it all together
With all of the above in mind, we end up with the following stored procedure containing a single call to
api_utils.GetTable()
, no correlated sub-query, andapi_utils.AddSortColumn
(which you need to create) can be used in other stored procedures and against other base tables: