T-sql – split string for calculations

t-sql

I'm doing some basic statistical calculations on the results of a survey. A survey consists of several sections and each section has some questions in it. The answers to each question in each section are stored.

The ResultsTable table has Result column which contains computed result for the entire survey as well as overall results for each section, separated by double pipe. For example,

Moderate||Low||High||Low||Low||Low||NA||NA

Above, Moderate is overall survey result, Low is result for section 1, High is for section 2 and so on.

(Then I would be computing some statistics, such as % of people who have High as their result for a specific section. Or, of those who have High for a specific section the % who answered Yes or No to a specific question in that section, etc.)

The problem I'm facing is that I need to split string, which I think could cause performance problems. What I have right now is as follows.

SELECT COUNT(*) [Personal_Health_High_Risk] FROM 
    (
        SELECT rt.* 
        FROM ResultsTable rt
        WHERE 
            rt.CampaignID IN (SELECT Value FROM ufn_ParseDelimitedString(@campaignIDs, ',')) AND 
            rt.SectionID = @sectionID
    ) sq
WHERE 
    -- where 3rd section result is High
    (SELECT Value FROM ufn_ParseDelimitedString(sq.Result, '||') WHERE RowNumber = 4 ) LIKE '%High%'

The ufn_ParseDelimitedString function splits a string using given delimiter and returns resulting tokens in a table. For example,

-- ufn_ParseDelimitedString output:
--  [RowNumber] [Value]
--  1           Moderate
--  2           Low
--  3           High
--  ...

So, is it better to continue with what I'm doing – basically split the string on each select? OR I was thinking taking that inner SELECT in code above and saving output into temporary table and actually expanding the concatenated result string into separate columns. This way I could reuse temporary table for various other calculations. What are your thoughts on this?

Best Answer

Doing string manipulation in SQL is non-optimal for the engine; it's not meant for that kind of row by row processing.

Your option of using temporary tables to hold intermediary results is going to be your best bet on solving the performance problem. At least that way it's only split once. But there are solutions that may work better for you.

Can you modify the import process so the data is split up before it's saved to the database?
What about importing to staging tables and using either an Agent Job or external process to split the data up into permanent tables?

I'm not sure how much reporting you are doing on older data sets; if you only need to report on the data once or twice then temp tables are probably your best bet. But if you want or need to do longer term analysis then continually splitting the same data is unnecessarily expensive. Storage is usually cheaper than CPU.

Related Solutions

T-sql – Split/explode comma delimited string field into SQL query

I don't actually have a sybase system to test with, but according to http://dcx.sybase.com/1200/en/dbreference/sa-split-list-sysproc.html the following solution should work:

SELECT * 
  FROM table AS T
  JOIN sa_split_list('1234,23,56,576,1231,567,122,87876,57553,1216') AS L
    ON T.id = L.row_value;

If the id_list is actually stored in another table (say list_table) you need to also use a CROSS APPLY operator (http://dcx.sybase.com/1101/en/dbusage_en11/apply-joins-joinsasp.html):

SELECT *
  FROM list_table AS LT
 CROSS APPLY sa_split_list(LT.id_list) L
  JOIN table T
    ON T.id = L.row_value
   AND <optional: some other condition that ties rows in list_table to rows in table, like a group membership or location>

This all requires that only numbers are part of these list as the split pieces will be automatically cast to the datatype of the id column. If that is not the case you can use this simpler, but slower technique:

SELECT *
  FROM table
 WHERE ','+@id_list+',' LIKE '%,'+CAST(id AS VARCHAR(20))+',%'

That last example assumed that the id list is in a variable @id_list. if it is in another table you can use the same CROSS APPLY technique from above. The additional commata make sure that the first or last entry can be matched.

SQL Server – Splitting CSV Column from SELECT Query into Multiple Columns

Making use of Jeff Moden's Tally-Ho! CSV splitter from here:

CREATE FUNCTION [dbo].[DelimitedSplit8K]
--===== Define I/O parameters
        (@pString VARCHAR(8000), @pDelimiter CHAR(1))
--WARNING!!! DO NOT USE MAX DATA-TYPES HERE!  IT WILL KILL PERFORMANCE!
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
--===== "Inline" CTE Driven "Tally Table" produces values from 1 up to 10,000...
     -- enough to cover VARCHAR(8000)
WITH E1(N) AS (
           SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
           SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
           SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
       ),                          --10E+1 or 10 rows
       E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
       E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front
                     -- for both a performance gain and prevention of accidental "overruns"
            SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() 
                                                        OVER (ORDER BY (SELECT NULL)) FROM E4
        ),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just
                     -- once for each delimiter)
            SELECT 1 UNION ALL
            SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(@pString,t.N,1) = @pDelimiter
        ),
cteLen(N1,L1) AS(--==== Return start and length (for use in substring)
            SELECT s.N1,
                   ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1,8000)
            FROM cteStart s
        )
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final
     -- element when no delimiter is found.
 SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
        Item       = SUBSTRING(@pString, l.N1, l.L1)
   FROM cteLen l
;
go

We can code the solution as an apply against Jeff's function and a pivot like so:

with data as (
    select Code,Location,Quantity,Store from ( values
        ('L698-W-EA',          NULL,                                      2, 'A')
       ,('L82009-EA',          'A1K2, A1N2, C4Y3, CBP2',                  2, 'A')
       ,('L80401-A-EA',        'A1S2, SHIP, R2F1, CBP5, BRP, BRP1-20',    17,'A')
       ,('CWD2132W-BOX-25PK',  'A-AISLE',                                 1, 'M')
       ,('GM22660003-EA',      'B1K2',                                    1, 'M')
    )data(Code,Location,Quantity,Store)
)
,shredded as (
    select Code,Location,Quantity,Store,t.*
    from data
    cross apply [dbo].[DelimitedSplit8K](data.Location,',') as t
)
select 
    pvt.Code,pvt.Quantity,pvt.Store
   ,cast(isnull(pvt.[1],' ') as varchar(8)) as Loc1
   ,cast(isnull(pvt.[2],' ') as varchar(8)) as Loc2
   ,cast(isnull(pvt.[3],' ') as varchar(8)) as Loc3
   ,cast(isnull(pvt.[4],' ') as varchar(8)) as Loc4
   ,cast(isnull(pvt.[5],' ') as varchar(8)) as Loc5 
   ,cast(isnull(pvt.[6],' ') as varchar(8)) as Loc6
from shredded
pivot (max(Item) for ItemNumber in ([1],[2],[3],[4],[5],[6])) pvt;
;
go

yielding this:

Code              Quantity    Store Loc1     Loc2     Loc3     Loc4     Loc5     Loc6
----------------- ----------- ----- -------- -------- -------- -------- -------- --------
L698-W-EA         2           A                                                   
L82009-EA         2           A     A1K2      A1N2     C4Y3     CBP2              
L80401-A-EA       17          A     A1S2      SHIP     R2F1     CBP5     BRP      BRP1-20
CWD2132W-BOX-25PK 1           M     A-AISLE                                       
GM22660003-EA     1           M     B1K2

Best Answer

Related Solutions

T-sql – Split/explode comma delimited string field into SQL query

SQL Server – Splitting CSV Column from SELECT Query into Multiple Columns

Related Question