Sql-server – Split two delimited strings in same order without function

functionssql serversql-server-2012string-splittingt-sql

I am trying to split two columns with delimited strings into rows. The positions of the values in each string are related so I am trying to split it out so that the related values are in a row. I am unable to use function as I cannot create objects in the database

Here is sample table and data

CREATE TABLE #temp
(id   INT,
 keys VARCHAR(50),
 vals VARCHAR(50)
);

INSERT INTO #temp
VALUES
(1, '1,2,3', 'one,two,three'),
(2, '4,5,6', 'four,five,six'),
(3, '7,8,9', 'seven,eight,nine');

and my desired output would be

ID  key  val
1   1    one
1   2    two
1   3    three
2   4    four
2   5    five
2   6    six
3   7    seven
3   8    eight
3   9    nine

I got the query to work if I only split one column, so I define two CTEs with row_number and join on ID and row_number. This does give desired output but my live table is very large and I was hoping for a way to pass through the table only once, instead of twice.

with keys as(
SELECT id,keys,vals,
       keys.keyid.value('.', 'VARCHAR(8000)') AS keyid,
      row_number() over(order by (select null)) as rn
FROM
(SELECT id,keys,vals,
           CAST('<Keys><key>'+REPLACE(keys, ',', '</key><key>')+'</key></Keys>' AS XML) AS tempkeys
    FROM #temp
) AS temp
CROSS APPLY tempkeys.nodes('/Keys/key') AS keys(keyid)),
vals as(
SELECT id,keys,vals,
       vals.val.value('.', 'VARCHAR(8000)') AS valid,
      row_number() over(order by (select null)) as rn
FROM
(SELECT id,keys,vals,
           CAST('<vals><val>'+REPLACE(vals, ',', '</val><val>')+'</val></vals>' AS XML) AS tempvals
    FROM #temp
) AS temp
CROSS APPLY tempvals.nodes('/vals/val') AS vals(val))


SELECT k.id, k.keyid, v.valid
FROM keys AS k
     INNER JOIN vals AS v
     ON k.id = v.id
        AND k.rn = v.rn; 

Best Answer

Create the function in msdb or somewhere else.

CREATE FUNCTION dbo.SplitTwoStringsWithSameOrder
(
    @List1  varchar(50),
    @List2  varchar(50),
    @Delim  varchar(10)
)
RETURNS TABLE
AS
    RETURN
    (
      WITH src(r) AS 
      (
        SELECT 1 UNION ALL SELECT r + 1 FROM src WHERE r < 10
      ),
      Numbers(Number) AS 
      (
        SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) 
        FROM src AS s1, src AS s2 -- add more if you need longer strings
      ),
      parsed(s1,s2,r1,r2)
      AS
      (
        SELECT
          SUBSTRING(@List1, n1.Number, CHARINDEX(@Delim, @List1 
            + @Delim, n1.Number) - n1.Number),
          SUBSTRING(@List2, n2.Number, CHARINDEX(@Delim, @List2 
            + @Delim, n2.Number) - n2.Number),
          r1 = ROW_NUMBER() OVER (ORDER BY n1.Number),
          r2 = ROW_NUMBER() OVER (ORDER BY n2.Number)
        FROM Numbers AS n1, Numbers AS n2
        ON  n1.Number <= LEN(@List1)
        AND n2.Number <= LEN(@List2)
        AND SUBSTRING(@Delim + @List1, n1.Number, LEN(@Delim)) = @Delim
        AND SUBSTRING(@Delim + @List2, n2.Number, LEN(@Delim)) = @Delim
      )
      SELECT s1, s2, r1, r2 FROM parsed WHERE r1 = r2
    );

Then, as @gbn noted, reference it by 3-part name wherever your query has to run.

CREATE TABLE #temp
(id   INT,
 keys VARCHAR(50),
 vals VARCHAR(50)
);

INSERT INTO #temp
VALUES
(1, '1,2,3', 'one,two,three'),
(2, '4,5,6', 'four,five,six'),
(3, '7,8,9', 'seven,eight,nine');

SELECT t.id, f.s1, f.s2 FROM #temp AS t
  CROSS APPLY msdb.dbo.SplitTwoStringsWithSameOrder(keys, vals, ',') AS f
  ORDER BY t.id, f.r1;
GO

DROP TABLE #temp;

Results:

enter image description here

The resulting plan, shown in Plan Explorer (disclaimer: I'm the Product Manager), is not the prettiest thing I've ever seen (click to enlarge a little bit):

enter image description here

But there is exactly one scan of #temp (4% cost). The biggest costs are two sorts and a spool, and there is some I/O due to a worktable which I am not sure is avoidable.

If you KNOW you will only ever have 50 characters in either of these strings, then you can get a much simpler plan with a built-in Numbers table (people object to these, but they're very useful, and they are almost always in memory if you reference them enough). This doesn't help I/O but removing the recursive CTE and other constructs of building the numbers inside the function is quite helpful for CPU etc.

First, the numbers table:

DROP TABLE dbo.Numbers;

;WITH n AS
(
    SELECT
        TOP (50) rn = ROW_NUMBER() OVER
        (ORDER BY [object_id])
    FROM sys.all_columns 
    ORDER BY [object_id]
)
SELECT [Number] = rn - 1
INTO dbo.Numbers
FROM n;

CREATE UNIQUE CLUSTERED INDEX n ON dbo.Numbers([Number]);

Then a second version of the function:

CREATE FUNCTION dbo.SplitTwoStringsWithSameOrder2
(
    @List1  varchar(50),
    @List2  varchar(50),
    @Delim  nvarchar(10)
)
RETURNS TABLE
AS
    RETURN
    (
      WITH parsed(s1,s2,r1,r2)
      AS
      (
        SELECT
          SUBSTRING(@List1, n1.Number, CHARINDEX(@Delim, @List1 
            + @Delim, n1.Number) - n1.Number),
          SUBSTRING(@List2, n2.Number, CHARINDEX(@Delim, @List2 
            + @Delim, n2.Number) - n2.Number),
          r1 = ROW_NUMBER() OVER (ORDER BY n1.Number),
          r2 = ROW_NUMBER() OVER (ORDER BY n2.Number)
        FROM dbo.Numbers AS n1
        INNER JOIN dbo.Numbers AS n2
        ON  n1.Number <= LEN(@List1)
        AND n2.Number <= LEN(@List2)
        AND SUBSTRING(@Delim + @List1, n1.Number, LEN(@Delim)) = @Delim
        AND SUBSTRING(@Delim + @List2, n2.Number, LEN(@Delim)) = @Delim
      )
      SELECT s1, s2, r1, r2 FROM parsed WHERE r1 = r2
    );
GO

Here is the simpler plan that results (again, click to enlarge):

enter image description here

The plan still has two sort operations, but the spool is gone, there is still only one scan of #temp, and in my limited tests the cost numbers (absolute cost numbers, not %) were better every time.

I don't know precisely either of these will scale with a lot more rows, but it's worth testing, and if you weigh this against other solutions and it can't scale well, that may be a point your reconsider the design (store these relationally instead of as comma-separated sets).