Sql-server – How to you split multiple delimited strings of varying sizes into rows

sql serversql-server-2016

I have a situation where for each ID in my table there are multiple fields that each contain a comma-delimited string. The strings don't necessarily have the same number of elements as one another. The format of the data comes from another team and I can't change it.

How can I split these strings into rows with SQL Server 2016? I'd rather not add a function to the database if it can be avoided.

Here's a sample of what I'm starting with and what I'm looking to achieve:

DECLARE @SourceTable TABLE (ID INT, String1 VARCHAR(100), 
  String2 VARCHAR(100), String3 VARCHAR(100))

INSERT INTO @SourceTable (ID, String1, String2, String3)
VALUES (1, 'a1,a2,a3', 'b1,b2,b3,b4,b5,b6', 'c1,c2'),
       (2, 'd1,d2', 'e1', 'f1,f2')

--Source
SELECT *
FROM @SourceTable

Desired Output:

ID  Position    S1_Transformed  S2_Transformed  S3_Transformed
--  --------    --------------  --------------  --------------
1   1           a1              b1              c1
1   2           a2              b2              c2
1   3           a3              b3  
1   4                           b4  
1   5                           b5  
1   6                           b6  
2   1           d1              e1              f1
2   2           d2                              f2

EDIT: Corrected output based on Aaron's comment.

Best Answer

If you can bear to have a function to supply functionality not (yet) built into the engine, you can create this function which returns an ordinal position alongside each value (there are dozens of different ways to write this splitting function):

CREATE FUNCTION dbo.SplitCSVInOrder
(
    @List  nvarchar(max)
)
RETURNS TABLE
WITH SCHEMABINDING
AS
    RETURN
    (
      WITH L0(c) AS (SELECT 1 UNION ALL SELECT 1),
           L1(c) AS (SELECT 1 FROM L0, L0 AS B),
           L2(c) AS (SELECT 1 FROM L1, L1 AS B),
           L3(c) AS (SELECT 1 FROM L2, L2 AS B),
           L4(c) AS (SELECT 1 FROM L3, L3 AS B),
           L5(c) AS (SELECT 1 FROM L4, L4 AS B),
           N (n) AS (SELECT ROW_NUMBER() OVER (ORDER BY c) FROM L5)
        SELECT 
           [Position] = ROW_NUMBER() OVER (ORDER BY n),
           [Value] = LTRIM(RTRIM(SUBSTRING(@List, n,
                CHARINDEX(',', @List + ',', n) - n)))
        FROM N WHERE n <= LEN(@List)
            AND SUBSTRING(',' + @List, n, 1) = ','
    );

Then you can unpivot / re-pivot to line up the values from each position:

DECLARE @SourceTable TABLE
( 
  ID INT, 
  String1 varchar(100), 
  String2 varchar(100), 
  String3 varchar(100)
);

INSERT INTO @SourceTable (ID, String1, String2, String3)
VALUES (1, 'a1,a2,a3', 'b1,b2,b3,b4,b5,b6', 'c1,c2'),
       (2, 'd1,d2', 'e1', 'f1,f2');

;WITH x AS 
(
  SELECT * FROM @SourceTable
    UNPIVOT (s FOR strings IN (String1, String2, String3)) AS up
    CROSS APPLY dbo.SplitCSVInOrder(s)
),
p AS 
(
  SELECT * FROM x PIVOT (MAX(Value) 
    FOR strings IN ([String1],[String2],[String3])) AS p
)
SELECT ID, Position, S1_Transformed = COALESCE(MIN(String1),''), 
                     S2_Transformed = COALESCE(MIN(String2),''), 
                     S3_Transformed = COALESCE(MIN(String3),'')
FROM p
GROUP BY ID, Position
ORDER BY ID, Position;

Results:

ID  Position    S1_Transformed  S2_Transformed  S3_Transformed
--  --------    --------------  --------------  --------------
1   1           a1              b1              c1
1   2           a2              b2              c2
1   3           a3              b3  
1   4                           b4  
1   5                           b5  
1   6                           b6  
2   1           d1              e1              f1
2   2           d2                              f2

I am sure there is a simpler way to do it but that's what my raised-mostly-on-SQL-Server-2005 brain came up with.

Do you wish that query could be easier to formulate? Me too! For one, it would be great if there were a native string splitting function that also returned ordinal position. That would negate the need for a function to ensure the output comes out in the same order as it's listed in the string. I've asked for everyone's help in getting Microsoft to add a function in the next version of SQL Server. Please vote and add a constructive comment that explains your business case here:

And see some background on what I'm asking for specifically here: