Sql-server – Select a CSV string as multiple columns

csvsql serversql server 2014string manipulation

I'm using SQL Server 2014 and I have a table with one column containing a CSV string:

110,200,310,130,null

The output from the table looks like this:

select result

I want to select the second column as multiple columns, putting each item of the CSV string in a separate column, like this:

expected result

So I created a function for splitting a string:

create FUNCTION [dbo].[fn_splitstring]
(
    @List nvarchar(2000),
    @SplitOn nvarchar(5)
)  
RETURNS @RtnValue table 
(
    Id int identity(1,1),
    Value nvarchar(100)
) 
AS  
BEGIN 
    while (Charindex(@SplitOn,@List)>0)
    begin
        insert into @RtnValue (value)
        select 
            Value = ltrim(rtrim(Substring(@List,1,Charindex(@SplitOn,@List)-1)))

        set @List = Substring(@List,Charindex(@SplitOn,@List)+len(@SplitOn),len(@List))
    end
    insert Into @RtnValue (Value)
    select Value = ltrim(rtrim(@List))

    return
END

I would like to use it similar to this:

select Val , (select value from tvf_split_string(cchar1,','))  from table1

But the above code obviously won't work, because the function will return more than one row causing the subquery to return more than one value and breaking the code.

I can use something like:

select Val ,
(select value from tvf_split_string(cchar1,',') order by id offset 0 rows fetch next 1 rows only ) as col1,
(select value from tvf_split_string(cchar1,',') order by id offset 1 rows fetch next 1 rows only ) as col2,
................
 from table1

but I don't think it's a good approach.

What is the correct way to do it?

Best Answer

For solving this, you will probably need some more procedural code. Different databases have different sets of built-in string functions (as you know). Thus, for finding a solution for this I have written "proof of concept" code that is rather generic and uses just the SUBSTR() function (the equivalent is SUBSTRING() in MS SQL).

When looking at LOAD DATA ... (MySQL) you can see that we need a .csv file, and an existing table. Using my function, you will be able to SELECT sections of the .csv file, by passing the column name plus 2 integers: one for the number of the "left-hand side delimiter", and one for the number of the "right-hand side delimiter". (Sounds horrible ...).

Example: suppose we have a comma-separated value looking like this, and it is stored in a colum called csv:

 aaa,111,zzz

If we want to "extract" 111 out of this, we call the function like this:

 select split(csv, 1, 2) ... ;
 -- 1: start at the first comma
 -- 2: end at the second comma

The "start" and the "end" of the string can be selected like this:

 select split(csv, 0, 1) ... ; -- from the start of the string (no comma) up to the first comma
 select split(csv, 2, 0) ... ; -- from the second comma right up to the end of the string

I know that the function code is not perfect, and that it can be simplified in places (eg in Oracle we should use INSTR(), which can find a particular occurrence of a part of a string). Also, there's no exception handling right now. It's just a first draft. Here goes ...

create or replace function split(
  csvstring varchar2
, lcpos number 
, rcpos number )
return varchar2
is
  slen pls_integer := 0 ;  -- string length
  comma constant varchar2(1) := ',' ;
  currentchar varchar2(1) := '' ;
  commacount pls_integer := 0 ;
  firstcommapos pls_integer := 0 ;
  secondcommapos pls_integer := 0 ;
begin

  slen := length(csvstring);

  -- special case: leftmost value
  if lcpos = 0 then
     firstcommapos := 0 ;
     for i in 1 .. slen
     loop    
        currentchar := substr(csvstring, i, 1) ;
        if currentchar = comma then
           secondcommapos := i - 1 ; 
           exit ;
        end if ;
     end loop ;
     return substr(csvstring, 1, secondcommapos) ;
  end if ;

  -- 2 commas somewhere in the middle of the string
  if lcpos > 0 and rcpos > 0 then
     for i in 1 .. slen
     loop    
        currentchar := substr(csvstring, i, 1) ;
        if currentchar = comma then
           commacount := commacount + 1;
           if commacount = lcpos then 
              firstcommapos := i ; 
           end if ;
           if commacount = rcpos then
              secondcommapos := i ;
           end if ;
        end if ;
     end loop ;
     return substr(csvstring, firstcommapos + 1, (secondcommapos-1) - firstcommapos ) ;
  end if ; 

  -- special case: rightmost value
  if rcpos = 0 then
     secondcommapos := slen ;
     for i in reverse 1 .. slen  -- caution: count DOWN!
     loop    
        currentchar := substr(csvstring, i, 1) ;
        if currentchar = comma then
           firstcommapos := i + 1  ; 
           exit ;
        end if ;
     end loop ;
     return substr(csvstring, firstcommapos, secondcommapos-(firstcommapos-1)) ;
  end if ;

end split;

Testing:

-- test table, test data
create table csv (
  id number generated always as identity primary key
, astring varchar2(256) 
);

-- insert some test data
begin
  insert into csv (astring) values ('123,456,88789,null,null');
  insert into csv (astring) values ('123,456,99789,1234,null');
  insert into csv (astring) values ('123,456,00789,1234,null');
  insert into csv (astring) values ('1,2222,77789,null,null');
  insert into csv (astring) values ('11,222,88789,null,');
  insert into csv (astring) values ('111,22,99789,,');
  insert into csv (astring) values ('1111,2,00789,oooo,null');
end;

-- testing:
select 
  split(astring,0,1) col1
, split(astring,1,2) col2
, split(astring,2,3) col3
, split(astring,3,4) col4
, split(astring,4,0) col5
from csv

-- output
COL1    COL2    COL3    COL4    COL5
123     456     88789   null    null
123     456     99789   1234    null
123     456     00789   1234    null
1       2222    77789   null    null
11      222     88789   null    -
111     22      99789   -       -
1111    2       00789   oooo    null

... The function seems to be overkill. However, if we write more procedural code, the SQL depending on it becomes rather "elegant". Best of luck with processing your csv!