Sql-server – Select a CSV string as multiple columns

csvsql serversql server 2014string manipulation

I'm using SQL Server 2014 and I have a table with one column containing a CSV string:

110,200,310,130,null

The output from the table looks like this:

I want to select the second column as multiple columns, putting each item of the CSV string in a separate column, like this:

So I created a function for splitting a string:

create FUNCTION [dbo].[fn_splitstring]
(
    @List nvarchar(2000),
    @SplitOn nvarchar(5)
)  
RETURNS @RtnValue table 
(
    Id int identity(1,1),
    Value nvarchar(100)
) 
AS  
BEGIN 
    while (Charindex(@SplitOn,@List)>0)
    begin
        insert into @RtnValue (value)
        select 
            Value = ltrim(rtrim(Substring(@List,1,Charindex(@SplitOn,@List)-1)))

        set @List = Substring(@List,Charindex(@SplitOn,@List)+len(@SplitOn),len(@List))
    end
    insert Into @RtnValue (Value)
    select Value = ltrim(rtrim(@List))

    return
END

I would like to use it similar to this:

select Val , (select value from tvf_split_string(cchar1,','))  from table1

But the above code obviously won't work, because the function will return more than one row causing the subquery to return more than one value and breaking the code.

I can use something like:

select Val ,
(select value from tvf_split_string(cchar1,',') order by id offset 0 rows fetch next 1 rows only ) as col1,
(select value from tvf_split_string(cchar1,',') order by id offset 1 rows fetch next 1 rows only ) as col2,
................
 from table1

but I don't think it's a good approach.

What is the correct way to do it?

Best Answer

For solving this, you will probably need some more procedural code. Different databases have different sets of built-in string functions (as you know). Thus, for finding a solution for this I have written "proof of concept" code that is rather generic and uses just the SUBSTR() function (the equivalent is SUBSTRING() in MS SQL).

When looking at LOAD DATA ... (MySQL) you can see that we need a .csv file, and an existing table. Using my function, you will be able to SELECT sections of the .csv file, by passing the column name plus 2 integers: one for the number of the "left-hand side delimiter", and one for the number of the "right-hand side delimiter". (Sounds horrible ...).

Example: suppose we have a comma-separated value looking like this, and it is stored in a colum called csv:

 aaa,111,zzz

If we want to "extract" 111 out of this, we call the function like this:

 select split(csv, 1, 2) ... ;
 -- 1: start at the first comma
 -- 2: end at the second comma

The "start" and the "end" of the string can be selected like this:

 select split(csv, 0, 1) ... ; -- from the start of the string (no comma) up to the first comma
 select split(csv, 2, 0) ... ; -- from the second comma right up to the end of the string

I know that the function code is not perfect, and that it can be simplified in places (eg in Oracle we should use INSTR(), which can find a particular occurrence of a part of a string). Also, there's no exception handling right now. It's just a first draft. Here goes ...

create or replace function split(
  csvstring varchar2
, lcpos number 
, rcpos number )
return varchar2
is
  slen pls_integer := 0 ;  -- string length
  comma constant varchar2(1) := ',' ;
  currentchar varchar2(1) := '' ;
  commacount pls_integer := 0 ;
  firstcommapos pls_integer := 0 ;
  secondcommapos pls_integer := 0 ;
begin

  slen := length(csvstring);

  -- special case: leftmost value
  if lcpos = 0 then
     firstcommapos := 0 ;
     for i in 1 .. slen
     loop    
        currentchar := substr(csvstring, i, 1) ;
        if currentchar = comma then
           secondcommapos := i - 1 ; 
           exit ;
        end if ;
     end loop ;
     return substr(csvstring, 1, secondcommapos) ;
  end if ;

  -- 2 commas somewhere in the middle of the string
  if lcpos > 0 and rcpos > 0 then
     for i in 1 .. slen
     loop    
        currentchar := substr(csvstring, i, 1) ;
        if currentchar = comma then
           commacount := commacount + 1;
           if commacount = lcpos then 
              firstcommapos := i ; 
           end if ;
           if commacount = rcpos then
              secondcommapos := i ;
           end if ;
        end if ;
     end loop ;
     return substr(csvstring, firstcommapos + 1, (secondcommapos-1) - firstcommapos ) ;
  end if ; 

  -- special case: rightmost value
  if rcpos = 0 then
     secondcommapos := slen ;
     for i in reverse 1 .. slen  -- caution: count DOWN!
     loop    
        currentchar := substr(csvstring, i, 1) ;
        if currentchar = comma then
           firstcommapos := i + 1  ; 
           exit ;
        end if ;
     end loop ;
     return substr(csvstring, firstcommapos, secondcommapos-(firstcommapos-1)) ;
  end if ;

end split;

Testing:

-- test table, test data
create table csv (
  id number generated always as identity primary key
, astring varchar2(256) 
);

-- insert some test data
begin
  insert into csv (astring) values ('123,456,88789,null,null');
  insert into csv (astring) values ('123,456,99789,1234,null');
  insert into csv (astring) values ('123,456,00789,1234,null');
  insert into csv (astring) values ('1,2222,77789,null,null');
  insert into csv (astring) values ('11,222,88789,null,');
  insert into csv (astring) values ('111,22,99789,,');
  insert into csv (astring) values ('1111,2,00789,oooo,null');
end;

-- testing:
select 
  split(astring,0,1) col1
, split(astring,1,2) col2
, split(astring,2,3) col3
, split(astring,3,4) col4
, split(astring,4,0) col5
from csv

-- output
COL1    COL2    COL3    COL4    COL5
123     456     88789   null    null
123     456     99789   1234    null
123     456     00789   1234    null
1       2222    77789   null    null
11      222     88789   null    -
111     22      99789   -       -
1111    2       00789   oooo    null

... The function seems to be overkill. However, if we write more procedural code, the SQL depending on it becomes rather "elegant". Best of luck with processing your csv!

Related Solutions

Sql-server – Using Select Statement to get column name and table name from the same table

Currently there is no syntax directly supporting what you are trying to do. As you probably know, names cannot be parametrised in a SQL statement. That means that when you need to substitute names from column values of another table, you have to use dynamic SQL: first build the query string and then execute it. There is just no working around using dynamic SQL in such cases. Furthermore, you have already established for yourself that you cannot use dynamic SQL in a function. So there you are, seemingly stumped.

However, if you insist on using a single SELECT statement for this, there is one way – provided you agree to bend over backwards slightly to achieve the goal, that is. And accept a major limitation of the method.

The solution involves creation of a loopback linked server and using the OPENQUERY function. But first you will need to make sure your dynamic SQL solution works as it is. For the purpose of this answer, I am going to assume that the dynamic SQL looks like this:

DECLARE @sql nvarchar (max) = '', @sqltemplate nvarchar(max) =
'UNION ALL
SELECT
  Column_Name = ''{Column_Name}'',
  Table_Name  = ''{Table_Name}'',
  Max_Length  = MAX(LEN([{Column_Name}]))
FROM
  [oil stop].dbo.[{Table_Name}]
';
SELECT
  @sql += REPLACE(
            REPLACE(
              @sqltemplate,
              '{Column_Name}',
              Column_Name
            ),
            '{Table_Name}',
            Table_Name
          )
FROM
  tempdb.dbo.YourMetaDataTable
;
SET @sql = STUFF(@sql, 1, 9, '');  -- remove the leading UNION ALL
EXECUTE sp_executesql @sql;

Once you have verified the script is working, and made sure the loopback linked server is created, just put the script inside the OPENQUERY function like this:

SELECT
  *
FROM
  OPENQUERY(
    YourLinkedServerName,
    '...'  -- the dynamic SQL script
  )
;

Remember to double each quotation mark (apostrophe) inside the script.

One other important change you will likely need to make is to add a WITH RESULT SETS clause to the EXECUTE statement to describe the result set, so that OPENQUERY can process the output correctly for you. When describing the result set, you will likely just repeat the same type for Column_Name and Table_Name as defined for them in the metadata table. For the example below I am assuming the type to be sysname in both cases. And as for the Max_Length column, I believe int would work well there. So, the modified EXECUTE statement would look like this:

EXECUTE sp_executesql @sql
WITH RESULT SETS
(
  (Column_Name sysname, Table_Name sysname, Max_Length int)
);

For completeness, and to make the lack of elegance in this solution more evident for the wider audience, this is what the final query would look like:

SELECT
  *
FROM
  OPENQUERY(
    [OIL STOP],
    'DECLARE @sql nvarchar (max) = '''', @sqltemplate nvarchar(max) =
    ''UNION ALL
    SELECT
      Column_Name = ''''{Column_Name}'''',
      Table_Name  = ''''{Table_Name}'''',
      Max_Length  = MAX(LEN([{Column_Name}]))
    FROM
      [oil stop].dbo.[{Table_Name}]
    '';
    SELECT
      @sql += REPLACE(
                REPLACE(
                  @sqltemplate,
                  ''{Column_Name}'',
                  Column_Name
                ),
                ''{Table_Name}'',
                Table_Name
              )
    FROM
      tempdb.dbo.MetaData
    ;
    SET @sql = STUFF(@sql, 1, 9, '''');  -- remove the leading UNION ALL
    EXECUTE sp_executesql @sql
    WITH RESULT SETS
    (
      (Column_Name sysname, Table_Name sysname, Max_Length int)
    );
    '
  )
;

The main problem, though, is that the query above still cannot be parametrised, and that is the principal limitation I was talking about. Even though the OPENQUERY script is specified as a string literal, it can only be a single string literal – not a variable, not a complex expression. That means that if you want to apply the query to a different subset of rows of the metadata table, you will have to use a new script for that.

Sql-server – select distinct on csv data

You should be able to use CROSS APPLY.

In an effort to provide a Minimum Complete Verifiable Example, I've included a 'simple' SPLIT TVF which is intended to emulate your INTLIST_TO_TBL function. This is for demonstration purposes only and there are more efficient ways to split a string that are beyond my answer.

IF EXISTS ( SELECT  *
            FROM    sys.objects
            WHERE   object_id = OBJECT_ID(N'[dbo].[Split]')
                    AND type IN (N'FN', N'IF', N'TF', N'FS', N'FT') ) 
    DROP FUNCTION [dbo].[Split] ;
GO
CREATE FUNCTION [dbo].[Split](@String varchar(8000), @Delimiter char(1))        
returns @temptable TABLE (element int,items varchar(8000))        
as        
begin        
    declare @element int=1
    declare @idx int        
    declare @slice varchar(8000)        

    select @idx = 1        
        if len(@String)<1 or @String is null  return        

    while @idx!= 0        
    begin        
        set @idx = charindex(@Delimiter,@String)        
        if @idx!=0        
            set @slice = left(@String,@idx - 1)        
        else        
            set @slice = @String        

        if(len(@slice)>0)
            begin   
                insert into @temptable(Element,Items) values(@element,@slice)        
                set @element=@element+1
            end 

        set @String = right(@String,len(@String) - @idx)        
        if len(@String) = 0 break        
    end    
return        
end


GO

Now that we have the Split TVF, let's see an example of how to incorporate it.

DECLARE @tbl TABLE (id VARCHAR(100))

INSERT @tbl
VALUES ('111,1487')
    ,('462')
    ,('2492')
    ,('3184')
    ,('3181,3184')
    ,('3181')
    ,('440')
    ,('1436')

SELECT DISTINCT b.items
FROM @tbl a
CROSS APPLY [dbo].[Split](a.id, ',') b

Best Answer

Related Solutions

Sql-server – Using Select Statement to get column name and table name from the same table

Sql-server – select distinct on csv data

Related Question