Sql-server – SQL Split Row Data Separated by Spaces

splitsql serversql-server-2016substring

I am looking for a query
to find nth value in a list. The separator is anything greater than or equal to 2 spaces. (it can be 3, or 5 spaces).
Trying to avoid scalar value functions, since performance may be slower. The sentences can have any number of words, from 5-20.

CREATE TABLE dbo.TestWrite (TestWriteId int primary key identity(1,1), 
                            TextRow varchar(255))
INSERT INTO dbo.TestWrite (TextRow)
SELECT 'I am writing SQL Code.'
UNION ALL
SELECT 'SQL keywords include join, except, where.'


+-----+----------+---------+---------------+---------+----------+
| SQL | keywords | include |     join,     | except, |   where. |
+-----+----------+---------+---------------+---------+----------+
| I   | am       | writing |    SQL  Code. |         |          |
+-----+----------+---------+---------------+---------+----------+

Would like in individual rows with columns, see comments above.

This may be one solution trying to utilize.
https://stackoverflow.com/questions/19449492/using-t-sql-return-nth-delimited-element-from-a-string

DECLARE @dlmt NVARCHAR(10)=N' ';
DECLARE @pos INT = 2;
SELECT CAST(N'<x>' + REPLACE(@input,@dlmt,N'</x><x>') + N'</x>' AS XML).value('/x[sql:variable("@pos")][1]','nvarchar(max)')

Best Answer

Splitting to separate fields of a result set is a bit tricky if you have a varying number of elements per row. There is a SQLCLR stored procedure in the SQL# library (that I wrote) that, while not in the Free version, does break a delimited string of varying elements into result set fields based on a Regular Expression delimiter. The delimiter being RegEx makes it quite easy to treat "one or more spaces" as the delimiter: \s+. For example:

EXEC SQL#.String_SplitResultIntoFields N'
SELECT [TextRow] FROM #TestWrite;',
N'\s+', NULL, NULL;
/*
Field1    Field2     Field3     Field4    Field5
I         am         writing    SQL       Code.
SQL       keywords   include    join,     except,
*/

As you can see, it determines the number of fields for the result set based on the first row, which is why there is no "Field6" to contain the final word in the second row.

Of course, you could always seed the first row with dashes separated by spaces to force a certain number of fields, but there is no way to then filter out that initial row:

EXEC SQL#.String_SplitResultIntoFields N'
SELECT N''- - - - - - - -''
UNION ALL
SELECT [TextRow] FROM #TestWrite;',
N'\s+', NULL, NULL;
/*
Field1    Field2     Field3     Field4    Field5    Field6   Field7    Field8
-         -          -          -         -         -        -         -
I         am         writing    SQL       Code.
SQL       keywords   include    join,     except,   where.
*/

I suppose I can add an optional input parameter for @ForceResultSetFieldCount easily enough, but it's not there as of today.

If this request was only what is initially stated in the question (i.e. "I am looking for a query to find nth value in a list"), then that is not only trivial (even with the "one or more spaces" delimiter requirement), but that requires a regular expression function, RegEx_CaptureGroupCapture, and that is in the Free version of SQL#. For example:

SETUP

CREATE TABLE #TestWrite
(
  TestWriteId INT PRIMARY KEY IDENTITY(1, 1),
  TextRow VARCHAR(255)
);

INSERT INTO #TestWrite (TextRow)
  SELECT 'I am     writing SQL                   Code.'
  UNION ALL
  SELECT 'SQL  keywords include   join,    except, where.';

TESTS

As you can see below, you can either use a pattern of "one or more word characters", which will exclude both whitespace and punctuation (first example), or you can use a pattern of "one or more non-whitespace", which will include punctuation, etc (second example).

-- only get "word" characters:
SELECT SQL#.RegEx_CaptureGroupCapture(t.[TextRow], N'(\w+)', 1, 4, 1, NULL, 1, -1, NULL)
FROM   #TestWrite t;
/*
SQL
join
*/

-- get non-whitespace:
SELECT SQL#.RegEx_CaptureGroupCapture(t.[TextRow], N'([^\s]+)', 1, 4, 1, NULL, 1, -1, NULL)
FROM   #TestWrite t;
/*
SQL
join,
*/

Related Solutions

Sql-server – Execution plan flips Filter and Execute Scalar when using the PK, causes cast to fail

I'm sure there are dba.se and StackOverflow duplicates, but it was far faster for me to post this link:

https://feedback.azure.com/forums/908035-sql-server/suggestions/32912431-sql-server-should-not-raise-illogical-errors

In short, the Query Optimizer is really free to rewrite the plan as it sees best. Sometimes, it means a column is transformed (CAST -> int) close to retrieval so that it doesn't have to be carried many steps forward before performing the CAST.

The only sure-fire way of preventing the error is to look at the CAST expression itself, e.g.:

cast(CASE WHEN ISNUMERIC(ENTRY_CODE)=1 THEN ENTRY_CODE END as int)

Sql-server – deteriorating stored procedure running times

What is up with FROM part JOIN model ON 1=1? This the same as FROM part, model, which is a cartesian join and will result in a very large number of rows. Is that join supposed to be like that?

You will likely help us help you if you provide details about the tables involved. Please "script" the definition of the tables, along with any indexes defined on those tables.

This sounds like a classic case of parameter sniffing resulting in good plan/bad plan choices for various scenarios in your data.

You may be able to get more reliable performance by making SQL Server cache different plans for different scenarios by using sp_executesql, as in the following example:

CREATE PROCEDURE [dbo].[create_grid_materials2] 
(
    @partlistid bigint
    , @pid bigint
    , @masterid bigint
)
AS
BEGIN
    begin
        DECLARE @cmd NVARCHAR(MAX);

        SET @cmd = '   
        INSERT INTO material (partid, personid, modelID)
        SELECT 
            partid = part.id
            , personid = @pid
            , modelid = model.id  
        FROM part
            INNER JOIN model ON 1=1
        WHERE (
            model.masterid = ' + CONVERT(NVARCHAR(50), @masterid) + ' 
                AND model.modelSetID IS NULL
                AND part.partlistid = ' + CONVERT(NVARCHAR(50), @partlistid) + '
                AND (
                    part.partType = 100 
                    or part.partType=120 
                    or part.partType = 130
                )
            )
            AND NOT EXISTS (
                SELECT 1 
                FROM material AS a1 
                WHERE a1.partid = part.id 
                    AND a1.personid=@pid 
                    AND a1.modelid=model.id
                )';
        DECLARE @Params VARCHAR(200);
        SET @Params = '@pid INT';
        EXEC sys.sp_executesql @cmd
            , @Params
            , @pid = @pid;
    end
End

The above code will cause a new plan to be generated for each combination of @partlistid, and @masterid.

The presumption here is some combinations of those two variables lead to a very small number of rows, whereas some combinations lead to a very large number of rows.

Forcing a plan for each combination allows SQL Server to generate more efficient plans for each. I've explicitly not included @pid since you probably want to try it with a fairly small number of combinations first; adding a third variable to the mix will make for an exponentially larger number of possible plans.

Best Answer

Related Solutions

Sql-server – Execution plan flips Filter and Execute Scalar when using the PK, causes cast to fail

Sql-server – deteriorating stored procedure running times

Related Question