SQL Server 2012 – Using LEFT or SUBSTRING with Wildcard

sql-server-2012t-sql

I am looking to retrieve information from a field, but am having troubles getting exactly what I want. The information I want has no set order and I am also looking for two different things. Here is an example of the information I am working with.

|PEvent|   PDate     |         PDescription                          |
|  3  | 2014-11-21 | Padding, 90 Series, Stage, SF550                |
|  4  | 2014-11-21 | Padding, 45 Series, 144 Stage SF90, 15 spor, AR |
|  5  | 2013-03-06 | 400 Series, 96 STAGE RE FLEX 30                 | 
|  6  | 2015-01-09 | Padding, 20 Series, 13 Stage SF 550, Housing    |

What I want to get is SF and the number and FLEX and the number that follows it, in these cases SF550, SF90, SF 550 FLEX 30. I would also like to get FLEX and the number that follows it. The problem I am having is that there are so many differences in the data. I cannot say 10 spaces from the left since it is not always 10 spaces. I have the same problem from the right. The other issue I have is that sometimes SF and FLEX will be right against the number where other times there will be a space in between them.

I have created a SQL Fiddle sample database with more records for testing.

SQL Fiddle

Best Answer

The following assumptions have been made (some of them possibly repeating parts of your description):

SF/FLEX is always followed either by a number immediately or by a space character and then a number.
There is always one space character before SF/FLEX, unless the item is at the beginning of the string.
There is always a comma or a space after the number that follows SF/FLEX, unless the item is at the end of the string.

With those in mind, here is my approach:

SELECT
  t.*,
  Item = LEFT(item.S, CHARINDEX(' ', item.S + ' ', LEN(prefix.S) + spaces.N + 1))
FROM
  dbo.testdesc AS t
  CROSS JOIN (VALUES ('SF'), ('FLEX')) AS prefix (S)
  CROSS JOIN (VALUES (0), (1)) AS spaces (N)
  CROSS APPLY (SELECT NULLIF(PATINDEX('% ' + prefix.S + SPACE(spaces.N) + '[0-9]%', ' ' + t.PDescription), 0)) AS startpos (N)
  CROSS APPLY (SELECT CHARINDEX(',', t.PDescription + ',', startpos.N + LEN(prefix.S) + spaces.N)) AS endpos (N)
  CROSS APPLY (SELECT SUBSTRING(t.PDescription, startpos.N, endpos.N - startpos.N)) AS item (S)
WHERE
  startpos.N IS NOT NULL
;

This is the output of the above query:

PEvent  PDate                    PDescription                                               Item
------  -----------------------  ---------------------------------------------------------  -------
3       2014-11-21 00:00:00.000  Padding, 90 Series, Stage, SF550                           SF550
4       2014-11-21 00:00:00.000  Padding, 45 Series, 144 Stage SF90, 15 spor, AR            SF90
8       2013-06-14 00:00:00.000  Padding, 900 Series, 140 Stage SF1100, 1/16, Money, TS XB  SF1100
10      2014-12-30 00:00:00.000  Padding, 444 Series, 47 Stage SF330, 5/9, house            SF330
6       2015-01-09 00:00:00.000  Padding, 20 Series, 13 Stage SF 550, Housing               SF 550
5       2013-03-06 00:00:00.000  400 Series, 96 STAGE RE FLEX 30                            FLEX 30
7       2013-06-14 00:00:00.000  Padding, FLEX 10 400 Series, 11 Stage, BYHMP               FLEX 10
9       2015-02-27 00:00:00.000  Flex 10 40 87 stage                                        Flex 10

And to explain some of the trickery employed by the query, this is how it works:

The original row set is cross joined with two inline views, one that represents the required prefixes (prefix) and one that specifies how many spaces there may be between the prefix and the subsequent number (spaces). This way the query will probe for all combinations of prefixes and numbers of delimiting spaces (SF, SF , FLEX, FLEX ).
The PDescription string is searched for each combination, one after one, to calculate the starting position of the prefix. It uses PATINDEX, which, for each prefix and space number, builds its own search pattern. As per the assumptions above, PATINDEX expects a space before the prefix and at least one digit after the prefix and, possibly, a space.

So, if, for instance, the current prefix is SF and the current expected number of spaces after it is 1, the pattern would be % SF [0-9]%. To make sure there is a space before the prefix, a space is added at the beginning of PDescription.

That trick accomplishes two things: it helps to find the prefix if it is at the beginning of the string and it makes the resulting starting position accurate. The latter is important because technically the returned position would match not the prefix itself but the preceding space. So, to get the actual position of the prefix, we would need to increment the result by 1. However, because we had added a space at the beginning of the source, all the positions were shifted forward already. So, in the end, the result points exactly at the beginning of the prefix.

One final touch here is that if the returned position is 0, it is transformed into NULL by NULLIF. That helps to avoid errors about negative length values passed to SUBSTRING later.
The found position is then used in CHARINDEX to find the first comma in PDescription found after that position. Again, to make sure there is a comma, the character is appended to PDescription. The result constitutes the ending position of the sought item.
Using the obtained starting and ending positions, the item is extracted from PDescription using the SUBSTRING function.
Finally, in the SELECT clause, before actually returning the item, the query additionally checks if there is a space after the number, just in case the item was not properly delimited by a comma. By appending a space to the item, it uses the same trick as when searching for the comma position, also taking into account the length of the prefix and the expected number of spaces after the prefix to make sure the found space character will be the one after the number.
In cases where the prefix is not found in the string, Item returns NULL. Such rows, however, are excluded from the output by the WHERE clause, which specifies that the starting position be not NULL.

I am not sure if the above description is clear enough but hopefully it will help together with the code itself.

Related Solutions

SQL Server 2012 – Using Charindex to Extract Specific Data from a String

I use charIndex and patIndex to resolve it.

CASE WHEN PatIndex('%, [0-9][0-9]% Stage%',Descr) > 0 
    AND LEN(SUBSTRING(Descr,1,CharIndex(' Stage', Descr)-1))>=3
    THEN RIGHT(SUBSTRING(Descr,1,CharIndex(' Stage', Descr)-1),3) 
    ELSE NULL END
    AS NewNo,   
CASE WHEN PatIndex('%, [0-9][0-9]% Stage%',Descr) > 0 
    AND LEN(SUBSTRING(Descr,1,CharIndex(' Stage', Descr)-1))>=3
    THEN SUBSTRING(Descr, PatIndex('%, [0-9][0-9]% Stage%',Descr)+2,3)
    ELSE NULL END
    AS NewNo2

With the output:

NewNo NewNo2
----- ------
 98   98 
111   111
125   125
134   134

Some explanation on it: I used to CharIndex(' Stage', Descr)to find the position when starts ' Stage'.
Then I use SUBSTRING(Descr,1,CharIndex(' Stage', Descr)-1) to cut the text so that your number will be in the right part. (like this :Playhouse, Virginia Series, 98).Then you can use different technique (right,another charIndex,reverse) to obtain the Number.

SQL Server Primary Key – Creating Unique Identifier Field with Conditional

You could use a key table to store the incrementing part of your second ID column. This solution does not rely on any client-side code, and is automatically multi-year aware; when the @DateAdded parameter passes in a previously unused year, it will automatically begin using a new set of values for the ID2 column, based on that year. If the proc is consequently used to insert rows from prior years, those rows will be inserted with "correct" values for the increment. The GetNextID() proc is geared to handle possible deadlocks gracefully, only passing an error to the caller if 5 sequential deadlocks occur when trying to update the tblIDs table.

Create a table to store one row per year containing the currently-used ID value, along with a stored procedure to return the new value to use:

CREATE TABLE [dbo].[tblIDs]
(
    IDName nvarchar(255) NOT NULL,
    LastID int NULL,
    CONSTRAINT [PK_tblIDs] PRIMARY KEY CLUSTERED 
    (
        [IDName] ASC
    ) WITH 
    (
        PAD_INDEX = OFF
        , STATISTICS_NORECOMPUTE = OFF
        , IGNORE_DUP_KEY = OFF
        , ALLOW_ROW_LOCKS = ON
        , ALLOW_PAGE_LOCKS = ON
        , FILLFACTOR = 100
    ) 
);
GO

CREATE PROCEDURE [dbo].[GetNextID](
    @IDName nvarchar(255)
)
AS
BEGIN
    /*
        Description:    Increments and returns the LastID value from
                        tblIDs for a given IDName
        Author:         Max Vernon / Mike Defehr
        Date:           2012-07-19

    */
    SET NOCOUNT ON;

    DECLARE @Retry int;
    DECLARE @EN int, @ES int, @ET int;
    SET @Retry = 5;
    DECLARE @NewID int;
    WHILE @Retry > 0
    BEGIN
        SET @NewID = NULL;
        BEGIN TRY
            UPDATE dbo.tblIDs 
            SET @NewID = LastID = LastID + 1 
            WHERE IDName = @IDName;

            IF @NewID IS NULL
            BEGIN
                SET @NewID = 1;
                INSERT INTO tblIDs (IDName, LastID) 
                VALUES (@IDName, @NewID);
            END
            SET @Retry = -2; /* no need to retry since the 
                                  operation completed */
        END TRY
        BEGIN CATCH
            IF (ERROR_NUMBER() = 1205) /* DEADLOCK */
                SET @Retry = @Retry - 1;
            ELSE
                BEGIN
                SET @Retry = -1;
                SET @EN = ERROR_NUMBER();
                SET @ES = ERROR_SEVERITY();
                SET @ET = ERROR_STATE()
                RAISERROR (@EN,@ES,@ET);
                END
        END CATCH
    END
    IF @Retry = 0 /* must have deadlock'd 5 times. */
    BEGIN
        SET @EN = 1205;
        SET @ES = 13;
        SET @ET = 1
        RAISERROR (@EN,@ES,@ET);
    END
    ELSE
        SELECT @NewID AS NewID;
END
GO

Your table, along with a proc to insert rows into it:

CREATE TABLE dbo.Cond
(
    CondID INT NOT NULL
        CONSTRAINT PK_Cond
        PRIMARY KEY CLUSTERED
        IDENTITY(1,1)
    , CondID2 VARCHAR(30) NOT NULL
    , Date_Added DATE NOT NULL
);

GO
CREATE PROCEDURE dbo.InsertCond
(
    @DateAdded DATE
)
AS
BEGIN
    DECLARE @NextID INT;
    DECLARE @Year INT;
    DECLARE @IDName NVARCHAR(255);
    SET @Year = DATEPART(YEAR, @DateAdded);
    DECLARE @Res TABLE
    (
        NextID INT NOT NULL
    );
    SET @IDName = 'Cond_' + CONVERT(VARCHAR(30), @Year, 0);
    INSERT INTO @Res (NextID)
    EXEC dbo.GetNextID @IDName;

    INSERT INTO dbo.Cond (CondID2, Date_Added)
    SELECT CONVERT(VARCHAR(30), NextID) + '/' + 
        SUBSTRING(CONVERT(VARCHAR(30), @Year), 3, 2), @DateAdded
    FROM @Res;
END
GO

Insert some sample data:

EXEC dbo.InsertCond @DateAdded = '2015-12-30';
EXEC dbo.InsertCond @DateAdded = '2015-12-31';
EXEC dbo.InsertCond @DateAdded = '2016-01-01';
EXEC dbo.InsertCond @DateAdded = '2016-01-02';

Show both tables:

SELECT *
FROM dbo.Cond;

SELECT *
FROM dbo.tblIDs;

Results:

The key table and stored proc come from this question.

Best Answer

Related Solutions

SQL Server 2012 – Using Charindex to Extract Specific Data from a String

SQL Server Primary Key – Creating Unique Identifier Field with Conditional

Related Question