SQL Server – Check if a String Contains a Number Exclusively

pattern matchingsql serversql-server-2005substringt-sql

I have been trying to write a function to check if a string contains a number without that number being part of a bigger number (in other words if the number being searched for is '6' and the string is '7+16+2' it should return false, because the '6' in this string is part of the number '16')

I wrote the function below (It is lengthy but I was going to test it first before refactoring it)

Upon testing I found a bug whereby it only runs the first instance of a found number through the logic. So running this function with '6' against '16+7+9+6' would return false because it determines that the firrst '6' is part of a bigger number and stops processing.

I figured that to get around this I have to implement a loop to shorten the 'haystack' string (so that, using the example '16+7+9+6' the function continues checking '+7+9+6' after eliminating the first '6') but before spending the time to make an already convoluted function even more convoluted I wanted to check if there is a simpler way to acheive the same goal?

drop function dbo.runners_contain_runner
go
create function dbo.runners_contain_runner(@runner varchar(max), @runners varchar(max))
returns int
as 
begin

    /*
    eliminate the plus sign from @runners so that the 
    'isnumeric' function doesn't return false positives (it returns 1 for '+') 
    */
    set @runners = replace(@runners,'+','_' ) 


    declare @ret int;
    set @ret = 0;

    -- if the runner is the only runner return 1
    if @runners = @runner 
        set @ret = 1
    else
    begin
        declare @charindex int;
        set @charindex = charindex(@runner,@runners)
        if @charindex > 0
        begin

            -- if it is at the beginning then check the char after it 
            if @charindex = 1 
            begin
                if isnumeric(substring(@runners,@charindex + len(@runner),1)) = 0
                    set @ret = @charindex
            end

            -- if it is at the end then check the char before it
            else if @charindex = len(@runners) - (len(@runner) - 1) 
            begin
                if isnumeric(substring(@runners,@charindex - 1,1)) = 0
                    set @ret = @charindex
            end

            -- if it is in the middle check the chars either side of it
            else
            begin
                if isnumeric(substring(@runners,@charindex - 1,1)) + 
                isnumeric(substring(@runners,@charindex + len(@runner),1)) = 0
                    set @ret = @charindex 
            end
        end
    end
    return @ret
end

Best Answer

Perhaps you are over-complicating this by focusing too much on wanting a number. Take a step back for a moment. What you actually want is a substring without any digits on either side of it. The only way a number could be part of a larger number is to have at least 1 digit on either side of it, right? So as long as you only pass in numbers, then this definition should still produce numbers that do not have any digits on either side.

With that in mind, we just need 3 PATINDEX predicates to cover the passed-in value being on the far left, on the far right, or in the middle. Try the following as it seems to work:

GO
CREATE PROCEDURE #TestFindRunner
(
  @Runner VARCHAR(10)
)
AS
SET NOCOUNT ON;

DECLARE @Data TABLE
(
  [ID] INT NOT NULL PRIMARY KEY,
  [Runners] VARCHAR(50) NULL
 );

INSERT INTO @Data ([ID], [Runners]) VALUES (1, '16+7+9+6');
INSERT INTO @Data ([ID], [Runners]) VALUES (2, '16+7+9+5');
INSERT INTO @Data ([ID], [Runners]) VALUES (3, '26+77+9+5');
INSERT INTO @Data ([ID], [Runners]) VALUES (4, '6+3+45');
INSERT INTO @Data ([ID], [Runners]) VALUES (5, '63,808,111,92');
INSERT INTO @Data ([ID], [Runners]) VALUES (6, '1-7-9,6');
INSERT INTO @Data ([ID], [Runners]) VALUES (7, '1-6-9,7');
INSERT INTO @Data ([ID], [Runners]) VALUES (8, '1-7-9,63');
INSERT INTO @Data ([ID], [Runners]) VALUES (9, '1-63-9,7');
INSERT INTO @Data ([ID], [Runners]) VALUES (10, NULL);
INSERT INTO @Data ([ID], [Runners]) VALUES (11, '6');

SELECT tmp.*
FROM   @Data tmp
WHERE  @Runner COLLATE Latin1_General_100_BIN2 = tmp.[Runners]
OR     PATINDEX('%[^0123456789]' + @Runner COLLATE Latin1_General_100_BIN2,
                tmp.[Runners]) > 0
OR     PATINDEX(@Runner + '[^0123456789]%' COLLATE Latin1_General_100_BIN2,
                tmp.[Runners]) > 0
OR     PATINDEX('%[^0123456789]' + @Runner + '[^0123456789]%'
                COLLATE Latin1_General_100_BIN2, tmp.[Runners]) > 0
GO

And then test with:

EXEC #TestFindRunner 0;
EXEC #TestFindRunner 2;
EXEC #TestFindRunner 4;
EXEC #TestFindRunner 8;
EXEC #TestFindRunner 11;
-- 0 rows

EXEC #TestFindRunner 3;   -- 4
EXEC #TestFindRunner 77;  -- 3
EXEC #TestFindRunner 111; -- 5
-- 1 row

EXEC #TestFindRunner 5; -- 2 and 3
-- 2 rows

EXEC #TestFindRunner 1; -- 6, 7, 8, and 9
-- 4 rows

EXEC #TestFindRunner 6; -- 1, 4, 6, 7, and 11
-- 5 rows

EXEC #TestFindRunner 7; -- 1, 2, 6, 7, 8, and 9
-- 6 rows

EXEC #TestFindRunner 9; -- 1, 2, 3, 6, 7, 8, and 9
-- 7 rows

The reason for having 3 variations of the PATINDEX is that PATINDEX search patterns are not Regular Expressions (RegeEx), contrary to what many people say / think (same with LIKE patterns). PATINDEX and LIKE patterns do not have quantifiers, so it is not possible to specify that the [^0123456789] single character replacement should be "0 or more"; it is "one and only one; no more, no less".

Forcing the binary collation (i.e. the COLLATE Latin1_General_100_BIN2 after each @Runner reference) ensures that we are only dealing with these 10 decimal digits and not any other character that might be considered equivalent


To put the above logic into an Inline Table-Valued Function (TVF) so that it is easier to use (and more efficient than a similarly easy-to-use scalar UDF), try the following:

USE [tempdb];
GO
CREATE FUNCTION dbo.IsRunnerPresent
(
  @Runner VARCHAR(10),
  @Runners VARCHAR(8000)
)
RETURNS TABLE
WITH SCHEMABINDING
AS RETURN

  SELECT CONVERT(BIT,
    CASE WHEN @Runner COLLATE Latin1_General_100_BIN2 = @Runners
           OR PATINDEX('%[^0123456789]' + @Runner
                       COLLATE Latin1_General_100_BIN2, @Runners) > 0
           OR PATINDEX(@Runner + '[^0123456789]%'
                       COLLATE Latin1_General_100_BIN2, @Runners) > 0
           OR PATINDEX('%[^0123456789]' + @Runner + '[^0123456789]%'
                       COLLATE Latin1_General_100_BIN2, @Runners) > 0
           THEN 1
         ELSE 0
      END) AS [RunnerFound];
GO

And then test with:

DECLARE @Runner VARCHAR(10);
SET @Runner = '6';


DECLARE @Data TABLE
(
  [ID] INT NOT NULL PRIMARY KEY,
  [Runners] VARCHAR(50) NULL
 );

INSERT INTO @Data ([ID], [Runners]) VALUES (1, '16+7+9+6');
INSERT INTO @Data ([ID], [Runners]) VALUES (2, '16+7+9+5');
INSERT INTO @Data ([ID], [Runners]) VALUES (3, '26+77+9+5');
INSERT INTO @Data ([ID], [Runners]) VALUES (4, '6+3+45');
INSERT INTO @Data ([ID], [Runners]) VALUES (5, '63,808,111,92');
INSERT INTO @Data ([ID], [Runners]) VALUES (6, '1-7-9,6');
INSERT INTO @Data ([ID], [Runners]) VALUES (7, '1-6-9,7');
INSERT INTO @Data ([ID], [Runners]) VALUES (8, '1-7-9,63');
INSERT INTO @Data ([ID], [Runners]) VALUES (9, '1-63-9,7');
INSERT INTO @Data ([ID], [Runners]) VALUES (10, NULL);
INSERT INTO @Data ([ID], [Runners]) VALUES (11, '6');

SELECT tmp.[ID],
       tmp.[Runners],
       fnd.[RunnerFound]
FROM   @Data tmp
CROSS APPLY dbo.IsRunnerPresentTVF(@Runner, tmp.[Runners]) fnd;

Which returns:

ID    Runners          RunnerFound
1     16+7+9+6         1
2     16+7+9+5         0
3     26+77+9+5        0
4     6+3+45           1
5     63,808,111,92    0
6     1-7-9,6          1
7     1-6-9,7          1
8     1-7-9,63         0
9     1-63-9,7         0
10    NULL             0
11    6                1