Sql-server – How to convert varchar iPv4 Address into a number deterministically

sql server

What are the various ways to convert a varchar iPv4 Address into a number without using PARSENAME() (non-deterministic) function in SQL Server, in order to create a persisted calculated column?

I used the string manipulation technique from Extract part of string based on nth instance of character to create the following:

Query

select 
((256*256*256)*convert(bigint, ltrim(rtrim(substring(replace('255.255.255.255','.',replicate(' ',8)),1, 9)))))
+
((256*256)*convert(bigint,ltrim(rtrim(substring(replace('255.255.255.255','.',replicate(' ',8)),10, 9)))))
+
((256)*convert(bigint,ltrim(rtrim(substring(replace('255.255.255.255','.',replicate(' ',8)),19, 9)))))
+
(convert(bigint,ltrim(rtrim(substring(replace('255.255.255.255','.',replicate(' ',8)),28, 9)))))

Here is a visual of what the string manipulation looks like before each segment is converted to BIGINT and multiplied:
IP String Manipulation

… This works to create a deterministic calculated column that can be persisted. What other more efficient techniques are there out there?

Best Answer

One option is to use SQLCLR. It is rather easy to create a method to split the string on the periods, convert each to an Int64, and multiply by the appropriate power of 256. Be sure to set IsDeterministic=true in the SqlFunction attribute, which will not only allow it to be persisted, but will also allow it to participate in parallel plans :-). And in fact, the SQL# SQLCLR library (that I created) contains functions (which are in the Free version) to do this conversion: INET_AddressToNumber and INET_NumberToAddress.

Test Setup:

CREATE TABLE #TempAddresses (IPAddress VARCHAR(15) NULL);

-- populate 1 million rows, with a NULL every 1000 rows
;WITH cte AS
(
  SELECT  TOP (1000000) ac1.column_id,
          ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS [RowNum]
  FROM    master.sys.all_columns ac1
  CROSS APPLY  master.sys.all_columns ac2
)
INSERT INTO #TempAddresses ([IPAddress])
SELECT CASE cte.[RowNum] % 1000
         WHEN 0 THEN NULL
         ELSE CONVERT(VARCHAR(3), CONVERT(INT, CRYPT_GEN_RANDOM(1)))
              + '.' +
              CONVERT(VARCHAR(3), CONVERT(INT, CRYPT_GEN_RANDOM(1)))
              + '.' +
              CONVERT(VARCHAR(3), CONVERT(INT, CRYPT_GEN_RANDOM(1)))
              + '.' +
              CONVERT(VARCHAR(3), CONVERT(INT, CRYPT_GEN_RANDOM(1)))
       END
FROM   cte

Regarding the following three tests, please note:

  • They are shown in order of fastest to slowest.
  • I ran each SELECT several times and kept the best time for each, not average time.

SQLCLR Test and Results:

DECLARE @Test1 BIGINT;
SET STATISTICS TIME ON;
SELECT @Test1 =  -- Comment out "@Test1 = " to test returning the value
       SQL#.INET_AddressToNumber(CONVERT(NVARCHAR(15), tmp.[IPAddress]))
FROM   #TempAddresses tmp;
SET STATISTICS TIME OFF;

-- CPU time = 2454 ms,  elapsed time = 5133 ms.
-- CPU time = 1594 ms,  elapsed time = 1617 ms. (into variable)

MartinSmith's Test and Results:

DECLARE @Test3 BIGINT;
SET STATISTICS TIME ON;
SELECT @Test3 =  -- Comment out "@Test3 = " to test returning the value
         256 * 256 * 256 * CAST(FLOOR(LEFT(tmp.[IPAddress],3)) AS BIGINT)
       +       256 * 256 * CAST(FLOOR(SUBSTRING(tmp.[IPAddress],1 + CHARINDEX('.', tmp.[IPAddress]),3)) AS BIGINT)
       +             256 * SUBSTRING(REPLACE(tmp.[IPAddress],'.',SPACE(8)),19, 9)
       +                   RIGHT(tmp.[IPAddress], -1 + CHARINDEX('.', REVERSE(tmp.[IPAddress])))
FROM   #TempAddresses tmp;
SET STATISTICS TIME OFF;

-- CPU time = 2781 ms,  elapsed time = 5151 ms.
-- CPU time = 2156 ms,  elapsed time = 2156 ms. (into variable)

O.P.'s Test and Results:

DECLARE @Test2 BIGINT;
SET STATISTICS TIME ON;
SELECT @Test2 =  -- Comment out "@Test2 = " to test returning the value
       ((256*256*256) * CONVERT(BIGINT, LTRIM(RTRIM(SUBSTRING(REPLACE(tmp.[IPAddress],'.',REPLICATE(' ',8)),1, 9)))))
       + ((256*256)*CONVERT(BIGINT,LTRIM(RTRIM(SUBSTRING(REPLACE(tmp.[IPAddress],'.',REPLICATE(' ',8)),10, 9)))))
       + ((256)*CONVERT(BIGINT,LTRIM(RTRIM(SUBSTRING(REPLACE(tmp.[IPAddress],'.',REPLICATE(' ',8)),19, 9)))))
       + (CONVERT(BIGINT,LTRIM(RTRIM(SUBSTRING(REPLACE(tmp.[IPAddress],'.',REPLICATE(' ',8)),28, 9)))))
FROM   #TempAddresses tmp;
SET STATISTICS TIME OFF;

-- CPU time = 3531 ms,  elapsed time = 5249 ms.
-- CPU time = 2515 ms,  elapsed time = 2534 ms. (into variable)

One thing to keep in mind regarding the SQLCLR option: using a SQLCLR function in a computed column is a dependency that will prevent the Assembly containing the function from being dropped. If you are able to use ALTER ASSEMBLY, then you won't being dropping and re-creating the Assembly. But ALTER ASSEMBLY is only allowed if there are no new or removed methods, and if the signatures of existing methods are still the same. If any of those changes are being made, then you will have to drop the Assembly and then create it again. And that will require that you first drop the computed column that is referencing the SQLCLR function contained in that Assembly. This might make the SQLCLR option less desirable in the use case of placing a function in a computed column, whereas if the function was in a query, Stored Procedure, View, etc, then this wouldn't be a concern.

It is possible to wrap the SQLCLR function in a T-SQL function and reference the T-SQL function in the computed column, but then you lose the benefit of SQLCLR functions being able to participate a parallel execution plan.