I am going to create a table with two fields – ID
as BIGINT
and IPAddress
as either varchar(45)
or varbinary(16)
. The idea is to store all unique IP addresses and use a reference ID
instead the actual IP address
in other tables.
Generally, I am going to create a stored procedure that is returning the ID
for given IP address
or (if the address was not found) insert the address and return the generated ID
.
I am expecting to have many records (I cannot tell exactly how many), but I need the stored procedure above to be executed as fast as possible. So, I am wondering how to stored the actual IP address – in text or bytes format. Which is going to be better?
I have already written SQL CLR
functions for transforming IP address bytes to string and the reverse, so transformation is not an issue (working with both IPv4
and IPv6
).
I guess I need to create an index to optimize the search, but I am not sure should I include the IP address
field to the clustered index, or to create a separate index and with which type the search will be faster?
Best Answer
Since "text" here refers to
VARCHAR(45)
and "bytes" refers toVARBINARY(16)
, I would say: neither.Given the following information (from Wikipedia article on IPv6):
I would start by using 8
VARBINARY(2)
fields to represent the 8 groups. The fields for Groups 5 - 8 should beNULL
as they will only be used for IPv6 addresses. The fields for Groups 1 - 4 should beNOT NULL
as they will be used for both IPv4 and IPv6 addresses.By keeping each group independent (as opposed to combining them into either a
VARCHAR(45)
or aVARBINARY(16)
or even twoBIGINT
fields) you get two main benefits:IF
/IIF
/CASE
statements to facilitate this.ROW COMPRESSION
orPAGE COMPRESSION
. Since both types of COMPRESSION will allow for fields that are0x00
to take up 0 bytes, all of those groups of zeroes will now not cost you anything. On the other hand, if you stored the example address from above (in the Wikipedia quote), then the 3 sets of all zeroes in the middle would take up their full amount of space (unless you were doing theVARCHAR(45)
and went with the reduced notation, but that might not work well for indexing and would require special parsing to reconstruct it to the full format, so let's assume that is not an option ;-).IF you need to capture the Network, create a
TINYINT
field for that called, um,[Network]
:-)For more info on the Network value, here is some info from another Wikipedia article on the IPv6 address:
For indexing, I would say create a Non-Clustered index on the 8 Group fields, and possibly the Network field if you decide to include that.
End result should be something like the following:
Notes:
BIGINT
for the ID field, but do you really expect to capture more than 4,294,967,295 unique values? If so then just change the field to be BIGINT and you can then even change the seed value to be 0. But otherwise you are better off using INT and starting with the minimum value so that you can make use of the entire range of that datatype.SELECT *
will return the fields in the expected order. But the index has them going up, from 1 to 8, as that is how they are filled out.An example (unfinished) of a computed column to represent the values in text form is:
Test:
Result: