Sql-server – TSQL performance – JOIN on value BETWEEN min and max

performancequery-performancesql-server-2008t-sql

I have two tables in which I store:

  • an IP range – country lookup table
  • a list of requests coming from different IPs

The IPs were stored as bigints to improve lookup performance.

This is the table structure:

create table [dbo].[ip2country](
    [begin_ip] [varchar](15) NOT NULL,
    [end_ip] [varchar](15) NOT NULL,
    [begin_num] [bigint] NOT NULL,
    [end_num] [bigint] NOT NULL,
    [IDCountry] [int] NULL,
    constraint [PK_ip2country] PRIMARY KEY CLUSTERED 
    (
        [begin_num] ASC,
        [end_num] ASC
    )
)

create table Request(
    Id int identity primary key, 
    [Date] datetime, 
    IP bigint, 
    CategoryId int
)

I want to get the request breakdown per country, so I perform the following query:

select 
    ic.IDCountry,
    count(r.Id) as CountryCount
from Request r
left join ip2country ic 
  on r.IP between ic.begin_num and ic.end_num
where r.CategoryId = 1
group by ic.IDCountry

I have a lot of records in the tables: about 200,000 in IP2Country and a few millions in Request, so the query takes a while.

Looking at the execution plan, the most expensive part is a Clustered Index Seek on index PK_IP2Country, which is executed many times (the number of rows in Request).

Also, something that I feel a little strange about is the left join ip2country ic on r.IP between ic.begin_num and ic.end_num part (don't know if there's a better way to perform the lookup).

The table structure, some sample data and query are available in SQLFiddle: http://www.sqlfiddle.com/#!3/a463e/3 (unfortunately I don't think I can insert many records to reproduce the problem, but this hopefully gives an idea).

I'm (obviously) not an expert in SQL performance/optimizations, so my question is: Are there any obvious ways in which this structure/query can be improved performance-wise that I am missing?

Best Answer

You need an additional index. In your Fiddle example I added:

CREATE UNIQUE INDEX ix_IP ON Request(CategoryID, IP)

Which covers you for the request table and gets an index seek instead of a clustered index scan.

See how that improves it and let me know. I'm guessing it'll help quite a bit since the scan on that index is I'm sure not cheap.