Sql-server – Over 99 million records with geography column & spatial index – VERY VERY VERY slow queries!

indexspatialsql server

I’ve been working with SQL Server for a long time but have only recently encountered the need to work with spatial data. I have come into an environment that uses it pretty heavily and my first real challenge is to get queries to run faster (and stop timing out) against a table with a geography data type column (and index).

We use a query that identifies all geography points in this table that are found within polygons and multi polygons. There are over 99 million records in this table and I have NO friggin’ idea how to performance tune this beast!
I have identified the clustered index as being a bit larger than necessary and intend to add an identity column to do two things: 1) Reduce the size of the clustered index. 2) Eliminate page splitting for inserts. Although I expect to get some relief from doing this, I am not optimistic that it will help the spatial queries very much.

Given my almost complete lack of knowledge/experience with spatial data, I am
cannot make this better.

Example query:

    Declare @OrgID int
    Declare @Geog geography

    Set @OrgID =100011

    /* This will return a multi polygon */
    SELECT @Geog = geog 
    FROM Organization
    WHERE orgid= @orgid

    Select count(*)
    FROM ProblemChild WITH (INDEX(IDX_geog))  
    WHERE Geog.STIntersects(@geog) = 1) 

    Table Design:
    CREATE TABLE [dbo].[ProblemChild](
        [Phone] [char](10) NOT NULL,
        [Lat] [float] NOT NULL,
        [Lon] [float] NOT NULL,
        [Geog] [geography] NOT NULL,
        [Recordsource] [varchar](2) NOT NULL
     CONSTRAINT [PK_Phone] PRIMARY KEY CLUSTERED 
    ([Phone] ASC)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 80) ON [PRIMARY]
    ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
    GO

Index design:

    CREATE SPATIAL INDEX [IDX_geog] ON [dbo].[[ProblemChild]]
    ([Geog]
    )USING  GEOGRAPHY_GRID 
    WITH (GRIDS =(LEVEL_1 = HIGH,LEVEL_2 = HIGH,LEVEL_3 = HIGH,LEVEL_4 = MEDIUM), 
    CELLS_PER_OBJECT = 20, PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    GO

With:

    Number of records:  99,155,267

    MinLat      MinLon      MaxLat      MaxLon
    18.957356   -166.512394 71.292528   -66.967883

Best Answer

Look at the estimated query plan, and make sure the index is being used.

Also, the complexity of your MultiPolygon could be a significant factor. If you imagine your index as a series of grids over your MultiPolygon, there will be grids that either completely covered by your MultiPolygon or completely not covered by your MultiPolygon. Your ProblemChild points that fall into these grids are easy. When the grids are only partly covered by your MultiPolygon, it'll drill into the next level in and try the same.

When there are no more grid levels to drill into, the complexity really kicks in. If you have a particularly crinkly line, and you need to figure out which side of that line a particular point is, you need to check a lot of line segments and do maths around those. If you can simplify your MultiPolygon a lot, it'll speed up your query significantly. You can do this using the .Reduce() method, but then you need to be careful about correctness, in case you have points near the borders that would be on the other side of the line if were simplified.

By simplifying, I mean reducing it. Imagine an octagon, with eight points. If that were expressed using six points, or a four-point square, then the shape becomes different and something near a corner may have moved inside or outside as it the number of points reduced.

Edit: I've blogged about this topic at http://sqlblog.com/blogs/rob_farley/archive/2015/04/29/tuning-slow-spatial-queries-in-sql-server.aspx