Sql-server – SQL Server spatial filter with additional conditions performance issue

performancequery-performancespatialsql server

I have a table Event that has 30 million rows and a table Location that has 35 million rows in SQL Server. An event can have multiple locations.

I have a spatial index, geometry auto grid (cells per object: 4000), on the Location table. The bounding box of all the locations covers and area of about 500 000 km2.

The performance is fine when only using STContains in the WHERE clause. If I try to filter on a additional column(s) the performance takes a big hit.

The following query takes 171ms:

SELECT TOP 2000 [E].*
FROM [dbo].[Event] AS [E]
INNER JOIN [dbo].[Location] AS [L] ON [E].[Id] = [L].[EventId]
WHERE 
    --[E].[Price] >= @Price
    --AND
    @BoundingBox.STContains([L].[Position]) = 1

If I add the filter for Price, the same query takes 10 seconds to run. If I remove the spatial index the query takes 500ms. Depending on the size of the bounding box and the value of price the query takes either like second or less or anywhere between three to thirty seconds. Only filtering on the spatial column gives good performance for all bounding box sizes.

Every article I've found about spatial queries in SQL Server never says anything about additional column(s) filters. They only describes the spatial filter and how to tune performance for the spatial index/query and it this case the spatial filter works fine with different sized bounding boxes. The problem is the additional filter columns I need.

Is it possible to get performance from a query with a spatial filter and with additional column(s) filters?

EDIT

95% of the price values are below 5000. So filtering on a high value will only return around 100 rows. I've tried with a bounding box that is 24000 sq km and contains around 4 million events. So the issue might be with the distribution of the price values. I have a second column Date which has a better distribution of values and filtering on that column is fast as long as we don't get to close the the max date.

SELECT TOP 2000 [E].*
FROM [dbo].[Event] AS [E]
INNER JOIN [dbo].[Location] AS [L] ON [E].[Id] = [L].[EventId]
WHERE 
    [E].[Date] >= @Date
    AND
    @BoundingBox.STContains([L].[Position]) = 1

Also the geometry have srid 4326 if that makes any difference.

Best Answer

The answer is yes it is possible to get performance with filters over multiple columns including spatial columns.

I've tried to replicate your tables and closely as I can to your brief description using RAND to create prices, location points and link them to events. But I can't duplicate you performance problem. Have I got the basic table details right?

--  Event table
SELECT TOP 30000000 
    N EventID, 
    RAND(CAST(NEWID() AS VARBINARY)) * 10000 + 5000 Price
    into Event
FROM Tally;

ALTER TABLE Event ALTER COLUMN EventID INT NOT NULL;
GO
ALTER TABLE Event ADD CONSTRAINT Event_PK PRIMARY KEY (EventID);
GO
CREATE INDEX Event_Price_IDX ON Event(Price);
GO

-- Location Table
SELECT TOP 35000000
    N LocationID, 
    CEILING(RAND(CAST(NEWID() AS VARBINARY)) * 30000000) EventID,
    Geometry::Point(RAND(CAST(NEWID() AS VARBINARY)) * 710000 + 40000, RAND(CAST(NEWID() AS VARBINARY)) * 710000 + 40000, 0) Location
    into Location
FROM Tally;

Running the 3 versions of your query with hopefully sensible parameters

-- Test Query
DECLARE @price FLOAT = 13000.00 -- ~20% of records
-- 10 sq km query window
DECLARE @BoundingBox Geometry = Geometry::STGeomFromText('POLYGON(( 300000 300000, 310000 300000, 310000 310000, 300000 310000, 300000 300000))',0)

SELECT TOP 2000 [E].*
FROM [dbo].[Event] AS [E]
INNER JOIN [dbo].[Location] AS [L] ON [E].[EventId] = [L].[EventId]
WHERE 
    --[E].[Price] >= @Price
    --AND
    @BoundingBox.STContains([L].[Location]) = 1

SELECT TOP 2000 [E].*
FROM [dbo].[Event] AS [E]
INNER JOIN [dbo].[Location] AS [L] ON [E].[EventId] = [L].[EventId]
WHERE 
    [E].[Price] >= @Price
    --AND
    --@BoundingBox.STContains([L].[Location]) = 1

SELECT TOP 2000 [E].*
FROM [dbo].[Event] AS [E]
INNER JOIN [dbo].[Location] AS [L] ON [E].[EventId] = [L].[EventId]
WHERE 
    [E].[Price] >= @Price
    AND
    @BoundingBox.STContains([L].[Location]) = 1

I get 93ms, 4ms and 93ms respectively, removing the TOP 2000 from the queries causes the price filtered query to blow out to a minute or so.

Are you able to give more details (execution plans, DDL, etc) to help us replicate your issue. My initial thought was that the TOP 2000 was hiding the underlying problem. I think that may still be the case.