I have a table Event
that has 30 million rows and a table Location
that has 35 million rows in SQL Server. An event can have multiple locations.
I have a spatial index, geometry auto grid (cells per object: 4000), on the Location
table. The bounding box of all the locations covers and area of about 500 000 km2.
The performance is fine when only using STContains
in the WHERE
clause. If I try to filter on a additional column(s) the performance takes a big hit.
The following query takes 171ms:
SELECT TOP 2000 [E].*
FROM [dbo].[Event] AS [E]
INNER JOIN [dbo].[Location] AS [L] ON [E].[Id] = [L].[EventId]
WHERE
--[E].[Price] >= @Price
--AND
@BoundingBox.STContains([L].[Position]) = 1
If I add the filter for Price
, the same query takes 10 seconds to run. If I remove the spatial index the query takes 500ms. Depending on the size of the bounding box and the value of price the query takes either like second or less or anywhere between three to thirty seconds. Only filtering on the spatial column gives good performance for all bounding box sizes.
Every article I've found about spatial queries in SQL Server never says anything about additional column(s) filters. They only describes the spatial filter and how to tune performance for the spatial index/query and it this case the spatial filter works fine with different sized bounding boxes. The problem is the additional filter columns I need.
Is it possible to get performance from a query with a spatial filter and with additional column(s) filters?
EDIT
95% of the price values are below 5000. So filtering on a high value will only return around 100 rows. I've tried with a bounding box that is 24000 sq km and contains around 4 million events. So the issue might be with the distribution of the price values. I have a second column Date which has a better distribution of values and filtering on that column is fast as long as we don't get to close the the max date.
SELECT TOP 2000 [E].*
FROM [dbo].[Event] AS [E]
INNER JOIN [dbo].[Location] AS [L] ON [E].[Id] = [L].[EventId]
WHERE
[E].[Date] >= @Date
AND
@BoundingBox.STContains([L].[Position]) = 1
Also the geometry have srid 4326 if that makes any difference.
Best Answer
The answer is yes it is possible to get performance with filters over multiple columns including spatial columns.
I've tried to replicate your tables and closely as I can to your brief description using RAND to create prices, location points and link them to events. But I can't duplicate you performance problem. Have I got the basic table details right?
Running the 3 versions of your query with hopefully sensible parameters
I get 93ms, 4ms and 93ms respectively, removing the TOP 2000 from the queries causes the price filtered query to blow out to a minute or so.
Are you able to give more details (execution plans, DDL, etc) to help us replicate your issue. My initial thought was that the
TOP 2000
was hiding the underlying problem. I think that may still be the case.