I want to query spatial data for nearest neighbor. I am using this article and the following query works perfectly:
SELECT TOP 1 [Location].STDistance(@Location)
FROM [DS1]
WHERE [Location].STDistance(@Location) IS NOT NULL
ORDER BY [Location].STDistance(@Location);
The issue is I need to calculate this distance for each MSPID
. I have try several things (using cross apply, creating separate function, etc) but nothing worked. So, I decided to create a loop which will calculate the distance for each MSPID
.
SELECT TOP 1 [Location].STDistance(@Location)
FROM [DS1]
WHERE [Location].STDistance(@Location) IS NOT NULL
AND @CurrentMSPID = [MSPID]
ORDER BY [Location].STDistance(@Location);
The problem is that the above statement does not use the index. If I replace the @CurrentMSPID
with a number the index is used, but when a variable is used, the clustered index is used instead the spatial one.
I have try a lot of options and the only one that works is:
OPTION (OPTIMIZE FOR ( @CurrentMSPID = 1001 ))
or when the value is hard-coded. I cannot leave it that way because in the future this id
could even not exists and it is sure that the number of rows which it matches will be changed.
The [DS1]
table has primary key on [MSPID]
column and spatial index on the [Location]
column.
How can I help the engine to generate better execution plan and use the spatial index without using the OPTIMIZE FOR
option?
It seems when other data is extracted from the table (we are filtering by MSPID
now) the engine is not able to use the spatial index and performs clustered index seek instead.
Best Answer
I'm going to go out on a bit of a limb here and guess what you are trying to achieve.
I suspect that you want to find the nearest neighbour from within the DS1 table for all the rows in DS1.
For a test set I created the following randomly filled table.
I a version of your initial query for a quick test, but unfortunately on my desktop the optimiser decide that parallel was better and ignored the spatial index resulting in a 3 second query. Restricting it to a single core caused it to use the index and returned in milliseconds.
To achieve what you want for a single row from DS1 and assuming that you want the nearest row from DS1 to the one you've picked, the following uses the index for me and again works in milliseconds.
This can be made to go through the entire table, but it will take some time to complete as it needs to find the nearest neighbour for each one.
It took ~1 minute for 10,000, so for the full million that would take almost 2 hours.I have found that restricting this query to a single core isn't required and allowing parallism will improve performance. For me it halved the time. I would also recommend that you have a look at this question and it's answer for other performance considerations.