I have a large Postgres table contain geographic positions (latitude and longitude) both fields are indexed, and both are defined as NUMERIC(9,6)
.
If I run a query looking for an exact position match, something like this:
WHERE latitude = 1.234567890123456789
AND longitude = 9.876543210987654321
Then get a very fast response, but I get very few results because the database is searching for a very precise match.
For my purposes, I'm looking for positions that match to within a few meters so a match to 4 or 5 decimal places should be fine. This gives me the results I'm looking for:
WHERE ABS(latitude - 1.234567890123456789) < 0.0001
AND ABS(longitude - 9.876543210987654321) < 0.0001
But NOT the performance (it can take 5 minutes to run, compared to a fraction of a second for the exact search)
Next I tried rounding the precision down:
WHERE ROUND( latitude, 4) = ROUND( 1.234567890123456789, 4)
AND ROUND( longitude,4) = ROUND( 9.876543210987654321, 4)
Again, same problem. Got the results I wanted, but took far too long.
So, my question is how can I search for a close match between two numbers, without losing performance?
UPDATE – SOLVED:
As a couple of commenters have observed, using BETWEEN
seems to work fine.
Best Answer
The smart and fast solution for this class of problems is an index-backed "nearest neighbor" search.
For the record: if you want precise results with spatial data use PostGis and operate with
geometry
orgeography
types. Here is a starting point. And operate withST_DWithin()
. Examples:Sticking to your setup (2-D points, no PostGis), and ignoring the additional approximation error of handling spatial data in a 2-D plain, which seems negligible for the case at hand - I suggest a space-partitioned GiST index (on an expression in your case):
Why SP-Gist? See:
Perform this hours of operation query in PostgreSQL
https://www.2ndquadrant.com/en/blog/postgresql-12-implementing-k-nearest-neighbor-space-partitioned-generalized-search-tree-indexes/
To get a maximum of 10 "nearest neighbors" in next to no time:
You can then filter the ones close enough. To get a maximum of 10 closest within a maximum distance:
db<>fiddle here
I use a squared distance to get nearest neighbors based on simple Pythagorean theorem. The beauty of it: the calculation is only performed on the nearest neighbors, so it's still very fast when the calculation gets more expensive - even in big tables.