PostgreSQL – Finding Nearest Geo Points Across Two Tables

postgispostgresqlspatial

I have two tables, both have lat/lon columns within same country (UK). Tables sizes are roughly 80M and 50M.

Along with the lat/lon columns I have created a geo-indexes for both tables in this way:

SELECT AddGeometryColumn('my_table_50/80', 'geom', 4326, 'POINT', 2);
UPDATE my_table SET geom = ST_SetSRID(ST_MakePoint(longitude, latitude), 4326);
CREATE INDEX my_table_geom_idx ON my_table USING gist(geom);

To find the nearest points from the table with 80M respect to 50M within 0.1 miles I run something like:

   SELECT A.latitude, A.longitude, B.latitude, B.longitude,
   FROM my_table_50 AS A, my_table_80 AS B
   where ST_Distance(A.geom, B.geom) < 0.1609 -- 1 mile / 10
   ORDER BY ST_Distance(A.geom, B.geom) ASC LIMIT 1;

The query is very slow to run (pretty much is cartesian 50M X 80M).

Is there a way to speed it up?

Also, is it really useful to use "postgis geo indexing" for a problem like this?
Using "pythagorean theorem" can be enough (like in the chosen answer here https://stackoverflow.com/questions/1664799/calculating-distance-between-two-points-using-pythagorean-theorem), as I am expecting distances way far shortest than earth radius or it may lead to some error?

Best Answer

Don't use ST_Distance. It will never use an index. Instead use KNN distance with <->, and use ST_DWithin when possible.

To find the nearest points from the table with 80M respect to 50M within 0.1 miles I run something like:

To find the nearest point, you can do..

SELECT A.latitude, A.longitude, B.latitude, B.longitude,
FROM my_table_50 AS A, my_table_80 AS B
ORDER BY A.geom <-> B.geom
LIMIT 1

To find the nearest point within one mile,

SELECT A.latitude, A.longitude, B.latitude, B.longitude,
FROM my_table_50 AS A, my_table_80 AS B
WHERE ST_DWithin(A.geom, B.geom, 1609.34)
ORDER BY A.geom <-> B.geom
LIMIT 1

Also, unless your postgis is old don't ever do

SELECT AddGeometryColumn('my_table_50/80', 'geom', 4326, 'POINT', 2);

From the docs,

Changed: 2.0.0 This function no longer updates geometry_columns since geometry_columns is a view that reads from system catalogs. It by default also does not create constraints, but instead uses the built in type modifier behavior of PostgreSQL. So for example building a wgs84 POINT column with this function is now equivalent to: ALTER TABLE some_table ADD COLUMN geom geometry(Point,4326);

Make sure in the above you have a spatial/gist index on A.geom and B.geom. And, consider clustering both of them on those indexes.