PostgreSQL – How to Improve Geo Spatial Query Performance

performancepostgispostgresqlpostgresql-10postgresql-performancespatial

After given up with MySQL i have tried Elasticsearch and now wan't to see if I could go with PostgreSQL/PostGIS that could allow me to go with PostgreSQL only.

I need to fetch records from a Table by distance (must not be exactly) and sort by distance. The table has 10 million records.

As I got for my Query on PostgreSQL slower speed then on MySQL I think I must do something wrong.

What I can do better?

Table:

id | hash_id | town | geo_pt2 

geo_pt2 is geography

Index:

CREATE INDEX geo_pt2_gix ON public.member_profile USING gist (geo_pt2)

Query:

SELECT hash_id, town
     , ST_Distance(t.x, geo_pt2) AS dist
FROM   member_profile, (SELECT ST_GeographyFromText('POINT(47.4667 8.3167)')) AS t(x)
WHERE  ST_DWithin(t.x, geo_pt2, 250000)
ORDER  BY dist
limit 100 offset 1000;

Explain:

Limit  (cost=9.08..9.08 rows=1 width=53)
  ->  Sort  (cost=9.07..9.08 rows=1 width=53)
        Sort Key: (_st_distance('0101000020E610000088855AD3BCBB474052499D8026A22040'::geography, member_profile.geo_pt2, '0'::double precision, true))
        ->  Index Scan using geo_pt2_gix on member_profile  (cost=0.42..9.06 rows=1 width=53)
              Index Cond: (geo_pt2 && '0101000020E610000088855AD3BCBB474052499D8026A22040'::geography)
              Filter: (('0101000020E610000088855AD3BCBB474052499D8026A22040'::geography && _st_expand(geo_pt2, '250000'::double precision)) AND _st_dwithin('0101000020E610000088855AD3BCBB474052499D8026A22040'::geography, geo_pt2, '250000'::double precision, true))

I'm using PostgreSQL 10 on a modern Server with high IOPS (NVMe) and the Query need 35 sec.

After suggestions of @Evan Carroll much better performance:

EXPLAIN ANALYZE SELECT hash_id, town
     , ST_Distance(ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2) AS dist
FROM   member_profile
WHERE  ST_DWithin(ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2, 250000)
ORDER  BY ST_MakePoint(47.4667, 8.3167)::geography <-> geo_pt2
OFFSET 10000
FETCH NEXT 100 ROWS ONLY;

Limit  (cost=9.31..18.21 rows=1 width=61) (actual time=392.608..394.138 rows=100 loops=1)
  ->  Index Scan using geo_pt2_gix on member_profile  (cost=0.42..9.31 rows=1 width=61) (actual time=26.624..392.776 rows=10100 loops=1)
        Index Cond: (geo_pt2 && '0101000020E610000088855AD3BCBB474052499D8026A22040'::geography)
        Order By: (geo_pt2 <-> '0101000020E610000088855AD3BCBB474052499D8026A22040'::geography)
        Filter: (('0101000020E610000088855AD3BCBB474052499D8026A22040'::geography && _st_expand(geo_pt2, '250000'::double precision)) AND _st_dwithin('0101000020E610000088855AD3BCBB474052499D8026A22040'::geography, geo_pt2, '250000'::double precision, true))
Planning time: 89.020 ms
Execution time: 395.039 ms

If the user paginate to end it got slow:

EXPLAIN ANALYZE SELECT hash_id, town
     , ST_Distance(ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2) AS dist
FROM   member_profile
WHERE  ST_DWithin(ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2, 250000)
ORDER  BY ST_MakePoint(47.4667, 8.3167)::geography <-> geo_pt2
OFFSET 1000000
FETCH NEXT 100 ROWS ONLY;

Limit  (cost=9.31..18.21 rows=1 width=61) (actual time=28872.156..28873.239 rows=100 loops=1)

->  Index Scan using geo_pt2_gix on member_profile  (cost=0.42..9.31 rows=1 width=61) (actual time=32.441..28764.569 rows=1000100 loops=1)
    Index Cond: (geo_pt2 && '0101000020E610000088855AD3BCBB474052499D8026A22040'::geography)
    Order By: (geo_pt2 <-> '0101000020E610000088855AD3BCBB474052499D8026A22040'::geography)
    Filter: (('0101000020E610000088855AD3BCBB474052499D8026A22040'::geography && _st_expand(geo_pt2, '250000'::double precision)) AND _st_dwithin('0101000020E610000088855AD3BCBB474052499D8026A22040'::geography, geo_pt2, '250000'::double precision, true))
Planning time: 50.979 ms
Execution time: 28875.403 ms

Best Answer

First, use EXPLAIN ANALYZE (not mere EXPLAIN) and show the result of \d on the table. (psql). As a first point this,

ST_GeographyFromText('POINT(47.4667 8.3167)')

should be written as ST_MakePoint(47.4667, 8.3167)::geography

Your problem here is this pattern,

SELECT ST_Distance( ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2) AS dist
...
ORDER  BY dist
LIMIT 100 OFFSET 1000;

Every time you do that, you must at least calculate the distance to 1100 rows. That said, it shouldn't be slow. It's slow because in order to that like that you're having to calculate the ST_Distance on all rows. We can stop that using KNN there using the <-> operator. MySQL doesn't support KNN.

SELECT hash_id, town
     , ST_Distance(ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2) AS dist
FROM   member_profile
WHERE  ST_DWithin(ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2, 250000)
ORDER  BY ST_MakePoint(47.4667, 8.3167)::geography <-> geo_pt2
OFFSET 1000
FETCH NEXT 100 ROWS ONLY;

As a style critique, I personally prefer OFFSET/FETCH (the standardized method limit/offset).

Pagination

I'm not sure this will work. but, it may be worth trying (keep us updated).

SELECT hash_id, town
     , ST_Distance(ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2) AS dist
     , ST_MakePoint(47.4667, 8.3167)::geography <-> geo_pt2 AS myknn
FROM member_profile
WHERE ST_DWithin(ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2, 250000)
  AND ST_MakePoint(47.4667, 8.3167)::geography <-> geo_pt2 > OLD_VALUE
ORDER BY ST_MakePoint(47.4667, 8.3167)::geography <-> geo_pt2
FETCH NEXT 100 ROWS ONLY;

So the first time you run this you save the value of last value of myknn then the second time you run this, you can replay that value back in this clause as OLD_VALUE,

AND ST_MakePoint(47.4667, 8.3167)::geography <-> geo_pt2 > OLD_VALUE

So each time you run it you're saving the new point from which to continue at, and using FETCH NEXT x ROWS ONLY.

myknn and dist may be the same for you, if so you can just remove one of them.