After given up with MySQL i have tried Elasticsearch and now wan't to see if I could go with PostgreSQL/PostGIS that could allow me to go with PostgreSQL only.
I need to fetch records from a Table by distance (must not be exactly) and sort by distance. The table has 10 million records.
As I got for my Query on PostgreSQL slower speed then on MySQL I think I must do something wrong.
What I can do better?
Table:
id | hash_id | town | geo_pt2
geo_pt2 is geography
Index:
CREATE INDEX geo_pt2_gix ON public.member_profile USING gist (geo_pt2)
Query:
SELECT hash_id, town
, ST_Distance(t.x, geo_pt2) AS dist
FROM member_profile, (SELECT ST_GeographyFromText('POINT(47.4667 8.3167)')) AS t(x)
WHERE ST_DWithin(t.x, geo_pt2, 250000)
ORDER BY dist
limit 100 offset 1000;
Explain:
Limit (cost=9.08..9.08 rows=1 width=53)
-> Sort (cost=9.07..9.08 rows=1 width=53)
Sort Key: (_st_distance('0101000020E610000088855AD3BCBB474052499D8026A22040'::geography, member_profile.geo_pt2, '0'::double precision, true))
-> Index Scan using geo_pt2_gix on member_profile (cost=0.42..9.06 rows=1 width=53)
Index Cond: (geo_pt2 && '0101000020E610000088855AD3BCBB474052499D8026A22040'::geography)
Filter: (('0101000020E610000088855AD3BCBB474052499D8026A22040'::geography && _st_expand(geo_pt2, '250000'::double precision)) AND _st_dwithin('0101000020E610000088855AD3BCBB474052499D8026A22040'::geography, geo_pt2, '250000'::double precision, true))
I'm using PostgreSQL 10 on a modern Server with high IOPS (NVMe) and the Query need 35 sec.
After suggestions of @Evan Carroll much better performance:
EXPLAIN ANALYZE SELECT hash_id, town
, ST_Distance(ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2) AS dist
FROM member_profile
WHERE ST_DWithin(ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2, 250000)
ORDER BY ST_MakePoint(47.4667, 8.3167)::geography <-> geo_pt2
OFFSET 10000
FETCH NEXT 100 ROWS ONLY;
Limit (cost=9.31..18.21 rows=1 width=61) (actual time=392.608..394.138 rows=100 loops=1)
-> Index Scan using geo_pt2_gix on member_profile (cost=0.42..9.31 rows=1 width=61) (actual time=26.624..392.776 rows=10100 loops=1)
Index Cond: (geo_pt2 && '0101000020E610000088855AD3BCBB474052499D8026A22040'::geography)
Order By: (geo_pt2 <-> '0101000020E610000088855AD3BCBB474052499D8026A22040'::geography)
Filter: (('0101000020E610000088855AD3BCBB474052499D8026A22040'::geography && _st_expand(geo_pt2, '250000'::double precision)) AND _st_dwithin('0101000020E610000088855AD3BCBB474052499D8026A22040'::geography, geo_pt2, '250000'::double precision, true))
Planning time: 89.020 ms
Execution time: 395.039 ms
If the user paginate to end it got slow:
EXPLAIN ANALYZE SELECT hash_id, town
, ST_Distance(ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2) AS dist
FROM member_profile
WHERE ST_DWithin(ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2, 250000)
ORDER BY ST_MakePoint(47.4667, 8.3167)::geography <-> geo_pt2
OFFSET 1000000
FETCH NEXT 100 ROWS ONLY;
Limit (cost=9.31..18.21 rows=1 width=61) (actual time=28872.156..28873.239 rows=100 loops=1)
-> Index Scan using geo_pt2_gix on member_profile (cost=0.42..9.31 rows=1 width=61) (actual time=32.441..28764.569 rows=1000100 loops=1)
Index Cond: (geo_pt2 && '0101000020E610000088855AD3BCBB474052499D8026A22040'::geography)
Order By: (geo_pt2 <-> '0101000020E610000088855AD3BCBB474052499D8026A22040'::geography)
Filter: (('0101000020E610000088855AD3BCBB474052499D8026A22040'::geography && _st_expand(geo_pt2, '250000'::double precision)) AND _st_dwithin('0101000020E610000088855AD3BCBB474052499D8026A22040'::geography, geo_pt2, '250000'::double precision, true))
Planning time: 50.979 ms
Execution time: 28875.403 ms
Best Answer
First, use
EXPLAIN ANALYZE
(not mereEXPLAIN
) and show the result of\d
on the table. (psql). As a first point this,should be written as
ST_MakePoint(47.4667, 8.3167)::geography
Your problem here is this pattern,
Every time you do that, you must at least calculate the distance to 1100 rows. That said, it shouldn't be slow. It's slow because in order to that like that you're having to calculate the
ST_Distance
on all rows. We can stop that using KNN there using the<->
operator. MySQL doesn't support KNN.As a style critique, I personally prefer OFFSET/FETCH (the standardized method limit/offset).
Pagination
I'm not sure this will work. but, it may be worth trying (keep us updated).
So the first time you run this you save the value of last value of
myknn
then the second time you run this, you can replay that value back in this clause asOLD_VALUE
,So each time you run it you're saving the new point from which to continue at, and using
FETCH NEXT x ROWS ONLY
.myknn
anddist
may be the same for you, if so you can just remove one of them.