PostgreSQL – Subquery in WHERE Clause Causes Slow Query Performance

postgispostgresqlsubquery

I have this rather basic query that is very slow for reasons I can't figure out:

SELECT s.id 
FROM segments s
WHERE
    ST_DWithin(
        s.geom::GEOGRAPHY,
        ST_Envelope((SELECT ST_COLLECT(s2.geom) FROM segments s2 WHERE s2.id IN (407820025,  407820024,  407817407,  407817408,  407816908,  407816909,  407817413,  407817414,  407817409,  407817410,  407817405,  407817406,  407816905,  407816907,  407817412,  407817411,  407816906,  407816904,  407816764,  407816765)))::GEOGRAPHY,
        30
    );

                                                                                                                           QUERY PLAN                                                                                                                            
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on segments s  (cost=55.58..48476381.06 rows=7444984 width=4)
   Filter: st_dwithin((geom)::geography, (st_astext(st_envelope($0)))::geography, '30'::double precision)
   InitPlan 1 (returns $0)
     ->  Aggregate  (cost=55.57..55.58 rows=1 width=32)
           ->  Index Scan using segments_pkey on segments s2  (cost=0.44..55.52 rows=20 width=113)
                 Index Cond: (id = ANY ('{407820025,407820024,407817407,407817408,407816908,407816909,407817413,407817414,407817409,407817410,407817405,407817406,407816905,407816907,407817412,407817411,407816906,407816904,407816764,407816765}'::integer[]))

Where I'm really confused is that the ST_Envelope with the subquery is very fast by itself

SELECT ST_Envelope((SELECT ST_COLLECT(geom) FROM segments WHERE id IN (407820025,  407820024,  407817407,  407817408,  407816908,  407816909,  407817413,  407817414,  407817409,  407817410,  407817405,  407817406,  407816905,  407816907,  407817412,  407817411,  407816906,  407816904,  407816764,  407816765)))::GEOGRAPHY;

                                                                                                                           QUERY PLAN                                                                                                                            
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=55.58..55.60 rows=1 width=32)
   InitPlan 1 (returns $0)
     ->  Aggregate  (cost=55.57..55.58 rows=1 width=32)
           ->  Index Scan using segments_pkey on segments  (cost=0.44..55.52 rows=20 width=113)
                 Index Cond: (id = ANY ('{407820025,407820024,407817407,407817408,407816908,407816909,407817413,407817414,407817409,407817410,407817405,407817406,407816905,407816907,407817412,407817411,407816906,407816904,407816764,407816765}'::integer[]))

And so is the main query if I plug the result of the ST_Envelope

SELECT id 
FROM segments
WHERE
    st_dwithin(
        geom::geography,
        '0103000020E61000000100000005000000C87B6E0D8FB85EC04BFD8462B9C34640C87B6E0D8FB85EC0929B35C16DC44640BBF8DDA6F2B75EC0929B35C16DC44640BBF8DDA6F2B75EC04BFD8462B9C34640C87B6E0D8FB85EC04BFD8462B9C34640'::GEOGRAPHY,
        30
    );

                                                                                                                                                                                                                                                                                QUERY PLAN                                                                                                                                                                                                                                                                                
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using segments_geom_geo_idx on segments  (cost=0.42..4.82 rows=1 width=4)
   Index Cond: ((geom)::geography && '0103000020E61000000100000005000000C87B6E0D8FB85EC04BFD8462B9C34640C87B6E0D8FB85EC0929B35C16DC44640BBF8DDA6F2B75EC0929B35C16DC44640BBF8DDA6F2B75EC04BFD8462B9C34640C87B6E0D8FB85EC04BFD8462B9C34640'::geography)
   Filter: (('0103000020E61000000100000005000000C87B6E0D8FB85EC04BFD8462B9C34640C87B6E0D8FB85EC0929B35C16DC44640BBF8DDA6F2B75EC0929B35C16DC44640BBF8DDA6F2B75EC04BFD8462B9C34640C87B6E0D8FB85EC04BFD8462B9C34640'::geography && _st_expand((geom)::geography, '30'::double precision)) AND _st_dwithin((geom)::geography, '0103000020E61000000100000005000000C87B6E0D8FB85EC04BFD8462B9C34640C87B6E0D8FB85EC0929B35C16DC44640BBF8DDA6F2B75EC0929B35C16DC44640BBF8DDA6F2B75EC04BFD8462B9C34640C87B6E0D8FB85EC04BFD8462B9C34640'::geography, '30'::double precision, true))

Shouldn't Postgres compute the ST_Envelope once and then use it for the WHERE condition, effectively doing what I did manually? I also don't get why no index is used to do the Filter in the original query.

I tried putting the subquery in a CTE but that didn't solve the issue.

Best Answer

The reason is that with the constant geometry, the planner knows the value and estimates one result row, which makes an index scan a good strategy.

With the original query, the planner doesn't know the value, because it is only determined at execution time, so it guesses that there will be 7444984 result rows.

I would write two queries: one that calculates the geometry and one that uses the result as constant.