Postgresql subquery speed much slower than individual queries

join;performancepostgispostgresqlpostgresql-performance

In Postgres 9.1. I am attempting to do a single query to get the timezone of a given postal code. I have data loaded in my database, and I am using PostGIS to store the coordinates of each postal code. Here are the two queries that I need to do in order to get the data I need:

SELECT coord FROM public.postal_code WHERE postal_code = 'T1K0T4' LIMIT 1

query time: 6ms

This gives me the coordinate of the center of the postal_code area. Then I use this coordinate to find which timezone it intersects:

SELECT *
FROM public.timezones as tz
WHERE ST_Intersects(ST_GeomFromText('POINT(-112 49)',4326),geom)

query time: 36ms

Now when I combine the 2 queries, the query time jumps to 7-8 seconds. This is my query:

SELECT *
FROM public.timezones as tz
WHERE ST_Intersects((SELECT coord FROM taduler.postal_code
                     WHERE postal_code = 'T1K0T4' LIMIT 1),geom)

I have a spatial index on the coord column in the postal_code table, and also on the geom column in the timezone table, but it seems like it isn't being used for the subquery.

Does anyone know of a better way to optimize this query? I have tried several variations of this query, like joining the tables and such, but they have all resulted in the same query speed.

Output from EXPLAIN ANALYZE:

Seq Scan on timezones tz  (cost=8.37..167.47 rows=136 width=335547) (actual time=4606.136..7274.428 rows=1 loops=1)
  Filter: st_intersects($0, (geom)::geography)
  InitPlan 1 (returns $0)
    ->  Limit  (cost=0.00..8.37 rows=1 width=128) (actual time=0.011..0.011 rows=1 loops=1)
          ->  Index Scan using postal_code_idx on postal_code  (cost=0.00..8.37 rows=1 width=128) (actual time=0.010..0.010 rows=1 loops=1)
                Index Cond: ((postal_code)::text = 'T1K0T4'::text)
Total runtime: 7274.448 ms

EXPLAIN ANALYZE for the following query:

SELECT *
FROM public.timezones as tz
JOIN taduler.postal_code as pc on ST_Intersects(pc.coord, tz.geom)
WHERE pc.postal_code = 'T1K0T4'

Output:

Nested Loop  (cost=0.00..174.61 rows=1 width=335714) (actual time=4870.908..7572.723 rows=1 loops=1)
  Join Filter: ((pc.coord && (tz.geom)::geography) AND (_st_distance(pc.coord, (tz.geom)::geography, 0::double precision, false) < 1e-05::double precision))
  ->  Index Scan using postal_code_idx on postal_code pc  (cost=0.00..8.37 rows=1 width=167) (actual time=0.012..0.019 rows=1 loops=1)
        Index Cond: ((postal_code)::text = 'T1K0T4'::text)
  ->  Seq Scan on timezones tz  (cost=0.00..56.08 rows=408 width=335547) (actual time=0.002..2.795 rows=408 loops=1)
Total runtime: 7572.787 ms

Best Answer

Seems you are running in a weakness of the query planner: The best index is sometimes not used for joining tables. Had a similar problem here:
Algorithm for finding the longest prefix (Chapter "Failed attempt with text_pattern_ops")

In Postgres 9.3 You could try this version with LEFT JOIN LATERAL:

SELECT *
FROM  (
    SELECT coord
    FROM   taduler.postal_code
    WHERE  postal_code = 'T1K0T4'
    LIMIT  1
    ) pc
LEFT JOIN LATERAL (
    SELECT *
    FROM   public.timezones tz
    WHERE  ST_Intersects(pc.coord, tz.geom)
   ) tz ON TRUE;

Something similar Worked for @ypercube's solution in this related answer.
LATERAL requires Postgres 9.3+, though.

In PostgreSQL 9.1, it might help to encapsulate the first query in a CTE, but I doubt it. (Don't have a PostGis installation here to test.):

WITH pc AS (
    SELECT coord
    FROM   taduler.postal_code
    WHERE  postal_code = 'T1K0T4'
    LIMIT  1
    )
SELECT *
FROM   pc
JOIN   public.timezones tz ON ST_Intersects(pc.coord, tz.geom);

A plpgsql function to encapsulate two separate queries should certainly do the trick:

CREATE OR REPLACE FUNCTION f_get_tz(_pc text)
  RETURNS SETOF public.timezones AS
$func$
DECLARE
   _coord geom;
BEGIN

SELECT coord
INTO  _coord
FROM   taduler.postal_code
WHERE  postal_code = _pc
LIMIT  1;

RETURN QUERY
SELECT *
FROM   public.timezones tz
WHERE  ST_Intersects(_coord, tz.geom);

END
$func$  LANGUAGE plpgsql;

Call:

SELECT * FROM f_get_tz('T1K0T4');