Postgresql – Query planner enerates different plans in psql and application for the same query

execution-planoptimizationpostgresqlpostgresql-11query-performance

I am optimising an application on a development server; the database is being dropped and restored from time to time (always the same database and data) and the caches flushed.

I want to identify rows in a parent table which have rows in a child table.

The parent table has about 5000 rows, the child table about 1.8 million. 1795 rows satisfy the criteria.

-- parent
           Column            |            Type             | Collation | Nullable |                  Default                   
-----------------------------+-----------------------------+-----------+----------+--------------------------------------------
 id                          | integer                     |           | not null | nextval('parent_id_seq'::regclass)



Indexes:
    "parent_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
    TABLE "child" CONSTRAINT "child_parent_id_fkey" FOREIGN KEY (parent_id) REFERENCES parent(id) ON DELETE RESTRICT


-- child
      Column       |            Type             | Collation | Nullable |                 Default                 
-------------------+-----------------------------+-----------+----------+-----------------------------------------
 id                | integer                     |           | not null | nextval('child_id_seq'::regclass)
 parent_id         | integer                     |           | not null | 
Indexes:
    "child_pkey" PRIMARY KEY, btree (id)
    "child_parent_id_index" btree (parent_id)
Foreign-key constraints:
    "child_parent_id_fkey" FOREIGN KEY (parent_id) REFERENCES parent(id) ON DELETE RESTRICT

I wrote a query, an the planner reports an execution time of 13ms:

# EXPLAIN ANALYZE SELECT p.id FROM parent p WHERE EXISTS (SELECT 1 FROM child c WHERE c.parent_id = p.id);
                                                                           QUERY PLAN                                                                            
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Semi Join  (cost=0.43..2299.35 rows=69 width=4) (actual time=0.193..13.188 rows=1795 loops=1)
   ->  Seq Scan on parent p  (cost=0.00..178.50 rows=4750 width=4) (actual time=0.008..0.715 rows=4750 loops=1)
   ->  Index Only Scan using child_parent_id_index on child c  (cost=0.43..487.99 rows=26447 width=4) (actual time=0.002..0.002 rows=0 loops=4750)
         Index Cond: (parent_id = p.id)
         Heap Fetches: 1795
 Planning Time: 1.197 ms
 Execution Time: 13.355 ms
(7 rows)

Adding an ORDER BY clause results in similar performance (16ms):

# EXPLAIN ANALYZE SELECT p.id FROM parent p WHERE EXISTS (SELECT 1 FROM child c WHERE c.parent_id = p.id) ORDER BY p.id;
                                                                              QUERY PLAN                                                                               
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=2301.45..2301.63 rows=69 width=4) (actual time=15.915..15.996 rows=1795 loops=1)
   Sort Key: p.id
   Sort Method: quicksort  Memory: 133kB
   ->  Nested Loop Semi Join  (cost=0.43..2299.35 rows=69 width=4) (actual time=0.181..15.191 rows=1795 loops=1)
         ->  Seq Scan on parent p  (cost=0.00..178.50 rows=4750 width=4) (actual time=0.018..0.729 rows=4750 loops=1)
         ->  Index Only Scan using child_parent_id_index on child c  (cost=0.43..487.99 rows=26447 width=4) (actual time=0.003..0.003 rows=0 loops=4750)
               Index Cond: (parent_id = p.id)
               Heap Fetches: 1795
 Planning Time: 1.870 ms
 Execution Time: 16.161 ms
(10 rows)

However, when I change application code to run either version of the query, execution times average 306ms over 1600 executions*, even if I try to "prime" the planner by executing the query in psql beforehand.

auto_explain logs this plan while the application is running (I'm assuming it's representative):

LOG:  duration: 451.723 ms  plan:
    Query Text: SELECT "parent"."id" FROM "parent"
                    WHERE EXISTS (SELECT 1 FROM child
                                    WHERE "child"."parent_id" = "parent"."id")
                    ORDER BY "parent"."id"

    Sort  (cost=47844.13..47844.30 rows=69 width=4) (actual time=451.327..451.433 rows=1796 loops=1)
      Sort Key: parent.id
      Sort Method: quicksort  Memory: 133kB
      Buffers: shared hit=8518 read=24207
      ->  Nested Loop  (cost=47271.56..47842.02 rows=69 width=4) (actual time=442.385..450.911 rows=1796 loops=1)
            Buffers: shared hit=8518 read=24207
            ->  HashAggregate  (cost=47271.14..47271.83 rows=69 width=4) (actual time=442.355..442.716 rows=1796 loops=1)
                  Group Key: child.parent_id
                  Buffers: shared hit=212 read=24207
                  ->  Seq Scan on child  (cost=0.00..42700.71 rows=1828171 width=4) (actual time=0.038..186.566 rows=1817908 loops=1)
                        Buffers: shared hit=212 read=24207
            ->  Index Only Scan using parent_pkey on parent  (cost=0.42..8.26 rows=1 width=4) (actual time=0.004..0.004 rows=1 loops=1796)
                  Index Cond: (id = child.parent_id)
                  Heap Fetches: 3234
                  Buffers: shared hit=8306

Why am I seeing such a difference between the plan generated by the planner in psql and that generated at runtime? (And how can I persuade postgres to select a better plan?)

SELECT version();
PostgreSQL 11.7 on x86_64-redhat-linux-gnu, compiled by gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1), 64-bit

Update

The values of child.parent_id are overwhelmingly skewed to a single value: 1.7M have value 9, none of the other values occur more than 10000 times.

pg_stats for child.parent_id looks like this (immediately after a restore):

+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
schemaname             | public
tablename              | child
attname                | parent_id
inherited              | f
null_frac              | 0
avg_width              | 4
n_distinct             | 62
most_common_vals       | {9,7895,7891,7893,7885,7907,9042,7903,7884,7902,7892,7894,7886,7899,7887,7898,9041,49,7906,45,7901,44}
most_common_freqs      | {0.968433,0.00366667,0.00343333,0.0031,0.0028,0.00173333,0.0016,0.00143333,0.00126667,0.00123333,0.00116667,0.0011,0.000966667,0.000933333,0.0009,0.0007,0.0006,0.000533333,0.000533333,0.0005,0.0004,0.0003}
histogram_bounds       | {5,5,5,8,20,42,42,42,43,46,47,47,48,48,48,48,3680,3975,4118,4367,4902,5236,5332,5793,6142,6421,6980,7272,8863,9006,9006,9006,9007,9007,9010,9010,9010,9010,9014,9035}
correlation            | 0.929476
most_common_elems      | 
most_common_elem_freqs | 
elem_count_histogram   | 

I've tried these changes, but they have made no difference to the plan selection:

  • drop all other dbs except postgres, template0, template1
  • increase RAM for buffers
  • VACUUM FULL ANALYZE for both tables
  • REINDEX TABLE child
  • ALTER TABLE child ALTER COLUMN parent_id SET STATISTICS 10000
  • SELECT pg_stat_reset()
  • redorder WHERE operands
  • changing random_page_cost to 1.0 (storage is SSD)

The planner generates the desired plan if HashAggregates are disabled.

Changing ndistinct to something more like the actual number of unique parent_ids tends to make the planner generate the slower plan in the console.

* As reported by pgbadger, slow queries > 100ms being logged

Best Answer

Eventually I found this Q&A on StackOverflow, asserting that adding OFFSET = 0 to a subquery prevents the query planner from inlining the subquery. Applying this change made the planner consistently* generate an efficient query in psql and in the application.

EXPLAIN (ANALYZE, BUFFERS) SELECT "parent"."id" FROM "parent"
                        WHERE EXISTS
                         (SELECT 1 FROM child
                            WHERE "child"."parent_id" = "parent"."id" OFFSET 0);
                                                                            QUERY PLAN                                                                            
------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on parent  (cost=0.00..2739.35 rows=2375 width=4) (actual time=0.185..16.813 rows=1795 loops=1)
   Filter: (SubPlan 1)
   Rows Removed by Filter: 2955
   Buffers: shared hit=16342
   SubPlan 1
     ->  Index Only Scan using child_parent_id_index on child  (cost=0.43..3533.66 rows=31652 width=4) (actual time=0.003..0.003 rows=0 loops=4750)
           Index Cond: (parent_id = parent.id)
           Heap Fetches: 2025
           Buffers: shared hit=16211
 Planning Time: 0.328 ms
 Execution Time: 16.988 ms
(11 rows)

This is an effective workaround, but somewhat unsatisfactory because:

  • we're preventing the planner from selecting a more efficient plan, if one exists
  • we still don't understand why the planner generated different plans in psql and the application

*The log reported 33 slow queries, which is an acceptable reduction from 1600.