Postgresql Query plan different on two servers

execution-planoptimizationperformancepostgresqlpostgresql-11query-performance

I'm having some strange issue of running the same query between prod and dev and receiving a bit of a different query plan and I'm not sure why. I'll try to cover everything I can, but if I forgot something of importance please let me know.

Postgresql: 11.6

GCE Prod specs:

  • CPUS: 6
  • RAM: 64gigs
  • Storage: SSD

GCE Dev specs:

  • CPUS: 4
  • RAM: 24gigs
  • Storage: HDD

Prod configuration:

shared_buffers = 16GB
seq_page_cost = 1.0
random_page_cost = 3.0
effective_cache_size = 48GB
work_mem = 512MB
maintenance_work_mem = 2GB
effective_io_concurrency = 200

dev configuration:

shared_buffers = 3840MB
seq_page_cost = 1.0
random_page_cost = 4.0
effective_cache_size = 11520MB
work_mem = 512MB
maintenance_work_mem = 960MB
effective_io_concurrency = 4

Table Definitions

                                                Table "api.ms1_peak"
        Column         |       Type       | Collation | Nullable |                      Default                      
-----------------------+------------------+-----------+----------+---------------------------------------------------
 ms1_peak_id           | integer          |           | not null | nextval('api.ms1_peak_ms1_peak_id_seq'::regclass)
 mzml_id               | integer          |           |          | 
 charge                | integer          |           |          | 
 mz                    | double precision |           |          | 
 adduct_m              | double precision |           |          | 
 rt_corrected          | double precision |           |          | 
Indexes:
    "ms1_peak_pkey" PRIMARY KEY, btree (ms1_peak_id)
    "ms1_peak_ms1_peak_id_idx" btree (ms1_peak_id) INCLUDE (mz, rt_corrected)
    "ms1_peak_mz_rt_ms1_peak_id_idx" btree (mz, rt_corrected) INCLUDE (ms1_peak_id, charge)
    "ms1_peak_mzml_idx" btree (mzml_id)
Foreign-key constraints:
    "ms1_peak_mzml_id_fkey" FOREIGN KEY (mzml_id) REFERENCES api.mzml(mzml_id) ON DELETE CASCADE
                                                      Table "api.molecular_structure"
         Column         |       Type       | Collation | Nullable |                                 Default
------------------------+------------------+-----------+----------+-------------------------------------------------------------------------
 molecular_structure_id | integer          |           | not null | nextval('api.molecular_structure_molecular_structure_id_seq'::regclass)
 monoisotopic_mass      | double precision |           |          |
 mol                    | mol              |           | not null |
Indexes:
    "molecular_structure_pkey" PRIMARY KEY, btree (molecular_structure_id)
    "molecular_structure_unique_idx" UNIQUE, btree (md5(mol::text))
    "molecular_structure_monoisotopic_mass_idx" btree (monoisotopic_mass, molecular_structure_id)
Referenced by:
    TABLE "api.smallmolecule_fragment" CONSTRAINT "smallmolecule_molecular_structure_id_fkey" FOREIGN KEY (molecular_structure_id) REFERENCES api.molecular_structure(molecular_structure_id) ON DELETE CASCADE
    TABLE "api.smallmolecule" CONSTRAINT "smallmolecule_molecular_structure_id_fkey" FOREIGN KEY (molecular_structure_id) REFERENCES api.molecular_structure(molecular_structure_id) ON DELETE SET NULL
                                                       Table "api.smallmolecule"
         Column         |           Type           | Collation | Nullable |                           Default
------------------------+--------------------------+-----------+----------+-------------------------------------------------------------
 smallmolecule_id       | integer                  |           | not null | nextval('api.smallmolecule_smallmolecule_id_seq'::regclass)
 created_date           | timestamp with time zone |           |          | now()
 original_smiles        | text                     |           |          |
 reference              | json                     |           |          |
 names                  | jsonb                    |           |          |
 cas                    | text                     |           |          |
 names_tsvector         | tsvector                 |           |          |
 molecular_structure_id | integer                  |           |          |
Indexes:
    "smallmolecule_pkey" PRIMARY KEY, btree (smallmolecule_id)
    "smallmolecule_molecular_structure_id_idx" btree (molecular_structure_id, smallmolecule_id)
    "smallmolecule_names_idx" gist (names_tsvector)
Foreign-key constraints:
    "smallmolecule_molecular_structure_id_fkey" FOREIGN KEY (molecular_structure_id) REFERENCES api.molecular_structure(molecular_structure_id) ON DELETE SET NULL
    "smallmolecule_organization_id_fkey" FOREIGN KEY (organization_id) REFERENCES api.organization(organization_id) ON DELETE SET NULL

The query in question

select p.ms1_peak_id,
  (
    select coalesce(json_agg(sm.smallmolecule_id), '[]') from api.molecular_structure as ms
    join api.smallmolecule sm using(molecular_structure_id)
    where ms.monoisotopic_mass BETWEEN  p.adduct_m - public.mz_tol(p.adduct_m, 5) AND
            p.adduct_m + public.mz_tol(p.adduct_m, 5)
  ) as smallmolecules
from api.ms1_peak as p
where p.mzml_id = 20634
and adduct_m is not null
order by ms1_peak_id asc;

On the dev server, here is the query plan

 Sort  (cost=94236607.88..94236610.32 rows=973 width=36) (actual time=173.075..173.144 rows=1082 loops=1)
   Output: p.ms1_peak_id, ((SubPlan 1))
   Sort Key: p.ms1_peak_id
   Sort Method: quicksort  Memory: 106kB
   Buffers: shared hit=58597
   ->  Bitmap Heap Scan on api.ms1_peak p  (cost=39.56..94236559.59 rows=973 width=36) (actual time=17.735..172.456 rows=1082 loops=1)
         Output: p.ms1_peak_id, (SubPlan 1)
         Recheck Cond: (p.mzml_id = 20634)
         Filter: (p.adduct_m IS NOT NULL)
         Rows Removed by Filter: 1318
         Heap Blocks: exact=399
         Buffers: shared hit=58597
         ->  Bitmap Index Scan on ms1_peak_mzml_idx  (cost=0.00..39.32 rows=1967 width=0) (actual time=0.180..0.180 rows=2400 loops=1)
               Index Cond: (p.mzml_id = 20634)
               Buffers: shared hit=12
         SubPlan 1
           ->  Aggregate  (cost=96843.52..96843.53 rows=1 width=32) (actual time=0.157..0.157 rows=1 loops=1082)
                 Output: COALESCE(json_agg(sm.smallmolecule_id), '[]'::json)
                 Buffers: shared hit=58186
                 ->  Hash Join  (cost=6062.96..96842.90 rows=246 width=4) (actual time=0.136..0.156 rows=1 loops=1082)
                       Output: sm.smallmolecule_id
                       Hash Cond: (ms.molecular_structure_id = sm.molecular_structure_id)
                       Buffers: shared hit=58186
                       ->  Bitmap Heap Scan on api.molecular_structure ms  (cost=816.07..91096.68 rows=26500 width=4) (actual time=0.019..0.111 rows=89 loops=1082)
                             Output: ms.molecular_structure_id, ms.molecular_weight, ms.monoisotopic_mass, ms.molecular_formula, ms.flat_smiles, ms.mol, ms.ecfp6, ms.fcfp6
                             Recheck Cond: ((ms.monoisotopic_mass >= (p.adduct_m - (('5'::double precision * p.adduct_m) / '1000000'::double precision))) AND (ms.monoisotopic_mass <= (p.adduct_m + (('5'::double precision * p.adduct_m) / '1000000'::double precision))))
                             Heap Blocks: exact=54237
                             Buffers: shared hit=57961
                             ->  Bitmap Index Scan on molecular_structure_monoisotopic_mass_idx  (cost=0.00..809.45 rows=26500 width=0) (actual time=0.012..0.012 rows=89 loops=1082)
                                   Index Cond: ((ms.monoisotopic_mass >= (p.adduct_m - (('5'::double precision * p.adduct_m) / '1000000'::double precision))) AND (ms.monoisotopic_mass <= (p.adduct_m + (('5'::double precision * p.adduct_m) / '1000000'::double precision))))
                                   Buffers: shared hit=3724
                       ->  Hash  (cost=4633.03..4633.03 rows=49108 width=8) (actual time=16.781..16.782 rows=49102 loops=1)
                             Output: sm.smallmolecule_id, sm.molecular_structure_id
                             Buckets: 65536  Batches: 1  Memory Usage: 2431kB
                             Buffers: shared hit=225
                             ->  Index Only Scan using smallmolecule_molecular_structure_id_idx on api.smallmolecule sm  (cost=0.41..4633.03 rows=49108 width=8) (actual time=0.030..7.495 rows=49108 loops=1)
                                   Output: sm.smallmolecule_id, sm.molecular_structure_id
                                   Heap Fetches: 0
                                   Buffers: shared hit=225
 Planning Time: 0.440 ms
 Execution Time: 173.291 ms

On Production

 Sort  (cost=3158406.45..3158408.86 rows=965 width=36) (actual time=10773.364..10773.433 rows=1082 loops=1)
   Output: p.ms1_peak_id, ((SubPlan 1))
   Sort Key: p.ms1_peak_id
   Sort Method: quicksort  Memory: 106kB
   Buffers: shared hit=358573
   ->  Index Scan using ms1_peak_mzml_idx on api.ms1_peak p  (cost=0.56..3158358.61 rows=965 width=36) (actual time=14.920..10770.308 rows=1082 loops=1)
         Output: p.ms1_peak_id, (SubPlan 1)
         Index Cond: (p.mzml_id = 20634)
         Filter: (p.adduct_m IS NOT NULL)
         Rows Removed by Filter: 1318
         Buffers: shared hit=358573
         SubPlan 1
           ->  Aggregate  (cost=3266.75..3266.76 rows=1 width=32) (actual time=9.924..9.924 rows=1 loops=1082)
                 Output: COALESCE(json_agg(sm.smallmolecule_id), '[]'::json)
                 Buffers: shared hit=357961
                 ->  Hash Join  (cost=1164.60..3266.13 rows=246 width=4) (actual time=8.142..9.913 rows=1 loops=1082)
                       Output: sm.smallmolecule_id
                       Inner Unique: true
                       Hash Cond: (sm.molecular_structure_id = ms.molecular_structure_id)
                       Buffers: shared hit=357961
                       ->  Index Only Scan using smallmolecule_molecular_structure_id_idx on api.smallmolecule sm  (cost=0.41..1973.03 rows=49108 width=8) (actual time=0.009..5.957 rows=49108 loops=1017)
                             Output: sm.molecular_structure_id, sm.smallmolecule_id
                             Heap Fetches: 0
                             Buffers: shared hit=322390
                       ->  Hash  (cost=834.29..834.29 rows=26392 width=4) (actual time=0.087..0.087 rows=89 loops=1082)
                             Output: ms.molecular_structure_id
                             Buckets: 32768  Batches: 1  Memory Usage: 262kB
                             Buffers: shared hit=35571
                             ->  Index Only Scan using molecular_structure_monoisotopic_mass_idx on api.molecular_structure ms  (cost=0.45..834.29 rows=26392 width=4) (actual time=0.032..0.071 rows=89 loops=1082)
                                   Output: ms.molecular_structure_id
                                   Index Cond: ((ms.monoisotopic_mass >= (p.adduct_m - (('5'::double precision * p.adduct_m) / '1000000'::double precision))) AND (ms.monoisotopic_mass <= (p.adduct_m + (('5'::double precision * p.adduct_m) / '1000000'::double precision))))
                                   Heap Fetches: 0
                                   Buffers: shared hit=35571
 Planning Time: 2.329 ms
 Execution Time: 10773.584 ms

The interesting thing is on dev I get a different query plan. It generally is using bitmap scans, while on production it is using index/index only scans.

Now, I can get the dev server to output a closer query plan to prod if I disable bitmapscans forcing more of an index only route

Here is the dev output after setting set enable_bitmapscan = off;

 Sort  (cost=107687464.30..107687466.74 rows=973 width=36) (actual time=166.059..166.129 rows=1082 loops=1)
   Output: p.ms1_peak_id, ((SubPlan 1))
   Sort Key: p.ms1_peak_id
   Sort Method: quicksort  Memory: 106kB
   Buffers: shared hit=58988
   ->  Index Scan using ms1_peak_mzml_idx on api.ms1_peak p  (cost=0.56..107687416.01 rows=973 width=36) (actual time=17.591..165.425 rows=1082 loops=1)
         Output: p.ms1_peak_id, (SubPlan 1)
         Index Cond: (p.mzml_id = 20634)
         Filter: (p.adduct_m IS NOT NULL)
         Rows Removed by Filter: 1318
         Buffers: shared hit=58988
         SubPlan 1
           ->  Aggregate  (cost=110667.53..110667.54 rows=1 width=32) (actual time=0.150..0.150 rows=1 loops=1082)
                 Output: COALESCE(json_agg(sm.smallmolecule_id), '[]'::json)
                 Buffers: shared hit=58376
                 ->  Hash Join  (cost=5247.33..110666.91 rows=246 width=4) (actual time=0.123..0.149 rows=1 loops=1082)
                       Output: sm.smallmolecule_id
                       Hash Cond: (ms.molecular_structure_id = sm.molecular_structure_id)
                       Buffers: shared hit=58376
                       ->  Index Only Scan using molecular_structure_monoisotopic_mass_idx on api.molecular_structure ms  (cost=0.45..104920.69 rows=26500 width=4) (actual time=0.010..0.106 rows=89 loops=1082)
                             Output: ms.monoisotopic_mass, ms.molecular_structure_id
                             Index Cond: ((ms.monoisotopic_mass >= (p.adduct_m - (('5'::double precision * p.adduct_m) / '1000000'::double precision))) AND (ms.monoisotopic_mass <= (p.adduct_m + (('5'::double precision * p.adduct_m) / '1000000'::double precision))))
                             Heap Fetches: 96251
                             Buffers: shared hit=58151
                       ->  Hash  (cost=4633.03..4633.03 rows=49108 width=8) (actual time=16.941..16.941 rows=49102 loops=1)
                             Output: sm.smallmolecule_id, sm.molecular_structure_id
                             Buckets: 65536  Batches: 1  Memory Usage: 2431kB
                             Buffers: shared hit=225
                             ->  Index Only Scan using smallmolecule_molecular_structure_id_idx on api.smallmolecule sm  (cost=0.41..4633.03 rows=49108 width=8) (actual time=0.027..7.519 rows=49108 loops=1)
                                   Output: sm.smallmolecule_id, sm.molecular_structure_id
                                   Heap Fetches: 0
                                   Buffers: shared hit=225

This query is a bit closer to production, and still faster (both cases on dev are faster than prod) However the query plan between the two is still slightly different. Notice the prod query plan uses an inner unique: true while places the index on 'molecular_structure_monoisotopix_mass_idx` at the bottom and loops 1082 times while on dev, this index usage is higher up in the explanation while not using an inner unique: true

Putting these query plans through explain.depesz breaks it down and shows that the production query planner takes 6s on the index only scan on api.smallmolecule, and 4 seconds on the hash join.

While on dev, the index only scan on api.smallmolecule is 8ms and the hash join is 28ms.

The data on molecular_structure and smallmolecule on dev and production is the same. Both are vacuumed and analyzed. So the dataset for these tables is the same.

about 5.2million rows on molecular_structure and 49000 on smallmolecule.

Playing around with configuration tweaks, I can manage to get the dev server to perform and output the exact same plan, when I set random_page_cost = 2.0. This produces the exact same plan. So I'm starting to think this is definitely a tuning issue between the two. Tweaking with random_page_cost on prod seems to have no affect for this particular query.

These servers are on Google Cloud Engine, prod uses a persistent SSD so i'm aware tuning certain values may not be exact with dedicated SSD setups while dev uses an HDD.

Any ideas on how to tune or debug this on production to get better results would be great. Anything I'm missing let me know.

Edit:

Vacuum Analyze has been ran on production, this is new data so there are no new inserts/updates/deletes so I dont believe index fragmentation is the problem.

On production if I disable index scans set enable_indexscans = 'off'
I get bitmap heap scan plans and 180ms matching the performance of the dev machine.

 Sort  (cost=70758692.10..70758694.51 rows=966 width=36) (actual time=179.621..179.692 rows=1082 loops=1)
   Output: p.ms1_peak_id, ((SubPlan 1))
   Sort Key: p.ms1_peak_id
   Sort Method: quicksort  Memory: 106kB
   Buffers: shared hit=60536
   ->  Bitmap Heap Scan on api.ms1_peak p  (cost=42.52..70758644.20 rows=966 width=36) (actual time=29.350..179.233 rows=1082 loops=1)
         Output: p.ms1_peak_id, (SubPlan 1)
         Recheck Cond: (p.mzml_id = 20634)
         Filter: (p.adduct_m IS NOT NULL)
         Rows Removed by Filter: 1318
         Heap Blocks: exact=399
         Buffers: shared hit=60536
         ->  Bitmap Index Scan on ms1_peak_mzml_idx  (cost=0.00..42.28 rows=1962 width=0) (actual time=0.169..0.169 rows=2400 loops=1)
               Index Cond: (p.mzml_id = 20634)
               Buffers: shared hit=12
         SubPlan 1
           ->  Aggregate  (cost=73243.01..73243.02 rows=1 width=32) (actual time=0.163..0.163 rows=1 loops=1082)
                 Output: COALESCE(json_agg(sm.smallmolecule_id), '[]'::json)
                 Buffers: shared hit=60125
                 ->  Hash Join  (cost=3970.76..73242.40 rows=246 width=4) (actual time=0.145..0.162 rows=1 loops=1082)
                       Output: sm.smallmolecule_id
                       Hash Cond: (ms.molecular_structure_id = sm.molecular_structure_id)
                       Buffers: shared hit=60125
                       ->  Bitmap Heap Scan on api.molecular_structure ms  (cost=576.97..69450.26 rows=26392 width=4) (actual time=0.019..0.106 rows=89 loops=1082)
                             Output: ms.molecular_structure_id, ms.molecular_weight, ms.monoisotopic_mass, ms.molecular_formula, ms.flat_smiles, ms.mol, ms.ecfp6, ms.fcfp6
                             Recheck Cond: ((ms.monoisotopic_mass >= (p.adduct_m - (('5'::double precision * p.adduct_m) / '1000000'::double precision))) AND (ms.monoisotopic_mass <= (p.adduct_m + (('5'::double precision * p.adduct_m) / '1000000'::double precision))))
                             Heap Blocks: exact=54237
                             Buffers: shared hit=57836
                             ->  Bitmap Index Scan on molecular_structure_monoisotopic_mass_idx  (cost=0.00..570.37 rows=26392 width=0) (actual time=0.012..0.012 rows=89 loops=1082)
                                   Index Cond: ((ms.monoisotopic_mass >= (p.adduct_m - (('5'::double precision * p.adduct_m) / '1000000'::double precision))) AND (ms.monoisotopic_mass <= (p.adduct_m + (('5'::double precision * p.adduct_m) / '1000000'::double precision))))
                                   Buffers: shared hit=3599
                       ->  Hash  (cost=2780.02..2780.02 rows=49102 width=8) (actual time=28.821..28.822 rows=49102 loops=1)
                             Output: sm.smallmolecule_id, sm.molecular_structure_id
                             Buckets: 65536  Batches: 1  Memory Usage: 2431kB
                             Buffers: shared hit=2289
                             ->  Seq Scan on api.smallmolecule sm  (cost=0.00..2780.02 rows=49102 width=8) (actual time=0.007..19.592 rows=49108 loops=1)
                                   Output: sm.smallmolecule_id, sm.molecular_structure_id
                                   Buffers: shared hit=2289
 Planning Time: 0.461 ms
 Execution Time: 179.824 ms
(40 rows)

Another thing I have noticed is that if I remove the join to the smallmolecule table within the subquery, this query becomes very fast prod.

Edit 2:

I managed to do a pg_basebackup and sync the data fully between dev and production. The only table that had a few million more inserts was the ms1_peak table, the molecular_structure and smallmolecule table never changed between the two during these results.

The result set from the ms1_peak still returned the same results between outdated dev and prod, (1082 peaks).

But anyways I'm still not convinced a large peak table was the cause because that query is super fast. But now both dev and prod produce the exact same query plan and performance of 10s. Why is joining to the smallmolecule table super slow.

Best Answer

The problem comes from the bad row count estimate for the bitmap index scan on api.molecular_structure (26392 instead of the actual 89), and it must be a minor difference in the data that tilts the plan.

It is probably difficult to improve the estimate.

How about rewriting the query?

SELECT p.ms1_peak_id,
       coalesce(json_agg(sm.smallmolecule_id), '[]')
FROM api.ms1_peak AS p
   LEFT JOIN api.molecular_structure AS ms
      ON ms.monoisotopic_mass BETWEEN p.adduct_m - public.mz_tol(p.adduct_m, 5)
                                  AND p.adduct_m + public.mz_tol(p.adduct_m, 5)
   LEFT JOIN api.smallmolecule sm
      USING (molecular_structure_id)
WHERE p.mzml_id = 20634
  AND p.adduct_m IS NOT NULL
GROUP BY p.ms1_peak_id
ORDER BY p.ms1_peak_id ASC;

Not sure if that will improve the speed, but it is worth a try.