Postgresql – Getting started with query tuning

postgresql

I tried explain analyze in postgres to find below plan. I have index in all of column with join but still the query is slow. How can I optimize the query?

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=338814.09..701327.89 rows=697105 width=32) (actual time=4566.092..6408.981 rows=886204 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Hash Join  (cost=337814.09..630617.39 rows=290460 width=32) (actual time=4537.457..4986.583 rows=295401 loops=3)
         Hash Cond: (s.imsi = i.imsi)
         ->  Parallel Seq Scan on subsloc_spatial s  (cost=0.00..254545.62 rows=983064 width=40) (actual time=484.231..812.885 rows=841711 loops=3)
               Filter: ((calltime IS NOT NULL) AND (calltime >= '2018-12-21 09:10:50'::timestamp without time zone) AND (calltime <= '2018-12-21 12:45:50'::timestamp without time zone))
               Rows Removed by Filter: 3721095
         ->  Parallel Hash  (cost=310079.62..310079.62 rows=1690438 width=8) (actual time=3516.791..3516.791 rows=999275 loops=3)
               Buckets: 131072  Batches: 64  Memory Usage: 2912kB
               ->  Parallel Hash Join  (cost=80542.85..310079.62 rows=1690438 width=8) (actual time=2233.489..3295.436 rows=999275 loops=3)
                     Hash Cond: (i.msisdn = l.msisdn)
                     ->  Parallel Seq Scan on imsi_msisdn i  (cost=0.00..131410.85 rows=5721285 width=16) (actual time=0.046..844.358 rows=4577028 loops=3)
                     ->  Parallel Hash  (cost=52808.38..52808.38 rows=1690438 width=8) (actual time=415.896..415.897 rows=1352350 loops=3)
                           Buckets: 131072  Batches: 64  Memory Usage: 3552kB
                           ->  Parallel Seq Scan on lte l  (cost=0.00..52808.38 rows=1690438 width=8) (actual time=0.039..180.049 rows=1352350 loops=3)

.

Best Answer

Seeing the query text, not just the plan, could help us help you.

Your hash joins are resorting to multiple batches, so increasing "work_mem" (by a lot) could help, if you have the RAM to support it.

Increasing "max_parallel_workers_per_gather" could lead to more parallelism which could be beneficial.

An index on (calltime,imsi) if you don't already have one might help, but given that you are fetching nearly a million rows it might not.

What are you going to do with all these rows once you get them? Maybe you do aggregation on the server, rather than doing it in the client (if that is what you are doing).