Postgresql – How to increase timing performace of a SQL query

performancepostgresqlquery-performance

I'm using postgresql 9.6 on the 16GB Ubuntu-PC with 4 cores & SSD with following params:

max_connections = 200                   
shared_buffers = 6GB
work_mem = 256MB                              
maintenance_work_mem =1GB   

How to increase performance of the SQL-query:

WITH ss AS (SELECT  
      time_noise, track, slow, distance, last_time
    FROM
        eco.noise
    WHERE
        (track, time_noise, slow) IN (
        SELECT
          DISTINCT ON (track)
          track, time_noise, slow 
        FROM
          eco.noise
        WHERE
          base_name='B001' AND slow >= 50 AND slow <= 90 AND distance <= 10000 AND time_noise >= '06-01-2019' AND time_noise <= '07-01-2019'
          ORDER BY track, slow DESC
        ) 
    ORDER BY time_noise)
SELECT
    COALESCE(to_char(ss.time_noise, 'YYYY-MM-DD HH24:MI:SS'), '') AS time_noise, ss.slow, ss.track, COALESCE(to_char(ss.last_time, 'YYYY-MM-DD HH24:MI:SS'), '') as last_time, ss.distance,
    eco.tracks.callsign, eco.tracks.altitude,eco.tracks.speed,eco.tracks.angle,eco.tracks.latitude,eco.tracks.longitude,eco.tracks.vertical_speed,
    eco.routes.from, eco.routes.to
FROM ss
LEFT JOIN eco.tracks ON ss.track  = eco.tracks.track AND eco.tracks.time_track = ss.last_time
LEFT JOIN eco.routes ON eco.tracks.callsign  = eco.routes.callsign
ORDER BY ss.time_noise ASC;

I use the following indexes:

 postgres  | eco         | eco.noise            | test_unique_noise_3         | f         | f          | btree      | 3 8 26 2 | {base_name,slow,distance,time_noise} | f             | f
 postgres  | eco         | eco.tracks           | tracks_time_track_track_key | t         | f          | btree      | 2 3      | {time_track,track}                   | f             | f
 postgres  | eco         | eco.routes           | pr_routes                   | t         | t          | btree      | 1        | {callsign}                           | f             | f
 postgres  | eco         | eco.noise            | test_unique_noise           | f         | f          | btree      | 25 2 8   | {track,time_noise,slow}              | f             | f

EXPLAIN ANALYZE:

Sort  (cost=3739384.97..3742913.97 rows=1411600 width=131) (actual time=2377.640..2377.805 rows=3792 loops=1)
       Sort Key: ss.time_noise
       Sort Method: quicksort  Memory: 1087kB
       CTE ss
         ->  Sort  (cost=525472.51..529001.51 rows=1411600 width=32) (actual time=2161.522..2162.140 rows=3792 loops=1)
               Sort Key: noise.time_noise
               Sort Method: quicksort  Memory: 393kB
               ->  Nested Loop  (cost=369723.59..381285.34 rows=1411600 width=32) (actual time=2026.531..2160.723 rows=3792 loops=1)
                     ->  Unique  (cost=369723.02..370160.00 rows=3978 width=20) (actual time=2026.498..2142.201 rows=3791 loops=1)
                           ->  Sort  (cost=369723.02..369941.51 rows=87396 width=20) (actual time=2026.497..2117.520 rows=568059 loops=1)
                                 Sort Key: noise_1.track, noise_1.slow DESC
                                 Sort Method: quicksort  Memory: 68956kB
                                 ->  Index Scan using test_unique_noise_3 on noise noise_1  (cost=0.56..362549.87 rows=87396 width=20) (actual time=0.039..1751.161 rows=568059 loops=1)
                                       Index Cond: ((base_name = 'B001'::text) AND (slow >= '50'::double precision) AND (slow <= '90'::double precision) AND (distance <= 10000) AND (time_noise >= '2019-06-01 00:00:00'::timestamp without time zone) AND (time_noise <= '2019-07-01 00:00:00'::timestamp without time zone))
                     ->  Index Scan using test_unique_noise on noise  (cost=0.56..2.78 rows=1 width=32) (actual time=0.004..0.004 rows=1 loops=3791)
                           Index Cond: ((track = noise_1.track) AND (time_noise = noise_1.time_noise) AND (slow = noise_1.slow))
       ->  Hash Left Join  (cost=17792.76..3066196.28 rows=1411600 width=131) (actual time=2351.691..2376.080 rows=3792 loops=1)
             Hash Cond: (tracks.callsign = routes.callsign)
             ->  Nested Loop Left Join  (cost=0.56..3021978.00 rows=1411600 width=67) (actual time=2161.540..2179.800 rows=3792 loops=1)
                   ->  CTE Scan on ss  (cost=0.00..28232.00 rows=1411600 width=32) (actual time=2161.524..2162.558 rows=3792 loops=1)
                   ->  Index Scan using tracks_time_track_track_key on tracks  (cost=0.56..2.11 rows=1 width=51) (actual time=0.004..0.004 rows=1 loops=3792)
                         Index Cond: ((time_track = ss.last_time) AND (ss.track = track))
             ->  Hash  (cost=10017.64..10017.64 rows=621964 width=15) (actual time=189.279..189.279 rows=610992 loops=1)
                   Buckets: 1048576  Batches: 1  Memory Usage: 37585kB
                   ->  Seq Scan on routes  (cost=0.00..10017.64 rows=621964 width=15) (actual time=0.017..65.022 rows=610992 loops=1)
     Planning time: 0.646 ms
     Execution time: 2380.094 ms

As I understand the bottleneck is CTE because of 'WITH', so I've tried to use subquery.. but similar result;

   SELECT
      sub.time_noise AS time_noise, sub.slow, sub.track, sub.last_time as last_time, sub.temperature, sub.humadity, sub.presure, sub.wind, sub.distance,
      eco.tracks.callsign, eco.tracks.altitude,eco.tracks.speed,eco.tracks.angle,eco.tracks.latitude,eco.tracks.longitude,eco.tracks.vertical_speed,
      eco.routes.from, eco.routes.to
  FROM (SELECT  
    time_noise, track, slow, distance, last_time
  FROM
      eco.noise
  WHERE
      (track, time_noise, slow) IN (
      SELECT
        DISTINCT ON (track)
        track, time_noise, slow 
      FROM
        eco.noise
      WHERE
        base_name='B001' AND slow >= 50 AND slow <= 100 AND distance <= 10000 AND time_noise >= '01-01-2019' AND time_noise <= '07-01-2019' 
        ORDER BY track, slow DESC
      ) 
  ORDER BY time_noise) as sub
  LEFT JOIN eco.tracks ON sub.track  = eco.tracks.track AND eco.tracks.time_track = sub.last_time
  LEFT JOIN eco.routes ON eco.tracks.callsign  = eco.routes.callsign
  ORDER BY sub.time_noise ASC;

EXPLAIN ANALYZE:

Nested Loop Left Join  (cost=1066367.20..4077589.03 rows=1411652 width=83) (actual time=3943.936..3992.130 rows=9262 loops=1)
   ->  Sort  (cost=1066366.64..1069895.77 rows=1411652 width=48) (actual time=3943.917..3945.328 rows=9262 loops=1)
         Sort Key: noise.time_noise
         Sort Method: quicksort  Memory: 1108kB
         ->  Nested Loop  (cost=907649.38..922173.77 rows=1411652 width=48) (actual time=3582.100..3941.579 rows=9262 loops=1)
               ->  Unique  (cost=907648.81..911048.44 rows=3978 width=20) (actual time=3582.059..3891.212 rows=9253 loops=1)
                     ->  Sort  (cost=907648.81..909348.63 rows=679925 width=20) (actual time=3582.058..3826.172 rows=1450072 loops=1)
                           Sort Key: noise_1.track, noise_1.slow DESC
                           Sort Method: quicksort  Memory: 162439kB
                           ->  Index Scan using test_unique_noise_3 on noise noise_1  (cost=0.56..841781.02 rows=679925 width=20) (actual time=0.043..2864.183 rows=1450072 loops=1)
                                 Index Cond: ((base_name = 'B001'::text) AND (slow >= '50'::double precision) AND (slow <= '100'::double precision) AND (distance <= 10000) AND (time_noise >= '2019-01-01 00:00:00'::timestamp without time zone) AND (time_noise <= '2019-07-01 00:00:00'::timestamp without time zone))
               ->  Index Scan using test_unique_noise on noise  (cost=0.56..2.78 rows=1 width=48) (actual time=0.005..0.005 rows=1 loops=9253)
                     Index Cond: ((track = noise_1.track) AND (time_noise = noise_1.time_noise) AND (slow = noise_1.slow))
   ->  Index Scan using tracks_time_track_track_key on tracks  (cost=0.56..2.11 rows=1 width=51) (actual time=0.005..0.005 rows=1 loops=9262)
         Index Cond: ((time_track = noise.last_time) AND (noise.track = track))
 Planning time: 0.782 ms
 Execution time: 3994.348 ms

Should I use PL/pgpsql e.g?

Best Answer

The index scan can only use the first two columns (base_name and slow) for the scan itself, the remaining ones are used as a filter (before table access). But your query is probably slow because it has to access the table 1.5 million times to fetch the track column.

You should create an index like this:

CREATE INDEX ON noise (base_name, track, slow DESC, distance, time_noise);

Then make sure that noise is vacuumed regularly, and you should get an index only scan, which is much faster, and you won't need an explicit sort as well.