Postgresql – Improve DISTINCT SQL-query, PostgreSQL 9.6

performancepostgresqlquery-performance

I have a table:

CREATE TABLE noise (
    id bigint NOT NULL,
    time_noise timestamp without time zone NOT NULL,
    base_name text,
    level real,
);

And I want to get last data (by time) for every base_name:

\timing on
SELECT DISTINCT ON (base_name) base_name, time_noise, level 
FROM noise 
ORDER BY base_name, time_noise DESC;

But this query is too slow, ~2s.

explain (analyze, buffers):

                                                          QUERY PLAN                                                          
------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=197699.46..204176.80 rows=1 width=20) (actual time=1617.637..1795.908 rows=1 loops=1)
   Buffers: shared hit=53222, temp read=4775 written=4775
   ->  Sort  (cost=197699.46..200938.13 rows=1295469 width=20) (actual time=1617.636..1719.406 rows=1302740 loops=1)
         Sort Key: base_name, time_noise DESC
         Sort Method: external merge  Disk: 38192kB
         Buffers: shared hit=53222, temp read=4775 written=4775
         ->  Seq Scan on noise  (cost=0.00..66176.69 rows=1295469 width=20) (actual time=0.006..223.798 rows=1302740 loops=1)
               Buffers: shared hit=53222
 Planning time: 0.053 ms
 Execution time: 1803.080 ms

How can it be improved?

with work_mem=160MB

 Unique  (cost=197699.46..204176.80 rows=1 width=20) (actual time=1559.037..1712.098 rows=1 loops=1)
   Buffers: shared hit=53222
   ->  Sort  (cost=197699.46..200938.13 rows=1295469 width=20) (actual time=1559.036..1635.336 rows=1302740 loops=1)
         Sort Key: base_name, time_noise DESC
         Sort Method: quicksort  Memory: 150929kB
         Buffers: shared hit=53222
         ->  Seq Scan on noise  (cost=0.00..66176.69 rows=1295469 width=20) (actual time=0.007..232.343 rows=1302740 loops=1)
               Buffers: shared hit=53222
 Planning time: 0.054 ms
 Execution time: 1713.740 ms

I found another approach, 10times faster:

select (base_name, time_noise, level)
from noise 
where (base_name, time_noise) in (select base_name, max(time_noise) from noise group by base_name);

And create INDEX for time_noise.

    QUERY PLAN                                                                           
---------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=81714.75..81719.74 rows=2 width=32) (actual time=193.519..193.537 rows=2 loops=1)
   Buffers: shared hit=74800
   ->  Finalize GroupAggregate  (cost=81714.32..81714.40 rows=2 width=15) (actual time=193.490..193.493 rows=2 loops=1)
         Group Key: noise_1.base_name
         Buffers: shared hit=74792
         ->  Sort  (cost=81714.32..81714.34 rows=8 width=15) (actual time=193.483..193.485 rows=10 loops=1)
               Sort Key: noise_1.base_name
               Sort Method: quicksort  Memory: 25kB
               Buffers: shared hit=74792
               ->  Gather  (cost=81713.38..81714.20 rows=8 width=15) (actual time=193.411..193.462 rows=10 loops=1)
                     Workers Planned: 4
                     Workers Launched: 4
                     Buffers: shared hit=74792
                     ->  Partial HashAggregate  (cost=80713.38..80713.40 rows=2 width=15) (actual time=189.863..189.864 rows=2 loops=5)
                           Group Key: noise_1.base_name
                           Buffers: shared hit=74076
                           ->  Parallel Seq Scan on noise noise_1  (cost=0.00..78500.92 rows=442492 width=15) (actual time=0.014..112.629 rows=353746 loops=5)
                                 Buffers: shared hit=74076
   ->  Index Scan using time_noise on noise  (cost=0.43..2.65 rows=1 width=19) (actual time=0.015..0.017 rows=1 loops=2)
         Index Cond: (time_noise = (max(noise_1.time_noise)))
         Filter: (noise_1.base_name = base_name)
         Rows Removed by Filter: 1
         Buffers: shared hit=8
 Planning time: 0.174 ms
 Execution time: 209.365 ms

Any other suggestions?

Best Answer

Another way is (here I also cut off the range of the date):

select base_name, time_noise, slow
from noise 
where (base_name, time_noise) in (select base_name, max(time_noise) from noise WHERE time_noise >= (NOW() - INTERVAL '5 minutes' ) group by base_name);

~2ms, thats fine.