I have a table:
CREATE TABLE noise (
id bigint NOT NULL,
time_noise timestamp without time zone NOT NULL,
base_name text,
level real,
);
And I want to get last data (by time) for every base_name:
\timing on
SELECT DISTINCT ON (base_name) base_name, time_noise, level
FROM noise
ORDER BY base_name, time_noise DESC;
But this query is too slow, ~2s.
explain (analyze, buffers):
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Unique (cost=197699.46..204176.80 rows=1 width=20) (actual time=1617.637..1795.908 rows=1 loops=1)
Buffers: shared hit=53222, temp read=4775 written=4775
-> Sort (cost=197699.46..200938.13 rows=1295469 width=20) (actual time=1617.636..1719.406 rows=1302740 loops=1)
Sort Key: base_name, time_noise DESC
Sort Method: external merge Disk: 38192kB
Buffers: shared hit=53222, temp read=4775 written=4775
-> Seq Scan on noise (cost=0.00..66176.69 rows=1295469 width=20) (actual time=0.006..223.798 rows=1302740 loops=1)
Buffers: shared hit=53222
Planning time: 0.053 ms
Execution time: 1803.080 ms
How can it be improved?
with work_mem=160MB
Unique (cost=197699.46..204176.80 rows=1 width=20) (actual time=1559.037..1712.098 rows=1 loops=1)
Buffers: shared hit=53222
-> Sort (cost=197699.46..200938.13 rows=1295469 width=20) (actual time=1559.036..1635.336 rows=1302740 loops=1)
Sort Key: base_name, time_noise DESC
Sort Method: quicksort Memory: 150929kB
Buffers: shared hit=53222
-> Seq Scan on noise (cost=0.00..66176.69 rows=1295469 width=20) (actual time=0.007..232.343 rows=1302740 loops=1)
Buffers: shared hit=53222
Planning time: 0.054 ms
Execution time: 1713.740 ms
I found another approach, 10times faster:
select (base_name, time_noise, level)
from noise
where (base_name, time_noise) in (select base_name, max(time_noise) from noise group by base_name);
And create INDEX
for time_noise.
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=81714.75..81719.74 rows=2 width=32) (actual time=193.519..193.537 rows=2 loops=1)
Buffers: shared hit=74800
-> Finalize GroupAggregate (cost=81714.32..81714.40 rows=2 width=15) (actual time=193.490..193.493 rows=2 loops=1)
Group Key: noise_1.base_name
Buffers: shared hit=74792
-> Sort (cost=81714.32..81714.34 rows=8 width=15) (actual time=193.483..193.485 rows=10 loops=1)
Sort Key: noise_1.base_name
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=74792
-> Gather (cost=81713.38..81714.20 rows=8 width=15) (actual time=193.411..193.462 rows=10 loops=1)
Workers Planned: 4
Workers Launched: 4
Buffers: shared hit=74792
-> Partial HashAggregate (cost=80713.38..80713.40 rows=2 width=15) (actual time=189.863..189.864 rows=2 loops=5)
Group Key: noise_1.base_name
Buffers: shared hit=74076
-> Parallel Seq Scan on noise noise_1 (cost=0.00..78500.92 rows=442492 width=15) (actual time=0.014..112.629 rows=353746 loops=5)
Buffers: shared hit=74076
-> Index Scan using time_noise on noise (cost=0.43..2.65 rows=1 width=19) (actual time=0.015..0.017 rows=1 loops=2)
Index Cond: (time_noise = (max(noise_1.time_noise)))
Filter: (noise_1.base_name = base_name)
Rows Removed by Filter: 1
Buffers: shared hit=8
Planning time: 0.174 ms
Execution time: 209.365 ms
Any other suggestions?
Best Answer
Another way is (here I also cut off the range of the date):
~2ms, thats fine.