SELECT count(c.id) as clickCount
FROM clicks c
LEFT JOIN links l on c.link_id = l.id
LEFT JOIN user_agents ua on c.user_agent_id = ua.id
AND ua.robot IS NULL
WHERE l.user_id = ?
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=337670.79..337670.80 rows=1 width=8) (actual time=3508.630..3508.630 rows=1 loops=1)
Buffers: shared hit=448334
-> Nested Loop (cost=0.84..334057.39 rows=1445360 width=4) (actual time=0.041..3040.606 rows=6110334 loops=1)
Buffers: shared hit=448334
-> Index Scan using link_user_idx on links l (cost=0.28..73.63 rows=136 width=4) (actual time=0.017..0.093 rows=136 loops=1)
Index Cond: (user_id = 1125)
Buffers: shared hit=24
-> Index Scan using click_link_idx on clicks c (cost=0.56..2208.66 rows=24710 width=12) (actual time=0.003..16.136 rows=44929 loops=136)
Index Cond: (link_id = l.id)
Buffers: shared hit=448310
Planning Time: 0.512 ms
Execution Time: 3508.683 ms
3.5 seconds
SELECT count(c.id) as clickCount, l.location
FROM clicks c
LEFT JOIN links l on c.link_id = l.id
LEFT JOIN user_agents ua on c.user_agent_id = ua.id
AND ua.robot IS NULL
WHERE l.user_id = 1125
GROUP BY l.location
ORDER BY clickCount, location DESC
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=495348.44..495348.49 rows=18 width=524) (actual time=1978.261..1978.261 rows=1 loops=1)
Sort Key: (count(c.id)), l.location DESC
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=23740 read=38820
-> Finalize GroupAggregate (cost=495334.17..495348.07 rows=18 width=524) (actual time=1978.251..1978.251 rows=1 loops=1)
Group Key: l.location
Buffers: shared hit=23740 read=38820
-> Gather Merge (cost=495334.17..495347.35 rows=108 width=524) (actual time=1978.231..1997.869 rows=7 loops=1)
Workers Planned: 6
Workers Launched: 6
Buffers: shared hit=153563 read=257052
-> Sort (cost=494334.08..494334.12 rows=18 width=524) (actual time=1956.989..1956.990 rows=1 loops=7)
Sort Key: l.location DESC
Sort Method: quicksort Memory: 25kB
Worker 0: Sort Method: quicksort Memory: 25kB
Worker 1: Sort Method: quicksort Memory: 25kB
Worker 2: Sort Method: quicksort Memory: 25kB
Worker 3: Sort Method: quicksort Memory: 25kB
Worker 4: Sort Method: quicksort Memory: 25kB
Worker 5: Sort Method: quicksort Memory: 25kB
Buffers: shared hit=153563 read=257052
-> Partial HashAggregate (cost=494333.52..494333.70 rows=18 width=524) (actual time=1956.955..1956.956 rows=1 loops=7)
Group Key: l.location
Buffers: shared hit=153515 read=257052
-> Hash Join (cost=22.52..494165.25 rows=33654 width=520) (actual time=42.709..1738.826 rows=872905 loops=7)
Hash Cond: (c.link_id = l.id)
Buffers: shared hit=153515 read=257052
-> Parallel Seq Scan on clicks c (cost=0.00..476696.35 rows=6638735 width=12) (actual time=0.045..884.436 rows=5690344 loops=7)
Buffers: shared hit=153257 read=257052
-> Hash (cost=22.29..22.29 rows=19 width=520) (actual time=0.219..0.219 rows=136 loops=7)
Buckets: 1024 Batches: 1 Memory Usage: 15kB
Buffers: shared hit=174
-> Bitmap Heap Scan on links l (cost=1.53..22.29 rows=19 width=520) (actual time=0.050..0.186 rows=136 loops=7)
Recheck Cond: (user_id = 1125)
Heap Blocks: exact=22
Buffers: shared hit=174
-> Bitmap Index Scan on link_user_idx (cost=0.00..1.52 rows=19 width=0) (actual time=0.036..0.036 rows=136 loops=7)
Index Cond: (user_id = 1125)
Buffers: shared hit=20
Planning Time: 0.334 ms
Execution Time: 1998.071 ms
1.9 seconds
SELECT sum(r.amount * c.revshare_influencer) as amount, r.month as month, r.year as year
FROM revenue r
JOIN clicks c on c.id = r.click_id
WHERE r.user_id = 1125
GROUP BY r.year, r.month
ORDER BY r.year desc, r.month desc
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize GroupAggregate (cost=57546.34..62229.59 rows=22711 width=40) (actual time=2765.425..2914.399 rows=13 loops=1)
Group Key: r.year, r.month
Buffers: shared hit=1755012 read=8976, temp read=1198 written=1204
-> Gather Merge (cost=57546.34..61621.83 rows=32388 width=40) (actual time=2762.219..2936.577 rows=52 loops=1)
Workers Planned: 3
Workers Launched: 3
Buffers: shared hit=6407432 read=20902, temp read=4374 written=4398
-> Partial GroupAggregate (cost=56546.30..56816.20 rows=10796 width=40) (actual time=2749.492..2865.023 rows=13 loops=4)
Group Key: r.year, r.month
Buffers: shared hit=6407432 read=20902, temp read=4374 written=4398
-> Sort (cost=56546.30..56573.29 rows=10796 width=52) (actual time=2681.282..2736.792 rows=318604 loops=4)
Sort Key: r.year DESC, r.month DESC
Sort Method: external merge Disk: 9584kB
Worker 0: Sort Method: external merge Disk: 8480kB
Worker 1: Sort Method: external merge Disk: 8608kB
Worker 2: Sort Method: external merge Disk: 8320kB
Buffers: shared hit=6407432 read=20902, temp read=4374 written=4398
-> Nested Loop (cost=361.57..55823.06 rows=10796 width=52) (actual time=113.962..2503.475 rows=318604 loops=4)
Buffers: shared hit=6407411 read=20902
-> Parallel Bitmap Heap Scan on revenue r (cost=361.01..27825.97 rows=10796 width=40) (actual time=113.683..358.398 rows=318604 loops=4)
Recheck Cond: (user_id = 1125)
Heap Blocks: exact=14271
Buffers: shared hit=37228 read=15519
-> Bitmap Index Scan on revenue_user_idx (cost=0.00..352.64 rows=33468 width=0) (actual time=104.289..104.289 rows=1274414 loops=1)
Index Cond: (user_id = 1125)
Buffers: shared read=3485
-> Index Scan using clicks_idx_id on clicks c (cost=0.56..2.59 rows=1 width=20) (actual time=0.006..0.006 rows=1 loops=1274414)
Index Cond: (id = r.click_id)
Buffers: shared hit=6370183 read=5383
Planning Time: 0.369 ms
Execution Time: 2939.455 ms
2.9 seconds
table clicks
contains 35m rows and table revenue
contains 56m rows. the server is linode 32gb 16 core ubuntu.
What i have done regarding this ?
it's been mysql for me but haven't dealt with this loads of data earlier. i have been researching to improve the performance of this db. i have been tweaking the postgres for performance. there is 0 improvement (work_mem,shared_buffers
) and tried understanding the query explainer which i am not very good at.
i am thinking of table partitioning / materialized views, but i have also read table partitioning might not be effective. or tweaking this query would give at least 50% of performance bump ? as these tables will pileup more in the future, what is the ideal way to overcome this issue ?
Postgres Version:
PostgreSQL 11.5 (Ubuntu 11.5-1.pgdg16.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609, 64-bit
Postgres Settings :
max_connections = 200
shared_buffers = 8GB
effective_cache_size = 24GB
maintenance_work_mem = 2GB
checkpoint_completion_target = 0.7
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 1.1
effective_io_concurrency = 200
work_mem = 5242kB
min_wal_size = 1GB
max_wal_size = 2GB
max_worker_processes = 16
max_parallel_workers_per_gather = 8
max_parallel_workers = 16
Best Answer
Things that have helped so far (from ~10-15s to ~2-4s):
9.5
to11.5
(allowed for paralelisation, possibly other benefits too)DISTINCT
A couple of further ideas:
Additionally, there are several tools that make understanding and sharing query execution plans easier.
Disclaimer: I work on the most recent one listed, pgMustard.