I'm trying to select the most recent authors of messages in a room, using this (simplified) table:
Table "public.message"
Column | Type | Nullable | Default
---------+---------+----------+-------------------------------------
id | bigint | not null | nextval('message_id_seq'::regclass)
room | integer | not null |
author | integer | not null |
created | integer | not null |
Indexes:
"message_pkey" PRIMARY KEY, btree (id)
"message_author_created_room" btree (author, created, room)
"message_room_author_created" btree (room, author, created)
"message_room_created" btree (room, created)
"message_room_id" btree (room, id)
The problem is a query like this is slow:
select message.author as id, max(message.created) as mc from message
where room=12 group by message.author order by mc desc limit 50;
And here's the explain(analyze, verbose, buffers)
:
miaou=> explain (analyze, verbose, buffers) select message.author as id, max(message.created) as mc from message
where room=12 group by message.author order by mc desc limit 50;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=10627.14..10627.26 rows=50 width=8) (actual time=54.887..54.901 rows=50 loops=1)
Output: author, (max(created))
Buffers: shared hit=490
-> Sort (cost=10627.14..10629.19 rows=820 width=8) (actual time=54.885..54.891 rows=50 loops=1)
Output: author, (max(created))
Sort Key: (max(message.created)) DESC
Sort Method: top-N heapsort Memory: 29kB
Buffers: shared hit=490
-> Finalize GroupAggregate (cost=1000.46..10599.90 rows=820 width=8) (actual time=14.019..54.788 rows=160 loops=1)
Output: author, max(created)
Group Key: message.author
Buffers: shared hit=490
-> Gather Merge (cost=1000.46..10583.50 rows=1640 width=8) (actual time=14.007..54.636 rows=248 loops=1)
Output: author, (PARTIAL max(created))
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=490
-> Partial GroupAggregate (cost=0.43..9394.18 rows=820 width=8) (actual time=3.439..34.733 rows=83 loops=3)
Output: author, PARTIAL max(created)
Group Key: message.author
Buffers: shared hit=2989
Worker 0: actual time=0.297..49.593 rows=116 loops=1
Buffers: shared hit=1550
Worker 1: actual time=6.624..40.612 rows=60 loops=1
Buffers: shared hit=949
-> Parallel Index Only Scan using message_room_author_created on public.message (cost=0.43..8904.09 rows=96377 width=8) (actual time=0.030..20.067 rows=73907 loops=3)
Output: author, created
Index Cond: (message.room = 12)
Heap Fetches: 139
Buffers: shared hit=2989
Worker 0: actual time=0.035..28.355 rows=109834 loops=1
Buffers: shared hit=1550
Worker 1: actual time=0.030..23.723 rows=79112 loops=1
Buffers: shared hit=949
Planning time: 0.211 ms
Execution time: 57.071 ms
I'd like to know how to make that faster, the real important goal being to get the N most recent authors. Is there a faster way to query that information?
Best Answer
Because author can not be null just do
SELECT count(*)
. Doing so may result in this being an index-only scan on eithermessage_room_created
ormessage_room_id
. If you need the unique authors, which you don't have you'll have to usecount( DISTINCT author )
.Also this being planned as a parallel query. That's relatively new functionality. You may want to try
SET max_parallel_workers_per_gather = 0
to disable parallel query functionality and reposting the results.Either way
If you've got millions rows an index scan and heap fetches, it's not really fair to call 45ms "slow."