I've got comments
that have 1..N comment_notifications
.
Each notification basically contains a username
of the user notified.
Each notification can be resolved_at
when the user is done with it.
This query is slower than I'd like – seeing times around 100–150 ms in production (hobby DB plan on Heroku) but would like to get it down to at least 50 ms or so.
EXPLAIN ANALYZE
SELECT c0."id", c0."resolved_at", c0."username", c0."comment_id", c0."inserted_at", c0."updated_at",
c1."id", c1."github_id", c1."commit_sha", c1."body", c1."commented_at", c1."commenter_username", c1."path", c1."position", c1."payload", c1."inserted_at", c1."updated_at"
FROM "comment_notifications" AS c0
INNER JOIN "comments" AS c1 ON c1."id" = c0."comment_id"
WHERE
(NOT (c0."resolved_at" IS NULL)) AND
(LOWER(c1."commenter_username") = 'henrik')
ORDER BY c0."resolved_at" DESC
LIMIT 300
These are some of my indexes:
CREATE INDEX comment_notifications_resolved_at_index ON public.comment_notifications USING btree (resolved_at);
CREATE INDEX comment_notifications_lower_username_index ON public.comment_notifications USING btree (lower((username)::text));
This is the result of the EXPLAIN ANALYZE:
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=57.86..57.86 rows=5 width=298) (actual time=1.931..1.969 rows=300 loops=1)
-> Sort (cost=57.86..57.86 rows=5 width=298) (actual time=1.930..1.949 rows=300 loops=1)
Sort Key: c0.resolved_at DESC
Sort Method: quicksort Memory: 238kB
-> Hash Join (cost=43.64..57.85 rows=5 width=298) (actual time=1.101..1.520 rows=459 loops=1)
Hash Cond: (c0.comment_id = c1.id)
-> Seq Scan on comment_notifications c0 (cost=0.00..13.75 rows=866 width=48) (actual time=0.014..0.177 rows=871 loops=1)
Filter: (resolved_at IS NOT NULL)
Rows Removed by Filter: 46
-> Hash (cost=43.62..43.62 rows=6 width=250) (actual time=1.051..1.051 rows=473 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 140kB
-> Seq Scan on comments c1 (cost=0.00..43.62 rows=6 width=250) (actual time=0.058..0.805 rows=473 loops=1)
Filter: (lower((commenter_username)::text) = 'henrik'::text)
Rows Removed by Filter: 387
Planning Time: 1.221 ms
Execution Time: 2.056 ms
(16 rows)
My reading is that the username filter is the most expensive part by far. But I have an index on LOWER(username)
as seen above. Any thoughts on why it's not being used?
Any suggestions of what I could improve, or further experiments I could make?
Best Answer
Two different indexes for your query:
As you found out yourself, you had the column
comment_notifications.username
indexed, which is irrelevant for the case. You need this one instead:More importantly, the plan also shows a sequential scan on
comment_notifications
- the far bigger table. Another index on(comment_id)
seems to be missing. Or, tailored to the query, this partial multicolumn index:Run
ANALYZE
on the tables after creating these (and possibly deleting the ones you had).Adding
WHERE resolved_at IS NOT NULL
only makes sense if a sizable proportion of rows with NULL are excluded.If you can change
ORDER BY
, the second index can be used much more efficiently, yet:If you only added
AND cn.resolved_at IS NOT NULL
to get rid of the (few) rows with NULL being listed first, there is a more elegant solution withNULLS LAST
.Query:
Matching index:
See:
Aside 1: Why the cast to
text
? What's the actual data type ofusername
andcommenter_username
? (Should betext
to begin with.) And why the need forlower()
? User names should not be saved with varying upper/lower case spellings to begin with.Aside 2: The explain plan shows low cardinalities. Indexes pay for big tables. Not so much for small tables, where queries are fast either way. 2 ms doesn't seem too bad. 150 ms seems to be a misunderstanding, including network latency or display times.