In an app called Links
, users post interesting content in a forum like interface, and others post replies or comments under this publicly posted content.
These posted replies are saved in a postgresql table named links_publicreply
(postgresql 9.6.5 DB).
One query run on the table keep on cropping up in the slow_log
(greater than 500ms
). This relates to showing the most-recent 25 public replies accumulated under a given piece of shared content.
Here's a sample from the slow log:
LOG: duration: 1614.030 ms statement:
SELECT "links_publicreply"."id",
"links_publicreply"."submitted_by_id",
"links_publicreply"."answer_to_id",
"links_publicreply"."submitted_on",
"links_publicreply"."description",
"links_publicreply"."abuse",
"auth_user"."id",
"auth_user"."username",
"links_userprofile"."id",
"links_userprofile"."user_id",
"links_userprofile"."score",
"links_userprofile"."avatar",
"links_link"."id",
"links_link"."description",
"links_link"."submitter_id",
"links_link"."submitted_on",
"links_link"."reply_count",
"links_link"."latest_reply_id"
FROM "links_publicreply"
INNER JOIN "links_link"
ON ( "links_publicreply"."answer_to_id" = "links_link"."id" )
INNER JOIN "auth_user"
ON ( "links_publicreply"."submitted_by_id" = "auth_user"."id" )
LEFT OUTER JOIN "links_userprofile"
ON ( "auth_user"."id" = "links_userprofile"."user_id" )
WHERE "links_publicreply"."answer_to_id" = 8936203
ORDER BY "links_publicreply"."id" DESC
LIMIT 25
Here are the explain analyze
results of the said query: https://explain.depesz.com/s/pVZ5
According to that, ~70% of the query time seems to be taken up by index scan. But to an accidental DBA like myself, it isn't immediately obvious what optimization I can do to make this more performant. Perhaps a composite index on links_publicreply.answer_to_id, links_publicreply.id
?
It would greatly help me learn if a domain expert can furnish guidance + intuition on solving this class of problem.
P.s. \d links_publicreply
is:
Table "public.links_publicreply"
Column | Type | Modifiers
-----------------+--------------------------+----------------------------------------------------------------
id | integer | not null default nextval('links_publicreply_id_seq'::regclass)
submitted_by_id | integer | not null
answer_to_id | integer | not null
submitted_on | timestamp with time zone | not null
description | text | not null
category | character varying(20) | not null
seen | boolean | not null
abuse | boolean | not null
device | character varying(10) | default '1'::character varying
Indexes:
"links_publicreply_pkey" PRIMARY KEY, btree (id)
"links_publicreply_answer_to_id" btree (answer_to_id)
"links_publicreply_submitted_by_id" btree (submitted_by_id)
Foreign-key constraints:
"links_publicreply_answer_to_id_fkey" FOREIGN KEY (answer_to_id) REFERENCES links_link(id) DEFERRABLE INITIALLY DEFERRED
"links_publicreply_submitted_by_id_fkey" FOREIGN KEY (submitted_by_id) REFERENCES auth_user(id) DEFERRABLE INITIALLY DEFERRED
Referenced by:
TABLE "links_report" CONSTRAINT "links_report_which_publicreply_id_fkey" FOREIGN KEY (which_publicreply_id) REFERENCES links_publicreply(id) DEFERRABLE INITIALLY DEFERRED
TABLE "links_seen" CONSTRAINT "links_seen_which_reply_id_fkey" FOREIGN KEY (which_reply_id) REFERENCES links_publicreply(id) DEFERRABLE INITIALLY DEFERRED
TABLE "links_link" CONSTRAINT "publicreplyposter_link_fkey" FOREIGN KEY (latest_reply_id) REFERENCES links_publicreply(id) ON UPDATE CASCADE ON DELETE CASCADE
Best Answer
Firstly, trying to reduce the number of calling the index. This filter
answer_to_id = 8936203
returns14740
rows which used to check on the other tables. However you need only top25
ofid
. What if youLIMIT 25
thenJOIN
the other tables.The query above would work correctly if you have the constraint between
links_publicreply
and 2 tableslinks_link
andauth_user
. Why? Supposed that, youLIMIT 25
, then found nothing whenJOIN
because there is no rows inlinks_link
related toanswer_to_id = 8936203
.Then, creating the new index on
answer_to_id, id DESC
.Note:
WITH
(query above) called Common Table Expressions