Postgresql – Optimizing slow SELECT query (postgresql 9.6.5)

performancepostgresqlquery-performance

In an app called Links, users post interesting content in a forum like interface, and others post replies or comments under this publicly posted content.

These posted replies are saved in a postgresql table named links_publicreply (postgresql 9.6.5 DB).

One query run on the table keep on cropping up in the slow_log (greater than 500ms). This relates to showing the most-recent 25 public replies accumulated under a given piece of shared content.

Here's a sample from the slow log:

LOG: duration: 1614.030 ms statement:

 SELECT "links_publicreply"."id",
       "links_publicreply"."submitted_by_id",
       "links_publicreply"."answer_to_id",
       "links_publicreply"."submitted_on",
       "links_publicreply"."description",
       "links_publicreply"."abuse",
       "auth_user"."id",
       "auth_user"."username",
       "links_userprofile"."id",
       "links_userprofile"."user_id",
       "links_userprofile"."score",
       "links_userprofile"."avatar",
       "links_link"."id",
       "links_link"."description",
       "links_link"."submitter_id",
       "links_link"."submitted_on",
       "links_link"."reply_count",
       "links_link"."latest_reply_id"
FROM   "links_publicreply"
       INNER JOIN "links_link"
               ON ( "links_publicreply"."answer_to_id" = "links_link"."id" )
       INNER JOIN "auth_user"
               ON ( "links_publicreply"."submitted_by_id" = "auth_user"."id" )
       LEFT OUTER JOIN "links_userprofile"
                    ON ( "auth_user"."id" = "links_userprofile"."user_id" )
WHERE  "links_publicreply"."answer_to_id" = 8936203
ORDER  BY "links_publicreply"."id" DESC
LIMIT  25  

Here are the explain analyze results of the said query: https://explain.depesz.com/s/pVZ5

According to that, ~70% of the query time seems to be taken up by index scan. But to an accidental DBA like myself, it isn't immediately obvious what optimization I can do to make this more performant. Perhaps a composite index on links_publicreply.answer_to_id, links_publicreply.id?

It would greatly help me learn if a domain expert can furnish guidance + intuition on solving this class of problem.


P.s. \d links_publicreply is:

                                      Table "public.links_publicreply"
     Column      |           Type           |                           Modifiers                            
-----------------+--------------------------+----------------------------------------------------------------
 id              | integer                  | not null default nextval('links_publicreply_id_seq'::regclass)
 submitted_by_id | integer                  | not null
 answer_to_id    | integer                  | not null
 submitted_on    | timestamp with time zone | not null
 description     | text                     | not null
 category        | character varying(20)    | not null
 seen            | boolean                  | not null
 abuse           | boolean                  | not null
 device          | character varying(10)    | default '1'::character varying
Indexes:
    "links_publicreply_pkey" PRIMARY KEY, btree (id)
    "links_publicreply_answer_to_id" btree (answer_to_id)
    "links_publicreply_submitted_by_id" btree (submitted_by_id)
Foreign-key constraints:
    "links_publicreply_answer_to_id_fkey" FOREIGN KEY (answer_to_id) REFERENCES links_link(id) DEFERRABLE INITIALLY DEFERRED
    "links_publicreply_submitted_by_id_fkey" FOREIGN KEY (submitted_by_id) REFERENCES auth_user(id) DEFERRABLE INITIALLY DEFERRED
Referenced by:
    TABLE "links_report" CONSTRAINT "links_report_which_publicreply_id_fkey" FOREIGN KEY (which_publicreply_id) REFERENCES links_publicreply(id) DEFERRABLE INITIALLY DEFERRED
    TABLE "links_seen" CONSTRAINT "links_seen_which_reply_id_fkey" FOREIGN KEY (which_reply_id) REFERENCES links_publicreply(id) DEFERRABLE INITIALLY DEFERRED
    TABLE "links_link" CONSTRAINT "publicreplyposter_link_fkey" FOREIGN KEY (latest_reply_id) REFERENCES links_publicreply(id) ON UPDATE CASCADE ON DELETE CASCADE

Best Answer

Firstly, trying to reduce the number of calling the index. This filter answer_to_id = 8936203 returns 14740 rows which used to check on the other tables. However you need only top 25 of id. What if you LIMIT 25 then JOIN the other tables.

WITH tmp_links_publicreply AS (
   SELECT ...
   FROM links_publicreply
   WHERE  "links_publicreply"."answer_to_id" = 8936203
   ORDER  BY "links_publicreply"."id" DESC
   LIMIT  25  
)
SELECT 
FROM tmp_links_publicreply t 
JOIN ... 
JOIN ...

The query above would work correctly if you have the constraint between links_publicreply and 2 tables links_link and auth_user. Why? Supposed that, you LIMIT 25, then found nothing when JOIN because there is no rows in links_link related to answer_to_id = 8936203.

Then, creating the new index on answer_to_id, id DESC.

Note: WITH (query above) called Common Table Expressions