Postgresql – Optimizing SELECT query consistently showing up in slow-log

performancepostgresqlpostgresql-9.6query-performance

In a app called 'Links', users post links and photos of interesting content they've discovered recently (and others comment on the said postings).

These posted comments under the photos are saved in a links_photocomment table in my postgresql 9.6.5 DB.

One SELECT query on the links_photocomment table is consistently showing up in slow_log. It's taking longer than 500ms, and is ~10X slower than what I'm experiencing in most other postgresql operations.

Here's an example of the corresponding SQL from my slow log:

LOG: duration: 5071.112 ms statement:

SELECT "links_photocomment"."abuse",
       "links_photocomment"."text",
       "links_photocomment"."id",
       "links_photocomment"."submitted_by_id",
       "links_photocomment"."submitted_on",
       "auth_user"."username",
       "links_userprofile"."score"
FROM   "links_photocomment"
       INNER JOIN "auth_user"
               ON ( "links_photocomment"."submitted_by_id" = "auth_user"."id" )
       LEFT OUTER JOIN "links_userprofile"
                    ON ( "auth_user"."id" = "links_userprofile"."user_id" )
WHERE  "links_photocomment"."which_photo_id" = 3115087
ORDER  BY "links_photocomment"."id" DESC
LIMIT  25;

See the the explain analyze results: https://explain.depesz.com/s/UuCk

The query ends up filtering 19,100,179 rows according to that!

What I've tried:

My gut feel is that Postgres bases this query plan on misleading statistics. Thus, I've run VACUUM ANALYZE on the said table. However this hasn't changed anything.

Being an accidental DBA of sorts, I'm looking for some quick expert guidance on the subject. Thanks in advance and apologies for the noob question (if it is).

Addendum:

Here's the entire output for \d links_photocomment:

                                      Table "public.links_photocomment"
     Column      |           Type           |                            Modifiers                            
-----------------+--------------------------+-----------------------------------------------------------------
 id              | integer                  | not null default nextval('links_photocomment_id_seq'::regclass)
 which_photo_id  | integer                  | not null
 text            | text                     | not null
 device          | character varying(10)    | not null
 submitted_by_id | integer                  | not null
 submitted_on    | timestamp with time zone | not null
 image_comment   | character varying(100)   | not null
 has_image       | boolean                  | not null
 abuse           | boolean                  | default false
Indexes:
    "links_photocomment_pkey" PRIMARY KEY, btree (id)
    "links_photocomment_submitted_by_id" btree (submitted_by_id)
    "links_photocomment_which_photo_id" btree (which_photo_id)
Foreign-key constraints:
    "links_photocomment_submitted_by_id_fkey" FOREIGN KEY (submitted_by_id) REFERENCES auth_user(id) DEFERRABLE INITIALLY DEFERRED
    "links_photocomment_which_photo_id_fkey" FOREIGN KEY (which_photo_id) REFERENCES links_photo(id) DEFERRABLE INITIALLY DEFERRED
Referenced by:
    TABLE "links_photo" CONSTRAINT "latest_comment_id_refs_id_f2566197" FOREIGN KEY (latest_comment_id) REFERENCES links_photocomment(id) DEFERRABLE INITIALLY DEFERRED
    TABLE "links_report" CONSTRAINT "links_report_which_photocomment_id_fkey" FOREIGN KEY (which_photocomment_id) REFERENCES links_photocomment(id) DEFERRABLE INITIALLY DEFERRED
    TABLE "links_photo" CONSTRAINT "second_latest_comment_id_refs_id_f2566197" FOREIGN KEY (second_latest_comment_id) REFERENCES links_photocomment(id) DEFERRABLE INITIALLY DEFERRED

Best Answer

The plan is not using the index on (which_photo_id) but the PK (id) index instead, so it has to read a very large part of the index (and all of it if there are less than 25 rows that match the filter). This takes about 4.4 seconds in the specific execution (and finds the 25 rows after reading and rejecting 19M rows):

-> Index Scan Backward using links_photocomment_pkey on links_photocomment  
   (cost=0.57..2,819,246.22 rows=7,195 width=41)
   (actual time=555.830..4,929.154 rows=25 loops=1)

   Filter: (which_photo_id = 3115087)
   Rows Removed by Filter: 19100179

I would try these:

  • replacing the index on (which_photo_id) with an index on (which_photo_id, id).

  • rewrite the INNER join to a LEFT join (there is a FOREIGN KEY constraint that ensures that the two queries will produce the same results.)

  • rewrite with a subquery (derived table or CTE) moving the WHERE filter inside) in order to first get the 25 ids (hopefully with an index only scan) and then join the other 2 tables.

Query (with derived table):

SELECT "links_photocomment"."abuse",
       "links_photocomment"."text",
       "links_photocomment"."id",
       "links_photocomment"."submitted_by_id",
       "links_photocomment"."submitted_on",
       "auth_user"."username",
       "links_userprofile"."score"
FROM   
       (
         SELECT id
         FROM links_photocomment
         WHERE which_photo_id = 3115087
         ORDER BY id DESC
         LIMIT 25
       ) AS lim
       INNER JOIN "links_photocomment"
           ON ( "links_photocomment"."id" = lim.id )
       LEFT OUTER JOIN "auth_user"
           ON ( "links_photocomment"."submitted_by_id" = "auth_user"."id" )
       LEFT OUTER JOIN "links_userprofile"
           ON ( "auth_user"."id" = "links_userprofile"."user_id" )
ORDER  BY lim.id DESC
LIMIT  25;

Query (with CTE):

WITH lim AS
       (
         SELECT id
         FROM links_photocomment
         WHERE which_photo_id = 3115087
         ORDER BY id DESC
         LIMIT 25
       )
SELECT "links_photocomment"."abuse",
       "links_photocomment"."text",
       "links_photocomment"."id",
       "links_photocomment"."submitted_by_id",
       "links_photocomment"."submitted_on",
       "auth_user"."username",
       "links_userprofile"."score"
FROM   
       lim
       INNER JOIN "links_photocomment"
           ON ( "links_photocomment"."id" = lim.id )
       LEFT OUTER JOIN "auth_user"
           ON ( "links_photocomment"."submitted_by_id" = "auth_user"."id" )
       LEFT OUTER JOIN "links_userprofile"
           ON ( "auth_user"."id" = "links_userprofile"."user_id" )
ORDER  BY lim.id DESC
LIMIT  25;