Postgresql – Optimizing SELECT query consistently showing up in slow-log

performancepostgresqlpostgresql-9.6query-performance

In a app called 'Links', users post links and photos of interesting content they've discovered recently (and others comment on the said postings).

These posted comments under the photos are saved in a links_photocomment table in my postgresql 9.6.5 DB.

One SELECT query on the links_photocomment table is consistently showing up in slow_log. It's taking longer than 500ms, and is ~10X slower than what I'm experiencing in most other postgresql operations.

Here's an example of the corresponding SQL from my slow log:

LOG: duration: 5071.112 ms statement:

SELECT "links_photocomment"."abuse",
       "links_photocomment"."text",
       "links_photocomment"."id",
       "links_photocomment"."submitted_by_id",
       "links_photocomment"."submitted_on",
       "auth_user"."username",
       "links_userprofile"."score"
FROM   "links_photocomment"
       INNER JOIN "auth_user"
               ON ( "links_photocomment"."submitted_by_id" = "auth_user"."id" )
       LEFT OUTER JOIN "links_userprofile"
                    ON ( "auth_user"."id" = "links_userprofile"."user_id" )
WHERE  "links_photocomment"."which_photo_id" = 3115087
ORDER  BY "links_photocomment"."id" DESC
LIMIT  25;

See the the explain analyze results: https://explain.depesz.com/s/UuCk

The query ends up filtering 19,100,179 rows according to that!

What I've tried:

My gut feel is that Postgres bases this query plan on misleading statistics. Thus, I've run VACUUM ANALYZE on the said table. However this hasn't changed anything.

Being an accidental DBA of sorts, I'm looking for some quick expert guidance on the subject. Thanks in advance and apologies for the noob question (if it is).

Addendum:

Here's the entire output for \d links_photocomment:

                                      Table "public.links_photocomment"
     Column      |           Type           |                            Modifiers                            
-----------------+--------------------------+-----------------------------------------------------------------
 id              | integer                  | not null default nextval('links_photocomment_id_seq'::regclass)
 which_photo_id  | integer                  | not null
 text            | text                     | not null
 device          | character varying(10)    | not null
 submitted_by_id | integer                  | not null
 submitted_on    | timestamp with time zone | not null
 image_comment   | character varying(100)   | not null
 has_image       | boolean                  | not null
 abuse           | boolean                  | default false
Indexes:
    "links_photocomment_pkey" PRIMARY KEY, btree (id)
    "links_photocomment_submitted_by_id" btree (submitted_by_id)
    "links_photocomment_which_photo_id" btree (which_photo_id)
Foreign-key constraints:
    "links_photocomment_submitted_by_id_fkey" FOREIGN KEY (submitted_by_id) REFERENCES auth_user(id) DEFERRABLE INITIALLY DEFERRED
    "links_photocomment_which_photo_id_fkey" FOREIGN KEY (which_photo_id) REFERENCES links_photo(id) DEFERRABLE INITIALLY DEFERRED
Referenced by:
    TABLE "links_photo" CONSTRAINT "latest_comment_id_refs_id_f2566197" FOREIGN KEY (latest_comment_id) REFERENCES links_photocomment(id) DEFERRABLE INITIALLY DEFERRED
    TABLE "links_report" CONSTRAINT "links_report_which_photocomment_id_fkey" FOREIGN KEY (which_photocomment_id) REFERENCES links_photocomment(id) DEFERRABLE INITIALLY DEFERRED
    TABLE "links_photo" CONSTRAINT "second_latest_comment_id_refs_id_f2566197" FOREIGN KEY (second_latest_comment_id) REFERENCES links_photocomment(id) DEFERRABLE INITIALLY DEFERRED

Best Answer

The plan is not using the index on (which_photo_id) but the PK (id) index instead, so it has to read a very large part of the index (and all of it if there are less than 25 rows that match the filter). This takes about 4.4 seconds in the specific execution (and finds the 25 rows after reading and rejecting 19M rows):

-> Index Scan Backward using links_photocomment_pkey on links_photocomment  
   (cost=0.57..2,819,246.22 rows=7,195 width=41)
   (actual time=555.830..4,929.154 rows=25 loops=1)

   Filter: (which_photo_id = 3115087)
   Rows Removed by Filter: 19100179

I would try these:

replacing the index on (which_photo_id) with an index on (which_photo_id, id).
rewrite the INNER join to a LEFT join (there is a FOREIGN KEY constraint that ensures that the two queries will produce the same results.)
rewrite with a subquery (derived table or CTE) moving the WHERE filter inside) in order to first get the 25 ids (hopefully with an index only scan) and then join the other 2 tables.

Query (with derived table):

SELECT "links_photocomment"."abuse",
       "links_photocomment"."text",
       "links_photocomment"."id",
       "links_photocomment"."submitted_by_id",
       "links_photocomment"."submitted_on",
       "auth_user"."username",
       "links_userprofile"."score"
FROM   
       (
         SELECT id
         FROM links_photocomment
         WHERE which_photo_id = 3115087
         ORDER BY id DESC
         LIMIT 25
       ) AS lim
       INNER JOIN "links_photocomment"
           ON ( "links_photocomment"."id" = lim.id )
       LEFT OUTER JOIN "auth_user"
           ON ( "links_photocomment"."submitted_by_id" = "auth_user"."id" )
       LEFT OUTER JOIN "links_userprofile"
           ON ( "auth_user"."id" = "links_userprofile"."user_id" )
ORDER  BY lim.id DESC
LIMIT  25;

Query (with CTE):

WITH lim AS
       (
         SELECT id
         FROM links_photocomment
         WHERE which_photo_id = 3115087
         ORDER BY id DESC
         LIMIT 25
       )
SELECT "links_photocomment"."abuse",
       "links_photocomment"."text",
       "links_photocomment"."id",
       "links_photocomment"."submitted_by_id",
       "links_photocomment"."submitted_on",
       "auth_user"."username",
       "links_userprofile"."score"
FROM   
       lim
       INNER JOIN "links_photocomment"
           ON ( "links_photocomment"."id" = lim.id )
       LEFT OUTER JOIN "auth_user"
           ON ( "links_photocomment"."submitted_by_id" = "auth_user"."id" )
       LEFT OUTER JOIN "links_userprofile"
           ON ( "auth_user"."id" = "links_userprofile"."user_id" )
ORDER  BY lim.id DESC
LIMIT  25;

pgAdmin timing

When you execute a query from the query tool, the message pane shows something like:

Total query runtime: 62 ms.

And the status line shows the same time. I quote pgAdmin help about that:

The status line will show how long the last query took to complete. If a dataset was returned, not only the elapsed time for server execution is displayed, but also the time to retrieve the data from the server to the Data Output page.

If you want to see the time on the server you need to use SQL EXPLAIN ANALYZE or the built in Shift + F7keyboard shortcut or Query -> Explain analyze. Then, at the bottom of the explain output you get something like this:

Total runtime: 0.269 ms

Postgresql – Simple PostgreSQL lookup table is inexplicably slow

Rather than "VARCHAR(40000)" why not use "TEXT"?
HASH index use is discouraged (see docs at http://www.postgresql.org/docs/current/static/indexes-types.html).
Have you run "ANALYZE" on your tables before running the query?
Giant IN lists can be performance killers.

How do the following queries perform?

SELECT e.entityId
FROM entities e
INNER JOIN triples t ON (t.object = e.entityId)
LIMIT 10000;

SELECT e.entityId
FROM entities e
WHERE EXISTS (SELECT 1 FROM triples t WHERE t.object = e.entityId LIMIT 10000);

Best Answer

Related Solutions

Postgresql – Postgres multiple joins slow query, how to store default child record

pgAdmin timing

Postgresql – Simple PostgreSQL lookup table is inexplicably slow

Related Question