Postgresql – Optimizing slow SELECT query (postgresql 9.6.5)

performancepostgresqlquery-performance

In an app called Links, users post interesting content in a forum like interface, and others post replies or comments under this publicly posted content.

These posted replies are saved in a postgresql table named links_publicreply (postgresql 9.6.5 DB).

One query run on the table keep on cropping up in the slow_log (greater than 500ms). This relates to showing the most-recent 25 public replies accumulated under a given piece of shared content.

Here's a sample from the slow log:

LOG: duration: 1614.030 ms statement:

 SELECT "links_publicreply"."id",
       "links_publicreply"."submitted_by_id",
       "links_publicreply"."answer_to_id",
       "links_publicreply"."submitted_on",
       "links_publicreply"."description",
       "links_publicreply"."abuse",
       "auth_user"."id",
       "auth_user"."username",
       "links_userprofile"."id",
       "links_userprofile"."user_id",
       "links_userprofile"."score",
       "links_userprofile"."avatar",
       "links_link"."id",
       "links_link"."description",
       "links_link"."submitter_id",
       "links_link"."submitted_on",
       "links_link"."reply_count",
       "links_link"."latest_reply_id"
FROM   "links_publicreply"
       INNER JOIN "links_link"
               ON ( "links_publicreply"."answer_to_id" = "links_link"."id" )
       INNER JOIN "auth_user"
               ON ( "links_publicreply"."submitted_by_id" = "auth_user"."id" )
       LEFT OUTER JOIN "links_userprofile"
                    ON ( "auth_user"."id" = "links_userprofile"."user_id" )
WHERE  "links_publicreply"."answer_to_id" = 8936203
ORDER  BY "links_publicreply"."id" DESC
LIMIT  25

Here are the explain analyze results of the said query: https://explain.depesz.com/s/pVZ5

According to that, ~70% of the query time seems to be taken up by index scan. But to an accidental DBA like myself, it isn't immediately obvious what optimization I can do to make this more performant. Perhaps a composite index on links_publicreply.answer_to_id, links_publicreply.id?

It would greatly help me learn if a domain expert can furnish guidance + intuition on solving this class of problem.

P.s. \d links_publicreply is:

                                      Table "public.links_publicreply"
     Column      |           Type           |                           Modifiers                            
-----------------+--------------------------+----------------------------------------------------------------
 id              | integer                  | not null default nextval('links_publicreply_id_seq'::regclass)
 submitted_by_id | integer                  | not null
 answer_to_id    | integer                  | not null
 submitted_on    | timestamp with time zone | not null
 description     | text                     | not null
 category        | character varying(20)    | not null
 seen            | boolean                  | not null
 abuse           | boolean                  | not null
 device          | character varying(10)    | default '1'::character varying
Indexes:
    "links_publicreply_pkey" PRIMARY KEY, btree (id)
    "links_publicreply_answer_to_id" btree (answer_to_id)
    "links_publicreply_submitted_by_id" btree (submitted_by_id)
Foreign-key constraints:
    "links_publicreply_answer_to_id_fkey" FOREIGN KEY (answer_to_id) REFERENCES links_link(id) DEFERRABLE INITIALLY DEFERRED
    "links_publicreply_submitted_by_id_fkey" FOREIGN KEY (submitted_by_id) REFERENCES auth_user(id) DEFERRABLE INITIALLY DEFERRED
Referenced by:
    TABLE "links_report" CONSTRAINT "links_report_which_publicreply_id_fkey" FOREIGN KEY (which_publicreply_id) REFERENCES links_publicreply(id) DEFERRABLE INITIALLY DEFERRED
    TABLE "links_seen" CONSTRAINT "links_seen_which_reply_id_fkey" FOREIGN KEY (which_reply_id) REFERENCES links_publicreply(id) DEFERRABLE INITIALLY DEFERRED
    TABLE "links_link" CONSTRAINT "publicreplyposter_link_fkey" FOREIGN KEY (latest_reply_id) REFERENCES links_publicreply(id) ON UPDATE CASCADE ON DELETE CASCADE

Best Answer

Firstly, trying to reduce the number of calling the index. This filter answer_to_id = 8936203 returns 14740 rows which used to check on the other tables. However you need only top 25 of id. What if you LIMIT 25 then JOIN the other tables.

WITH tmp_links_publicreply AS (
   SELECT ...
   FROM links_publicreply
   WHERE  "links_publicreply"."answer_to_id" = 8936203
   ORDER  BY "links_publicreply"."id" DESC
   LIMIT  25  
)
SELECT 
FROM tmp_links_publicreply t 
JOIN ... 
JOIN ...

The query above would work correctly if you have the constraint between links_publicreply and 2 tables links_link and auth_user. Why? Supposed that, you LIMIT 25, then found nothing when JOIN because there is no rows in links_link related to answer_to_id = 8936203.

Then, creating the new index on answer_to_id, id DESC.

Note: WITH (query above) called Common Table Expressions

Related Solutions

Postgresql – Postgres multiple joins slow query, how to store default child record

You write:

Each customer can have multiple sites, but only one should be displayed in this list.

Yet, your query retrieves all rows. That would be a point to optimize. But you also do not define which site is to be picked.

Either way, it does not matter much here. Your EXPLAIN shows only 5026 rows for the site scan (5018 for the customer scan). So hardly any customer actually has more than one site. Did you ANALYZE your tables before running EXPLAIN?

From the numbers I see in your EXPLAIN, indexes will give you nothing for this query. Sequential table scans will be the fastest possible way. Half a second is rather slow for 5000 rows, though. Maybe your database needs some general performance tuning?

Maybe the query itself is faster, but "half a second" includes network transfer? EXPLAIN ANALYZE would tell us more.

If this query is your bottleneck, I would suggest you implement a materialized view.

After you provided more information I find that my diagnosis pretty much holds.

The query itself needs 27 ms. Not much of a problem there. "Half a second" was the kind of misunderstanding I had suspected. The slow part is the network transfer (plus ssh encoding / decoding, possibly rendering). You should only retrieve 100 rows, that would solve most of it, even if it means to execute the whole query every time.

If you go the route with a materialized view like I proposed you could add a serial number without gaps to the table plus index on it - by adding a column row_number() OVER (<your sort citeria here>) AS mv_id.

Then you can query:

SELECT *
FROM   materialized_view
WHERE  mv_id >= 2700
AND    mv_id <  2800;

This will perform very fast. LIMIT / OFFSET cannot compete, that needs to compute the whole table before it can sort and pick 100 rows.

pgAdmin timing

When you execute a query from the query tool, the message pane shows something like:

Total query runtime: 62 ms.

And the status line shows the same time. I quote pgAdmin help about that:

The status line will show how long the last query took to complete. If a dataset was returned, not only the elapsed time for server execution is displayed, but also the time to retrieve the data from the server to the Data Output page.

If you want to see the time on the server you need to use SQL EXPLAIN ANALYZE or the built in Shift + F7keyboard shortcut or Query -> Explain analyze. Then, at the bottom of the explain output you get something like this:

Total runtime: 0.269 ms

Postgresql – Simple PostgreSQL lookup table is inexplicably slow

Rather than "VARCHAR(40000)" why not use "TEXT"?
HASH index use is discouraged (see docs at http://www.postgresql.org/docs/current/static/indexes-types.html).
Have you run "ANALYZE" on your tables before running the query?
Giant IN lists can be performance killers.

How do the following queries perform?

SELECT e.entityId
FROM entities e
INNER JOIN triples t ON (t.object = e.entityId)
LIMIT 10000;

SELECT e.entityId
FROM entities e
WHERE EXISTS (SELECT 1 FROM triples t WHERE t.object = e.entityId LIMIT 10000);

Best Answer

Related Solutions

Postgresql – Postgres multiple joins slow query, how to store default child record

pgAdmin timing

Postgresql – Simple PostgreSQL lookup table is inexplicably slow

Related Question