PostgreSQL 9.3.10 – How to Improve Slow Performance of Maintenance/Deletion Query

postgresqlpostgresql-9.3

I have a reddit-type app called 'Links' where 1) users post interesting hyperlinks, and 2) other users publicly respond to such postings. The former is defined as links_link table in my postgresql 9.3.10 DB, the latter is links_publicreply.

I've noticed that deletion is quite fast across most of the tables in the DB of this web app.

However, links_publicreply is problematic. E.g., I just deleted 238 rows from that table, and waited for ~20 mins for it to complete. This isn't an exception; it's been more like the rule since over a year now.

Now confessedly, it's a big table (~75M rows). The hardware it's operating on has 16 cores and 120 GB memory. I've been monitoring the server's performance – there's no bottleneck there, far from it.

Look at the the explain analyze results here: https://explain.depesz.com/s/ATwE It seems time is eaten up by a sequential scan.

Moreover here's the output of\d links_publicreply:

                                      Table "public.links_publicreply"
     Column      |           Type           |                           Modifiers                            
-----------------+--------------------------+----------------------------------------------------------------
 id              | integer                  | not null default nextval('links_publicreply_id_seq'::regclass)
 submitted_by_id | integer                  | not null
 answer_to_id    | integer                  | not null
 submitted_on    | timestamp with time zone | not null
 description     | text                     | not null
 category        | character varying(20)    | not null
 seen            | boolean                  | not null
 abuse           | boolean                  | not null
 device          | character varying(10)    | default '1'::character varying
Indexes:
    "links_publicreply_pkey" PRIMARY KEY, btree (id)
    "links_publicreply_answer_to_id" btree (answer_to_id)
    "links_publicreply_submitted_by_id" btree (submitted_by_id)
Foreign-key constraints:
    "links_publicreply_answer_to_id_fkey" FOREIGN KEY (answer_to_id) REFERENCES links_link(id) DEFERRABLE INITIALLY DEFERRED
    "links_publicreply_submitted_by_id_fkey" FOREIGN KEY (submitted_by_id) REFERENCES auth_user(id) DEFERRABLE INITIALLY DEFERRED
Referenced by:
    TABLE "links_report" CONSTRAINT "links_report_which_publicreply_id_fkey" FOREIGN KEY (which_publicreply_id) REFERENCES links_publicreply(id) DEFERRABLE INITIALLY DEFERRED
    TABLE "links_seen" CONSTRAINT "links_seen_which_reply_id_fkey" FOREIGN KEY (which_reply_id) REFERENCES links_publicreply(id) DEFERRABLE INITIALLY DEFERRED
    TABLE "links_link" CONSTRAINT "publicreplyposter_link_fkey" FOREIGN KEY (latest_reply_id) REFERENCES links_publicreply(id) ON UPDATE CASCADE ON DELETE CASCADE

Is there any way I could have performed a query like this differently? I need to perform maintenance on this table – but the times are exorbitant. Please advise on all options (if any) I have here.


In case it matters, also adding the output for \d links_link here:

                                         Table "public.links_link"
        Column        |           Type           |                        Modifiers                        
----------------------+--------------------------+---------------------------------------------------------
 id                   | integer                  | not null default nextval('links_link_id_seq'::regclass)
 description          | text                     | not null
 submitter_id         | integer                  | not null
 submitted_on         | timestamp with time zone | not null
 rank_score           | double precision         | not null
 url                  | character varying(250)   | not null
 cagtegory            | character varying(25)    | not null
 image_file           | character varying(100)   | 
 reply_count          | integer                  | default 0
 device               | character varying(10)    | default '1'::character varying
 latest_reply_id      | integer                  | 
 which_photostream_id | integer                  | 
 is_visible           | boolean                  | default true
 net_votes            | integer                  | default 0
Indexes:
    "links_link_pkey" PRIMARY KEY, btree (id)
    "links_link_submitter_id" btree (submitter_id)
Foreign-key constraints:
    "link_whichphotostreamid_fkey" FOREIGN KEY (which_photostream_id) REFERENCES links_photostream(id) ON UPDATE CASCADE ON DELETE CASCADE
    "links_link_submitter_id_fkey" FOREIGN KEY (submitter_id) REFERENCES auth_user(id) DEFERRABLE INITIALLY DEFERRED
    "publicreplyposter_link_fkey" FOREIGN KEY (latest_reply_id) REFERENCES links_publicreply(id) ON UPDATE CASCADE ON DELETE CASCADE
Referenced by:
    TABLE "links_publicreply" CONSTRAINT "links_publicreply_answer_to_id_fkey" FOREIGN KEY (answer_to_id) REFERENCES links_link(id) DEFERRABLE INITIALLY DEFERRED
    TABLE "links_report" CONSTRAINT "links_report_which_link_id_fkey" FOREIGN KEY (which_link_id) REFERENCES links_link(id) DEFERRABLE INITIALLY DEFERRED
    TABLE "links_vote" CONSTRAINT "links_vote_link_id_fkey" FOREIGN KEY (link_id) REFERENCES links_link(id) DEFERRABLE INITIALLY DEFERRED
    TABLE "links_photoobjectsubscription" CONSTRAINT "which_link_id_photoobjectsubscription" FOREIGN KEY (which_link_id) REFERENCES links_link(id) ON DELETE CASCADE

Best Answer

The sequential scan is taking only 23 seconds. The key is to look at the full text of the plan, which shows the missing time:

Trigger for constraint publicreplyposter_link_fkey: time=766392.766 calls=238

(explain.depesz.com doesn't do such a great job with marking up the plans from non-select queries)

So, it looks like you are missing a create index on links_link (latest_reply_id).

That means every row you delete from links_publicreply has to do a full table scan on links_link to make sure it is not referencing that row.