Postgresql – Optimizing postgresql maintenance statements on psql

postgresqlpostgresql-9.3

In a reddit-like app (called Links) I maintain, users post hyperlinks and others can then publicly reply under the said links. The former is saved in a postgresql 9.3.10 table called links_link, the latter in a table called links_publicreply.

My task is to:

  1. Delete all link objects created in a 30 day time window.

  2. Delete all publicreply objects associated with objects in (1), and created within a 45 day time window.

Here's how I just did that for one time-window from 2016:

begin;
  delete from links_publicreply where submitted_on >= timestamp'2016-07-23 01:01:00' and submitted_on < timestamp'2016-09-07 01:00:00' and answer_to_id in (select id from links_link where submitted_on >=timestamp'2016-07-23 01:00:00' and submitted_on < timestamp'2016-08-23 01:00:00');
  delete from links_link where submitted_on >= timestamp'2016-07-23 01:00:00' and submitted_on < timestamp'2016-08-23 01:00:00';
commit;

Notice I'm querying links_link exactly the same way twice. I.e. once each in both delete statements. Now in Django ORM (of which I'm a native), I would have optimized this via first getting all the required links_link object ids separately, and then using them in all proceeding statements. How do I optimize that in the psql command line?


The most pertinent advice I've seen is in this SO answer, but it seems the with clause has to be inside the SQL statement it's being called in.

In other words, the following wouldn't work would it?

BEGIN;
  DELETE FROM links_publicreply
    WHERE submitted_on >= timestamp '2016-07-23 01:01:00'
        AND submitted_on < timestamp '2016-09-07 01:00:00'
        AND answer_to_id in (
            SELECT id
            FROM links_link
            WHERE submitted_on >= timestamp '2016-07-23 01:00:00'
                AND submitted_on < timestamp '2016-08-23 01:00:00'
    );
    DELETE FROM links_link
    WHERE submitted_on >= timestamp'2016-07-23 01:00:00'
        AND submitted_on < timestamp'2016-08-23 01:00:00';
COMMIT;

Best Answer

The most pertinent advice I've seen is in this SO answer, but it seems the with clause has to be inside the SQL statement it's being called in.

Yes, you can use a modifying WITH, i.e. a CTE that deletes from one table and returns the id values to the rest of the query, so they can be used to delete from the second table:

WITH
  links_link_deleted AS
    ( DELETE FROM links_link
        WHERE submitted_on >= timestamp'2016-07-23 01:00:00'
          AND submitted_on < timestamp'2016-08-23 01:00:00'
      RETURNING id
    ) 
DELETE FROM links_publicreply
  WHERE submitted_on >= timestamp '2016-07-23 01:01:00'
    AND submitted_on < timestamp '2016-09-07 01:00:00'
    AND answer_to_id IN
        ( SELECT id
          FROM links_link_deleted
        )
 ;

In other words, the following wouldn't work would it?

That would work but it would be two statements.