Delete Aged PostgreSQL Rows with Tricky Condition

postgresqlpostgresql-9.3

I have two postgresql tables I need to clean up as part of maintenance. The tables, as described in my Django front-end, are:

class Group(models.Model):
    owner = models.ForeignKey(User)
    unique = models.CharField(max_length=36, unique=True)
    created_at = models.DateTimeField(auto_now_add=True)

class Reply(models.Model):
    text = models.TextField(validators=[MaxLengthValidator(500)])
    which_group = models.ForeignKey(Group)
    writer = models.ForeignKey(User)
    submitted_on = models.DateTimeField(db_index=True, auto_now_add=True)

I essentially want to delete all groups where the most recent reply was submitted more than 14 days ago. What will be the correct sql queries to accomplish that? I'm thinking of:

DELETE FROM links_group WHERE id IN (SELECT which_group_id FROM links_reply WHERE "submitted_on" < now() - interval '14 days');

But this is missing the logic of most recent reply. How do go about with this?


Relevant portions of \d+ links_group; are:

                                                          Table "public.links_group"
     Column     |           Type           |                        Modifiers                         | Storage  | Stats target | Description 
----------------+--------------------------+----------------------------------------------------------+----------+--------------+-------------
 id             | integer                  | not null default nextval('links_group_id_seq'::regclass) | plain    |              |  
 owner_id       | integer                  | not null                                                 | plain    |              |  
 unique         | character varying(36)    | not null                                                 | extended |              | 
 created_at     | timestamp with time zone | not null                                                 | plain    |              |
Indexes:
    "links_group_pkey" PRIMARY KEY, btree (id)
    "links_group_unique_key" UNIQUE CONSTRAINT, btree ("unique")
    "links_group_owner_id" btree (owner_id)
    "links_group_unique_like" btree ("unique" varchar_pattern_ops)
Foreign-key constraints:
    "links_group_owner_id_fkey" FOREIGN KEY (owner_id) REFERENCES auth_user(id) DEFERRABLE INITIALLY DEFERRED
Referenced by:
    TABLE "links_reply" CONSTRAINT "links_reply_which_group_id_fkey" FOREIGN KEY (which_group_id) REFERENCES links_group(id) DEFERRABLE INITIALLY DEFERRED
Has OIDs: no

Relevant portions of \d+ links_reply; are:

                                                          Table "public.links_reply"
     Column     |           Type           |                        Modifiers                         | Storage  | Stats target | Description 
----------------+--------------------------+----------------------------------------------------------+----------+--------------+-------------
 id             | integer                  | not null default nextval('links_reply_id_seq'::regclass) | plain    |              | 
 text           | text                     | not null                                                 | extended |              | 
 which_group_id | integer                  | not null                                                 | plain    |              | 
 writer_id      | integer                  | not null                                                 | plain    |              | 
 submitted_on   | timestamp with time zone | not null                                                 | plain    |              |
Indexes:
    "links_reply_pkey" PRIMARY KEY, btree (id)
    "links_reply_submitted_on" btree (submitted_on)
    "links_reply_which_group_id" btree (which_group_id)
    "links_reply_writer_id" btree (writer_id)
Foreign-key constraints:
    "links_reply_which_group_id_fkey" FOREIGN KEY (which_group_id) REFERENCES links_group(id) DEFERRABLE INITIALLY DEFERRED
    "links_reply_writer_id_fkey" FOREIGN KEY (writer_id) REFERENCES auth_user(id) DEFERRABLE INITIALLY DEFERRED
Has OIDs: no

Best Answer

You need something like this (the table and column names may differ, check the actual SQL CREATE TABLE statements):

DELETE FROM links_group AS g 
WHERE g.id IN 
      ( SELECT r.which_group_id
        FROM links_reply AS r
          -- WHERE r.which_group IS NOT NULL    
          -- not needed, the column is not nullable
        GROUP BY r.which_group_id
        HAVING MAX(r.submitted_on) < NOW() - INTERVAL '14 days'
      ) ;

Since there is a foreign key from links_reply that references links_group and does not have an ON DELETE CASCADE attribute set, the above will of course fail. To delete from both tables, you either need to run two statements or combine them in one with a modifying CTE:

WITH 
  groups_to_delete AS
    ( SELECT r.which_group_id
      FROM links_reply AS r
      GROUP BY r.which_group_id
      HAVING MAX(r.submitted_on) < NOW() - INTERVAL '14 days'
    ),
  delete_replies AS
    ( DELETE FROM links_reply AS r 
      WHERE r.which_group_id IN (TABLE groups_to_delete)
      RETURNING which_group_id
    )
DELETE FROM links_group AS g 
WHERE g.id IN (TABLE delete_replies) ;