Postgresql – Approaches for deleting unnecessary records from tables

deletepostgresqltruncate

We have a database which stores the temperatures of 20 thermometer every 7 seconds. We want to delete all records in the way that every minute holds just one tempereature instead of 8 which are older than 3 months. I was planning to do it as follows as discussed in that question here:

  1. Select the (relatively few) surviving rows into a temporary table.
  2. Truncate the table.
  3. Re-insert the survivors.

But all devices are working without interruption and inserting values into the DB so I can not Truncate and rename the temp_tables since the system should be on all the time. Or can I? If not, do I have to do it by DELETE VACUUM?

By the way we delete all unnecessary records which are older than 3 months just once, afterwards we have to repeat this process every month for the 4.th month previous (I hope that part was clear). If we repeat this process at the beginning of 05.2013, we have to delete the unnecessary records of the month 01.2013. How can I automatize this process? Should I better write a .bat file for scheduled task or is there any better approaches?

Best Answer

You seem to be misinterpreting a part of my advice to your previous question:

so I can not Truncate and rename the temp_tables since the system should be on all the time.

There was no renaming involved. After TRUNCATE you run an INSERT. The only blocking operation is the TRUNCATE. I quote the manual:

TRUNCATE acquires an ACCESS EXCLUSIVE lock on each table it operates on, which blocks all other concurrent operations on the table. (...) If concurrent access to a table is required, then the DELETE command should be used instead.

INSERT may still work, if it is not time-critical and allowed to wait until the lock is released. TRUNCATE is usually very fast, if you run it in a separate transaction, it should only block for a couple of milliseconds. Note: separate transaction, not separate session! Your temporary table lives and dies with the session.

The drawback of a separate transaction: if you lose your session before you are able to re-insert from the temporary table, you loose data. To be sure, you could use a plain table instead. (Which would offer the alternative path to drop the old and rename the new table.)

Either way, your updated question makes clear that you want to run this repeatedly, accumulating old rows in the same table. In this case, TRUNCATE is not a good option anyway. You can always just use a plain DELETE. Considerably slower with big tables, but concurrent INSERT is not blocked at all. The autovacuuming daemon will have to do some more work, too.

It's mostly a trade-off between speed and security.

SQL

The DELETE command for the slow and sure method could look like this:

DELETE FROM TABLE tbl t
USING (
   SELECT created_at FROM tbl WHERE created_at < now() - (interval '3 month')
   ) d
LEFT   JOIN (
   SELECT min(created_at) AS created_at
   FROM   tbl
   GROUP  BY date_trunc('minute', created_at)
   WHERE  created_at < now() - (interval '3 month')
   ) x USING (created_at)
WHERE  x.created_at IS NULL
AND    d.created_at = t.created_at;