Postgresql – Delete data from a billion row table based on where clause

deletepostgresqlpostgresql-9.2

I need to delete about 400 million rows from a 1.3 billion row table based on an indexed date field.

The table is about 800 GB in size.

What is the most efficient way to delete the data without causing adverse effects? The table is under heavy use, meaning lots of inserts and updates (which don't affect the rows to be deleted).

There is a maintenance window, which would be fastest, but I may not be able to get a big enough maintenance window. On the flip side, I can take my time with the delete, so there is no rush.

Best Answer

Figured a better way to do this

  1. Insert into new_tbl select * from old_tbl where start_date < now -INTERVAL '2 years'
  2. During downtime window:

    insert into new_tbl select * from old_tbl where id not in (select id from new_tbl ) and id not in ( select id from old_tbl)
    
  3. Rename old_tbl to old_tbl_drop and new_tbl to old_tbl.

  4. Drop old_tbl_drop.