Postgresql – How to have PostgreSQL running forever with constant insert and delete per day

disk-spacepostgresqlsqlitevacuum

Question:

Lots of rows are getting inserted every second. Batch deletion is performed when the disk is close to full. This causes two problems:

  1. It is impossible to examine actually how much space in how many pages are reusable. All I see is the disk is almost full, because the database does not return disk space to OS after deletion.

  2. Fragmentation happens. The disk usage slowly grows, and eventually becomes full, crashing the machine.

Solutions I have considered:

  1. Monitor disk usage, and perform a VACCUM FULL when the disk is 48% full. But reserving 52% (yes, the disk is mostly occupied by database) of the disk just to delete records sound very wasteful.

  2. Partition my tables by hour of record creation time. Only delete old partitions and hope that the deleted paritions free up complete pages and so autovacuum will return disk space of those completely empty pages.

  3. Use a on-file database like SQLite. Separate data into files by hour. Only load the most updated 24 hours unless otherwise specified. When I need to delete, just delete the files. I haven't tried this, but it sounds like lots of overhead to attach all the 24 databases every hour.

I really appreciate any suggestion. Below are the additional details. Many many thanks!


Context:

I am currently using PostgreSQL to host a streaming of live social media posts. There is a streamer that perform batch inserts, and a middleware that perform purely-read queries for client requests. The whole application is delivered using Docker image.

As the streamer inserts records, some day the disk will be full. So we need to delete records. But deleting records doesn't reclaim disk space and can cause fragmentation. So ultimately a VACCUM FULL is needed to compact the tables again.

Requirement:

  1. The users who deploy this app is non-technical. They don't even know what is a shell/terminal.

  2. The users will perform no maintenance for the database. So VACUUM FULL is an option only when the disk have enough space for the new tables.

Best Answer

based on the requirements.

Option 2 looks like the best solution, creating new partitions at X internal and dropping the older data

Partitions in Postgresql creates independent tables when a partition is dropped the file is deleted reclaiming disk space

You can do this with a PgAgent task, to pre-create the partitions for the next 24 hours, then a task that runs every hour dropping the older partitions.

You want to use Range type partitions then use Alter Table Detach Partition or Drop Table

this code will get you close the partition rule will have to tweaked

create table  table_of_records(
insert_date  timestamp default now(),
insert_hour integer default date_part('hour', now())
other_columns) PARTITION BY RANGE (insert_hour);

 -- pre create the next days tables
 create or replace function create_partitions();
returns boolean
language 'plpgsql'
as $code$
declare 
 _counter integer = 0;
 begin

loop
   if _counter > 24 then
      exit;
   end if ;

     execute 'CREATE TABLE table_of_records_' || _counter::text 
         || ' PARTITION OF table_of_records
          FOR VALUES FROM (_counter) TO (counter+1);';

   _counter = _counter +1;
end loop;
return true;
end;
$code$

create or replace function drop_partition(phour integer);
 returns boolean
 language 'plpgsql'
 as $code$

 begin
 execute 'drop table table_of_records_' || phour::text;
 return true;

end;
 $code$

https://www.postgresql.org/docs/10/ddl-partitioning.html