On one hand, you can have a look at one of my previous answers to see how you can keep a table size more or less steady. There you will find a solution with triggers - of course, this can be solved using a cron job as well. In the latter case I would first check if the row number exceeded a certain limit and the either delete the oldest rows or drop a partition.
On the other hand, as you already noticed, one has to take care of the disk space where pg_xlog
is. When it gets full, it is not that easy to recover... But checking your database settings you can have a fair estimation how much space you need:
There will always be at least one WAL segment file, and will normally
not be more than (2 + checkpoint_completion_target) *
checkpoint_segments + 1
or checkpoint_segments + wal_keep_segments + 1
files. Each segment file is normally 16 MB (though this size can be
altered when building the server). You can use this to estimate space
requirements for WAL. Ordinarily, when old log segment files are no
longer needed, they are recycled (renamed to become the next segments
in the numbered sequence). If, due to a short-term peak of log output
rate, there are more than 3 * checkpoint_segments + 1
segment files,
the unneeded segment files will be deleted instead of recycled until
the system gets back under this limit.
If you don't have replication set up, the maximum is 3 * checkpoint_segments + 1
(times 16 MB). A typical replicationless setup will need something under 10 GB for pg_xlog
, I think.
If you have an index on LastLoggedIn
, deleting a few tens of thousands of records should be fast without needing to partition. I ran a small test on MS SQL:
CREATE TABLE Temp.Temp
(
UserID INT NOT NULL PRIMARY KEY,
LastLoggedIn DATETIME2(0) NOT NULL INDEX,
Dummy1 VARCHAR(30),
Dummy2 INT
)
I insert ~57 M dummy records, then selected a LastLoggedIn
date such that 35,000 records were at least that old. Deleting these records took less than a second. This is on a four-core machine with 128 GB of memory, under light load; your mileage may vary.
Of course, if you run this purge only every few weeks, you may have millions of records to delete rather than thousands (300 K records took seven seconds for me; 1.3 M records took 30 seconds). But in that scenario, it's part of your monthly maintenance window rather than daily upkeep, and a brief delay should be acceptable.
If you get poor performance with just an index, then sure, consider partitioning. As others have pointed out, there will be a significant cost to UPDATE
operations, a greater overhead than maintaining an index. You will have many more updates than purges, and the latter can happen during relatively idle periods, so I would worry about the updates more.
If you need always-on uptime, and the delete locks the table for an unacceptable length of time, that might justify using partitioning, but even there I'd look for alternatives. You could SELECT
a list of old users and then delete them one at a time (confirming their LastLoggedIn
dates first of course) with a cursor, for example.
I'm an MS SQL guy, so my apologies if any of this is incorrect for PostgreSQL.
Best Answer
You can create a new tablespace on a different disk and then move those tables to that new tablespace.
First create a new table space on a disk that has enough space:
By making
your_postgres_user
the owner of that tablespace, the user automatically gets all privileges on that tablespace.Then you need to find out to which tables those files belong to:
Then move the tables: