I'm looking at altering how backups are done and am wondering if there is a way to determine which databases in a postgreql cluster have not been recently changed?
Instead of using pg_dumpall, I'd like to use pg_dump and only dump those databases that have changed since the last backup (some databases don't get updated very often)– the idea being that if nothing has changed then the current backup should still be good.
Does anyone know of a way to determine when a specific database was last updated/changed?
Thanks…
Update:
I was hoping to not have to write triggers all over tha place as I have no control over
the creation of databases in one particular cluster (let alone creation of db objects within a database).
Digging further, it looks like there is a correlation between the contents of the $PGDATA/global/pg_database file (specifically the second field) and the directory names under $PGDATA/base.
Going out on a limb, I'd guess that the second field of the pg_database file is the database oid and that each database has its own subdirectory under $PGDATA/base (with the oid for the subdirectory name). Is that correct? If so, is it reasonable to use the file timestamps from the files under $PGDATA/base/* as the trigger for needing a backup?
…or is there a better way?
Thanks again…
Best Answer
While using
select datname, xact_commit from pg_stat_database;
as suggested by @Jack Douglas doesn't quite work (apparently due to autovacuum),select datname, tup_inserted, tup_updated, tup_deleted from pg_stat_database
does appear to work. Both DML and DDL changes will change the values of tup_* columns while avacuum
does not (vacuum analyze
on the other hand...).In the off chance that this may be useful for others, I'm including the backup script that I've put in place. This works for Pg 8.4.x but not for 8.2.x-- YMMV depending on the version of Pg used.
Update: the script has been put on github here.