PostgreSQL WAL – How to Apportion Usage to Database, Table, or Query

postgresqlpostgresql-performance

I have a PostgreSQL 11 database which hosts several databases, some of which are contain tables that are frequently inserted and updated. Recently the amount of data being written to WAL files has increased greatly (about 400%), though I believe that the number of inserts to the database has increased by only about 20%.

As a consequence not only are we using more disk, but our database performance now seems constrained by the writer performance of the disks the WALs are on.

The application is updated very frequently, and it may be that something in the application (which we control, and are able to change) has changed, which is causing less efficient inserts/updates to be made, but I don't know how we would identify that. There have probably been a many changes since this change In behaviour started, and several since it was first noticed.

Is there a way of determining which databases/tables/queries are writing to the WAL, and in what (approximate) quantity?

Best Answer

If you are using PostgreSQL v13, you can install pg_stat_statements, which logs the amount of WAL per statement in the wal_bytes column. So you could run

CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

SELECT wal_bytes, calls, query
FROM pg_stat_statements
ORDER BY wal_bytes DESC
LIMIT 10;

The WAL itself has no connection to a certain SQL statement, but if you examine the entries with pg_waldump, you can connect them to certain files, which correspond to database objects. That route is cumbersome.

Upgrade to v13.