Postgresql has some interesting monitoring tools for monitoring the new logical replication system's progress, but I don't really understand them. The two tools I'm aware of are:
pg_stat_replication
and it's sibling:
pg_stat_subscription
I've read the documentation for these, but they don't say how to know if a replica is actually synced, and interpreting these tables didn't seem obvious to me. Can anybody explain?
Best Answer
I used replication on postgres 10 last year and I had same problem understanding how to monitor the process, as documentation is not clear enough. Anyway I'll try to give you an example as to check if monitoring is going on.
On PUBLISHER side there are few things you can check:
- pg_catalog.pg_publication;
- pg_catalog.pg_publication_tables;
- pg_current_wal_lsn();
I'll create a publication "test_publication" with two tables: t_1 and t_2. I wont cover the needed pre-requisites (user, roles and so on).
On SUBSCRIBER SIDE:
Interesting information are on table pg_catalog.pg_stat_subscription.
Here important columns are:
- received_lsn: Last write-ahead log location received
- last_msg_send_time: send time of last message received from PUBLISHER
- last_msg_receipt_time: Receipt time of last message received from PUBLISHER
- latest_end_lsn: Last write-ahead log location reported to PUBLISHER
- latest_end_time: Time of last write-ahead log location reported to PUBLISHER
You have to check these columns to catch what is happening.
First check if two db are in sync;
PUBLISHER SIDE:
We can check on SUBSCRIBER that at moment the two database are in sync,because
value returned by function pg_current_wal_lsn() on publisher match the value in columns received_lsn and latest_end_lsn on subscriber:
I'll add 4000 rows on table t_1, and see what happens on PUBLISHER:
let's look how pg_catalog.pg_stat_subscription values change during replication on SUBSCRIBER:
As you can see, on subscriber the 4 columes shows how WAL is arriving from publisher and how it is applied. The difference in time in columns last_msg_send_time and last_msg_receipt_time can give information about lag between publisher and subscriber.
In this case the two server are on different subnets on the same datacenter. Take in count that the two server I used are test servers and ARE NOT perfect in sync between them. (subscriber server has not an ntp server configured at all).
Hope this help. Regards
MarcoP.