Postgresql – Using Postgresql logical replication, how do you know that the subscriber is caught up

postgresqlreplication

Postgresql has some interesting monitoring tools for monitoring the new logical replication system's progress, but I don't really understand them. The two tools I'm aware of are:

pg_stat_replication

and it's sibling:

pg_stat_subscription

I've read the documentation for these, but they don't say how to know if a replica is actually synced, and interpreting these tables didn't seem obvious to me. Can anybody explain?

Best Answer

I used replication on postgres 10 last year and I had same problem understanding how to monitor the process, as documentation is not clear enough. Anyway I'll try to give you an example as to check if monitoring is going on.

On PUBLISHER side there are few things you can check:
- pg_catalog.pg_publication;
- pg_catalog.pg_publication_tables;
- pg_current_wal_lsn();

I'll create a publication "test_publication" with two tables: t_1 and t_2. I wont cover the needed pre-requisites (user, roles and so on).

test_logical_replication=# create publication test_publication for table t_1, t_2;  
CREATE PUBLICATION  
test_logical_replication=# select * from pg_catalog.pg_publication;  
pubname      | pubowner | puballtables | pubinsert | pubupdate | pubdelete  
-----------------+----------+--------------+-----------+-----------+-----------  
test_publication |       10 | f            | t         | t         | t  
(1 row) 

test_logical_replication=# select * from pg_publication_tables;    
    pubname      | schemaname | tablename  
    ------------------+------------+-----------  
     test_publication | public     | t_1  
     test_publication | public     | t_2  
    (2 rows)  

On SUBSCRIBER SIDE:

test_logical_replication_subscriber=# create subscription test_subscription CONNECTION 'dbname=test_logical_replication host=XXX user=repuser' PUBLICATION test_publication;  
NOTICE:  created replication slot "test_subscription" on publisher   
CREATE SUBSCRIPTION 

Interesting information are on table pg_catalog.pg_stat_subscription.
Here important columns are:
- received_lsn: Last write-ahead log location received
- last_msg_send_time: send time of last message received from PUBLISHER
- last_msg_receipt_time: Receipt time of last message received from PUBLISHER
- latest_end_lsn: Last write-ahead log location reported to PUBLISHER
- latest_end_time: Time of last write-ahead log location reported to PUBLISHER

You have to check these columns to catch what is happening.
First check if two db are in sync;

PUBLISHER SIDE:

test_logical_replication=> select pg_current_wal_lsn();  
 pg_current_wal_lsn  
--------------------  
 0/8EB83768     << this is the location on Wal file where we are now, before starting new insert  

We can check on SUBSCRIBER that at moment the two database are in sync,because
value returned by function pg_current_wal_lsn() on publisher match the value in columns received_lsn and latest_end_lsn on subscriber:

test_logical_replication_subscriber=# select received_lsn, latest_end_lsn from pg_catalog.pg_stat_subscription;  

received_lsn    | latest_end_lsn  
----------------+------------------     
 0/8EB83768     | 0/8EB83768        

I'll add 4000 rows on table t_1, and see what happens on PUBLISHER:

test_logical_replication=> insert into t_1 select id+1, txt||'--BB' from t_1;  
INSERT 0 4000  


test_logical_replication=> select pg_current_wal_lsn();  
 pg_current_wal_lsn
--------------------
 0/8EC4B9D0             <<< this value in increasing
(1 row)

test_logical_replication=> select pg_current_wal_lsn();
 pg_current_wal_lsn
--------------------
 0/8EC4DE78             <<< this value in increasing
(1 row)

test_logical_replication=> select pg_current_wal_lsn();
 pg_current_wal_lsn
--------------------
 0/8EC4DEB0             <<< this value in increasing
(1 row) 

test_logical_replication=> select pg_current_wal_lsn();
 pg_current_wal_lsn
--------------------
 0/8EC4DEB0            <<< same value, wal sending has finished
(1 row)  

let's look how pg_catalog.pg_stat_subscription values change during replication on SUBSCRIBER:

test_logical_replication_subscriber=# select received_lsn,last_msg_send_time,last_msg_receipt_time,latest_end_lsn,latest_end_time from pg_catalog.pg_stat_subscription;
 received_lsn |      last_msg_send_time       |    last_msg_receipt_time     | latest_end_lsn |        latest_end_time  
--------------+-------------------------------+------------------------------+----------------+-------------------------------
 0/8EC4B9D0   | 2018-12-17 11:39:56.014564+01 | 2018-12-17 11:39:56.07322+01 | 0/8EC4B9D0     | 2018-12-17 11:39:56.014564+01
(1 row)  

test_logical_replication_subscriber=# select received_lsn,last_msg_send_time,last_msg_receipt_time,latest_end_lsn,latest_end_time from pg_catalog.pg_stat_subscription;
 received_lsn |      last_msg_send_time       |     last_msg_receipt_time     | latest_end_lsn |        latest_end_time
--------------+-------------------------------+-------------------------------+----------------+-------------------------------
 0/8EC4BA08   | 2018-12-17 11:39:56.737101+01 | 2018-12-17 11:39:56.736303+01 | 0/8EC4BA08     | 2018-12-17 11:39:56.737101+01  
(1 row)  

test_logical_replication_subscriber=# select received_lsn,last_msg_send_time,last_msg_receipt_time,latest_end_lsn,latest_end_time from pg_catalog.pg_stat_subscription;
 received_lsn |      last_msg_send_time       |     last_msg_receipt_time     | latest_end_lsn |        latest_end_time
--------------+-------------------------------+-------------------------------+----------------+-------------------------------
 0/8EC4DE78   | 2018-12-17 11:40:04.184765+01 | 2018-12-17 11:40:04.183937+01 | 0/8EC4DE78     | 2018-12-17 11:40:04.184765+01
(1 row)  

 test_logical_replication_subscriber=# select received_lsn,last_msg_send_time,last_msg_receipt_time,latest_end_lsn,latest_end_time from pg_catalog.pg_stat_subscription;
 received_lsn |      last_msg_send_time       |   last_msg_receipt_time    | latest_end_lsn |        latest_end_time
--------------+-------------------------------+----------------------------+----------------+-------------------------------
 0/8EC4DEB0   | 2018-12-17 11:40:17.153797+01 | 2018-12-17 11:40:17.153+01 | 0/8EC4DEB0     | 2018-12-17 11:40:17.153797+01
(1 row)   

As you can see, on subscriber the 4 columes shows how WAL is arriving from publisher and how it is applied. The difference in time in columns last_msg_send_time and last_msg_receipt_time can give information about lag between publisher and subscriber.
In this case the two server are on different subnets on the same datacenter. Take in count that the two server I used are test servers and ARE NOT perfect in sync between them. (subscriber server has not an ntp server configured at all).

Hope this help. Regards
MarcoP.