Postgresql – Check integrity between servers

checksumdata integritypostgresqlreferential-integrity

I have three Database servers, running PostgreSQL 9.2:

master01 (Own data center USA)

slave01 (Own data center USA)

slave02 (AWS Australia)

Both slaves use streaming replication + wal_files (delivered by master01 via ssh).

The postgres process on the slave02 server has been recently stopped, due a maintenance on the slave02's postgres disk volume(/var/lib/pgsql).

Postgres is running on: /var/lib/pgsql

Wal files are shipped to this volume: /var/lib/walfiles

Once the volume was restored, I started postgres and got this log:

cp: cannot stat '/var/lib/walfiles/0000000500008AA800000069': No such file or directory
cp: cannot stat '/var/lib/walfiles/0000000500008AA80000006A': No such file or directory
consistent recovery state reached at 8AA8/8AD22F70
record with zero length at 8AA8/8AD22F70
database system is ready to accept read only connections
streaming replication successfully connected to primary

I have a very large table, that is constantly updated. I ran this query on slave01 and slave02 server:

select max(id) from ids_table;

SLAVE01:

   max    
----------
 45148099
(1 row)

SLAVE02:

   max    
----------
 45148095
(1 row)

Can you see the results are different? And that shouldn't be different!

Question:

How can I check the data between servers? How can I see if there is something wrong with slave02 due to those No such file or directory errors?

UPDATE 1:

As far as I could see, checksum is only available on Postgres 9.3+ ? So can't use that!

UPDATE 2: Anwsering Sahap Asci's questions

On the master:

synchronous_commit = on
#synchronous_standby_names = ''

Best Answer

First of all your replication is not synchronized. That means there can be always slight differences between master and slave. if you change configuration like below your master won't return commit until the replication processes.

synchronous_standby_names = '*'

In your case, synchronized replication can be problematic because your slave is quite distant from the master so that the commits get slower.

Still with synchronized commit running a query 2 times in different transactions can get different results because of data changes.

I recommend you to read the blog; Evolution of Fault Tolerance in PostgreSQL: Synchronous Commit.