Postgresql – How to verify postgresql db restore from pg_dump file

postgresql

How can I verify that my restored database is identical to my original database?

Here's what I did. Is this a valid way to verify a restore?

# dump mydb_orig:
$ pg_dump  --cluster 8.4/my.8.4.cluster mydb_orig > mydb_orig.dump

# create mydb_restored:
$ createdb --cluster 9.1/my.9.1.cluster mydb_restored

# load mydb_orig into mydb_restored:
$ psql -q  --cluster 9.1/my.9.1.cluster -d mydb_restored -f mydb_orig.dump

# dump mydb_restored:
$ pg_dump  --cluster 9.1/my.9.1.cluster > mydb_restored.dump

# verify mydb_orig and mydb_restored are identical:
$ diff mydb_orig.dump mydb_restored.dump

# The files have the same number of bytes, but some of the rows are in a different order, so sort them:

$ sort mydb_orig.dump > sorted.orig.dump
$ sort mydb_restored.dump > sorted.restored.dump
$ diff sorted.orig.dump sorted.restored.dump
# Now they are identical

Best Answer

What is the exact problem you are trying to solve? pg_dump and pg_restore are official supported PostgreSQL tools, so you should just assume they are correctly implemented. You have a few options on checking databases are identical though, depending on how certain you need to be they are identical.

Your most basic check would be to check that both databases have the same tables. You can find a list of tables via the information schema. For a loose check that the data is identical, you could look at the amount rows in these tables, and the upper and lower bound of primary keys (assuming you have primary keys where ordering makes sense).

If you really need to guarantee the data is identical, then you will need to produce some sort of hash and compare that, and that's essentially what you are doing with sorted dumps. It's worth to remember though that the database is more than just tables, you need to check sequences too.