PostgreSQL Database Comparison – How to Check if Two PostgreSQL Databases on Different Servers are Identical

postgresqlpostgresql-11postgresql-9.6

I copied an existing posttgresql database from one server to another. The source of my data is a postgresql 9.6 whilst the newer server is in postgresql 11.

So I want to know how I can check that both databases have the same schema and data in a fast and easy way?

Best Answer

You cannot do that in a fast way.

The best you can do is to dump both databases using pg_dump from v11 and compare the files with diff.

If you want to do that just to ensure that the migration was successful, don't. If you don't trust PostgreSQL's upgrade process, why should you trust pg_dump?

Related Solutions

PostgreSQL – How to manage WAL archiving for different databases on a server

You can't split WAL archiving out so one database goes in one WAL archive stream and other DBs go in another WAL archive stream. A single write-ahead log is shared by all databases in a PostgreSQL server ("cluster" in PostgreSQL parlance). All databases managed by a postmaster have the same WAL. This is a sometimes unfortunate limitation of the current replication/PITR/archiving design and is true for the in-development-at-time-of-writing 9.3 and all and prior releases.

If you want to separate the WAL archiving, you have to separate the databases into different clusters. You can do this by running more than one instance of PostgreSQL on a server; they simply have to have different data directories and different listening ports. Some packagings of PostgreSQL provide built-in tools to do this, like the pg_wrapper tool used on Debian and Ubuntu, which has pg_createcluster. Alternately, you can always just use initdb to create a cluster and pg_ctl to start the cluster after editing the new cluster's postgresql.conf to set an appropriate port.

Tools like pgbouncer may be used, if desired, to multiplex the multiple clusters transparently, so pgbouncer listens on port 5432 (the default Pg port) and decides which server to send the connection to based on which database it asks for. Generally it's easier to just specify a port in the connection setup of the applications that use each database, though.

On PostgreSQL 9.2 and below you must allocate each server an exclusive, non-overlapping shared memory region for its shared_buffers. One server cannot make use of the other server's shared_buffers space when the other server is not busy. That means you are wasting some system resources. PostgreSQL 9.3 improves this by allocating shared_buffers out of anonymous memory, giving the OS more freedom about how it manages it.

If you don't want to run multiple clusters, you can use alternative replication or backup solutions based on out-of-tree tools like Slony-I, Bucardo, etc to manage your data replication and archival.

PostgreSQL – How to Select Data in the Same Table on Different Rows That Are Empty

You explain:

I get result 1453 only when all items in column attributeY are empty for 1453.

But that's incorrect. Bold emphasis mine. The aggregate function count returns (per documentation):

number of input rows for which the value of expression is not null

The same is true for SQLite (per documentation):

The count(X) function returns a count of the number of times that X is not NULL in a group.

You obviously have one or more rows with a non-null value in attributeY in your Postgres table - probably an empty string ''.

Test with:

SELECT *
FROM   myschema.table
WHERE  attributeX = 1453
AND    attributeY IS NOT NULL;

Be sure to understand the difference between "empty" ('') and NULL:

Empty strings are character types (typically text, varchar or char) with 0 characters ('') - so basically "nothing", the equivalent of 0 for a numeric data type. NULL is possible for any data type and means "unknown". Some clients have a hard time making the difference clear in their display.

Alternative query

To find attributeX where all attributeY are either empty or NULL use this alternative query:

SELECT attributeX
FROM   myschema.table
GROUP  BY attributeX
HAVING NOT bool_or(attributeY <> '');

The expression attributeY <> '' is only true for non-null, non-empty attributeY. The aggregate function bool_or returns (per documentation):

true if at least one input value is true, otherwise false

Those are the results we exclude (NOT ...) and return the rest. Voilá.

There are many other (less elegant) ways to achieve the same. Like:

... HAVING count(CASE WHEN attributeY <> '' THEN 1 END) = 0;
... HAVING count(attributeY <> '' OR NULL) = 0;
... HAVING count(NULLIF(attributeY, '') = 0;

Best Answer

Related Solutions

PostgreSQL – How to manage WAL archiving for different databases on a server

PostgreSQL – How to Select Data in the Same Table on Different Rows That Are Empty

Alternative query

Related Question