PostgreSQL – Checking if Two Tables Have Identical Content

duplicationpostgresql

This has already been asked on Stack Overflow, but only for MySQL. I'm using PostgreSQL. Unfortunately (and surprisingly) PostgreSQL does not seem to have something like CHECKSUM table.

A PostgreSQL solution would be fine, but a generic one would be better. I found http://www.besttechtools.com/articles/article/sql-query-to-check-two-tables-have-identical-data, but I don't understand the logic used.

Background: I re-wrote some database generating code, so I need to check whether the old and new code produce identical results.

Best Answer

One option is to use a FULL OUTER JOIN between the two tables in the following form:

SELECT count (1)
    FROM table_a a
    FULL OUTER JOIN table_b b 
        USING (<list of columns to compare>)
    WHERE a.id IS NULL
        OR b.id IS NULL ;

For example:

CREATE TABLE a (id int, val text);
INSERT INTO a VALUES (1, 'foo'), (2, 'bar');

CREATE TABLE b (id int, val text);
INSERT INTO b VALUES (1, 'foo'), (3, 'bar');

SELECT count (1)
    FROM a
    FULL OUTER JOIN b 
        USING (id, val)
    WHERE a.id IS NULL
        OR b.id IS NULL ;

Will return a count of 2, whereas:

CREATE TABLE a (id int, val text);
INSERT INTO a VALUES (1, 'foo'), (2, 'bar');

CREATE TABLE b (id int, val text);
INSERT INTO b VALUES (1, 'foo'), (2, 'bar');

SELECT count (1)
    FROM a
    FULL OUTER JOIN b 
        USING (id, val)
    WHERE a.id IS NULL
        OR b.id IS NULL ;

returns the hoped for count of 0.

The thing I like about this method is that it only needs to read each table once vs. reading each table twice when using EXISTS. Additionally, this should work for any database that supports full outer joins (not just Postgresql).

I generally discourage use of the USING clause but here is one situation where I believe it to be the better approach.

Addendum 2019-05-03:

If there is an issue with possible null data, (i.e. the id column is not nullable but the val is) then you could try the following:

SELECT count (1)
    FROM a
    FULL OUTER JOIN b
        ON ( a.id = b.id
            AND a.val IS NOT DISTINCT FROM b.val )
    WHERE a.id IS NULL
        OR b.id IS NULL ;