PostgreSQL – Export Data in Insert on Conflict Format

dumpduplicationexportpostgresql

I’m using Postgres 9.5 on Mac Sierra. I want to export some records from my local db and import them into a db on a Linux machine (also running PostGres 9.5). I’m using this command to export data from my local machine …

localhost:myproject davea$ pg_dump -U myproject mydb -a -t table1 -t table2 -t table3 > /tmp/pgdata.sql

The data is exported in a series of copy commands. Is there a way to export the table data so that the file has a bunch of “INSERT … ON CONFLICT DO NOTHING;” statements? There are some duplicates in the origin database and the destination db but I don’t want that to derail the import of the non-duplicate data.

Best Answer

pg_dump won't allow you to do exactly what you ask for, but you have an option that might be good enough. According to the documentation of pg_dump, you have the --inserts option:

--inserts

Dump data as INSERT commands (rather than COPY). This will make restoration very slow; it is mainly useful for making dumps that can be loaded into non-PostgreSQL databases. However, since this option generates a separate command for each row, an error in reloading a row causes only that row to be lost rather than the entire table contents. Note that the restore might fail altogether if you have rearranged column order. The --column-inserts option is safe against column order changes, though even slower.

For instance, let's assume you use myhost and mydb.

We create and populate one table (in one schema):

CREATE SCHEMA s1 ;

CREATE TABLE s1.t1
(
   id serial PRIMARY KEY, 
   column_1 text, 
   column_2 text
) ;

INSERT INTO 
    s1.t1 (column_1, column_2)
VALUES 
    ('Some value', 'and another one'),
    ('Again some value', 'and some more') ;

At this point, we back it up:

pg_dump --host myhost --format custom --section data --inserts --verbose --file "t1.backup" --table "s1.t1" "mydb"

After the backup, we delete one of the rows at the table, but we still leave one:

DELETE FROM  
    s1.t1 
WHERE
    id = 1 ;

At this point, we do restore the backup (this is what you would normally do on your second database), and get the following messages:

pg_restore --host myhost --dbname "mydb" --section data --data-only --table t1 --schema s1 --verbose "t1.backup"

pg_restore: connecting to database for restore
pg_restore: processing data for table "s1.t1"
pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 2759; 0 21286 TABLE DATA t1 postgres
pg_restore: [archiver (db)] could not execute query: ERROR:  duplicate key value violates unique constraint "t1_pkey"
DETAIL:  Key (id)=(2) already exists.
    Command was: INSERT INTO t1 VALUES (2, 'Again some value', 'and some more');
pg_restore: setting owner and privileges for TABLE DATA "s1.t1"
WARNING: errors ignored on restore: 1

Process returned exit code 1.

The restore process generated one error (the row that was already on the table), but did insert the rest of the data.

Although this is not exactly what you were asking for, for all practical purposes you achieve the result you're looking for.

Related Question