PostgreSQL – Using COPY Command to stdout and Reading from stdin

copypostgresqlpsql

When transferring data between two databases, I would like to know if there is any disadvantages in the following approach:

psql -h [HOST1] -U postgres -d [DB1] -c [SQL1] | psql -h [HOST2]  -d [DB2] -U postgres -c [SQL2]

where

[SQL1]="\copy (SELECT [FIELDS_SUBSET] FROM [TABLE_NAME]) TO STDOUT"

[SQL2]="\copy [NEW_TABLE_NAME]( [FIELDS_SUBSET] ) FROM STDIN"

Basically, I extract some data from DB1 to stdout and then I immediately read it from stdin in order to import this data into DB2. Both databases are in different networks. This approach removes the need of an intermediate file.

In this context, I would like to know if this strategy has any drawbacks if compared to using intermediate files. For instance, is this approach suitable for transfering large volumes of data (GBs of data)?

Also, by using COPY with STDIN and STDOUT, do I still have to worry about the COPY command caveats? (Reference)

Best Answer

The drawback is that if the operation fails part way through due to network hiccup, a reboot of HOST1, or something like that, you've done a lot of work on DB2 that needs to be rolled back and repeated. This can blow up your WAL archive, and can lead to table bloat if you don't vacuum the table in DB2 before repeating the operation. I'd spool it to a local-storage file on HOST2 first, unless I had a good reason not to.

Most of the caveats you link to are about PostgreSQL interpreting the data format in a different way than Excel, Python, Perl, human beings, etc. might interpret it. Going PostgreSQL to PostgreSQL (either with a pipe or using a file intermediate) eliminates most of those caveats, assuming you choose the same options to \copy on each end and have the same table structure. The caveats about performance and foreign keys, however, remain.