PostgreSQL – Using COPY Command to stdout and Reading from stdin

copypostgresqlpsql

When transferring data between two databases, I would like to know if there is any disadvantages in the following approach:

psql -h [HOST1] -U postgres -d [DB1] -c [SQL1] | psql -h [HOST2]  -d [DB2] -U postgres -c [SQL2]

where

[SQL1]="\copy (SELECT [FIELDS_SUBSET] FROM [TABLE_NAME]) TO STDOUT"

[SQL2]="\copy [NEW_TABLE_NAME]( [FIELDS_SUBSET] ) FROM STDIN"

Basically, I extract some data from DB1 to stdout and then I immediately read it from stdin in order to import this data into DB2. Both databases are in different networks. This approach removes the need of an intermediate file.

In this context, I would like to know if this strategy has any drawbacks if compared to using intermediate files. For instance, is this approach suitable for transfering large volumes of data (GBs of data)?

Also, by using COPY with STDIN and STDOUT, do I still have to worry about the COPY command caveats? (Reference)

Best Answer

The drawback is that if the operation fails part way through due to network hiccup, a reboot of HOST1, or something like that, you've done a lot of work on DB2 that needs to be rolled back and repeated. This can blow up your WAL archive, and can lead to table bloat if you don't vacuum the table in DB2 before repeating the operation. I'd spool it to a local-storage file on HOST2 first, unless I had a good reason not to.

Most of the caveats you link to are about PostgreSQL interpreting the data format in a different way than Excel, Python, Perl, human beings, etc. might interpret it. Going PostgreSQL to PostgreSQL (either with a pipe or using a file intermediate) eliminates most of those caveats, assuming you choose the same options to \copy on each end and have the same table structure. The caveats about performance and foreign keys, however, remain.

Related Solutions

Postgresql – avoid lock and allow two scripts to insert data in one table

idle in transaction means it is waiting for data from the client, not that is waiting on a lock.

And your excerpt from pg_lock supports this, assuming the last column is the "granted" column. All locks have been granted, no one is blocking.

If your scripts are blocking each other, it seems to be happening outside of the database. Or at least, outside of that particular database.

PostgreSQL COPY Command – Appending Data

I think you cannot (easily) do that (at least on your PostgreSQL version, see Daniel's answer). What you definitely can, however, is changing your approach, and do a UNION ALL from all those tables in the query part of your COPY. This means there is no looping, but you have to construct your query from the collected table names.

The result would look like

COPY (SELECT * FROM table1
      UNION ALL
      SELECT * FROM table2
      ...) 
    TO tmp/result.txt;

Notes:

there is no such thing as a psql function. psql is a client to PostgreSQL. It has built in commands (basically everything starting with \), however.
it is not so clear what you mean by 'archive logs'. If it is the write-ahead log (WAL), then with the above approach you don't have to worry about it. Otherwise, you can use unlogged tables.
if the tables are big, there is a chance your system will write (a lot of) temp files. This might make the execution not so fast.

Best Answer

Related Solutions

Postgresql – avoid lock and allow two scripts to insert data in one table

PostgreSQL COPY Command – Appending Data

Related Question