PostgreSQL – Performance of postgres_fdw vs Copy + SCP + Copy

postgresql

foreign server 9.2
local server 9.5
table is 10GB
data transfer performed on same network interface as foreign server works
no indexes set on data
old way:
1. copy to – 2:36
2. scp – 08:17
3. copy from – 10:11
postgres_fdw:
1. by the time old way finished it has done 800MB of insert into ..select * from foreign_table

Question: Did I miss something in config (meaning I can improve it), or postgres_fdw is just not meant for bulk load (meaning I can't improve it)? (I use it for small data amount reconcile and it works fine. Idea of insert select from fdw instead of running bash commands looked so sweet)

Update I tried psql to remote server from local server and \copy table – six minutes – faster then over ssh

Update2 The fetch_size option, not available prior to 9.6, can be mocked up with dblink_fetch(CURSOR, fetch_size) – look my answer below for sample.

Best Answer

postgres_fdw is certainly not as optimized for bulk transfer as copy to, copy from, and scp are. After all, bulk transfer is the main reason for the existence of those tools.

But that doesn't mean there is nothing you can do. If you were running 9.6 on the local server, you could try increasing the fetch_size.

Pre-9.0 `VACUUM FULL`

You're on PostgreSQL 8.4 or older, where VACUUM FULL tends to bloat indexes. See this wiki page for details.

Don't run VACUUM FULL as a periodic maintenance task. It's unnecessary and inefficient. This remains true on current versions, it's just not as bad on 9.0 and above. If you feel the need to run VACUUM FULL regularly then you probably don't have autovacuum turned up far enough and are having table bloat issues. In fact, unless you've changed the FILLFACTOR on the table from its default 100 a VACUUM FULL is quite counter-productive; it'll compact away all the free space in the table, so following UPDATEs will have to extend the table.

Table extensions are currently one of the poorer performing operations in PostgreSQL, as they're controlled by a single global lock. So if you have tables that fluctuate in size, you really want to avoid constantly compacting and truncating them only to extend them again.

On some unusual workloads it can be worth running a periodic CLUSTER, which orders the table based on an index and effectively REINDEXes it. If you do many UPDATEs on the table should set a lower FILLFACTOR for efficiency.

If this table is being emptied and re-populated regularly, you should generally using TRUNCATE followed by COPY to fill it back up. If it's big, drop the indexes before the COPY then re-create them afterwards to produce indexes that are more compact and faster and to speed up the data load.

For one-off mitigation, CLUSTER the table or REINDEX it.

8.1?!?!

After edit added version: Holy bleepazoids, batman. 8.1.18? Forget what I said about autovacuum, autovacuum in 8.1 was way too ineffective. Upgrade to a sane version ASAP. You're not even on the current point release of 8.1, 8.1.23, from December 2010. 8.1.18 was released in September 2009! You need to begin your upgrade planning ... well, about two years ago, preferably. Read the release notes for every .0 version between 8.1 and the current release, focusing on the upgrade notes and compatibility notes. Then plan and execute your upgrade. If you don't feel up managing that on your own there are people who'll help you with it (I work for one of them) but honestly, the release notes and docs are quite sufficient for most people to do an upgrade themselves without undue pain.

Moving from 8.1 to 8.3 or newer will be your biggest pain point, as PostgreSQL 8.3 dropped a whole bunch of implicit casts that lots of potentially buggy SQL relied on. You'll need to test your application carefully on the newer version. Other changes to be aware of are:

The removal of implicit FROM and in later versions removal of the backwards compatibility parameter for it;
UTF-8 validation improvements in newer versions that can cause older dumps to fail to load until the data is corrected;
The change to standard_conforming_strings by default;
The change of bytea_output to hex

Best Answer

Related Solutions

Postgresql – How to troubleshoot COPY FROM inserting less rows then expected

Postgresql – Indexes Size 20 times bigger then the table itself and very slow queries

Pre-9.0 VACUUM FULL

8.1?!?!

Related Question

Pre-9.0 `VACUUM FULL`