Postgresql – Moving a large table from Localhost to remote server

datagrippg-dumppgadminpostgresqlpsql

Postgres 11, Windows 10, pgAdmin 4

I have a large post-gis enabled table (29million rows, 20GB + a 5GB spatial index) stored on my local machine (localhost).

I want to move this table to an AWS hosted server instance (also Postgres). The table has 2 columns, and id column and a geometry column.

Attempt 1: Using dblink, the transfer doesn't work:

CREATE TABLE destschema.dest_table AS
SELECT fid, wkb_geometry FROM dblink('host=localhost
                     user=postgres
                     password=password
                     dbname=sourcedb',
                     'select fid, wkb_geometry
                    from sourceschema.sourcetable') as linktable(
                    fid VARCHAR,
                    wkb_geometry geometry)

Error is: 'FATAL: password authentication failed for user "postgres"'

Even though the localhost connection credentials are correct. Why does it throw this error? I have previously had success using dblink to transfer tables from one remote cloud server to another remote cloud server. It just seems to not like moving a table from localhost to a remote server.

Attempt 2: using pg_dump to create a .backup file of the table (works ok), then upload it to the destination server using psql:

psql -U your_user_name your_db_name < your_dump_file

This method took 12 hours to insert 5million rows, so I cancelled the process. I don't want to have to wait 72 hours to move this table…

Attempt 3: Using PgAdmin's interface to backup the table to a .backup file and then restoring that .backup file to the destination database. However, after waiting 6 hours it was still uploading… I stop the process.

Attempt 4: Eventually I got around this issue by purchasing DataGrip (by Jetbrains) and exporting the 29mil rows to 3 CSV files, then importing the data back into the destination db. All the while using the DataGrip GUI. The exporting phase took around 8 mins in total, the uploading took 2.5 hours in total. Then I had to manually add a spatial index to the new table.

What is the quickest, simplest way of moving a large table from localhost to remote? Surely I've missed a trick or two…

Best Answer

CREATE TABLE destschema.dest_table AS
SELECT fid, wkb_geometry FROM dblink('host=localhost
                     user=postgres...

This statement could be used to transfer between databases on the same machine, but not across different machines. When you're logged to the AWS database, localhost designates the AWS host, not your PC. This is why the credentials of your PC don't work.

Copying the output of pg_dump to the remote, as a file and playing it there locally is probably the fastest way to achieve the transfer. Steps:

  • on the PC: pg_dump.exe -Fc [other options] dbname -f dumpfile
    -Fc as an option selects the custom format, which has the advantage of compressing the dump.
  • transfer dumpfile to the remote AWS host, with a method of your choice. On Unix scp would probably be used: scp dumpfile username@remotehostname.amazonaws.com:. On Windows, it may also be available if it got installed, but otherwise any sftp client should work.
  • in a shell on the remote host, restore the dump: pg_restore dumpfile | psql [options] dbname.

If you only need to dump one table (as opposed to an entire database), see the -t option of pg_dump.

For the method executing SQL remotely (that is, on your PC, psql.exe -h remotehostname.amazonaws.com [other options] or the equivalent with pgAdmin connecting to the remote), note that compression is not used unless it happens at the network layer, and with 20GB to transfer, you really want to benefit from compression.