Postgresql – While importing 150 GB file on Amazon RDS, why connection is failing

amazon ec2amazon-rdspostgresqlpostgresql-9.3rails

I am having Postgres database dump file of 150 GB on Amazon EC2 instance. While dumping the data on RDS from EC2 server I am getting error.

The output of the command is as given below. It's giving error

1. psql connection not open    
2. connection to server was lost 
on RDS dump postgres copy command

Command Output:
SET
SET
SET
SET
SET
SET
SET
SET
ALTER TABLE
ALTER TABLE
ALTER SEQUENCE
ALTER TABLE
psql:filename.sql:1396266: connection not open
psql:filename.sql:1396266: connection to server was lost

Application Configuration:

Ruby 1.9.3 
Rails 3
PostgreSQL 9.3

Please help why it is breaking while copying the data. As the connection was established and command is running. Suddenly while executing the copy command it is breaking.

Command/script that I am using to dump data the data is below:

psql -h instance.id.region-2.rds.amazonaws.com -p 5432 -U username -W -d database_name -f filename.sql

Workaround for finding issues as below:

I took head 100 rows and tail 200 rows and made 1 file from the big 150 GB file it converted to 56KB. when I run this command, it's dumped successfully.

So the file size is causing problem. The same command is running for small size file for dumping data on RDS.

How can I resolve this issue?

Best Answer

It's the file size that's the problem.

Essentially there's a timeout expiring somewhere, the server logs will tell you where.

When the timeout expires, no more data is produced, and after a while with no data the connection gets closed as per normal.

The last time I had to tackle something like this (Which is a long time ago now), I wrote a small program that basically pulled the data out of my server in chunks using various SQL criteria.

This is absolutely NOT the best way of doing it, but when you have a table that's so big and takes so long that things start to time out, it's one of the only ways I found to make it work.

The other approach you have is to use an ETL tool such as Pentaho and do a server to server transfer, again using a custom query to do it a chunk at a time.

If you MUST do it using PSQL, then try to break it up. Instead of one massive 150gb file, try doing the xfer a table at a time or similar.

Also, think about local resources. You might get better traction if for example, you can dump the file to a server where you have file system access that's inside amazons network, such as an AWS instance. If you have a local resource, set up an AT job, dump the DB to that instance (optionally G-zipping it in the process) then transfer that file to your PC using a web browser of FTP connection.

Tackling the initial problem head on, your not going to know which approach you have to take until you can figure out why it's timing out.