Postgresql – the best way to upload custom-parsed data into AWS Aurora PostgreSQL

awspostgresql

I have a large (5-10 GB) binary file on AWS S3 that will require custom parsing, probably in python. It is essentially a sequential set of millions of dataframes, all having the same structure. What is the best way for me to get this data into a severless/hosted AWS Aurora PostgreSQL instance? So far I have thought of:
1. I could write to a CSV file and use COPY, but the size would be astronomical
2. I could send it over the wire in batches of rows
3. use AWS Glue, though I'm still learning about that.

Best Answer

I could write to a CSV file and use COPY, but the size would be astronomical

You could write the CSV data stream to a pipe rather than a file:

geneate_csv | psql -c '\copy tablename from stdin'

or

\copy tablename from program 'generate_csv'
Related Question