PostgreSQL Import – How to Import Plain Files into a Table

copyimportpostgresqlpsql

What is a good way to insert/copy plain files into a postgres table, preferrably using the psql command-line?

In my case the files are a bunch of eMails from Maildir archives, so I tried to use COPY:

psql -c "COPY emails (data) FROM '/tmp/emailfile' WITH (FORMAT text);" emails

which I would use in a for-loop shell script (for file in $(ls dir); do psql ...; done).

However, I have trouble finding a good "delimiter" that couldn't possibly be in the file,
and I'm getting these errors: ERROR: extra data after last expected column.

So I considered using the COPY ... FORMAT binary version and a BYTEA field in the db (then convert the column to TEXT inside the db),
but that requires a file header and trailer, which I have no easy way of building on-the-fly.

Is there an easy way to do this from the commandline, or do I need to write a python script for this?

Best Answer

Server-side files

Assuming the account has the pg_read_server_files role (or is superuser), if the files are accessible on a filesystem mounted on the server, and their paths have been collected in the table, this will grab the contents efficiently:

UPDATE emails SET mail_data = pg_read_binary_file(emails.fullpath);

It is more efficient than using large objects as an intermediate storage area.

Client-side files

When the files are not accessible server-side or the account does not have elevated permissions, with psql a more generic solution may be, for each file:

\set clientpath '/path/to/file'
-- assume clean paths (without any character that would be special to the shell)
\set contents `base64 :clientpath`

insert into email_data([other columns...], mail_data)
 values ( [other columns values...], decode(:'contents','base64'));

A base64 intermediate representation is used because psql doesn't support parameters in binary form. The :'contents' syntax instructs psql to inject the variable in text form into the query.