PostgreSQL – How to Import Multiple CSV Files Using SQL COPY Statement

postgresql

In the server hosting the PostgreSQL server, there is a very specific directory where a series of .csv files will be loaded regularly to update one of the databases. I want to make the process of uploading the data contained in these files as automatic as possible, and I have thus created a .sh script to do this. It is a simple for loop iterating through the set of .csv files in that directory and passing their names as parameters to a \COPY sentence.

Now, since the server administrator are being a little privy of their server, they would like to give us access only to the SQL server and not to the underlying unix server. So, here goes the question:

Is there a way to accomplish the task described above through a stored procedure executed from inside the database? Can you really read and access a path and its contents in that way from the database? The whole set of .csv files could vary so I don't think a hard-coded solution would work, plus it would look rather dirty (although, if that is the only way I can make it work, so be it).

My guess is that you cannot but… you never know.

Best Answer

There's a built-in pg_ls_dir function that is quite close to what you need.

There are two security-related caveats:

  • it's reserved to superusers.
  • absolute paths are not allowed, paths are relative to the PostgreSQL data directory.

Concerning the need to be superuser, any solution would have this requirement anyway, since it is a feature by design that a normal user has zero access to the filesystem.

A DBA (superuser) can grant access to an otherwise-forbidden specific functionality through a proxy function defined with SECURITY DEFINER access rights.

For instance:

create function pg_ls_dir2(text) as
  'SELECT pg_ls_dir($1)'
language sql SECURITY DEFINER;

-- optional (to give access to a specific role only)
revoke execute on function  pg_ls_dir2(text) from public;
grant execute on pg_ls_dir2(text) TO specific_role;

Concerning the second issue, a DBA can create a symlink from inside the $PGDATA directory to any directory, and pg_ls_dir will follow it, so the real upload directory can be anywhere on the file system.

If the system admin agrees to this setup, as a non-priviledged user you could eventually run a simple plpgsql function matching the functionality of the shell script:

 FOR filename IN select pg_ls_dir2('relative_path') LOOP
   IF (filename ~ '.csv$') THEN
      COPY '/fullpath/' || filename TO table...
   END IF;
 END LOOP;

As COPY FROM file itself requires to be superuser, this function will also need to be validated and owned and checked as SECURITY DEFINER by a superuser.

Related Question