Processing CSV Data on Import in PostgreSQL with PostGIS

importpostgispostgresql

I have an externally-produced CSV filled with geographic points and associated data, but I would like to run some functions on the columns before saving them. For example I need to digest longitude/latitude columns into points, run some numerical processing on other columns, and discarding a couple altogether.

What's the best practise here? Options I've considered:

  • Processing the data with a shell script before importing. Can be invoked directly from the COPY command with COPY FROM PROGRAM, but still leaves the problem of turning the lon/lat into a geographical type.
  • Having two tables, data and data_import, and updating data using functions on the columns of data_import.
  • Having a table where the imported data is as it appears in the CSV, working with it through a View that does the necessary processing.

Best Answer

The common way is to use a staging table (the table you called "data_import" which conventionally would be called "data_stg" or similar variation).
Preferably, no "updating" would be involved (maybe it was just a poor choice of words) but only a single insert into ... select ....