Postgresql – Load data into postgresql from file with trailing delimiter

postgresql

Context

I receive a number of data files that I'd like to load on a monthly basis into Postgresql (currently running 11.4). However, some of the files each month are formatted such that they have a trailing delimiter at the end of each line.

For example, if I have the following table:

CREATE TABLE t1 (
  id integer,
  make varchar(50),
  model varchar(50)
);

The corresponding file looks like:

1|Ford|Taurus|
2|Honda|Civic|
3|Toyota|Camry|

Note that while the | character is used as a delimiter, there is an "extra" one at the end of each line — which causes a load to fail:

=> BEGIN TRANSACTION;
BEGIN
=> TRUNCATE TABLE t1;
TRUNCATE TABLE
=> \copy t1 (id, make, model) FROM t1.dat WITH FREEZE NULL '' DELIMITER '|';
ERROR:  extra data after last expected column
CONTEXT:  COPY t1, line 1: "1|Ford|Taurus|"
=> COMMIT;
ROLLBACK

Solutions I've thought of so far:

  1. Load into a temp table that has an extra dummy column, then INSERT AS SELECT; however I believe that loses the speed/efficiency of COPY WITH FREEZE, plus you touch the data twice which I'm guessing is bad as the size of it grows.
  2. Add a dummy column onto each table; don't know a priori which files/tables need this.
  3. Ask the data provider not to do this; it comes from a branch of the US Government, so a very long shot…
  4. Pre-process the files to remove any / a single trailing delimiters; again don't know a priori which files will need this and it could change month to month.

Anyone have any other ideas or recommendations on how to solve this?

Best Answer

You could also pre-process the file as suggested by VĂ©race:

Or sed s'/.$//'

from How to shave off last character using sed? by codaddict on Stack Overflow.

Or sed -i ... (inplace - I tend to avoid this, because in case of a mistake, you've lost your original file) or sed .... > table.txt.