Postgresql – How to troubleshoot COPY FROM inserting less rows then expected

csvimportpostgresql

I have a CSV (tab-delimited in fact) file with 7,590,051 lines, according to wc -l, which I want to import using COPY FROM. I removed the table's primary key, so it has no constraints.

I ran COPY customer FROM '.../customer.dsv' WITH DELIMITER E'\t' CSV HEADER; and it reported 7,588,671 rows imported and no errors, so it's missing 1,379 rows (already discounting the header line).

Since PostgreSQL didn't report any errors, how do you suggest I troubleshoot which rows are missing (and why they are missing)?

Best Answer

It could be normal if there are newlines in certain text fields. Newlines are allowed when the value in the field is enclosed by double quotes. And obviously that makes the number of lines in the file greater than the number of records.

Example :

$ cat file.csv
1,"ab
cd",2
3,"efgh",4

$ wc -l file.csv
3 file.csv

=> create table csvtest(a int, b text, c int);
=> \copy csvtest from 'file.csv' with csv

=> select * from csvtest;
 a |   b   | c 
---+-------+---
 1 | ab    | 2
   : cd      
 3 |  efgh | 4
(2 rows)