It could be normal if there are newlines in certain text fields. Newlines are allowed when the value in the field is enclosed by double quotes.
And obviously that makes the number of lines in the file greater than the number of records.
Example :
$ cat file.csv
1,"ab
cd",2
3,"efgh",4
$ wc -l file.csv
3 file.csv
=> create table csvtest(a int, b text, c int);
=> \copy csvtest from 'file.csv' with csv
=> select * from csvtest;
a | b | c
---+-------+---
1 | ab | 2
: cd
3 | efgh | 4
(2 rows)
Pre-9.0 VACUUM FULL
You're on PostgreSQL 8.4 or older, where VACUUM FULL
tends to bloat indexes. See this wiki page for details.
Don't run VACUUM FULL
as a periodic maintenance task. It's unnecessary and inefficient. This remains true on current versions, it's just not as bad on 9.0 and above. If you feel the need to run VACUUM FULL
regularly then you probably don't have autovacuum turned up far enough and are having table bloat issues. In fact, unless you've changed the FILLFACTOR
on the table from its default 100
a VACUUM FULL
is quite counter-productive; it'll compact away all the free space in the table, so following UPDATE
s will have to extend the table.
Table extensions are currently one of the poorer performing operations in PostgreSQL, as they're controlled by a single global lock. So if you have tables that fluctuate in size, you really want to avoid constantly compacting and truncating them only to extend them again.
On some unusual workloads it can be worth running a periodic CLUSTER
, which orders the table based on an index and effectively REINDEX
es it. If you do many UPDATE
s on the table should set a lower FILLFACTOR
for efficiency.
If this table is being emptied and re-populated regularly, you should generally using TRUNCATE
followed by COPY
to fill it back up. If it's big, drop the indexes before the COPY
then re-create them afterwards to produce indexes that are more compact and faster and to speed up the data load.
For one-off mitigation, CLUSTER
the table or REINDEX
it.
8.1?!?!
After edit added version: Holy bleepazoids, batman. 8.1.18? Forget what I said about autovacuum, autovacuum in 8.1 was way too ineffective. Upgrade to a sane version ASAP. You're not even on the current point release of 8.1, 8.1.23, from December 2010. 8.1.18 was released in September 2009! You need to begin your upgrade planning ... well, about two years ago, preferably. Read the release notes for every .0 version between 8.1 and the current release, focusing on the upgrade notes and compatibility notes. Then plan and execute your upgrade. If you don't feel up managing that on your own there are people who'll help you with it (I work for one of them) but honestly, the release notes and docs are quite sufficient for most people to do an upgrade themselves without undue pain.
Moving from 8.1 to 8.3 or newer will be your biggest pain point, as PostgreSQL 8.3 dropped a whole bunch of implicit casts that lots of potentially buggy SQL relied on. You'll need to test your application carefully on the newer version. Other changes to be aware of are:
- The removal of implicit
FROM
and in later versions removal of the backwards compatibility parameter for it;
- UTF-8 validation improvements in newer versions that can cause older dumps to fail to load until the data is corrected;
- The change to
standard_conforming_strings
by default;
- The change of
bytea_output
to hex
Best Answer
postgres_fdw is certainly not as optimized for bulk transfer as
copy to
,copy from
, andscp
are. After all, bulk transfer is the main reason for the existence of those tools.But that doesn't mean there is nothing you can do. If you were running 9.6 on the local server, you could try increasing the fetch_size.