PostgreSQL COPY: is it always faster to delete and recreate indexes

postgresql

In this answer, Erwin Brandstetter recommends

It is also substantially faster to delete indexes before a huge bulk INSERT / COPY and recreate them afterwards as it is much more costly to incrementally adjust indexes for every inserted row than to create them at once.

1) Is this true even when the table is already large? If not, is there an easy way to estimate the point of diminishing return? For example, I'm doing COPY of 30,000 records at a time into a table which already has say 100 million rows. Each batch is relatively small, but on the other hand the total batches on hand to import at the moment will more than double the size of the table.

2) If I were to delete the index before COPY, would recreating it be faster if the COPY is done with records already in index order?

There is only one index on the table, which is a 5-column primary key.

I'm doing this on a non-production machine, and I've already disabled fsync, syncronous_commit, autovacuum, etc. I'm executing the COPY operation from four separate processes concurrently on an 8-core machine. I've got 12GB of memory, so I could set maintenance_work_mem to 1GB or more if it would help.

Best Answer

My guess would be going with some benchmarking, measuring the time it's taking you to recreate the full index, the time it takes to insert 10k rows without index, with index. then I'd try with 1000 rows, 5000 rows, 10000 rows, to try to figure out a trend.

Because even if you try to reduce the amount of factors that could affect the operation time (as Auto-Vacuum) it still depends on your hardware and your whole postgres.conf

BONUS: if you share with us some benchmarks, we could gladly help you to figure out the trend.