PostgreSQL Performance – Single Statement vs Multiple Row Inserts

bulk-insertperformancepostgresql

I'm currently obsessing over bulk inserts via prepared statements with libpqxx.

I am far from a database expert and even further when it comes to Postgres, but just thinking superficially, it doesn't seem like it would make a difference.

In my MySQL days, it made a huge difference, but it strikes me that MySQL's clustered index was the bottleneck, that it would be better to prepare it for many rows at once rather than one at a time.

Since Postgres doesn't do that, happily I might add, is it correct that there would be a near negligible performance boost to bulk inserts? Please explain why or why not.

Best Answer

There is a big difference. Insert as many rows as possible at once. Best in a single INSERT statement, but at least in a single transaction. Inserting individual rows takes much longer. Quoting the chapter Populating a Database:

If you allow each insertion to be committed separately, PostgreSQL is doing a lot of work for each row that is added. An additional benefit of doing all insertions in one transaction is that if the insertion of one row were to fail then the insertion of all rows inserted up to that point would be rolled back, so you won't be stuck with partially loaded data.

If you insert a large number of rows, consider COPY, which is much faster, yet. And read that chapter in the manual. It has all you need.