Postgresql – Postgres 9.6: tune for one time import

importpostgresql

I'm using pgloader to perform a one-time migration from MySQL to Postgres.

For that purpose, I want to temporarily configure Postgres specifically for that workload.

  • reduce WAL as much as possible: wal_level = minimal
  • archive_mode = off
  • Should I autovacuum = off ?

What else can I do to improve things without severely putting my data at risk?

Ee.g. I don't want to fsync = off. Although I can re-do this migration, this additional time would still be unwanted.

Additional setting after Feedback

  • synchronous_commit=off (thanks @a_horse_with_no_name )

Best Answer

I found that the following changes to postgres.conf yielded the best results so far:

archive_mode            = off       # old: on
autovacuum              = off       # old: on
commit_delay            = 100000    # old: 0
maintenance_work_mem    = 1GB       # old: 512MB
max_wal_size            = 10GB      # old: 1GB
synchronous_commit      = off       # old: on
wal_level               = minimal   # old: archive
work_mem                = 1GB       # old: 8MB

Also I had quite trouble getting the most out of a single pgloader invocation to migrate all the table. This would have been the most desired way, because pgloader that way also re-creates all the FK between the imported tables.

But the problems I was facing (pgloader hanging, not achieving the performance I was aiming for in terms of total time) I settled with this approach:

  • run a separate pgloader instance for every individual table, in parallel, inside tmux
  • using tmux wait-for the longest import migration
  • run a separate *.sql script for (almost each) table for post-processing (add FKs, add triggers), again in parallel inside a tmux
  • run vacuumdb --jobs=24 --analyze

To fit this purpose, the machine in question had been pimped:

  • 14 cores (instead of 8)
  • 64GB RAM (instead of 32GB; but they will eventually stay)

Also, for each individual pgloader invocation I had a seperate loader config with individual batch size configuration. The default batch size of 25000 wouldn't work for the bigger tables.

In the end, for my purpose and workload, this cut down the time

  • from 4h38m (1h11m importing, 1h41m adding FKs, 1h22 vacuuming)
  • to 2h8m6s (74m30s importing, 25m50s adding FKs, 27m44s vacuuming)

(Actually, the latest numbers are slightly below 2h even, but I don't have more detailed stats for them)