The single best thing you can do to reduce disk I/O load is to ensure that your database fits comfortably in RAM after allocating RAM to everything else running on the system. If this is the case, your disk I/O will be primarily WAL flushes on commit, which is mostly sequential I/O.
The second thing you can do is address the shared_buffers
and effective_cache_size
settings. If the system is exclusively running PostgreSQL I like to set shared_buffers low, and effective_cache_size high and adjust from there. If the system is running other programs too, shared_buffers should be set much higher because it reserves RAM exclusively for PostgreSQL. Note that the OS disk cache is often faster than the PostgreSQL cache, but the Pg cache is more full-featured so there is a real balance. Measure, test, repeat.
A third setting to be aware of is work_mem.
This setting should be adjusted only with care, because it is per operation. If you set it too low, joins and sorts will write to disk. If you set it too high, it will squeeze out memory that could be used for caching data. Again, look for query performance patterns and move forward.
You seem to be trying to replicate from one server to another that wasn't set up using a copy of the original server. That's why:
database system identifier differs between the primary and standby. The primary's identifier is 6022019027749040119, the standby's identifier is 6033562405193904122.
Because each newly initdb
'd PostgreSQL gets a new random system identifier. When you copy an existing PostgreSQL install, it keeps the same system identifier. That's how PostgreSQL can keep track of whether one server can replay WAL from another.
You can only use physical replication if the replica is a copy (file-system level backup e.g. pg_basebackup
) of the master. See the manual's detailed coverage on replication for more information.
Update:
The instructions shown above should be fine, but they're not as clear as they could be.
The standby server's data directory is supposed to be replaced by the base backup you create at step 8, if it exists in the first place.
You can't make an existing PostgreSQL instance into a standby for another without replacing its data directory. You need a copy of the master's data directory to run a standby. A common way to set that up is to take an existing standby, delete its data directory, replace it with a copy of the master's data directory, and then configure it as a replication slave. That's what I think step 8 is supposed to be doing.
Instead of doing that I think you probably used an existing data directory for the slave and tried to start it up as a replica of the master. That will not work, and will result in the errors you showed.
The main PostgreSQL documentation on replication is the recommended and primary resource for information. I suggest going there first.
You might also want to check out repmgr, which helps automate replication and failover tasks.
Best Answer
I found that the following changes to postgres.conf yielded the best results so far:
Also I had quite trouble getting the most out of a single pgloader invocation to migrate all the table. This would have been the most desired way, because pgloader that way also re-creates all the FK between the imported tables.
But the problems I was facing (pgloader hanging, not achieving the performance I was aiming for in terms of total time) I settled with this approach:
tmux
tmux wait-for
the longest import migration*.sql
script for (almost each) table for post-processing (add FKs, add triggers), again in parallel inside a tmuxvacuumdb --jobs=24 --analyze
To fit this purpose, the machine in question had been pimped:
Also, for each individual pgloader invocation I had a seperate loader config with individual
batch size
configuration. The default batch size of25000
wouldn't work for the bigger tables.In the end, for my purpose and workload, this cut down the time
(Actually, the latest numbers are slightly below 2h even, but I don't have more detailed stats for them)