PostgreSQL Replication – Write Only Primary Server Bottleneck

postgresql

I'm wondering about whether CPU/RAM plays an important role for the primary server for a Postgresql Streaming Replication setup where you have one primary server that receives all the write operations and hot standby servers which are used for read operations.

Intuitively the disk io is probably the most important factor but what about other factors like:

  • Number of CPUs
  • CPU power
  • RAM

Could you get away with having a fast SSD and skimping on the CPU/RAM? Or will something else bottleneck the write operations for the primary server?

Best Answer

This depends almost entirely on the workload - and has little or nothing to do with whether it's a standalone DB server or a streaming replication master.

Ideally you want your DB to fit within RAM. If it doesn't, add RAM until it does or you run out of money.

If you can't sensibly make your DB fit within RAM, make sure there's enough RAM to hold the heavily used indexes and any really hot tables if possible. Then start spending money on fast disk I/O instead.

If you do more than a tiny bit of writing, especially if it's not just inserts, ensure that your storage has fast disk flushes. You need this even if the database fits entirely in RAM. That means battery-backed write-back cache on a RAID controller, or good quality SSDs with proper, reliable power-loss protection.

For CPU, again, it depends on the workload. If you're I/O bound, lots of CPU is generally unnecessary. If you're not I/O bound (i.e. the DB mostly fits in RAM) then lots of CPU will benefit you more. Because PostgreSQL can't use more than one core per query (grr), workloads with few big queries want few fast cores, and workloads with lots of simple queries want lots of small cores.

Does this sound like a long and roundabout way of saying "it depends, benchmark against your workload"? Yep!

(BTW, read http://wiki.postgresql.org/wiki/Number_Of_Database_Connections and consider a connection pooler, no matter what your workload).