Postgresql – tuning postgresql for large amounts of ram

memorypostgresql

I have two identical servers (in terms of hardware), they are both standard installations of windows server 2008 r2, with minimal software installed (basically my code and required stuff like jvm etc).

On the one server, I am running sql server 2005, on the second server postgresql 9.1. The difference in performance b/n these 2 servers is staggering, it's so bad on postgresql that I'm regretting my initial "let's use postgresql instead of paying for the sql server license" speech to my boss. We're talking differences of 30 seconds vs 15 mins for the same command, and it's not just this one command, it's any query or command I throw at it. They both have pretty much the same data (records were inserted in different order), and both databases have the exact same structure / indexes etc.

But I'm hoping it's just a matter of performance tuning. The thing is, sql server is pretty much using all 32 gigs of ram on the server, whereas postgresl is using nothing, definitely less than a gig though I haven't actually figured it out in fine detail.

How do I get postgresql to use 20+ gigs of ram? These servers were built specifically for this database stuff, so any ram not in use by the database and supporting processes is wasted in my opinion.

Best Answer

There are many tweakable constants, initialised via postgres.conf. The most important ones are:

  • max_connections: the number of concurrent sessions
  • work_mem : the maximal amount of memory to be used for intermediate results such as hash tables, and for sorting
  • shared_buffers the amount of memory dedicated to 'pinned' buffer space.
  • effective_cache_size the amount of memory assumed to be used by the OS's LRU buffers.
  • random_page_cost : an estimate for the relative cost of disk seeks.

max_connections should not be set higher than needed, connections cost resources even when idle; in most cases a connection would spend more time waiting inside than waiting outside. (at the price of concurrency) A nice rule-of-thumb formula is "number of spindles+number of processors+X"

work_mem is tricky: is can be applied to every subquery, so a query with 5 HASHJOINS might cost 5*work_mem. And for worst-case scenarios, you should also think of multiple sessions consuming this amount (again a reason to keep max_connections low).

shared_buffers is (IMHO) overrated. Normally it is advised to set it to about 1/4...1/2 of all available "free" memory, but I tend to keep it low, and set effective_cache_size to all available "free" memory.

random_page_cost is the cost for a seek+read on the disk. It is relative to the sequential_disk_cost, which is 1. The default (4) for random_page_cost is set too high for modern machines and network storage, normally it can be lowered to between 2 and 1.x. For SSD disks yould even set it to 1.0, since seeking is almost for free on SSDs.