Postgresql – Profiling PostgreSQL

performanceperformance-tuningpostgresqlpostgresql-9.1tuning

We have a system built with Python 2.7, Django, PostgreSQL 9.1 and PostGIS 1.5. In this system there is some extensive validation processes that are, well, intensive.

Our machine:

SO: CentOS 6
HW: 8 cores, 32GB RAM, 4Tb hard-drive;

We, for development and testing, are using the default PostgreSQL configuration, what I think, is somewhat wrong. I want to take advantage of the full capacities that our hardware provides, and so far we didnt got a lot of traffic. When this is release to the public, there will be a lot of users concurrently.

Our application has some queues and async strategies to handle the load, but, the main question is: how can I safely tune PostgreSQL, run some tests, and assure that the tuning was somewhat effective?

Best Answer

The default Pg config is almost certainly sub-optimal, it's a conservative default intended for "will run anywhere" not "will be fast".

There's lots to be said on that; see http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server, http://wiki.postgresql.org/wiki/Performance_Optimization, and Greg Smith's "PostgreSQL 9.0 High Performance". (By way of disclaimer, I now work with Greg at 2ndQuadrant, but I was recommending his book before I moved to 2ndQuadrant).

I won't repeat what's already covered in those references.

I advise that you run pg_test_fsync to see whether fsync() performance is acceptable on your I/O subsystem, as this is critical for decent transaciton-processing throughput. It doesn't matter as much for read-mostly workloads. If it's poor, consider getting a couple of high quality SSDs with proper built-in power-fail protection or a RAID controller with a battery-backed write cache. By way of comparison, my single cheap SSD laptop tends to hit about 5000 ops/sec in pg_test_fsync.

Also use sysbench to get some general idea of what disk seq and random throughput are like. This will be useful when you're adjusting random_page_size and seq_page_size.

Most importantly of all, use a connection pooler like PgBouncer instead of raising max_connections above a couple of hundred.