We had a question over on Server Fault that raised an interesting question:
In light of the scalability improvements in Postgres 9.2 scalability improvements, is it better to use connection pooling mechanisms to avoid the overhead of making extra connections to the database, or is the connection overhead worth the improved read performance?
Relating it specifically to my environment: We have a web application that is database backed and read-centric, and we're currently running on Postgres 8.4.
Our re-implementation will be launching next year, along with an upgrade to 9.2, and gives each Apache worker process its own connection to the database (and thus its own Postgres backend which is preserved for the life of the Apache worker).
Based on what we've seen this seems to be a good balance between the overhead of connecting to the database and having more workers handling the read load, though we haven't yet done any substantial benchmarking of our own to confirm this.
Does that implementation seem like a reasonable one, and are there other options / connection pooling mechanisms which we should be considering in light of the recent scalability improvements?
Best Answer
I think you see a false dichotomy that does not exist.
It can be useful to have connection pooling in place even if you expect a 1:1 mapping of clients to back-ends. If your connections are long-lived, you won't benefit from reducing backend setup/teardown overhead, as it's small and amortized across a long period. A pool like PgBouncer may remain useful for other reasons:
max_connections exceeded
error;max_connections
, so you can still make reporting and maintenance connections as a non-superuser.Additionally, if suitable for your application you can use transaction-level pooling to greatly increase the number of clients that can be served by your server.
I would not try to keep a strict 1:1 mapping from Apache workers to PostgreSQL workers, personally. If you've got (say) 16 cores and good I/O on your PostgreSQL box you might want something like 16-20 active PostgreSQL workers for optimal performance. You're almost certain to want more Apache workers than that, since they'll be kept busy by things like:
If possible, consider a transaction-pooling design with short-lived transactions instead.