Understanding PostgreSQL Connection Pool Recommendations vs Support for Hundreds of Connections

connection-poolingmax-connectionspostgresql

From PostgreSQL docs:

Number of Database Connections -> How to Find the Optimal Database Connection Pool Size

for optimal throughput the number of active connections should be
somewhere near ((core_count * 2) + effective_spindle_count)
Tuning Your PostgreSQL Server -> max_connections

Generally, PostgreSQL on good hardware can support a few hundred
connections.

For me – not an experienced DBA – there's a discrepancy somewhere in here, especially looking at the offerings of some DB-as-a-Service providers.

For example, at this time Amazon RDS's largest machine (db.r3.8xlarge) has 32 vCPUs, which according to the first formula would perhaps manage to run optimally with 100 connections in the pool, given many disks. Wouldn't it though run very badly with the "few hundred connections" from the second formula?

Even more extreme is the discrepancy for another DBaaS provider, who proposes a 2 core server with 500 concurrent connections. How could this possibly work well?

If I'm misunderstanding something, please let me know. Many thanks!

Best Answer

"Can support" != "optimal throughput".

You can use lots of connections, but it's slower.

If you use fewer connections and queue work, you get the same amount of work done in a smaller time.

Even more extreme is the discrepancy for another DBaaS provider, who proposes a 2 core server with 500 concurrent connections. How could this possibly work well?

Either they're using a connection pooling frontend like PgBouncer in transaction pooling mode, or it won't work well.

People like big numbers though, so they'll give you big numbers.

They're actually hurting performance by doing so. PostgreSQL has some costs that scale linearly with max_connections, so even if the connections are not used it still has a performance impact.

Additionally, even idle connections have some further housekeeping costs.

If the connections are actively working, then you also have contention on system resources and on internal locks.

I routinely run into people who're having PostgreSQL performance issues - and who try to solve them by adding more connections, more workers in their application, etc. Especially people running queuing systems. It's surprisingly hard to persuade them that lowering the number of workers will make the system go faster, and that their original performance issues stemmed from having too many in the first place.

Best Answer

Related Solutions

How to Determine the Sweet Spot Between Pool Size and Database Connections in PostgreSQL

SQL Server – Connection Pooling and Retained Isolation Level

Related Question