MySQL vs PostgreSQL – Is MySQL More Scalable Due to Connection Handling?

concurrencyMySQLpostgresql

I'm trying to decide if either MySQL or PostgreSQL would be more suitable for an application that will get hit by potentially thousands of simultaneous requests at a time.

During research, one fact that stands out is that PostgreSQL forks a new process for each connection, whereas MySQL creates a new thread to handle each connection.

  • Does this mean that MySQL is more efficient than PostgreSQL at handling many concurrent connections?

  • How much of an impact does this difference have on how well both systems scale? Is it something that I should worry about to begin with?

Best Answer

I'll use a dated example. Suppose you have a webserver serving some static pages and php pages with a database backend. Back in the day, the default config was:

1- Web client connects and apache process/thread is created

2- php interpreter is instanciated

3- database connection is opened

4- queries are done, some html is generated while being sent to client at the same time

5- wait for all data to be transferred to client

6- release php interpreter, database connection, and apache process/thread

This didn't work back then, and it still doesn't work now, because step 5 can take a while. Suppose you need 2 milliseconds of SQL queries and 10 milliseconds running php to generate the response, and 500ms for the HTTP transfer to complete.

The database connection is opened and lasts 500ms, using up valuable RAM and resources on the database server for the whole 500ms, but it is only being actually used for 2ms. That's an efficiency of 0.4%. This means: if the database connection was only used to perform the queries during 2ms, then released for other clients to use, it could perform 250 times more queries per second using the same amount of RAM on your database server. This also means the dumb config that doesn't scale will use hundreds of times more RAM precisely on the server where RAM is the most precious resource of all: the database. Which in practice means it will come down in flames.

The webserver process and php interpreter are only doing useful work for 10ms, but they will occupy valuable RAM on the webserver for 500ms. That's an efficiency of 2%. So, for the same reason as above, this setup would need 50x more webserver RAM than the smart setup.

The smart setup is very simple, it's basically a funnel.

On the wide side of the funnel, you have many slow incoming client connections.

On the small side of the funnel, you have a few webserver processes or database connections.

The funnel is software like g-wan or lighttpd, or a connection pool, which:

  • gathers input data while using minimum resources, especially NOT using one thread per client

  • when enough input data is available to generate a response, use a pre-spawned process in a pool to process it, buffer the response, and release the process as fast as possible

  • transfer data back to the client while using minimum resources

Using this setup, your valuable resources are always working instead of waiting for clients, so you need a lot less of them to produce the same amount of output. With few database connections open, ideally a few connections per core, there is much less contention, locks, and context switching on the database server. In particular, locks are held for a much shorter time. This all makes the database server much faster.

So you are asking the wrong question, it is not about how many concurrent connections you can have on your database server before it falls apart. It is about using tools that allow you to have the ideal number of database connections that make your database perform at maximum throughput, no matter the actual number of client connections on the other side of the funnel.

Postgres does tends to scale better especially when you have reads, writes, and long complicated queries at the same time.