PostgreSQL Performance – Speedup and Slowdown with Concurrent Queries

concurrencyperformancepostgresqlquery-performance

I approach you all humbly as one who is NOT a DBA, and I'm sure that my question is fraught with conceptual shortcomings and "it depends on" land mines. I'm also pretty sure that all of you who choose to answer are going to want a lot more in the way of specifics than I can currently deliver.

That said, I'm curious about the following scenario in general:

  • Say that I have two non-trivial queries.
  • Query 1 requires 2 minutes to complete on average.
  • Query 2 requires 5 minutes to complete on average.

If I run them serially, one right after the other, I'm expecting it will require 7 minutes to complete on average. Is this reasonable?

More than that, however, what if I run the two queries concurrently? Two separate connections at the same time.

  • Under what conditions would I expect to see a speedup? (Total time < 7 minutes)
  • Under what conditions would I expect to see a slowdown? (Total time > 7 minutes)

Now, if I had 1,000 non-trivial queries running concurrently, I have a hunch that it would result in an overall slowdown. In that case, where would the bottleneck likely be? Processor? RAM? Drives?

Again, I know it's probably impossible to answer the question precisely without knowing specifics (which I don't have.) I'm looking for some general guidelines to think about when asking the following questions:

  • Under what circumstances do concurrent queries result in an overall speedup?
  • Under what circumstances do concurrent queries result in an overall slowdown?

Best Answer

If I run them serially, one right after the other, I'm expecting it will require 7 minutes to complete on average. Is this reasonable?

If they use unrelated data sets, then yes.

If they share a data set, and the cache is cold for the first query and the query is mostly I/O bound, then the second one might complete in moments. You need to consider caching effects when dealing with performance analysis and query timing.

More than that, however, what if I run the two queries concurrently? Two separate connections at the same time.

"It depends".

If they were both using sequential scans of the same table then in PostgreSQL it'd be a huge performance win because of its support for synchronized sequential scans.

If they shared the same indexes then they'd likely benefit from each others' reads in to cache.

If they're independent and touch different data then they might compete for I/O bandwidth, in which case they might take the same amount of time as running sequentially. If the I/O subsystem benefits from concurrency (higher net throughput with more clients) then the total time might be less. If the I/O subsystem handles concurrency poorly then they might take longer than running them sequentially. Or they might not be I/O bound at all, in which case if there's a free CPU for each they could well execute as if the other wasn't running at all.

It depends a great deal on the hardware and system configuration, the data set, and on the queries themselves.

Now, if I had 1,000 non-trivial queries running concurrently, I have a hunch that it would result in an overall slowdown. In that case, where would the bottleneck likely be? Processor? RAM? Drives?

Yes, that'd very likely slow things down for a number of reasons.

  • PostgreSQL's own overheads in inter-process coordination, transaction and lock management, buffer management, etc. This can be quite a big cost, and PostgreSQL isn't really designed for high client counts - it works better if you queue work.

  • Competition for working memory, cache, etc.

  • OS scheduling overhead as it juggles 1000 competing processes all wanting time slices. Pretty minor these days, modern OSes have fast schedulers.

  • I/O thrashing. Most I/O systems have a peak performance client count. Sometimes it's 1, i.e. it's best with only one client, but it's often higher. Sometimes performance decreases again above the threshold. Sometimes it just reaches a plateau.