Postgresql – How long does query optimisation take relative to the execution time

optimizationpostgresql

Could anyone please point me to a benchmark comparing the time spent by the query optimiser relative to the overall query execution time (any relational DBMS will do, but PostgreSQL would be preferable)?

I'm curious about roughly how significant is the time spent on optimizing the query relative to the overall execution time.

In particular, whilst reading Bao: Learning to Steer Query Optimizers by Marcus et al. 2020 I was surprised by the fact that the DBMS' query optimizer was run 48 times for each query (required even at inference), whilst the model outperformed the default PostgreSQL query performance regardless.

Best Answer

There is no absolute number for the percentage of database time spent planning queries, it depends on your workload.

  • If the workload consists of analytical queries, planning time is usually insignificant in comparison to query execution time.

  • If the workload consists of simple OLTP queries, planning time can exceed query execution time.

I didn't read the article you quote, but the PostgreSQL optimizer is invoked once per SQL statement executed. Of course it considers several plans to find the best one.

For a simple benchmark, you can use the built-in PostgreSQL benchmark tool pgbench. While it is not a full-fledged workload simulator, it is surprisingly volatile. I'll use the default workload, which is a simple OLTP workload.

First, create a sample database with a scale factor appropriate for what you want to test (in this case, 10 million rows in the largest table):

pgbench -i -s 100 mydb

Then run the benchmark with the “simple query protocol” (one-shot queries that are planned each time they are executed):

pgbench -c 4 -T 60 mydb
[...]
tps = 3922.296560 (excluding connections establishing)

Repeat the benchmark using the extended query protocol with prepared statements, so that query plans are reused (this roughly corresponds to query execution time without planning):

pgbench -c 4 -M prepared -T 60 test
[...]
tps = 5022.185127 (excluding connections establishing)

So in this simple scenario, we got more than 25% more transactions done by excluding planning time. For more reliable values, run the test for more than 60 seconds.

Of course this number doesn't accurately measure query planning time versus query execution time, because there is still the latency of the round trip between client and server to consider. You could get better numbers by writing your own PL/pgSQL code that runs static or dynamic SQL statements in a tight loop. Since the plans of static SQL statements in PL/pgSQL statements are cached, their runtime will exclude query planning time.

You can also write your own benchmarks with pgbench; it allows you to write custom scripts and even offers a simple scripting language.