Postgresql – How long does query optimisation take relative to the execution time

optimizationpostgresql

Could anyone please point me to a benchmark comparing the time spent by the query optimiser relative to the overall query execution time (any relational DBMS will do, but PostgreSQL would be preferable)?

I'm curious about roughly how significant is the time spent on optimizing the query relative to the overall execution time.

In particular, whilst reading Bao: Learning to Steer Query Optimizers by Marcus et al. 2020 I was surprised by the fact that the DBMS' query optimizer was run 48 times for each query (required even at inference), whilst the model outperformed the default PostgreSQL query performance regardless.

Best Answer

There is no absolute number for the percentage of database time spent planning queries, it depends on your workload.

If the workload consists of analytical queries, planning time is usually insignificant in comparison to query execution time.
If the workload consists of simple OLTP queries, planning time can exceed query execution time.

I didn't read the article you quote, but the PostgreSQL optimizer is invoked once per SQL statement executed. Of course it considers several plans to find the best one.

For a simple benchmark, you can use the built-in PostgreSQL benchmark tool pgbench. While it is not a full-fledged workload simulator, it is surprisingly volatile. I'll use the default workload, which is a simple OLTP workload.

First, create a sample database with a scale factor appropriate for what you want to test (in this case, 10 million rows in the largest table):

pgbench -i -s 100 mydb

Then run the benchmark with the “simple query protocol” (one-shot queries that are planned each time they are executed):

pgbench -c 4 -T 60 mydb
[...]
tps = 3922.296560 (excluding connections establishing)

Repeat the benchmark using the extended query protocol with prepared statements, so that query plans are reused (this roughly corresponds to query execution time without planning):

pgbench -c 4 -M prepared -T 60 test
[...]
tps = 5022.185127 (excluding connections establishing)

So in this simple scenario, we got more than 25% more transactions done by excluding planning time. For more reliable values, run the test for more than 60 seconds.

Of course this number doesn't accurately measure query planning time versus query execution time, because there is still the latency of the round trip between client and server to consider. You could get better numbers by writing your own PL/pgSQL code that runs static or dynamic SQL statements in a tight loop. Since the plans of static SQL statements in PL/pgSQL statements are cached, their runtime will exclude query planning time.

You can also write your own benchmarks with pgbench; it allows you to write custom scripts and even offers a simple scripting language.

Related Solutions

SQL:2008 Standard – Is CTE Optimization Fence Behavior Specified?

I think it's an implementation detail.

A conforming implementation is not required to perform the exact sequence of actions defined in the General Rules, provided its effect on SQL-data and schemas, on host parameters and host variable, and on SQL parameters and SQL variables is identical to the effect of that sequence. The term effectively is used to emphasize actions whose effect might be achieved in other ways by an implementation.¹

I think an implementer could evaluate a common table expression 20 times, even in 20 different ways, and still have a conforming implementation. The only relevant issue is whether "its effect . . . is identical to the effect" of the sequence of actions defined in the General rules.

[1]. Section 6.3.3.3, "Rule evaluation order", in a draft of the SQL 2008 standard, having the local filename 5CD2-01-Framework-2006-01.pdf, p. 41 I have no idea where I got it. Google might know.

MySQL Performance – How to Improve Query Execution Time

There are at least three parts of your query that will or may benefit from rewriting:

These five correlated subqueries in the SELECT clause are retrieving data from the same table based on the same condition:

(select websitelink from company_profile  where z_companyid_fk=a.z_companyid_pk) "Website",
(select company_email from company_profile  where z_companyid_fk=a.z_companyid_pk) "Primary_Contact_Email",
(select telephone from company_profile  where z_companyid_fk=a.z_companyid_pk) "company_telephone",
(select facebooklink from company_profile  where z_companyid_fk=a.z_companyid_pk) "facebook", 
(select twitterlink from company_profile  where z_companyid_fk=a.z_companyid_pk) "twitter",

You could rewrite them by introducing a left join to company_profile in your main FROM clause and referencing just the columns in the main SELECT. But wait, you already have such a join:

LEFT OUTER JOIN company_profile c on a.z_companyid_pk=c.z_companyid_fk

For some reason you are not using it. You can use it by rewriting the above five columns like this:

c.websitelink   AS "Website",
c.company_email AS "Primary_Contact_Email",
c.telephone     AS "company_telephone",
c.facebooklink  AS "facebook", 
c.twitterlink   AS "twitter",

(The AS keyword is unnecessary. I use it because in my opinion it improves readability, but you are free to omit it.)

This series of correlated subqueries is pulling data from the same rowset as well, but each subquery has an additional condition, which makes this a different case from the previous one:

(SELECT  bizurl FROM user_settings us  INNER JOIN asset_mst am ON (us.userid=am.userid)  where  us.isiteid=1 and am.z_companyid_fk=a.z_companyid_pk ) "Yelp",
(SELECT  bizurl FROM user_settings us  INNER JOIN asset_mst am ON (us.userid=am.userid)  where  us.isiteid=2 and am.z_companyid_fk=a.z_companyid_pk ) "Urbanspoon",
(SELECT  bizurl FROM user_settings us  INNER JOIN asset_mst am ON (us.userid=am.userid)  where  us.isiteid=3 and am.z_companyid_fk=a.z_companyid_pk ) "Trip_Advisor",
(SELECT  bizurl FROM user_settings us  INNER JOIN asset_mst am ON (us.userid=am.userid)  where  us.isiteid=4 and am.z_companyid_fk=a.z_companyid_pk ) "Citysearch",
(SELECT  bizurl FROM user_settings us  INNER JOIN asset_mst am ON (us.userid=am.userid)  where  us.isiteid=5 and am.z_companyid_fk=a.z_companyid_pk ) "Open_Table",
(SELECT  bizurl FROM user_settings us  INNER JOIN asset_mst am ON (us.userid=am.userid)  where  us.isiteid=6 and am.z_companyid_fk=a.z_companyid_pk ) "ZAGAT",
(SELECT  bizurl FROM user_settings us  INNER JOIN asset_mst am ON (us.userid=am.userid)  where  us.isiteid=32 and am.z_companyid_fk=a.z_companyid_pk ) "Zomato",

Rewriting these will require more than just and outer join in order to preserve the same output format.

First of all, this join

user_settings us  INNER JOIN asset_mst am ON (us.userid=am.userid)

probably returns data like this:

...  am.z_companyid_fk  us.isiteid  us.bizurl                ...
---  --------------     ----------  -----------------------  ---
...  some_comp_id       1           some-Yelp-URL            ...
...  some_comp_id       2           some-Urbanspoon-URL-URL  ...
...  some_comp_id       3           some-Trip_Advisor-URL    ...
...  some_comp_id       4           some-Citysearch-URL      ...
...  some_comp_id       5           some-Open_Table-URL      ...
...  some_comp_id       6           some-ZAGAT-URL           ...
...  some_comp_id       32          some-Zomato-URL          ...
...  ...                ...         ...                      ...

But you want them in a single row, like this:

...  am.z_companyid_fk  Yelp           Urbanspoon               Trip_Advisor           Citysearch           Open_Table          ZAGAT           Zomato
---  -----------------  -------------  -----------------------  ---------------------  -------------------  ------------------  --------------  ---------------
...  some_comp_id       some-Yelp-URL  some-Urbanspoon-URL-URL  some-Trip_Advisor-URL  some-Citysearch-URL  some-Open_Table-URL some-ZAGAT-URL  some-Zomato-URL
...  ...                ...            ...                      ...                    ...                  ...                 ...             ...

The process of turning the rows like this is called pivoting. So, you

pivot the results of the join (in MySQL you can do that using conditional aggregation):

SELECT
  am.z_companyid_fk,
  MAX( CASE WHEN us.isiteid= 1 THEN bizurl END ) AS "Yelp",
  MAX( CASE WHEN us.isiteid= 2 THEN bizurl END ) AS "Urbanspoon",
  MAX( CASE WHEN us.isiteid= 3 THEN bizurl END ) AS "Trip_Advisor",
  MAX( CASE WHEN us.isiteid= 4 THEN bizurl END ) AS "Citysearch",
  MAX( CASE WHEN us.isiteid= 5 THEN bizurl END ) AS "Open_Table",
  MAX( CASE WHEN us.isiteid= 6 THEN bizurl END ) AS "ZAGAT",
  MAX( CASE WHEN us.isiteid=32 THEN bizurl END ) AS "Zomato"
FROM
  user_settings us
  INNER JOIN asset_mst am ON (us.userid=am.userid)
GROUP BY
  am.z_companyid_fk

outer-join to them in the main FROM clause:

LEFT JOIN
(
  SELECT
    am.z_companyid_fk,
  …
  …
  …
  GROUP BY
    am.z_companyid_fk
) AS url ON url.z_companyid_fk=a.z_companyid_pk

and finally just reference the pivoted column names in the main SELECT clause instead of those correlated subqueries:
```
url."Yelp",
url."Urbanspoon",
url."Trip_Advisor",
url."Citysearch",
url."Open_Table",
url."ZAGAT",
url."Zomato",
```

Finally, the correlated subqueries in this group:

(select if(count(*)>=1,"YES","NO") from webprofile where z_boxid_fk in(select z_boxid_pk from box_mst where z_companyid_fk=a.z_companyid_pk)) "V1_Website",
(select if(count(*)>=1,"YES","NO") from webapp_mst where z_companyid_fk=a.z_companyid_pk) "V2_Website",
(select if(count(*)>0,"YES","NO") from widget_mst where widget_status=1 and type in("templated-menu","menus") and z_companyid_fk=a.z_companyid_pk) "Widgets_with_my_menu",
(select if(count(imagehtml)>0,"YES","NO") from image_mst where  length(imagehtml)>0 and  z_companyid_fk=a.z_companyid_pk)"HTML_menus",

all have the same pattern: they all use COUNT(*) to determine whether to display YES or NO. Depending on the DB engine you are using, that may be inefficient. It might be a better idea to use EXISTS instead. So, you could rewrite the above like this:

IF(EXISTS (select * from webprofile where z_boxid_fk in(select z_boxid_pk from box_mst where z_companyid_fk=a.z_companyid_pk)),       'YES', 'NO') AS "V1_Website",
IF(EXISTS (select * from webapp_mst where z_companyid_fk=a.z_companyid_pk),                                                           'YES', 'NO') AS "V2_Website",
IF(EXISTS (select * from widget_mst where widget_status=1 and type in('templated-menu','menus') and z_companyid_fk=a.z_companyid_pk), 'YES', 'NO') AS "Widgets_with_my_menu",
IF(EXISTS (select * from image_mst  where length(imagehtml)>0 and z_companyid_fk=a.z_companyid_pk),                                   'YES', 'NO') AS "HTML_menus",

Alternatively you could be returning just the word YES and just a single row (using LIMIT 1) in each case, substituting a NO for a possible null result with IFNULL or COALESCE:

IFNULL( (select 'YES' from webprofile where z_boxid_fk in(select z_boxid_pk from box_mst where z_companyid_fk=a.z_companyid_pk) LIMIT 1),       'NO') AS "V1_Website",
IFNULL( (select 'YES' from webapp_mst where z_companyid_fk=a.z_companyid_pk LIMIT 1),                                                           'NO') AS "V2_Website",
IFNULL( (select 'YES' from widget_mst where widget_status=1 and type in('templated-menu','menus') and z_companyid_fk=a.z_companyid_pk LIMIT 1), 'NO') AS "Widgets_with_my_menu",
IFNULL( (select 'YES' from image_mst where length(imagehtml)>0 and z_companyid_fk=a.z_companyid_pk LIMIT 1),                                    'NO') AS "HTML_menus",

There may be other parts worth attention but these three were ones that stood out for me.

In conclusion, I would just like to add that the rewrite in the first bullet point is definitely going to be an improvement to your query as posted in your question. The other two I am less certain about, so I would advise you to try/test them independently.

Best Answer

Related Solutions

SQL:2008 Standard – Is CTE Optimization Fence Behavior Specified?

MySQL Performance – How to Improve Query Execution Time

Related Question