There are at least three parts of your query that will or may benefit from rewriting:
These five correlated subqueries in the SELECT clause are retrieving data from the same table based on the same condition:
(select websitelink from company_profile where z_companyid_fk=a.z_companyid_pk) "Website",
(select company_email from company_profile where z_companyid_fk=a.z_companyid_pk) "Primary_Contact_Email",
(select telephone from company_profile where z_companyid_fk=a.z_companyid_pk) "company_telephone",
(select facebooklink from company_profile where z_companyid_fk=a.z_companyid_pk) "facebook",
(select twitterlink from company_profile where z_companyid_fk=a.z_companyid_pk) "twitter",
You could rewrite them by introducing a left join to company_profile
in your main FROM clause and referencing just the columns in the main SELECT. But wait, you already have such a join:
LEFT OUTER JOIN company_profile c on a.z_companyid_pk=c.z_companyid_fk
For some reason you are not using it. You can use it by rewriting the above five columns like this:
c.websitelink AS "Website",
c.company_email AS "Primary_Contact_Email",
c.telephone AS "company_telephone",
c.facebooklink AS "facebook",
c.twitterlink AS "twitter",
(The AS
keyword is unnecessary. I use it because in my opinion it improves readability, but you are free to omit it.)
This series of correlated subqueries is pulling data from the same rowset as well, but each subquery has an additional condition, which makes this a different case from the previous one:
(SELECT bizurl FROM user_settings us INNER JOIN asset_mst am ON (us.userid=am.userid) where us.isiteid=1 and am.z_companyid_fk=a.z_companyid_pk ) "Yelp",
(SELECT bizurl FROM user_settings us INNER JOIN asset_mst am ON (us.userid=am.userid) where us.isiteid=2 and am.z_companyid_fk=a.z_companyid_pk ) "Urbanspoon",
(SELECT bizurl FROM user_settings us INNER JOIN asset_mst am ON (us.userid=am.userid) where us.isiteid=3 and am.z_companyid_fk=a.z_companyid_pk ) "Trip_Advisor",
(SELECT bizurl FROM user_settings us INNER JOIN asset_mst am ON (us.userid=am.userid) where us.isiteid=4 and am.z_companyid_fk=a.z_companyid_pk ) "Citysearch",
(SELECT bizurl FROM user_settings us INNER JOIN asset_mst am ON (us.userid=am.userid) where us.isiteid=5 and am.z_companyid_fk=a.z_companyid_pk ) "Open_Table",
(SELECT bizurl FROM user_settings us INNER JOIN asset_mst am ON (us.userid=am.userid) where us.isiteid=6 and am.z_companyid_fk=a.z_companyid_pk ) "ZAGAT",
(SELECT bizurl FROM user_settings us INNER JOIN asset_mst am ON (us.userid=am.userid) where us.isiteid=32 and am.z_companyid_fk=a.z_companyid_pk ) "Zomato",
Rewriting these will require more than just and outer join in order to preserve the same output format.
First of all, this join
user_settings us INNER JOIN asset_mst am ON (us.userid=am.userid)
probably returns data like this:
... am.z_companyid_fk us.isiteid us.bizurl ...
--- -------------- ---------- ----------------------- ---
... some_comp_id 1 some-Yelp-URL ...
... some_comp_id 2 some-Urbanspoon-URL-URL ...
... some_comp_id 3 some-Trip_Advisor-URL ...
... some_comp_id 4 some-Citysearch-URL ...
... some_comp_id 5 some-Open_Table-URL ...
... some_comp_id 6 some-ZAGAT-URL ...
... some_comp_id 32 some-Zomato-URL ...
... ... ... ... ...
But you want them in a single row, like this:
... am.z_companyid_fk Yelp Urbanspoon Trip_Advisor Citysearch Open_Table ZAGAT Zomato
--- ----------------- ------------- ----------------------- --------------------- ------------------- ------------------ -------------- ---------------
... some_comp_id some-Yelp-URL some-Urbanspoon-URL-URL some-Trip_Advisor-URL some-Citysearch-URL some-Open_Table-URL some-ZAGAT-URL some-Zomato-URL
... ... ... ... ... ... ... ... ...
The process of turning the rows like this is called pivoting. So, you
pivot the results of the join (in MySQL you can do that using conditional aggregation):
SELECT
am.z_companyid_fk,
MAX( CASE WHEN us.isiteid= 1 THEN bizurl END ) AS "Yelp",
MAX( CASE WHEN us.isiteid= 2 THEN bizurl END ) AS "Urbanspoon",
MAX( CASE WHEN us.isiteid= 3 THEN bizurl END ) AS "Trip_Advisor",
MAX( CASE WHEN us.isiteid= 4 THEN bizurl END ) AS "Citysearch",
MAX( CASE WHEN us.isiteid= 5 THEN bizurl END ) AS "Open_Table",
MAX( CASE WHEN us.isiteid= 6 THEN bizurl END ) AS "ZAGAT",
MAX( CASE WHEN us.isiteid=32 THEN bizurl END ) AS "Zomato"
FROM
user_settings us
INNER JOIN asset_mst am ON (us.userid=am.userid)
GROUP BY
am.z_companyid_fk
outer-join to them in the main FROM clause:
LEFT JOIN
(
SELECT
am.z_companyid_fk,
…
…
…
GROUP BY
am.z_companyid_fk
) AS url ON url.z_companyid_fk=a.z_companyid_pk
and finally just reference the pivoted column names in the main SELECT clause instead of those correlated subqueries:
url."Yelp",
url."Urbanspoon",
url."Trip_Advisor",
url."Citysearch",
url."Open_Table",
url."ZAGAT",
url."Zomato",
Finally, the correlated subqueries in this group:
(select if(count(*)>=1,"YES","NO") from webprofile where z_boxid_fk in(select z_boxid_pk from box_mst where z_companyid_fk=a.z_companyid_pk)) "V1_Website",
(select if(count(*)>=1,"YES","NO") from webapp_mst where z_companyid_fk=a.z_companyid_pk) "V2_Website",
(select if(count(*)>0,"YES","NO") from widget_mst where widget_status=1 and type in("templated-menu","menus") and z_companyid_fk=a.z_companyid_pk) "Widgets_with_my_menu",
(select if(count(imagehtml)>0,"YES","NO") from image_mst where length(imagehtml)>0 and z_companyid_fk=a.z_companyid_pk)"HTML_menus",
all have the same pattern: they all use COUNT(*)
to determine whether to display YES
or NO
. Depending on the DB engine you are using, that may be inefficient. It might be a better idea to use EXISTS instead. So, you could rewrite the above like this:
IF(EXISTS (select * from webprofile where z_boxid_fk in(select z_boxid_pk from box_mst where z_companyid_fk=a.z_companyid_pk)), 'YES', 'NO') AS "V1_Website",
IF(EXISTS (select * from webapp_mst where z_companyid_fk=a.z_companyid_pk), 'YES', 'NO') AS "V2_Website",
IF(EXISTS (select * from widget_mst where widget_status=1 and type in('templated-menu','menus') and z_companyid_fk=a.z_companyid_pk), 'YES', 'NO') AS "Widgets_with_my_menu",
IF(EXISTS (select * from image_mst where length(imagehtml)>0 and z_companyid_fk=a.z_companyid_pk), 'YES', 'NO') AS "HTML_menus",
Alternatively you could be returning just the word YES
and just a single row (using LIMIT 1
) in each case, substituting a NO
for a possible null result with IFNULL or COALESCE:
IFNULL( (select 'YES' from webprofile where z_boxid_fk in(select z_boxid_pk from box_mst where z_companyid_fk=a.z_companyid_pk) LIMIT 1), 'NO') AS "V1_Website",
IFNULL( (select 'YES' from webapp_mst where z_companyid_fk=a.z_companyid_pk LIMIT 1), 'NO') AS "V2_Website",
IFNULL( (select 'YES' from widget_mst where widget_status=1 and type in('templated-menu','menus') and z_companyid_fk=a.z_companyid_pk LIMIT 1), 'NO') AS "Widgets_with_my_menu",
IFNULL( (select 'YES' from image_mst where length(imagehtml)>0 and z_companyid_fk=a.z_companyid_pk LIMIT 1), 'NO') AS "HTML_menus",
There may be other parts worth attention but these three were ones that stood out for me.
In conclusion, I would just like to add that the rewrite in the first bullet point is definitely going to be an improvement to your query as posted in your question. The other two I am less certain about, so I would advise you to try/test them independently.
Best Answer
There is no absolute number for the percentage of database time spent planning queries, it depends on your workload.
If the workload consists of analytical queries, planning time is usually insignificant in comparison to query execution time.
If the workload consists of simple OLTP queries, planning time can exceed query execution time.
I didn't read the article you quote, but the PostgreSQL optimizer is invoked once per SQL statement executed. Of course it considers several plans to find the best one.
For a simple benchmark, you can use the built-in PostgreSQL benchmark tool pgbench. While it is not a full-fledged workload simulator, it is surprisingly volatile. I'll use the default workload, which is a simple OLTP workload.
First, create a sample database with a scale factor appropriate for what you want to test (in this case, 10 million rows in the largest table):
Then run the benchmark with the “simple query protocol” (one-shot queries that are planned each time they are executed):
Repeat the benchmark using the extended query protocol with prepared statements, so that query plans are reused (this roughly corresponds to query execution time without planning):
So in this simple scenario, we got more than 25% more transactions done by excluding planning time. For more reliable values, run the test for more than 60 seconds.
Of course this number doesn't accurately measure query planning time versus query execution time, because there is still the latency of the round trip between client and server to consider. You could get better numbers by writing your own PL/pgSQL code that runs static or dynamic SQL statements in a tight loop. Since the plans of static SQL statements in PL/pgSQL statements are cached, their runtime will exclude query planning time.
You can also write your own benchmarks with pgbench; it allows you to write custom scripts and even offers a simple scripting language.