I have a modified query from JOB benchmark:
SELECT *
FROM company_type AS ct,
info_type AS it,
movie_companies AS mc,
movie_info_idx AS mi_idx,
title AS t
WHERE ct.ct_kind = 'production companies'
AND it.it_info = 'top 250 rank'
AND t.t_production_year >2010
AND ct.ct_id = mc.mc_company_type_id
AND t.t_id = mc.mc_movie_id
AND t.t_id = mi_idx.mii_movie_id
AND it.it_id = mi_idx.mii_info_type_id;
Now optimizer is introducing condition
mc.mc_movie_id = mi_idx.mii_movie_id
in plan.
Here is the plan
Hash Join (cost=127405.50..221937.06 rows=150 width=275)
Hash Cond: (t.t_id = mc.mc_movie_id)
-> Seq Scan on title t (cost=0.00..91054.20 rows=926896 width=94)
Filter: (t_production_year > 2010)
-> Hash (cost=127398.60..127398.60 rows=552 width=181)
-> Hash Join (cost=37908.32..127398.60 rows=552 width=181)
Hash Cond: (mc.mc_movie_id = mi_idx.mii_movie_id) // HERE
-> Hash Join (cost=18.93..89399.66 rows=20806 width=114)
Hash Cond: (mc.mc_company_type_id = ct.ct_id)
-> Seq Scan on movie_companies mc (cost=0.00..75323.67 rows=3693067 width=28)
-> Hash (cost=18.88..18.88 rows=4 width=86)
-> Seq Scan on company_type ct (cost=0.00..18.88 rows=4 width=86)
Filter: ((ct_kind)::text = 'production companies'::text)
-> Hash (cost=37676.66..37676.66 rows=17019 width=67)
-> Hash Join (cost=2.43..37676.66 rows=17019 width=67)
Hash Cond: (mi_idx.mii_info_type_id = it.it_id)
-> Seq Scan on movie_info_idx mi_idx (cost=0.00..30292.03 rows=1923203 width=49)
-> Hash (cost=2.41..2.41 rows=1 width=18)
-> Seq Scan on info_type it (cost=0.00..2.41 rows=1 width=18)
Filter: ((it_info)::text = 'top 250 rank'::text)
This condition
Hash Cond: (mc.mc_movie_id = mi_idx.mii_movie_id)
is not present in original sql query.
My question is, how can I prevent postgres from introducing this condition in plan?
The application is written in our lab as a proof of concept which assumes that there are only pk-fk joins in the database that's why I need it this way. The input to the application is the query plan.
I know it's better this way but I'm testing an application which only supports pk-fk joins. None of mc_movie_id
or mii_movie_id
is a pk.
Note: Using explicit JOIN
syntax is not helping – the same plan is obtained.
Best Answer
Workarounds by changing the query
By changing one part to be a sub query and using OFFSET 0
OFFSET 0 won't allow optimizer to pull subquery up in plan tree.
One more workaround is to change query such that joins in query are in order which we want and setting join_collapse_limit = 1 in PostgreSQL configuration file which essentially tells optimizer not to change join order. More about join_collapse_limit here
Note : Explicit JOIN syntax should be used
So new query with explicit join order will look like this
Corresponding plan