Postgresql – Postgres : prevent altering join condition in plan

join;postgresql

I have a modified query from JOB benchmark:

SELECT * 
FROM   company_type AS ct, 
       info_type AS it, 
       movie_companies AS mc, 
       movie_info_idx AS mi_idx, 
       title AS t 
WHERE  ct.ct_kind = 'production companies' 
AND    it.it_info = 'top 250 rank' 
AND    t.t_production_year >2010 
AND    ct.ct_id = mc.mc_company_type_id 
AND    t.t_id = mc.mc_movie_id 
AND    t.t_id = mi_idx.mii_movie_id 
AND    it.it_id = mi_idx.mii_info_type_id;

Now optimizer is introducing condition

mc.mc_movie_id = mi_idx.mii_movie_id

in plan.

Here is the plan

Hash Join  (cost=127405.50..221937.06 rows=150 width=275)
    Hash Cond: (t.t_id = mc.mc_movie_id)
    ->  Seq Scan on title t  (cost=0.00..91054.20 rows=926896 width=94)
            Filter: (t_production_year > 2010)
    ->  Hash  (cost=127398.60..127398.60 rows=552 width=181)
            ->  Hash Join  (cost=37908.32..127398.60 rows=552 width=181)
                    Hash Cond: (mc.mc_movie_id = mi_idx.mii_movie_id)  // HERE
                    ->  Hash Join  (cost=18.93..89399.66 rows=20806 width=114)
                            Hash Cond: (mc.mc_company_type_id = ct.ct_id)
                            ->  Seq Scan on movie_companies mc  (cost=0.00..75323.67 rows=3693067 width=28)
                            ->  Hash  (cost=18.88..18.88 rows=4 width=86)
                                    ->  Seq Scan on company_type ct  (cost=0.00..18.88 rows=4 width=86)
                                            Filter: ((ct_kind)::text = 'production companies'::text)
                    ->  Hash  (cost=37676.66..37676.66 rows=17019 width=67)
                            ->  Hash Join  (cost=2.43..37676.66 rows=17019 width=67)
                                    Hash Cond: (mi_idx.mii_info_type_id = it.it_id)
                                    ->  Seq Scan on movie_info_idx mi_idx  (cost=0.00..30292.03 rows=1923203 width=49)
                                    ->  Hash  (cost=2.41..2.41 rows=1 width=18)
                                            ->  Seq Scan on info_type it  (cost=0.00..2.41 rows=1 width=18)
                                                    Filter: ((it_info)::text = 'top 250 rank'::text)

This condition

Hash Cond: (mc.mc_movie_id = mi_idx.mii_movie_id)

is not present in original sql query.

My question is, how can I prevent postgres from introducing this condition in plan?

The application is written in our lab as a proof of concept which assumes that there are only pk-fk joins in the database that's why I need it this way. The input to the application is the query plan.

I know it's better this way but I'm testing an application which only supports pk-fk joins. None of mc_movie_id or mii_movie_id is a pk.

Note: Using explicit JOIN syntax is not helping – the same plan is obtained.

Best Answer

Workarounds by changing the query

  1. By changing one part to be a sub query and using OFFSET 0

    SELECT *
    FROM   company_type AS ct,
           movie_companies AS mc, (
               SELECT t_id 
               FROM   info_type AS it, 
                      movie_info_idx AS mi_idx, 
                      title AS t 
               WHERE  it.it_info = 'top 250 rank' 
               AND    t.t_id = mi_idx.mii_movie_id 
               AND    it.it_id = mi_idx.mii_info_type_id 
               AND    t.t_production_year >2010 OFFSET 0
               ) AS var 
    WHERE  var.t_id = mc.mc_movie_id 
    AND    ct.ct_kind = 'production companies' 
    AND    ct.ct_id = mc.mc_company_type_id;
    

    OFFSET 0 won't allow optimizer to pull subquery up in plan tree.

  2. One more workaround is to change query such that joins in query are in order which we want and setting join_collapse_limit = 1 in PostgreSQL configuration file which essentially tells optimizer not to change join order. More about join_collapse_limit here
    Note : Explicit JOIN syntax should be used

    So new query with explicit join order will look like this

    SELECT *
    FROM   title AS t
    JOIN   movie_info_idx AS mi_idx ON t.t_id = mi_idx.mii_movie_id
    JOIN   info_type AS it ON it.it_id = mi_idx.mii_info_type_id
    JOIN   movie_companies AS mc ON t.t_id = mc.mc_movie_id
    JOIN   company_type AS ct ON ct.ct_id = mc.mc_company_type_id
    WHERE  ct.ct_kind = 'production companies'
    AND    it.it_info = 'top 250 rank'
    AND    mc.mc_note NOT LIKE '%(as Metro-Goldwyn-Mayer Pictures)%'
    AND    (
         mc.mc_note LIKE '%(co-production)%' 
      OR mc.mc_note LIKE '%(presents)%');  
    

    Corresponding plan

    Hash Join  (cost=331522.33..438739.98 rows=1 width=281)
      Hash Cond: (mc.mc_company_type_id = ct.ct_id)
      ->  Hash Join  (cost=331503.40..438720.17 rows=234 width=195)
            Hash Cond: (mc.mc_movie_id = t.t_id)
            ->  Seq Scan on movie_companies mc  (cost=0.00..107038.63 rows=46879 width=34)
                  Filter: ((mc_note !~~ '%(as Metro-Goldwyn-Mayer Pictures)%'::text) AND ((mc_note ~~ '%(co-production)%'::text) OR (mc_note ~~ '%(presents)%'::text)))
            ->  Hash  (cost=331290.67..331290.67 rows=17019 width=161)
                  ->  Hash Join  (cost=174945.39..331290.67 rows=17019 width=161)
                        Hash Cond: (mi_idx.mii_info_type_id = it.it_id)
                        ->  Hash Join  (cost=174942.96..323906.04 rows=1923203 width=143)
                              Hash Cond: (mi_idx.mii_movie_id = t.t_id)
                              ->  Seq Scan on movie_info_idx mi_idx  (cost=0.00..30292.03 rows=1923203 width=49)
                              ->  Hash  (cost=82545.76..82545.76 rows=3403376 width=94)
                                    ->  Seq Scan on title t  (cost=0.00..82545.76 rows=3403376 width=94)
                        ->  Hash  (cost=2.41..2.41 rows=1 width=18)
                              ->  Seq Scan on info_type it  (cost=0.00..2.41 rows=1 width=18)
                                    Filter: ((it_info)::text = 'top 250 rank'::text)
      ->  Hash  (cost=18.88..18.88 rows=4 width=86)
            ->  Seq Scan on company_type ct  (cost=0.00..18.88 rows=4 width=86)
                  Filter: ((ct_kind)::text = 'production companies'::text)