Postgresql – How to get a parallel nested loop in postgres 12

execution-planparallelismpostgresql

I have a lateral join, and I would like each subselect to execute in parallel. According to the docs (https://www.postgresql.org/docs/12/parallel-plans.html), nested loops can be parallelized:

Just as in a non-parallel plan, the driving table may be joined to one or more other tables using a nested loop, hash join, or merge join. The inner side of the join may be any kind of non-parallel plan that is otherwise supported by the planner provided that it is safe to run within a parallel worker. Depending on the join type, the inner side may also be a parallel plan.

In a nested loop join, the inner side is always non-parallel. Although it is executed in full, this is efficient if the inner side is an index scan, because the outer tuples and thus the loops that look up values in the index are divided over the cooperating processes.

However, I can't get a nested loop – or any join – to execute in parallel. Here's a (contrived) example:

=> create table foo as
  select round(random()*100) as pid, val
  from generate_series(1, 10000000) v(val);
=> set force_parallel_mode = on;
=> explain
  SELECT pid, sum
  FROM (
    select distinct pid
    from foo
  ) x
  JOIN LATERAL (
    select sum(val)
    from foo y
    where x.pid = y.pid
  ) y
    ON TRUE;
                                  QUERY PLAN                                   
-------------------------------------------------------------------------------
 Nested Loop  (cost=358361.91..18288838.71 rows=101 width=16)
   ->  HashAggregate  (cost=179057.19..179058.20 rows=101 width=8)
         Group Key: foo.pid
         ->  Seq Scan on foo  (cost=0.00..154056.75 rows=10000175 width=8)
   ->  Aggregate  (cost=179304.72..179304.73 rows=1 width=8)
         ->  Seq Scan on foo y  (cost=0.00..179057.19 rows=99012 width=4)
               Filter: (foo.pid = pid)
 JIT:
   Functions: 12
   Options: Inlining true, Optimization true, Expressions true, Deforming true
(10 rows)

I know that parallel plans are sometimes generated on this database, here is an example:

=> explain select pid, sum(val) from foo group by 1;
                                          QUERY PLAN                                           
-----------------------------------------------------------------------------------------------
 Finalize GroupAggregate  (cost=92560.09..92577.55 rows=101 width=16)
   Group Key: pid
   ->  Gather Merge  (cost=92560.09..92574.52 rows=404 width=16)
         Workers Planned: 4
         ->  Sort  (cost=91560.03..91560.28 rows=101 width=16)
               Sort Key: pid
               ->  Partial HashAggregate  (cost=91555.66..91556.67 rows=101 width=16)
                     Group Key: pid
                     ->  Parallel Seq Scan on foo  (cost=0.00..79055.44 rows=2500044 width=12)
(9 rows)

postgres server version: 12.4

linux kernel version: 5.4

Best Answer

I think that lateral join (other than for function calls) just doesn't support parallel execution. Not everything that conceivably could support it has actually been implemented yet (and probably never will be).

Also, the DISTINCT feature also doesn't support parallel execution currently. But changing your example to get rid of that still does not show parallel execution.