Postgresql – A multi-table JOIN query as subquery to a query that JOINS the same tables. Good or bad

optimizationperformancepostgresqlquery-performancesubquery

I have the following tables in my postgres database

- power_relations        - (id, properties)
- power_relation_members - (id, power_relation_id, member_id, member_type)
- power_lines            - (id, geom)
- power_nodes            - (id, geom)

A power_relation can have multiple members which can either be power_nodes or power_lines. I use power_relation_members to track this polymorphic many-to-many relation.

The query I need to write has to return all the power_lines that belong to power_relations that contains power_lines matching a condition.

For example, if I have a

powerlines:

id, geom
----------
1, abc
2, d
3, g
4, j
5, m
6, abc

power_relation_members:

member_id, power_relation_id
----------------------------
1, r_1
2, r_1
3, r_1
4, r_2
5, r_2  

power_relations:

id, properties
--------------
r_1, "relation1"
r_2, "relation2"

In the above data, line 1 and 6 have the geom = 'abc'. Lines 1 – 3 are related to relation 1, lines 4-5 to relation 2 and no relation for line 6.
I would want to get all the power_lines that are related (via power_relation to the power_line whose geom is abc. That means, I need lines 1, 2, 3 as results from my query.

I could achieve my result in two steps:

  1. Get the distinct id's of power_relations that have power_lines that match my condition
  2. Get all the power_lines that are related to the power_relations whose id is in the ids fetched from step 1.

Instead, I wanted to do this all in one query by doing the step 1 in a subquery. This includes joining the same tables in the subquery and its outer query. This is how my query looks and it gives me the results as I expected:

SELECT DISTINCT l.id FROM powerline l
JOIN power_relation_members m
ON (l.id = m.member_id AND m.member_type = 'way')
JOIN power_relations r
ON (m.power_relation_id = r.id
    AND r.id in (
        SELECT DISTINCT s_r.id FROM powerline s_l
        JOIN power_relation_members s_m
        ON (s_l.id = s_m.member_id AND s_m.member_type = 'way')
        JOIN power_relations s_r
        on (s_m.power_relation_id = s_r.id)
        WHERE s_l.geom = 'somestring'
    )
);

You may see the simulation in the fiddle here http://sqlfiddle.com/#!15/2493b

I'm not very familiar with how the database's perform and I would like to know if this single query is optimal compared to the two step individual queries or if there is a better and optimal way to achieve the same.

Thank you.

Best Answer

I'm not sure why you need to join all three tables every time. For your specific example, what about the following query:

WITH rel AS (
  SELECT prm.power_relation_id
  FROM power_relation_members prm
  JOIN power_lines pl ON prm.member_id = pl.id
  WHERE pl.geom = :BIND_VAR_HERE -- in this case, 'abc'
  GROUP BY prm.power_relation_id
) SELECT
  pl.id, pl.geom
FROM power_relation_members prm
JOIN rel r ON prm.power_relation_id = r.power_relation_id
JOIN power_lines pl on prm.member_id = pl.id;

You reference member_type and power_relation_members.id in your SQL, but nowhere are they mentioned in your post apart from the DDL.. Is this important logically, or just a filter?