I've a trouble with PostgreSQL 9.6 query planning. My query looks like this:
SET role plain_user;
SELECT properties.*
FROM properties
JOIN entries_properties
ON properties.id = entries_properties.property_id
JOIN structures
ON structures.id = entries_properties.entry_id
WHERE structures."STRUKTURBERICHT" != ''
AND properties."COMPOSITION" LIKE 'Mo%'
AND (
properties."NAME" LIKE '%VASP-ase-preopt%'
OR properties."CALCULATOR_ID" IN (7,22,25)
)
AND properties."TYPE_ID" IN (6)
I have Row-Level Security enabled for the above-used tables.
-
with
set enable_nestloop = True
, the query planner runs
Nested Loop joining with a total running time of about 37 seconds: https://explain.depesz.com/s/59BR -
with
set enable_nestloop = False
, the Hash Join method is used and the query time is about 0.3 sec: https://explain.depesz.com/s/PG8E
I did VACUUM ANALYZE
before running the queries, but it didn't help.
I know that it is not a good practice to set enable_nestloop = False
, and any other similar options for the planner. But how could I "convince" the planner to use hash joins without disabling nested loops?
Rewriting the query is an option.
If I run the same query under a role that bypasses RLS, then it is executed very fast. The row-level security policy looks like this:
CREATE POLICY properties_select
ON properties
FOR SELECT
USING (
(
properties.ouid = get_current_user_id()
AND properties.ur
)
OR (
properties.ogid in (select get_current_groups_id())
AND properties.gr
)
OR properties.ar
);
Any ideas or suggestions would be greatly appreciated.
Best Answer
What's happening here is the Nested Loop is way off on one side. Nested Loops work really well when one side is very small, such as returning one row. In your query, the planner fumbles here and estimates that a Hash Join will return just one row. Instead, that Hash Join (property_id = id) returns 1,338 rows. This forces 1,338 loops to run on the other side of the Nested Loop which already has 3,444 rows. That's a hella-lot when you're only expecting one (which isn't even much of a "loop"). Anywayy..
Further examination as we move down shows that the Hash Join is really borked by the estimations arising from this,
PostgreSQL expects that to return one row. But it doesn't. And, that's really your problem. So some options here, that don't involve taking out a sledge hammer and disabling
nested_loop
You can add an index or two to
properties
to help it potentially skip the seq scan entirely, or better estimate the return.Alternatively, you can move the properties stuff to a CTE or subselect with
OFFSET 0
which creates a fence.