PostgreSQL – Slow Recursive Query with Low Load and Minimal Data

index-tuningperformancepostgresqlpostgresql-9.6recursive

I'm building a simple ACL system and the datastore is Postgres 9.6.
Permissions are assigned to groups, which can have subgroups, and people are assigned to groups.

I have a poor performing query responsible for retrieving all permissions related to a user, and I need your help to optimize/rewrite it.

The simplified (but relevant) data model involved in the slow query is as such:

CREATE TABLE IF NOT EXISTS "acl_group" (id BIGSERIAL PRIMARY KEY, parent_id BIGINT, name TEXT NOT NULL);
CREATE TABLE IF NOT EXISTS "acl_group_membership" (group_id BIGINT NOT NULL, subject_id TEXT NOT NULL);
CREATE TABLE IF NOT EXISTS "acl" (id BIGSERIAL PRIMARY KEY, group_id BIGINT NOT NULL, service TEXT NOT NULL, action TEXT NOT NULL, resources TEXT NOT NULL);

Foreign keys and unique constraints are there, plus two additional indexes:

CREATE INDEX ON "acl_group_membership" (subject_id);
CREATE INDEX ON "acl" (group_id);

The query:

WITH RECURSIVE group_tree AS (
SELECT acl_group.id, acl_group.parent_id, 1 AS level
FROM acl_group
INNER JOIN acl_group_membership AS agm ON agm.group_id = acl_group.id
WHERE agm.subject_id = $1
UNION ALL
SELECT c.id, c.parent_id, p.level + 1
FROM acl_group c
INNER JOIN group_tree p on c.id = p.parent_id
)
SELECT acl.*
FROM acl
INNER JOIN group_tree on group_tree.id = acl.group_id
ORDER BY group_tree.level asc, acl.id asc;

During my very moderate stress tests (average of 50 req/s), run on localhost with Postgres running in a Linux docker container, running ONLY this query on the DB (so no writes), with very few data in the DB (less than 100 lines amongst all tables), I have the following results:

  1. around 10% of requests run very slow, clocking at around 200ms or
    more
  2. a lot more clock at around 60ms
  3. only 50% clock at <10 ms.

Fast execution query plan (https://explain.depesz.com/s/M5tO):

Sort  (cost=355.67..356.08 rows=161 width=50) (actual time=0.988..1.065 rows=26 loops=1)
  Sort Key: group_tree.level, acl.id
  Sort Method: quicksort  Memory: 27kB
  CTE group_tree
    ->  Recursive Union  (cost=13.74..324.76 rows=606 width=20) (actual time=0.110..0.412 rows=3 loops=1)
          ->  Hash Join  (cost=13.74..38.51 rows=6 width=20) (actual time=0.103..0.138 rows=1 loops=1)
                Hash Cond: (acl_group.id = agm.group_id)
                ->  Seq Scan on acl_group  (cost=0.00..20.70 rows=1070 width=16) (actual time=0.007..0.040 rows=11 loops=1)
                ->  Hash  (cost=13.67..13.67 rows=6 width=8) (actual time=0.043..0.043 rows=1 loops=1)
                      Buckets: 1024  Batches: 1  Memory Usage: 9kB
                      ->  Bitmap Heap Scan on acl_group_membership agm  (cost=4.20..13.67 rows=6 width=8) (actual time=0.021..0.024 rows=1 loops=1)
                            Recheck Cond: (subject_id = 'team_df'::text)
                            Heap Blocks: exact=1
                            ->  Bitmap Index Scan on acl_group_membership_subject_id_idx  (cost=0.00..4.20 rows=6 width=0) (actual time=0.011..0.011 rows=1 loops=1)
                                  Index Cond: (subject_id = 'team_df'::text)
          ->  Hash Join  (cost=1.95..27.41 rows=60 width=20) (actual time=0.036..0.078 rows=1 loops=3)
                Hash Cond: (c.id = p.parent_id)
                ->  Seq Scan on acl_group c  (cost=0.00..20.70 rows=1070 width=16) (actual time=0.005..0.038 rows=11 loops=2)
                ->  Hash  (cost=1.20..1.20 rows=60 width=12) (actual time=0.017..0.017 rows=1 loops=3)
                      Buckets: 1024  Batches: 1  Memory Usage: 8kB
                      ->  WorkTable Scan on group_tree p  (cost=0.00..1.20 rows=60 width=12) (actual time=0.003..0.006 rows=1 loops=3)
  ->  Hash Join  (cost=2.19..25.01 rows=161 width=50) (actual time=0.469..0.881 rows=26 loops=1)
        Hash Cond: (group_tree.id = acl.group_id)
        ->  CTE Scan on group_tree  (cost=0.00..12.12 rows=606 width=12) (actual time=0.118..0.440 rows=3 loops=1)
        ->  Hash  (cost=1.53..1.53 rows=53 width=46) (actual time=0.333..0.333 rows=53 loops=1)
              Buckets: 1024  Batches: 1  Memory Usage: 13kB
              ->  Seq Scan on acl  (cost=0.00..1.53 rows=53 width=46) (actual time=0.004..0.158 rows=53 loops=1)
Planning time: 0.454 ms
Execution time: 1.216 ms

Slow execution query plan (182ms) https://explain.depesz.com/s/pZ4e

Sort  (cost=355.67..356.08 rows=161 width=50) (actual time=92.671..182.171 rows=26 loops=1)
      Sort Key: group_tree.level, acl.id
      Sort Method: quicksort  Memory: 27kB
      CTE group_tree
        ->  Recursive Union  (cost=13.74..324.76 rows=606 width=20) (actual time=0.225..0.491 rows=3 loops=1)
              ->  Hash Join  (cost=13.74..38.51 rows=6 width=20) (actual time=0.216..0.252 rows=1 loops=1)
                    Hash Cond: (acl_group.id = agm.group_id)
                    ->  Seq Scan on acl_group  (cost=0.00..20.70 rows=1070 width=16) (actual time=0.030..0.092 rows=11 loops=1)
                    ->  Hash  (cost=13.67..13.67 rows=6 width=8) (actual time=0.094..0.094 rows=1 loops=1)
                          Buckets: 1024  Batches: 1  Memory Usage: 9kB
                          ->  Bitmap Heap Scan on acl_group_membership agm  (cost=4.20..13.67 rows=6 width=8) (actual time=0.061..0.066 rows=1 loops=1)
                                Recheck Cond: (subject_id = 'team_df'::text)
                                Heap Blocks: exact=1
                                ->  Bitmap Index Scan on acl_group_membership_subject_id_idx  (cost=0.00..4.20 rows=6 width=0) (actual time=0.016..0.016 rows=1 loops=1)
                                      Index Cond: (subject_id = 'team_df'::text)
              ->  Hash Join  (cost=1.95..27.41 rows=60 width=20) (actual time=0.031..0.067 rows=1 loops=3)
                    Hash Cond: (c.id = p.parent_id)
                    ->  Seq Scan on acl_group c  (cost=0.00..20.70 rows=1070 width=16) (actual time=0.005..0.032 rows=11 loops=2)
                    ->  Hash  (cost=1.20..1.20 rows=60 width=12) (actual time=0.014..0.014 rows=1 loops=3)
                          Buckets: 1024  Batches: 1  Memory Usage: 8kB
                          ->  WorkTable Scan on group_tree p  (cost=0.00..1.20 rows=60 width=12) (actual time=0.003..0.006 rows=1 loops=3)
      ->  Hash Join  (cost=2.19..25.01 rows=161 width=50) (actual time=92.140..92.506 rows=26 loops=1)
            Hash Cond: (group_tree.id = acl.group_id)
            ->  CTE Scan on group_tree  (cost=0.00..12.12 rows=606 width=12) (actual time=0.326..0.610 rows=3 loops=1)
            ->  Hash  (cost=1.53..1.53 rows=53 width=46) (actual time=91.781..91.781 rows=53 loops=1)
                  Buckets: 1024  Batches: 1  Memory Usage: 13kB
                  ->  Seq Scan on acl  (cost=0.00..1.53 rows=53 width=46) (actual time=0.012..10.226 rows=53 loops=1)

I'm new to the world of Postgres, but I don't think such a simple query should run this slow with almost no data in it and performing only read operations, so I think there is some macroscopic error I cannot see.
Do you see a way to optimize/rewrite this query?

This is a Linux docker running on a physical Macbook Pro but we have similar results on an Amazon RDS instance.

Best Answer

The main issue may be something else going on on your server behind the scenes like dezso commented. A couple of observations though:

Your query looks good to me. This one is a bit simpler and might perform faster. Not sure, test it:

WITH RECURSIVE group_tree AS (
   SELECT group_id, 1 AS level
   FROM   acl_group_membership
   WHERE  subject_id = $1

   UNION  ALL
   SELECT ag.parent_id, gt.level + 1
   FROM   group_tree gt
   JOIN   acl_group  ag USING (group_id)
)
SELECT acl.*
FROM   group_tree gt
JOIN   acl USING (group_id)
ORDER  BY gt.level, acl.id;

If you need to optimize read performance further and don't have many writes (you only mention read-only), a materialized view could be an option.

If you only have less than 100 lines amongst all tables, we should certainly not be seeing any index or bitmap index scans. We should see sequential scans exclusively. Either some configuration settings are not right (prime suspect: cost settings) or table statistics are misleading.

Once you have fixed your server configuration and/or autovacuum settings and populated tables with a serious number of rows (thousands? millions?) it will help to have these indexes for a read-only load like you demonstrate (same for either query):

CREATE INDEX ON acl_group_membership (subject_id, group_id);
CREATE INDEX ON acl_group            (id, parent_id);
CREATE INDEX ON acl                  (group_id);

Aiming for index-only scans on acl_group_membership and acl_group.

My remark "thousands? millions?" is inspired by the bigint columns in your schema. Typically, a plain integer is big enough - and smaller and faster. Smaller and faster indexes, too.

Hypothetically, if you create a new row every second you need 68 years to exhaust the positive half of a plain integer (int4) column.
2^31 / (3600*24*365) = 68.1

I doubt your design has nearly as many writes.