Postgresql – Why does a list of 10,000 IDs perform better in a complicated query with multiple CTEs compared to the equivalent SQL to select them

performancepostgresqlpostgresql-performancequery-performance

I have a Rails application with a legacy query that I'd like to refurbish. The current implementation performs two SQL queries: one to get a large number of IDs and a second query that uses those IDs and applies some additional joins and filters to get the desired result.

I am trying to replace this with a single query that avoids the round trip, but doing so has incurred a large performance degradation in my local testing environment (which is a copy of the full production dataset). It appears that an index is not being used in the new query, leading to a full table scan. I had hoped the single query would keep the same performance as the original code, ideally improving on it due to not needing to send all the IDs around.

I've asked about a simplified version of this problem in Why does a list of 10,000 IDs perform better than using the equivalent SQL to select them?, but applying the answers there to my larger code still produced slow queries.

Current Query

There's a query that takes ~9.5 seconds to calculate a list of 10000+ IDs. You can see that as the CTE visible_projects in the "proposed query" section below. Those IDs are then fed into this query:

EXPLAIN (ANALYZE, BUFFERS)
WITH visible_projects AS (
    SELECT
        id AS project_id
    FROM
        projects
    WHERE
        projects.id IN ( /* elided 10K IDs */ )),
    available_tasks AS (
        SELECT
            tasks.duedate AS task_due_date,
            tasks.predicted_duration AS predicted_duration
        FROM
            visible_projects
            INNER JOIN tasks ON tasks.project_id = visible_projects.project_id
        WHERE
            tasks.done_at IS NULL
            AND (tasks.dueafter IS NULL
                OR tasks.dueafter < 0)
            AND (tasks.assignable_type = 'Staff'
                AND tasks.assignable_id IS NOT NULL)
        GROUP BY
            tasks.id),
        available_jobs AS (
            SELECT
                projects.cached_effective_due_date AS project_due_date,
                projects.predicted_duration AS predicted_duration
            FROM
                visible_projects
                INNER JOIN projects ON projects.id = visible_projects.project_id
                LEFT OUTER JOIN tasks ON tasks.project_id = visible_projects.project_id
            WHERE
                projects.manager_id IS NOT NULL
                OR (tasks.assignable_type = 'Staff'
                    AND tasks.assignable_id IS NOT NULL)
            GROUP BY
                projects.id),
            projects_overdue AS (
                SELECT
                    count(1),
                    COALESCE(ROUND(SUM(predicted_duration) / 3600.0, 1), 0) AS hours
                FROM
                    available_jobs
                WHERE
                    project_due_date < '2020-12-22 18:03:25.364774'), tasks_overdue AS (
                    SELECT
                        count(1),
                        COALESCE(ROUND(SUM(predicted_duration) / 3600.0, 1), 0) AS hours
                    FROM
                        available_tasks
                    WHERE
                        task_due_date < '2020-12-22 18:03:25.364774'
)
                SELECT
                    projects_overdue.count AS projects_overdue,
                    projects_overdue.hours AS projects_overdue_hours,
                    tasks_overdue.count AS tasks_overdue,
                    tasks_overdue.hours AS tasks_overdue_hours
                FROM
                    projects_overdue,
                    tasks_overdue;

Query plan (depesz)

Nested Loop  (cost=5447822.09..5447822.15 rows=1 width=80) (actual time=2443.084..2443.089 rows=1 loops=1)
   Buffers: shared hit=115035 read=21033
   CTE visible_projects
     ->  Index Only Scan using projects_pkey on projects projects_1  (cost=0.43..42067.76 rows=10563 width=4) (actual time=2.928..165.328 rows=10563 loops=1)
           Index Cond: (id = ANY ('{}'::integer[]))
           Heap Fetches: 0
           Buffers: shared hit=34250 read=919
   ->  Aggregate  (cost=2821537.60..2821537.62 rows=1 width=40) (actual time=1704.568..1704.570 rows=1 loops=1)
         Buffers: shared hit=71584 read=9556
         ->  HashAggregate  (cost=2765553.31..2795944.78 rows=1706188 width=16) (actual time=1704.390..1704.521 rows=596 loops=1)
               Group Key: projects.id
               Planned Partitions: 64  Batches: 1  Memory Usage: 817kB
               Buffers: shared hit=71584 read=9556
               ->  Nested Loop Left Join  (cost=1.00..2673845.70 rows=1706188 width=8) (actual time=3.818..1702.845 rows=6491 loops=1)
                     Filter: ((projects.manager_id IS NOT NULL) OR (((tasks.assignable_type)::text = 'Staff'::text) AND (tasks.assignable_id IS NOT NULL)))
                     Rows Removed by Filter: 113
                     Buffers: shared hit=71584 read=9556
                     ->  Nested Loop  (cost=0.43..84745.43 rows=9292 width=16) (actual time=3.804..1690.635 rows=607 loops=1)
                           Buffers: shared hit=67952 read=9469
                           ->  CTE Scan on visible_projects  (cost=0.00..211.26 rows=10563 width=4) (actual time=2.930..170.504 rows=10563 loops=1)
                                 Buffers: shared hit=34250 read=919
                           ->  Index Scan using projects_pkey on projects  (cost=0.43..8.00 rows=1 width=12) (actual time=0.143..0.143 rows=0 loops=10563)
                                 Index Cond: (id = visible_projects.project_id)
                                 Filter: (cached_effective_due_date < '2020-12-22 18:03:25.364774'::timestamp without time zone)
                                 Rows Removed by Filter: 1
                                 Buffers: shared hit=33702 read=8550
                     ->  Index Scan using index_tasks_on_project_id on tasks  (cost=0.57..276.16 rows=198 width=14) (actual time=0.010..0.016 rows=11 loops=607)
                           Index Cond: (project_id = visible_projects.project_id)
                           Buffers: shared hit=3632 read=87
   ->  Aggregate  (cost=2584216.73..2584216.75 rows=1 width=40) (actual time=738.513..738.515 rows=1 loops=1)
         Buffers: shared hit=43451 read=11477
         ->  Group  (cost=2576823.33..2578671.68 rows=369670 width=16) (actual time=736.748..738.092 rows=6205 loops=1)
               Group Key: tasks_1.id
               Buffers: shared hit=43451 read=11477
               ->  Sort  (cost=2576823.33..2577747.51 rows=369670 width=8) (actual time=736.746..737.118 rows=6205 loops=1)
                     Sort Key: tasks_1.id
                     Sort Method: quicksort  Memory: 483kB
                     Buffers: shared hit=43451 read=11477
                     ->  Nested Loop  (cost=0.57..2537578.97 rows=369670 width=8) (actual time=0.564..735.562 rows=6205 loops=1)
                           Buffers: shared hit=43451 read=11477
                           ->  CTE Scan on visible_projects visible_projects_1  (cost=0.00..211.26 rows=10563 width=4) (actual time=0.000..1.680 rows=10563 loops=1)
                           ->  Index Scan using index_tasks_on_project_id on tasks tasks_1  (cost=0.57..239.86 rows=35 width=12) (actual time=0.067..0.069 rows=1 loops=10563)
                                 Index Cond: (project_id = visible_projects_1.project_id)
                                 Filter: ((done_at IS NULL) AND (assignable_id IS NOT NULL) AND ((dueafter IS NULL) OR (dueafter < 0)) AND (duedate < '2020-12-22 18:03:25.364774'::timestamp without time zone) AND ((assignable_type)::text = 'Staff'::text))
                                 Rows Removed by Filter: 4
                                 Buffers: shared hit=43451 read=11477
 Planning Time: 6.109 ms
 Execution Time: 2443.244 ms

Proposed Query

This is the same query structure, but instead of inserting the 10000+ IDs directly into the visible_projects CTE, I've embedded the SQL that finds those IDs.

EXPLAIN (ANALYZE, BUFFERS)
WITH visible_projects AS (
    SELECT
        id AS project_id
    FROM
        "projects"
    WHERE
        "projects"."company_id" = 11171
        AND "projects"."state" < 6
        AND "projects"."is_template" = FALSE),
    available_tasks AS (
        SELECT
            tasks.duedate AS task_due_date,
            tasks.predicted_duration AS predicted_duration
        FROM
            visible_projects
            INNER JOIN tasks ON tasks.project_id = visible_projects.project_id
        WHERE
            tasks.done_at IS NULL
            AND (tasks.dueafter IS NULL
                OR tasks.dueafter < 0)
            AND (tasks.assignable_type = 'Staff'
                AND tasks.assignable_id IS NOT NULL)
        GROUP BY
            tasks.id),
        available_jobs AS (
            SELECT
                projects.cached_effective_due_date AS project_due_date,
                projects.predicted_duration AS predicted_duration
            FROM
                visible_projects
                INNER JOIN projects ON projects.id = visible_projects.project_id
                LEFT OUTER JOIN tasks ON tasks.project_id = visible_projects.project_id
            WHERE
                projects.manager_id IS NOT NULL
                OR (tasks.assignable_type = 'Staff'
                    AND tasks.assignable_id IS NOT NULL)
            GROUP BY
                projects.id),
            projects_overdue AS (
                SELECT
                    count(1),
                    COALESCE(ROUND(SUM(predicted_duration) / 3600.0, 1), 0) AS hours
                FROM
                    available_jobs
                WHERE
                    project_due_date < '2020-12-22 18:03:25.364774'), tasks_overdue AS (
                    SELECT
                        count(1),
                        COALESCE(ROUND(SUM(predicted_duration) / 3600.0, 1), 0) AS hours
                    FROM
                        available_tasks
                    WHERE
                        task_due_date < '2020-12-22 18:03:25.364774'
)
                SELECT
                    projects_overdue.count AS projects_overdue,
                    projects_overdue.hours AS projects_overdue_hours,
                    tasks_overdue.count AS tasks_overdue,
                    tasks_overdue.hours AS tasks_overdue_hours
                FROM
                    projects_overdue,
                    tasks_overdue;

Query plan (depesz):

 Nested Loop  (cost=8586055.43..8586055.50 rows=1 width=80) (actual time=130790.034..130790.040 rows=1 loops=1)
   Buffers: shared hit=50640 read=3110004, temp read=116390 written=351693
   CTE visible_projects
     ->  Index Scan using index_projects_on_company_id on projects projects_1  (cost=0.43..332273.45 rows=36968 width=4) (actual time=2.027..10604.594 rows=10563 loops=1)
           Index Cond: (company_id = 11171)
           Filter: ((NOT is_template) AND (state < 6))
           Rows Removed by Filter: 65830
           Buffers: shared hit=68 read=61114
   ->  Aggregate  (cost=4926932.94..4926932.95 rows=1 width=40) (actual time=102177.365..102177.368 rows=1 loops=1)
         Buffers: shared hit=50640 read=1581399, temp read=83815 written=319118
         ->  HashAggregate  (cost=4730993.91..4837360.81 rows=5971475 width=16) (actual time=102177.120..102177.304 rows=596 loops=1)
               Group Key: projects.id
               Planned Partitions: 128  Batches: 1  Memory Usage: 817kB
               Buffers: shared hit=50640 read=1581399, temp read=83815 written=319118
               ->  Hash Left Join  (cost=3652962.46..4410027.13 rows=5971475 width=8) (actual time=95722.379..102173.748 rows=6491 loops=1)
                     Hash Cond: (visible_projects.project_id = tasks.project_id)
                     Filter: ((projects.manager_id IS NOT NULL) OR (((tasks.assignable_type)::text = 'Staff'::text) AND (tasks.assignable_id IS NOT NULL)))
                     Rows Removed by Filter: 113
                     Buffers: shared hit=50640 read=1581399, temp read=83815 written=319118
                     ->  Nested Loop  (cost=0.43..289339.80 rows=32521 width=16) (actual time=2.042..11059.372 rows=607 loops=1)
                           Buffers: shared hit=40333 read=63101
                           ->  CTE Scan on visible_projects  (cost=0.00..739.36 rows=36968 width=4) (actual time=2.029..10615.172 rows=10563 loops=1)
                                 Buffers: shared hit=68 read=61114
                           ->  Index Scan using projects_pkey on projects  (cost=0.43..7.81 rows=1 width=12) (actual time=0.041..0.041 rows=0 loops=10563)
                                 Index Cond: (id = visible_projects.project_id)
                                 Filter: (cached_effective_due_date < '2020-12-22 18:03:25.364774'::timestamp without time zone)
                                 Rows Removed by Filter: 1
                                 Buffers: shared hit=40265 read=1987
                     ->  Hash  (cost=2304404.12..2304404.12 rows=77579912 width=14) (actual time=84552.955..84552.956 rows=77579914 loops=1)
                           Buckets: 131072  Batches: 2048  Memory Usage: 2761kB
                           Buffers: shared hit=10307 read=1518298, temp written=316536
                           ->  Seq Scan on tasks  (cost=0.00..2304404.12 rows=77579912 width=14) (actual time=0.332..29819.151 rows=77579914 loops=1)
                                 Buffers: shared hit=10307 read=1518298
   ->  Aggregate  (cost=3326849.04..3326849.06 rows=1 width=40) (actual time=28612.665..28612.667 rows=1 loops=1)
         Buffers: shared read=1528605, temp read=32575 written=32575
         ->  HashAggregate  (cost=3284397.60..3307442.67 rows=1293758 width=16) (actual time=28611.143..28612.155 rows=6205 loops=1)
               Group Key: tasks_1.id
               Planned Partitions: 32  Batches: 1  Memory Usage: 2065kB
               Buffers: shared read=1528605, temp read=32575 written=32575
               ->  Hash Join  (cost=3124676.17..3214858.11 rows=1293758 width=8) (actual time=27123.321..28608.140 rows=6205 loops=1)
                     Hash Cond: (visible_projects_1.project_id = tasks_1.project_id)
                     Buffers: shared read=1528605, temp read=32575 written=32575
                     ->  CTE Scan on visible_projects visible_projects_1  (cost=0.00..739.36 rows=36968 width=4) (actual time=0.001..1.352 rows=10563 loops=1)
                     ->  Hash  (cost=2886253.46..2886253.46 rows=13715977 width=12) (actual time=27090.174..27090.175 rows=9382872 loops=1)
                           Buckets: 131072  Batches: 256  Memory Usage: 2469kB
                           Buffers: shared read=1528605, temp written=32065
                           ->  Seq Scan on tasks tasks_1  (cost=0.00..2886253.46 rows=13715977 width=12) (actual time=1.809..21827.176 rows=9382872 loops=1)
                                 Filter: ((done_at IS NULL) AND (assignable_id IS NOT NULL) AND ((dueafter IS NULL) OR (dueafter < 0)) AND (duedate < '2020-12-22 18:03:25.364774'::timestamp without time zone) AND ((assignable_type)::text = 'Staff'::text))
                                 Rows Removed by Filter: 68197042
                                 Buffers: shared read=1528605
 Planning Time: 0.558 ms
 Execution Time: 130791.082 ms

Even accounting for the two previous queries combined (~12s), this will take 10x the time compared to the current implementation.

I see that this has chosen to use (multiple!) Seq Scan on tasks and a Hash Left Join which are the main contributing time factors. What I don't understand is why these were chosen and what I should do to return to using the indexes.

Through research, I've learned that Postgres doesn't offer query hints to force the use of an index, so I assume that a good solution will involve demonstrating to the query planner that using the index would be beneficial.

Schema

Generated from the active database via pg_dump -t 'table-name' --schema-only.

CREATE TABLE public.projects (
    id integer NOT NULL,
    name character varying(255),
    description text,
    due timestamp without time zone,
    created_at timestamp without time zone NOT NULL,
    updated_at timestamp without time zone NOT NULL,
    client_id integer,
    company_id integer,
    repeat boolean DEFAULT true NOT NULL,
    end_date timestamp without time zone,
    prev_id integer,
    next_id integer,
    completed_tasks_count integer DEFAULT 0 NOT NULL,
    tasks_count integer DEFAULT 0 NOT NULL,
    done_at timestamp without time zone,
    state integer,
    schedule text,
    start_date timestamp without time zone,
    manager_id integer,
    partner_id integer,
    exschedule text,
    extdue timestamp without time zone,
    is_template boolean DEFAULT false NOT NULL,
    predicted_duration integer DEFAULT 0,
    budget integer DEFAULT 0,
    cached_effective_due_date timestamp without time zone,
    cached_manager_fullname character varying(255) DEFAULT ''::character varying,
    cached_partner_fullname character varying(255) DEFAULT ''::character varying,
    cached_staffs_fullnames text DEFAULT ''::text,
    cached_staffs_ids text DEFAULT ''::text,
    cached_label_ids character varying(255) DEFAULT ''::character varying,
    shared boolean DEFAULT false,
    date_in timestamp without time zone,
    cached_label_sum integer DEFAULT 0,
    date_out timestamp without time zone,
    turn_around_time integer DEFAULT 0,
    dues_calculated_at timestamp without time zone,
    dues timestamp without time zone[],
    dues_rewind integer[],
    quickbooks_item_id integer,
    perform_final_review boolean DEFAULT false NOT NULL,
    quickbooks_desktop_item_id integer,
    billing_model_type character varying DEFAULT 'staff'::character varying NOT NULL,
    series_id integer
)
WITH (autovacuum_vacuum_scale_factor='0.05');
CREATE SEQUENCE public.projects_id_seq
    START WITH 1
    INCREMENT BY 1
    NO MINVALUE
    NO MAXVALUE
    CACHE 1;
ALTER SEQUENCE public.projects_id_seq OWNED BY public.projects.id;
ALTER TABLE ONLY public.projects ALTER COLUMN id SET DEFAULT nextval('public.projects_id_seq'::regclass);
ALTER TABLE ONLY public.projects
    ADD CONSTRAINT projects_pkey PRIMARY KEY (id);
CREATE INDEX index_projects_on_cached_effective_due_date ON public.projects USING btree (cached_effective_due_date);
CREATE INDEX index_projects_on_client_id ON public.projects USING btree (client_id);
CREATE INDEX index_projects_on_company_id ON public.projects USING btree (company_id);
CREATE INDEX index_projects_on_manager_id ON public.projects USING btree (manager_id);
CREATE INDEX index_projects_on_next_id ON public.projects USING btree (next_id);
CREATE INDEX index_projects_on_partner_id ON public.projects USING btree (partner_id);
CREATE INDEX index_projects_on_series_id ON public.projects USING btree (series_id);
CREATE INDEX index_projects_on_shared_and_is_template ON public.projects USING btree (shared, is_template) WHERE ((shared = true) AND (is_template = true));
ALTER TABLE ONLY public.projects
    ADD CONSTRAINT fk_rails_243d23cb48 FOREIGN KEY (quickbooks_desktop_item_id) REFERENCES public.quickbooks_desktop_items(id);
ALTER TABLE ONLY public.projects
    ADD CONSTRAINT fk_rails_33ba8711de FOREIGN KEY (quickbooks_item_id) REFERENCES public.quickbooks_items(id);
ALTER TABLE ONLY public.projects
    ADD CONSTRAINT fk_rails_fcf0ca7614 FOREIGN KEY (series_id) REFERENCES public.series(id) NOT VALID;

The projects table has 14,461,870 rows.

CREATE TABLE public.tasks (
    id integer NOT NULL,
    name character varying(255),
    description text,
    duedate timestamp without time zone,
    created_at timestamp without time zone NOT NULL,
    updated_at timestamp without time zone NOT NULL,
    project_id integer NOT NULL,
    done boolean DEFAULT false NOT NULL,
    "position" integer,
    done_at timestamp without time zone,
    dueafter integer,
    done_by_user_id integer,
    predicted_duration integer,
    auto_predicted_duration integer DEFAULT 0,
    assignable_id integer,
    assignable_type character varying,
    will_assign_to_client boolean DEFAULT false NOT NULL
)
WITH (autovacuum_vacuum_scale_factor='0.05');
CREATE SEQUENCE public.tasks_id_seq
    START WITH 1
    INCREMENT BY 1
    NO MINVALUE
    NO MAXVALUE
    CACHE 1;
ALTER SEQUENCE public.tasks_id_seq OWNED BY public.tasks.id;
ALTER TABLE ONLY public.tasks ALTER COLUMN id SET DEFAULT nextval('public.tasks_id_seq'::regclass);
ALTER TABLE ONLY public.tasks
    ADD CONSTRAINT tasks_pkey PRIMARY KEY (id);
CREATE INDEX index_tasks_on_assignable_type_and_assignable_id ON public.tasks USING btree (assignable_type, assignable_id);
CREATE INDEX index_tasks_on_done_by_user_id ON public.tasks USING btree (done_by_user_id);
CREATE INDEX index_tasks_on_duedate ON public.tasks USING btree (duedate);
CREATE INDEX index_tasks_on_project_id ON public.tasks USING btree (project_id);
ALTER TABLE ONLY public.tasks
    ADD CONSTRAINT fk_rails_02e851e3b7 FOREIGN KEY (project_id) REFERENCES public.projects(id);

The tasks table has 77,579,914 rows.

Indexing attempts

Using the feedback from Why does a list of 10,000 IDs perform better than using the equivalent SQL to select them?, I've also applied two additional indexes:

CREATE INDEX erwin_answer_1 ON projects (company_id, state);
CREATE INDEX erwin_answer_2 ON tasks (project_id, id);

Current

This does not utilize the new indices. I've omitted the query plan due to post size limitations, but you can see it at depesz.

Proposed

Query plan (depesz)

 Nested Loop  (cost=8385425.32..8385425.38 rows=1 width=80) (actual time=132345.928..132345.934 rows=1 loops=1)
   Buffers: shared hit=40382 read=3068064, temp read=116390 written=351693
   CTE visible_projects
     ->  Bitmap Heap Scan on projects projects_1  (cost=514.76..131643.34 rows=36968 width=4) (actual time=4.011..2199.219 rows=10563 loops=1)
           Recheck Cond: ((company_id = 11171) AND (state < 6))
           Filter: (NOT is_template)
           Rows Removed by Filter: 38
           Heap Blocks: exact=8972
           Buffers: shared read=8984
           ->  Bitmap Index Scan on erwin_answer_1  (cost=0.00..505.51 rows=37308 width=0) (actual time=2.745..2.746 rows=10601 loops=1)
                 Index Cond: ((company_id = 11171) AND (state < 6))
                 Buffers: shared read=12
   ->  Aggregate  (cost=4926932.94..4926932.95 rows=1 width=40) (actual time=105765.113..105765.116 rows=1 loops=1)
         Buffers: shared hit=40286 read=1539555, temp read=83815 written=319118
         ->  HashAggregate  (cost=4730993.91..4837360.81 rows=5971475 width=16) (actual time=105764.928..105765.067 rows=596 loops=1)
               Group Key: projects.id
               Planned Partitions: 128  Batches: 1  Memory Usage: 817kB
               Buffers: shared hit=40286 read=1539555, temp read=83815 written=319118
               ->  Hash Left Join  (cost=3652962.46..4410027.13 rows=5971475 width=8) (actual time=97277.541..105761.492 rows=6491 loops=1)
                     Hash Cond: (visible_projects.project_id = tasks.project_id)
                     Filter: ((projects.manager_id IS NOT NULL) OR (((tasks.assignable_type)::text = 'Staff'::text) AND (tasks.assignable_id IS NOT NULL)))
                     Rows Removed by Filter: 113
                     Buffers: shared hit=40286 read=1539555, temp read=83815 written=319118
                     ->  Nested Loop  (cost=0.43..289339.80 rows=32521 width=16) (actual time=5.730..2766.446 rows=607 loops=1)
                           Buffers: shared hit=40222 read=11014
                           ->  CTE Scan on visible_projects  (cost=0.00..739.36 rows=36968 width=4) (actual time=4.014..2209.098 rows=10563 loops=1)
                                 Buffers: shared read=8984
                           ->  Index Scan using projects_pkey on projects  (cost=0.43..7.81 rows=1 width=12) (actual time=0.052..0.052 rows=0 loops=10563)
                                 Index Cond: (id = visible_projects.project_id)
                                 Filter: (cached_effective_due_date < '2020-12-22 18:03:25.364774'::timestamp without time zone)
                                 Rows Removed by Filter: 1
                                 Buffers: shared hit=40222 read=2030
                     ->  Hash  (cost=2304404.12..2304404.12 rows=77579912 width=14) (actual time=94419.982..94419.983 rows=77579914 loops=1)
                           Buckets: 131072  Batches: 2048  Memory Usage: 2761kB
                           Buffers: shared hit=64 read=1528541, temp written=316536
                           ->  Seq Scan on tasks  (cost=0.00..2304404.12 rows=77579912 width=14) (actual time=1.620..25256.011 rows=77579914 loops=1)
                                 Buffers: shared hit=64 read=1528541
   ->  Aggregate  (cost=3326849.04..3326849.06 rows=1 width=40) (actual time=26580.812..26580.813 rows=1 loops=1)
         Buffers: shared hit=96 read=1528509, temp read=32575 written=32575
         ->  HashAggregate  (cost=3284397.60..3307442.67 rows=1293758 width=16) (actual time=26579.236..26580.334 rows=6205 loops=1)
               Group Key: tasks_1.id
               Planned Partitions: 32  Batches: 1  Memory Usage: 2065kB
               Buffers: shared hit=96 read=1528509, temp read=32575 written=32575
               ->  Hash Join  (cost=3124676.17..3214858.11 rows=1293758 width=8) (actual time=25176.866..26576.600 rows=6205 loops=1)
                     Hash Cond: (visible_projects_1.project_id = tasks_1.project_id)
                     Buffers: shared hit=96 read=1528509, temp read=32575 written=32575
                     ->  CTE Scan on visible_projects visible_projects_1  (cost=0.00..739.36 rows=36968 width=4) (actual time=0.001..1.062 rows=10563 loops=1)
                     ->  Hash  (cost=2886253.46..2886253.46 rows=13715977 width=12) (actual time=25150.653..25150.653 rows=9382872 loops=1)
                           Buckets: 131072  Batches: 256  Memory Usage: 2469kB
                           Buffers: shared hit=96 read=1528509, temp written=32065
                           ->  Seq Scan on tasks tasks_1  (cost=0.00..2886253.46 rows=13715977 width=12) (actual time=1.390..20281.252 rows=9382872 loops=1)
                                 Filter: ((done_at IS NULL) AND (assignable_id IS NOT NULL) AND ((dueafter IS NULL) OR (dueafter < 0)) AND (duedate < '2020-12-22 18:03:25.364774'::timestamp without time zone) AND ((assignable_type)::text = 'Staff'::text))
                                 Rows Removed by Filter: 68197042
                                 Buffers: shared hit=96 read=1528509
 Planning:
   Buffers: shared hit=43 read=4 dirtied=2
 Planning Time: 2.886 ms
 Execution Time: 132346.748 ms

System specifications

  • PostgreSQL 13.1
  • 2.9 GHz 6-Core Intel Core i9
  • 32 GB RAM
  • macOS 11.1

Best Answer

The reason is clear: in the query with the list of ids, the estimate for the row count of the CTE is accurate, while with the subquery the optimizer overestimates the row count by a factor of 3.5. As a consequence, it decides that a hash join would be cheaper than a nested loop join, which turns out not to be true.

There are two approaches to this:

  1. Try to get PostgreSQL to get a batter estimate for the row count of the CTE.

    • Perhaps a simple ANALYZE on projects will do the trick

    • Perhaps you first need to make sure that more detailed statistics are collected:

      ALTER TABLE projects ALTER company_id SET STATISTICS 1000;
      
    • But most likely there is a correlation between company_id and state that distorts the estimate. You could try extended statistics:

      CREATE STATISTICS projects_ext (dependencies)
         ON company_id, state FROM projects;
      

      Then ANALYZE the table again and see if the estimate improves.

  2. Make PostgreSQL more willing to choose the nested loop join by telling it that the index scan will be cheaper than it thinks:

    • If you are using secondary storage where random reads are not more expensive than sequential reads, you should reduce the parameter random_page_cost to 1.1 or 1.0. That will make PostgreSQL (correctly) estimate index scans to be cheaper.

    • Set effective_cache_size high to tell the optimizer that indexes will likely be cached in RAM. That will also reduce the price of an index scan.