PostgreSQL – Unstable Execution Time Exceeding Planning Time

cacheexecution-planpostgresqlpostgresql-10

PG version: 10.9

Schema:

create table core_bundle
(
    id serial not null constraint core_bundle_pkey primary key,
    dt_add timestamp with time zone not null,
    dt_upd timestamp with time zone not null,
    aid integer not null constraint core_bundle_aid_check check (aid >= 0),
    prc_after timestamp with time zone,
    prc_grp uuid,
    status integer not null constraint core_bundle_status_check check (status >= 0)
);

create index core_bundle_aid_2d875f86 on core_bundle (aid);
create index core_bundle_prc_grp_a0b838ca on core_bundle (prc_grp);
create index core_bundle_status_5cfd1324 on core_bundle (status);

From time to time execution time for the following query dramatically varies.

explain analyze
    SELECT "core_bundle"."id", "core_bundle"."aid"
    FROM "core_bundle"
    WHERE (
            "core_bundle"."prc_after" <= '2019-07-11T14:29:11.159787'::timestamp
            AND "core_bundle"."status" = 13
        )
    ORDER BY "core_bundle"."id" ASC
    LIMIT 1;

Nice case:

Limit  (cost=266.19..266.19 rows=1 width=8) (actual time=0.288..0.289 rows=1 loops=1)
  ->  Sort  (cost=266.19..266.74 rows=221 width=8) (actual time=0.287..0.287 rows=1 loops=1)
        Sort Key: id
        Sort Method: top-N heapsort  Memory: 25kB
        ->  Index Scan using core_bundle_status_5cfd1324 on core_bundle  (cost=0.43..265.08 rows=221 width=8) (actual time=0.029..0.257 rows=167 loops=1)
              Index Cond: (status = 13)
              Filter: (prc_after <= '2019-07-11 14:29:11.159787'::timestamp without time zone)
Planning time: 0.138 ms
Execution time: 0.319 ms

Bad case:

Limit  (cost=0.43..127.12 rows=1 width=530) (actual time=2208.373..2208.375 rows=1 loops=1)
  ->  Index Scan using core_bundle_pkey on core_bundle  (cost=0.43..391587.09 rows=3091 width=530) (actual time=2208.372..2208.372 rows=1 loops=1)
        Filter: ((prc_after <= '2019-07-11 14:29:11.159787'::timestamp without time zone) AND (status = 13))
        Rows Removed by Filter: 4179221
Planning time: 0.218 ms
Execution time: 2208.419 ms

Shared hit is low for nice case (around 67) and about 1030528 in a bad one.

In a bad case there's neither sort nor status index. I wonder what might be the reasons behind that, and what can be done to get rid of bad cases?

Unfortunately adding an index pinning status to a certain value is not really an option, since value varies in where clause.

May default_statistics_target tuning be of help here, what value should I prefer then? Or would an index on prc_after will be efficient and enough to improve things?


Maybe related:

Best Answer

The problem must come from varying statistics; perhaps the tables change a lot.

The bad plan comes when PostgreSQL believes that there are so many rows that satisfy the condition that it will be cheaper to scan the rows in ORDER BY order using the primary key index until it hits the first match. Either that assumption is wrong, or the correlation between these conditions is unfortunate.

Anyway, if you know that the first plan is always the correct one, you can tell PostgreSQL not to use the primary key index by modifying the ORDER BY clause like this:

ORDER BY core_bundle.id + 0