Postgresql – Slow index scan

postgresqlpostgresql-performance

We have the following table which contains 25Mi rows and growing in production:

create table "Common"."Event"
(
    "Id" bigserial not null
        constraint "PK_Event"
            primary key,
    "AggregateId" citext not null,
    "AggregateType" integer not null,
    "CreatedBy" citext not null,
    "CreatedUtc" timestamp not null,
    "Data" jsonb,
    "EventId" uuid not null,
    "PropertyId" citext,
    "Type" citext not null,
    "UpdatedBy" citext not null,
    "UpdatedUtc" timestamp not null,
    "AccountCode" text not null,
    "TrackingId" citext not null,
    "OriginClientId" citext not null,
    "OriginSubjectId" citext
);

create index "IX_Event_AccountCode_AggregateType_Type_PropertyId_TrackingId"
    on "Common"."Event" ("AccountCode", "AggregateType", "Type", "PropertyId", "TrackingId");

create index "IX_Event_AccountCode_AggregateType_Type_PropertyId_AggregateId"
    on "Common"."Event" ("AccountCode", "AggregateType", "Type", "PropertyId", "AggregateId");

Executing the following query, takes time:

SELECT e."Id", e."AggregateId", e."CreatedUtc", e."OriginClientId", e."OriginSubjectId", e."PropertyId", e."Type"
FROM "Common"."Event" AS e
WHERE e."AccountCode" = 'XXXX'
  AND e."AggregateType" = 1
  AND e."Type" IN ('type-1', 'type-2', 'type-3', 'type-4')
  AND e."PropertyId" = 'YYYY'
ORDER BY e."Id" DESC
LIMIT 100 OFFSET 0;

And produces the following execution plan:

enter image description here

The query in some cases it's much slower than this. Executing the query again is very fast, also when changing the offset.

Looking at the Index scan output:

enter image description here

we noticed that we could benefit from having a covering index in order to let postgres scan only the index. Then we created the following covering index:

create index "IX_Event_Test_Covering_Index"
    on "Common"."Event" ("AccountCode", "AggregateType", "Type", "PropertyId", "AggregateId", "OriginClientId", "OriginSubjectId", "CreatedUtc", "Id");

Which produces the following execution plan:

enter image description here

Which is pretty cool.

We also run VACUUM ANALYZE and also tried to create statistics for multiple columns to let postgres better estimate the number of rows but nothing helped apart from the covering index.

So, now the question would be if you guys think this would be the correct approach to tackle this issue or if we are missing anything and there is a better solution.

I'm not sure that the amount of data we have justifies already partitioning the table.

Best Answer

A index-only scan is faster than anything I can recommend, particularly after vacuuming the table.

But it seems that this large index that contains more than half of the columns of the table is probably a bit over the top.

If you want to do cheaper, try a B-tree index like

CREATE INDEX ON "Common"."Event" ("AccountCode", "AggregateType", "PropertyId", "Type");

If any of these conditions are not selective (that is, the condition does not remove many rows), omit it from the index.

It is important that "Type" is last in the index column list.