PostgreSQL – How to Quickly Cascade Rare Delete to Large Table

indexperformancepostgresql

I have a schema that looks like this:

create table items (
    id serial primary key
);

create table revisions (
    id serial primary key,
    item_id int not null references items(id),
    name text not null,
    property_a text,
    property_b text,
    property_c text
);

create table deltas (
  id serial primary key,
  item_id int not null references items(id) on delete cascade,
  old_revision_id int not null references revisions(id) on delete cascade,
  new_revision_id int not null references revisions(id) on delete cascade
);

This schema can't be changed.

We need to support the rare operation of deleting from items or revisions with the delete cascading to the deltas table. These deletes rarely happen compared to inserting/selecting data from the deltas table (but at least ~20-30 times a day).

A major problem is that the deltas table is very big so these cascades were very slow (taking > 30 seconds). So, we added separate indexes to item_id, old_revision_id, and new_revision_id which turned the cascading delete into a sub-millisecond operation. Side note: on our actual table, there are other columns and other indexes to support our application's normal query patterns.

This introduced a new problem: the combined size of these indexes for the deltas table is now ~7-8 times the size of the actual table and the rate of growth is pretty high since the deltas table has a lot of writes to it.

It seems silly to use many large indexes to support such a rare operation. We have a couple of options to solve this:

  1. Leave it with multiple indexes and deal with the disk size as a separate problem.
  2. Re-architect the application to avoid doing any deletes for these tables to prevent the cascades from happening in the first place.
  3. Change our indexes.

For #3, I read in the Postgres docs that we might be able to use a multi-column GIN index on (item_id, old_revision_id, new_revision_id) which could support the cascading delete like delete from changeset_deltas where old_revision_id = <deleted_revision_id>:

A multicolumn GIN index can be used with query conditions that involve any subset of the index's columns. Unlike B-tree or GiST, index search effectiveness is the same regardless of which index column(s) the query conditions use.

Am I correct in understanding this and will it make a difference rather than 3 separate indexes or should we take a different approach?

Best Answer

The GIN index certainly can be used for the cascading delete (you will need to create extension btree_gin before creating the index).

How much space it will save you is hard to predict, it would depend on how much duplication there is in the values of those columns.

Having 3 separate GIN indexes will also work, and is likely slightly smaller than the one combined index. Each value in a combined index has to "remember" which column it came from, which takes up extra space.

If the btree indexes are bloated, whatever is causing that might cause the gin index to get bloated as well.