Postgresql – Can Postgres use an index-only scan for this query with joined tables

indexindex-tuningoptimizationpostgresql

This is a follow-up to: Do covering indexes in PostgreSQL help JOIN columns?

Consider the inverse of the schema in the other question where you filter in the joined-on table:

CREATE TABLE thing_types(
   id              INTEGER PRIMARY KEY
 , first_lvl_type  TEXT
 , second_lvl_type TEXT
);

CREATE TABLE things(
   id         INTEGER PRIMARY KEY
 , thing_type INTEGER REFERENCES thing_types(id)
 , t1c1       INTEGER
);

And a query like so:

SELECT things.t1c1
FROM   things
JOIN   thing_types ON things.thing_type = thing_types.id
WHERE  thing_types.first_lvl_type = 'Book'
AND    thing_types.second_lvl_type = 'Biography';

Is it madness to have an index like:

CREATE INDEX ON thing_types(first_lvl_type, second_lvl_type, id);

which covers the primary key for use in that join? Will the index be used as a covering index to help the JOIN in the above query? Should I change my indexing strategy to cover the primary key more often when I know the table is going to be JOINed on like this?

Best Answer

If additional preconditions for an index-only scan are met, it makes perfect sense to append the column id as trailing column to the index (not as leading column):

CREATE INDEX ON thing_types(first_lvl_type, second_lvl_type, id);

Postgres 11 introduces actual covering indexes with the INCLUDE keyword.

CREATE INDEX ON thing_types(first_lvl_type, second_lvl_type) INCLUDE (id);

Only a small benefit for your case, but it's a great option to add columns to a UNIQUE or PK index or constraint.

About index-only scans:

The most important precondition: The visibility map of table thing_types has to show most or all pages as "visible" to all transactions. I.e. the table is either read-only, or your autovacuum settings are aggressive enough to continuously clean up after writes to the table.

Every additional index adds costs. Mostly to write performance. But also side effects, like exhausted cache capacities. (Multiple queries using the same indexes have a better chance for them to reside in cache.) So it's also a question of size. id is typically a very small column integer or bigint. Makes it a good candidate for the use case.

In particular, adding a column to an index disables the option for H.O.T. updates involving the column. But since id is indexed anyway and typically not updated (being the PK) this is not a problem in this case. Related:

If you actually get index-only scans out of these indexes most of the time, it typically makes sense to use them. Test with EXPLAIN.

There were limitations for partial indexes in older versions. Quoting the release notes of Postgres 9.6:

  • Allow use of an index-only scan on a partial index when the index's WHERE clause references columns that are not indexed (Tomas Vondra, Kyotaro Horiguchi)

    For example, an index defined by CREATE INDEX tidx_partial ON t(b) WHERE a > 0 can now be used for an index-only scan by a query that specifies WHERE a > 0 and does not otherwise use a. Previously this was disallowed because a is not listed as an index column.