PostgreSQL – Does Query with Primary Key and Foreign Keys Run Faster Than with Just Primary Keys?

optimizationperformancepostgresqlquery-performance

SELECT something FROM table WHERE primary_key = ?

vs.

SELECT something FROM table WHERE primary_key = ? AND other_key = ?

Say that this is a scenario where the inclusion of other_key does NOT change the resultset. Is the second query faster in practice? Or do databases just use a single best key if several are provided?

Best Answer

Query

SELECT something FROM table WHERE primary_key = ?

This the fastest possible form. Adding any other predicate can only make it slower. Theoretically.

Exotic exceptions apply, like when the PK index is bloated for some reason, or the PK column is relatively big, or a multi-column PK, resulting in a much larger index, while the index for the added predicate on other_key is smaller. Then Postgres may decide to use the index for the added predicate, access the heap and filter on primary_key = ?. Unlikely, but possible.

If the added predicate evaluates to anything but TRUE, you get no row - a different result, so not a fair comparison - but that's not your case as you asserted.

A FOREIGN KEY constraint has no direct impact on read performance. The referencing column does not even have to be indexed (as opposed to the referenced column).

Covering index for top read performance

With tables of non-trivial size and not too much write activity, consider adding a multicolumn index on (primary_key, something) to allow index-only scans. In Postgres 10 or older that results in at least two indexes (imposing additional write / maintenance / space costs):

  1. the PK index on (primary_key), obviously.
  2. a plain (or, redundantly, UNIQUE) index on (primary_key, something).

Postgres 11 added true covering indexes using the INCLUDE clause, which conveniently allows to piggyback the non-key column something on the PRIMARY KEY:

CREATE TABLE tbl (
   primary_key bigint GENERATED ALWAYS AS IDENTITY
 , other_key   integer NOT NULL REFERENCES other_tbl
 , something   text
 , PRIMARY KEY (primary_key) INCLUDE (something)  -- here's the magic
);

If primary_key happens to be a much wider column than other_key you mentioned (bigint vs. int like in the example would not qualify), you can also piggyback something onto an index on other_key:

CREATE INDEX other_idx ON tbl(other_key) INCLUDE (something);

While either solution can optimize read performance for the given query, other queries not retrieving something then have to work with a bigger index. So weigh benefits and costs (like always when creating indexes).

Related blog entry with details from Michael Paquier: