PostgreSQL – Does Query with Primary Key and Foreign Keys Run Faster Than with Just Primary Keys?

optimizationperformancepostgresqlquery-performance

SELECT something FROM table WHERE primary_key = ?

vs.

SELECT something FROM table WHERE primary_key = ? AND other_key = ?

Say that this is a scenario where the inclusion of other_key does NOT change the resultset. Is the second query faster in practice? Or do databases just use a single best key if several are provided?

Best Answer

Query

SELECT something FROM table WHERE primary_key = ?

This the fastest possible form. Adding any other predicate can only make it slower. Theoretically.

Exotic exceptions apply, like when the PK index is bloated for some reason, or the PK column is relatively big, or a multi-column PK, resulting in a much larger index, while the index for the added predicate on other_key is smaller. Then Postgres may decide to use the index for the added predicate, access the heap and filter on primary_key = ?. Unlikely, but possible.

If the added predicate evaluates to anything but TRUE, you get no row - a different result, so not a fair comparison - but that's not your case as you asserted.

A FOREIGN KEY constraint has no direct impact on read performance. The referencing column does not even have to be indexed (as opposed to the referenced column).

Covering index for top read performance

With tables of non-trivial size and not too much write activity, consider adding a multicolumn index on (primary_key, something) to allow index-only scans. In Postgres 10 or older that results in at least two indexes (imposing additional write / maintenance / space costs):

the PK index on (primary_key), obviously.
a plain (or, redundantly, UNIQUE) index on (primary_key, something).

Postgres 11 added true covering indexes using the INCLUDE clause, which conveniently allows to piggyback the non-key column something on the PRIMARY KEY:

CREATE TABLE tbl (
   primary_key bigint GENERATED ALWAYS AS IDENTITY
 , other_key   integer NOT NULL REFERENCES other_tbl
 , something   text
 , PRIMARY KEY (primary_key) INCLUDE (something)  -- here's the magic
);

If primary_key happens to be a much wider column than other_key you mentioned (bigint vs. int like in the example would not qualify), you can also piggyback something onto an index on other_key:

CREATE INDEX other_idx ON tbl(other_key) INCLUDE (something);

While either solution can optimize read performance for the given query, other queries not retrieving something then have to work with a bigger index. So weigh benefits and costs (like always when creating indexes).

The manual on CREATE INDEX.

Related blog entry with details from Michael Paquier:

Postgres 11 highlight - Covering Indexes

Related Solutions

Sql-server – Why is an aggregate query significantly faster with a GROUP BY clause than without one

It looks like it is probably following an index on CreatedDate in order from lowest to highest and doing lookups to evaluate the SomeIndexedValue = 1 predicate.

When it finds the first matching row it is done, but it may well be doing many more lookups than it expects before it finds such a row (it assumes the rows matching the predicate are randomly distributed according to date.)

See my answer here for a similar issue

The ideal index for this query would be one on SomeIndexedValue, CreatedDate. Assuming that you can't add that or at least make your existing index on SomeIndexedValue cover CreatedDate as an included column then you could try rewriting the query as follows

SELECT MIN(DATEADD(DAY, 0, CreatedDate)) AS CreatedDate
FROM MyTable
WHERE SomeIndexedValue = 1

to prevent it from using that particular plan.

MySQL – Why Queries with Subqueries are Faster than Single Query

I did an upgrade to percona mysql 5.6 and it solves a problem. Both type of queries runs equally. Fortunately - equally fast.