Postgresql – Automatic index creation for primary vs. foreign keys in Postgresql

foreign keyindexpostgresqlprimary-key

For PostgreSQL 9.4 (http://www.postgresql.org/docs/9.4/static/ddl-constraints.html) it is stated that

A foreign key must reference columns that either are a primary key or form a unique constraint. This means that the referenced columns always have an index (the one underlying the primary key or unique constraint); so checks on whether a referencing row has a match will be efficient. Since a DELETE of a row from the referenced table or an UPDATE of a referenced column will require a scan of the referencing table for rows matching the old value, it is often a good idea to index the referencing columns too. Because this is not always needed, and there are many choices available on how to index, declaration of a foreign key constraint does not automatically create an index on the referencing columns.

and I wondered whether the author(s) of these lines were overlooking one major point: When I have a parent-child relation (e.g. a cart with cart positions) where the child (cart position) marks its relation to the parent (cart) using a foreign key to the parents (cart) primary key when I want to get all children for a given parent I use the non-indexed foreign key column.

So, does the overhead of not generating an index for foreign key columns by default (in contrast to primary key columns) and letting the user experience suboptimal performance on one hand overweigh the overhead of maintenancing an index which, at smaller table sizes, do only minor improvements of query performance on the other hand?

Perhaps someone has more insights in database implementation details and reasonings leading to this decision and/or can point me to reasons why a default index generation for foreign key columns (in Postgresql) could lead to unnoticed serious problems.

Best Answer

I don't think that - a default index generation for foreign key columns - would lead to serious problems.

It was just a decision taken from the PostgreSQL developers, to leave this choice to each database designer / administrator. We have the choice to either add an index when creating a foreign key or not.

If they had taken the opposite decision, then we'd be restricted. There would be an index for each and every foreign key constraint. If someone wanted to have some foreign key columns not indexed, they'd be limited or forced.

Now there is only the additional burden of having to add indexes to match the foreign key constraints - for those who want to.

Personally, I almost always add respective indexes for a foreign key constraint. But often, the index does not match exactly the constraint. For example the Fk may be TABLE c (..., FOREIGN KEY (a_id) REFERENCES a (a_id) ) and the index might be single (a_id) or composite (a_id, b_id) because it helps other queries as well, not only those that use a_id.

Or I might have (more rare cases of) 2 foreign key constraints that can be covered but 1 index - eg.: FK (a_id) REFERENCES a (a_id), ... FK (a_id, b_id) REFERENCES b (a_id, b_id).

When creating an index, there are also many options, like the type of index (btree, hash, Gin, Gist and other exotic types), so we're free to choose what is best for our situation.

Also note that Oracle, SQL Server, DB2 and Firebird also do not automatically create indexes for FOREIGN KEY columns, so it's not like that PostgreSQL developers were alone in this. And I'm sure there were many discussions on the matter and they considered both options, I'm only speculating on whey they came to the decision. Looking through the old Postgres discussion lists may reveal more about the actual reasons they considered.