PostgreSQL – Do Foreign Key Constraints Automatically Create Indexes?

foreign keyindexpgadminpostgresql

Per the PGSQL 9.3 docs:

Because this is not always needed, and there are many choices available on how to index, declaration of a foreign key constraint does not automatically create an index on the referencing columns.

Yet after I create such a constraint, when I run \d <table> I see that an index does in fact get created:

Indexes: "table_primary_key" PRIMARY KEY, btree (id) "fki_table_product_foreign_key" btree (product)

What could explain why this gets created? Do I still need to create another index on my product column to improve performance of queries on it?

Best Answer

I discovered that pgAdmin III does this by default. When you create a new foreign key constraint using the GUI, there is a checkbox "Auto FK Index" on the Definition tab. I hadn't noticed this until just now so it explains why the index was being created.

Related Solutions

PostgreSQL – Working of Indexes

This answer is about (default) B-tree indexes. See this later, related answer about GiST, GIN etc.:

Why is a GiST index used for filtering on non-leading column?

Here are the results of querying a table on the second column of a multicolumn index.
The effects are easy to reproduce for anybody. Try it at home.

I tested with PostgreSQL 9.0.5 on Debian using a medium sized table of a real-life database with 23322 rows. It implements the n:m relationship between the tables adr (address) and att (attribute), but that's not relevant here. Simplified schema:

CREATE TABLE adratt (
  adratt_id serial PRIMARY KEY
, adr_id    integer NOT NULL
, att_id    integer NOT NULL
, log_up    timestamp NOT NULL DEFAULT (now()::timestamp)
, CONSTRAINT adratt_uni UNIQUE (adr_id, att_id)
);

The UNIQUE constraint effectively implements a unique index. I repeated the test with a plain index to be sure and got identical results as expected.

CREATE INDEX adratt_idx ON adratt(adr_id, att_id);

The table is clustered on the adratt_uni index and before the test I ran:

CLUSTER adratt;
ANALYZE adratt;

Sequential scans for queries on (adr_id, att_id) are as fast as they can possibly be. The multicolumn index can still be used for a query condition on the second index column alone.

I ran the queries a couple of times to populate the cache and the picked the best of ten runs to get comparable results.

1. Query using both columns

SELECT *
FROM   adratt
WHERE  att_id = 90
AND    adr_id = 10;

 adratt_id | adr_id | att_id |       log_up
-----------+--------+--------+---------------------
       123 |     10 |     90 | 2008-07-29 09:35:54
(1 row)

Output of EXPLAIN ANALYZE:

Index Scan using adratt_uni on adratt  (cost=0.00..3.48 rows=1 width=20) (actual time=0.022..0.025 rows=1 loops=1)
  Index Cond: ((adr_id = 10) AND (att_id = 90))
Total runtime: 0.067 ms

2. Query using first column

SELECT * FROM adratt WHERE adr_id = 10;

 adratt_id | adr_id | att_id |       log_up
-----------+--------+--------+---------------------
       126 |     10 |     10 | 2008-07-29 09:35:54
       125 |     10 |     13 | 2008-07-29 09:35:54
      4711 |     10 |     21 | 2008-07-29 09:35:54
     29322 |     10 |     22 | 2011-06-06 15:50:38
     29321 |     10 |     30 | 2011-06-06 15:47:17
       124 |     10 |     62 | 2008-07-29 09:35:54
     21913 |     10 |     78 | 2008-07-29 09:35:54
       123 |     10 |     90 | 2008-07-29 09:35:54
     28352 |     10 |    106 | 2010-11-22 12:37:50
(9 rows)

Output of EXPLAIN ANALYZE:

Index Scan using adratt_uni on adratt  (cost=0.00..8.23 rows=9 width=20) (actual time=0.007..0.023 rows=9 loops=1)
  Index Cond: (adr_id = 10)
Total runtime: 0.058 ms

3. Query using second column

SELECT * FROM adratt WHERE att_id = 90;

 adratt_id | adr_id | att_id |       log_up
-----------+--------+--------+---------------------
       123 |     10 |     90 | 2008-07-29 09:35:54
       180 |     39 |     90 | 2008-08-29 15:46:07
...
(83 rows)

Output of EXPLAIN ANALYZE:

Index Scan using adratt_uni on adratt  (cost=0.00..818.51 rows=83 width=20) (actual time=0.014..0.694 rows=83 loops=1)
  Index Cond: (att_id = 90)
Total runtime: 0.849 ms

4. Disable indexscan & bitmapscan

SET enable_indexscan = off;
SELECT * FROM adratt WHERE att_id = 90;

Output of EXPLAIN ANALYZE:

Bitmap Heap Scan on adratt  (cost=779.94..854.74 rows=83 width=20) (actual time=0.558..0.743 rows=83 loops=1)
  Recheck Cond: (att_id = 90)
  ->  Bitmap Index Scan on adratt_uni  (cost=0.00..779.86 rows=83 width=0) (actual time=0.544..0.544 rows=83 loops=1)
        Index Cond: (att_id = 90)
Total runtime: 0.894 ms

SET enable_bitmapscan = off;
SELECT * FROM adratt WHERE att_id = 90;

Output of EXPLAIN ANALYZE:

Seq Scan on adratt  (cost=0.00..1323.10 rows=83 width=20) (actual time=0.009..2.429 rows=83 loops=1)
  Filter: (att_id = 90)
Total runtime: 2.680 ms

Conclusion

As expected, the multi-column index is used for a query on the second column alone.
As expected, it is less efficient, but the query is still 3x faster than without the index.
After disabling index scans, the query planner chooses a bitmap heap scan, which performs almost as fast. Only after disabling that, too, it falls back to a sequential scan.

See other answer for the original quote from the manual.

Updates since Postgres 9.0

Everything still basically true in Postgres 13. Most notable changes:

index-only scans in Postgres 9.2
True covering indexes with the INCLUDE keyword in Postgres 11
Multiple performance and space improvements (especially for multicolumn indexes) in Postgres 12
Deduplication in Postgres 13.

All in favor of index performance. (Sequential scans got faster, too, though.)

SQL Server Index Key – Choosing Index Key for Foreign Key Reference

The (lack of) documentation suggests that this behaviour is an implementation detail, and is therefore undefined and subject to change at any time.

This is in stark contrast to CREATE FULLTEXT INDEX, where you have to specify the name of an index to attach to -- AFAIK, there is no undocumented FOREIGN KEY syntax to do the equivalent (though theoretically, there could be in the future).

As mentioned, it does make sense that SQL Server chooses the smallest physical index with which to associate the foreign key. If you change the script to create the unique constraint as CLUSTERED, the script "works" on 2008 R2. But that behaviour is still undefined and should not be relied upon.

As with most legacy applications, you'll just have to get down to the nitty-gritty and clean things up.

Best Answer

Related Solutions

PostgreSQL – Working of Indexes

1. Query using both columns

2. Query using first column

3. Query using second column

4. Disable indexscan & bitmapscan

Conclusion

Updates since Postgres 9.0

SQL Server Index Key – Choosing Index Key for Foreign Key Reference

Related Question