Database Design – Are FOREIGN KEYs Always Necessary for Joins on Non-PK Columns?

database-designforeign keyperformancetable

I've been wondering if following approach is bad or really very bad:).

Question#1: can we use indicator_id to combine data from indicator_values and indicator_details tables (without using FOREIGN KEY) [we know we can, because we do this, but is it acceptable:)]?

Question#2: if we decide to use indicator_id (which in indicator_details table is not PK) would it have an important impact on performance?

Understanding SQL approach correctly, the best option would be to use iid (iid as FOREIGN KEY) in indicator_values table instead of indicator_id, but as always project has some limitation (like the way how users insert data – more below), thus we trying to find a compromise solution.

I aware of that this could cause data integrity issues, but solution without FK would be much user-friendly (since users don't have to worry about inserting into indicator_values table correct rows – rows with correct iid), especially that in our case data integrity is not crucial. Moreover if the only way (in our case) of inserting data by users is that is
preceded by removing all rows from table, approach with FKs would cause deleting all rows (caused by ON DELETE …) from indicator_values during indicator_details table update (again – update means: remove all rows then insert new data), so this would be time consuming.

DB:

indicator_values table

+----------+--------------+------+-------+
| vid (PK) | indicator_id | year | value |
+----------+--------------+------+-------+
|     1    | AACA         | 2001 |    10 |
|     2    | bbb          | 2001 |   100 |
|     3    | ccc          | 2001 |    10 |
|     4    | AACA         | 2002 |    20 |
|     5    | bbb          | 2002 |   200 |
|     6    | ccc          | 2002 |   300 |

indicator_details table

+----------+--------------+------------------+-----------------------------+
| iid (PK) | indicator_id |     desc         |          full_text          |
+----------+--------------+------------------+-----------------------------+
|  1       | AACA         | AACA is super    | Lorem ipsum dolor sit amet  |
|  2       | bbb          | bbb is extra     | Maecenas sagittis ultricies |
|  3       | ccc          | ccc is superb    | Praesent tincidunt nec orci |

Best Answer

There is absolutely nothing wrong with joining on columns that are not PKs/FKs. If you are concerned about efficiency then the key is to have appropriate indexes defined to support the join operations you are using. Also, don't assume that the existence of a foreign key implies the existence of an index - some databases automatically create such an index but many do not as it is not required and needlessly takes up space if it would never actually help with the queries you run.

Primary and foreign keys (and unique constrains and other database defined limits) exist for data integrity purposes.

Regarding FKs with ON DELETE CASCADE and people deleting parent rows: it is safer to no define ON DELETE CASCADE and let it error when people try rather than not defining FKs and other data integrity controls due to fear of human error because that is just saving yourself a nasty headache for later. Rows should not be updated by delete-then-reinsert, which is the behaviour that causes this concern, people should instead either perform a check-then-insert-or-update-as-needed or some form of UPSERT operation if the database supports on (MERGE in some like MS SQL Server, INSERT ... ON CONFLICT ... or similar in others, and so forth).