Database Design – Alternatives for Multiple Foreign Keys

database-theoryforeign keynormalization

Say there are entities called singulars, and entities called relationships.

It takes exactly two singulars to make up a relationships entity.

That pair of singulars can't be repeated elsewhere in relationships, in any order.

One way to model it could be this way:

+----------------+
|relationships   |         +----------+
+----------------+         |singulars |
|id              |         +----------+
|singular_id_1   <---------+id        |
|singular_id_2   <---+     |attribute1|
|pair_description|         |attribute2|
|pair_date       |         |          |
|                |         +----------+
+----------------+

With this pattern, it becomes necessary to check both foreign key fields in relationships for the existence of singulars, which could be on either side. The order doesn't matter, yet it is defined in the schema… So the queries end up with a number of AND/OR groups and cases.

Expanding on that approach could be to store two records for every pair, with the singular_id_[n] swapped on both sides. While that solves some querying complexities, it would introduce additional complexities to make it infeasible.

Using an intermediate table seems like one potential solution:

+----------------+       +-----------------------+
|relationships   |       |singulars_relationships|        +----------+
+----------------+       +-----------------------+        |singulars |
|id              <-------+relationship_id        |        +----------+
|pair_description|       |singular_id            +-------->id        |
|pair_date       |       |                       |        |attribute1|
|                |       +-----------------------+        |attribute2|
+----------------+                                        |          |
                                                          +----------+

So the records might end up something like this:

+----------------------------------+
|relationships                     |
+----------------------------------+
|id   pair_description   pair_date |
+----------------------------------+
|1    Fizz buzz blitz    2022-02-20|
|2    Blitz buzz fizz    2022-02-22|
+----------------------------------+

+----------------------------------+
|singulars_relationships           |
+----------------------------------+
|relationship_id   singular_id     |
+----------------------------------+
|1                 1               |
|1                 2               |
|2                 3               |
|2                 4               |
+----------------------------------+

+-----------------------------+
|singulars                    |
+-----------------------------+
|id   attribute1   attribute2 |
+-----------------------------+
|1    Fizz         Blitz      |
|2    Buzz         Foo        |
|3    Bar          World      |
|4    Blorg        Hello      |
+-----------------------------+

There, singulars_relationships is where the pairs are defined. If a singular_id exists in there, it is already in a pair. One problem that may arise with this pattern could be that three or more singular_id s could end up associated with a relationship_id, and the "exactly n" constraint would then be compromised.

Are there official terms for this type of scenario? And other theory and alternatives?

Best Answer

Are there official terms for this type of scenario?

Yes. This is a Symmetric Relation. And "relation" here has the same meaning as in "Relational Database". An RDBMS is a database management system designed around storing relations. However RDBMSs don't have a native way to store symmetric relations. You either have to store both tuples, eg (a,b) and (b,a) as separate rows, or you have to use some sort of convention to store only one tuple. A common approach is to use a check constraint on the FKs.

check (singular_id_1 < singular_id_2)

Assuming the relation is anti-reflexive.

Related Solutions

Sql-server – What alternatives exist when a table requires too many foreign keys

If there's any way to group parts, you might be able to introduce intermediate tables as a workaround. This won't work.

Parts
+ Table 1
+ Table 2
+ ...
+ Table 400

But something along these lines might.

Parts
+ RedOrangeYellow parts
  + Table 1
  + Table 2
  + ...
  + Table 200

+ GreenBlueIndigoViolet parts
  + Table 201
  + Table 202
  + ...
  + Table 400

I'd want to take a hard look at your DDL before I recommended doing this, though. And if you do this, don't start throwing ID numbers all over the place. You ought to be able to join "Table 400" directly to "Parts" without including "GreenBlueIndigoViolet parts".

Multiple foreign keys with shared columns for weak entities

Both solutions for T or T-alternate are considered denormalizations. Denormalizations optimize a datamodel that is normally in third normal form into a structure that is more convenient for select queries. The correct data already exists in CC and CD. T or T-alternate are duplicating the data. With duplicate (denormalized) data you have to make sure that data cannot get out of synch with its parent.

I think the first option is the better one. The constraints are simple and straight-forward. There is no check constraint on multiple fields with nulls. You need to make sure that any insert into CC or CD also inserts into T. After-insert triggers on CC and CD would automatically insert A,B,C,null into T or A,B,null,D into T.

One thing to plan for is that a parent has two children, each with many rows per parent. Lets assume a row in P has 3 rows in CC, 0 in CD. If you write a query over P inner joining to CC and CD, it shows zero rows. An left outer join (parent on left) shows 3 rows.

Lets assume CC has 3 rows and CD has 4 rows. Inner join and outer join queries would show 3*4 = 12 rows.

Lets assume a 3rd set of data. CC is the main child table with 20 rows per P. CD is a descriptive text field with 3 rows per P. You could reduce the resulting row to 20 per P with an aggregation function like Oracle 11g's LISTAGG over CD. This combines the D descriptive field into a comma separated list "Planes, Trains, Automobiles".

Having CCA and CDA fields in T-alternative increases the chances for data to get out of synch, since CCA might not equal CDA when something goes wrong in the code. The T-alternative is not the standard way of approaching this problem. The first option does have some challenges, but it is simpler, more standard, and more likely to hold up under production systems.

Best Answer

Related Solutions

Sql-server – What alternatives exist when a table requires too many foreign keys

Multiple foreign keys with shared columns for weak entities

Related Question