What are the pros/cons of between these 2 database design options where the join table can have 2 or 3 foreign keys

database-design

I am trying to model the following scenario, but I have come up with 2 options and I cannot figure out how to decide between the 2 options. Both options seem to work, but I worry that I am missing some edge case that suggests one over the other. Any recommendations or suggestions on where one option might be better than the other option?

Thank you for your time 🙂

The model:

A user has many labels
A property has many labels
A label has many properties

Option 1 (join table with 2 foreign keys):

user
----
id
name

property
--------
id
name

label
-----
id
name
user_id (foreign key for user.id)

property_label
--------------
property_id (foreign key for property.id)
label_id (foreign key for label.id)

Option 2 (join table with 3 foreign keys):

user
----
id
name

property
--------
id
name

label
-----
id
name

user_property_label
-------------------
property_id (foreign key for property.id)
label_id (foreign key for label.id)
user_id (foreign key for user.id)

Potential Queries:

Get all labels associated with a user
Get all properties associated with a label of a user

Example data:

User1 has labels "old" and "new" for property "bicycle".
User2 has labels "1 year", "2 years", and "3+ years" for property "bicycle".
User 3 has labels "<3 years" and "3+ years" for property "car"
User 4 has labels "red", "blue", and "black" for property "car"
User 5 has labels "daily beater" and "weekend fun" for property "car" and labels "<3 years" and "3+ years" for property "bicycle"
User 6 has labels "1 year", "2 years", and "3+ years" for properties "bicycle" and "car"

The list of properties is large, but finite. Users will reference the same property and cannot have their own variation in its spelling. So for example, "bicycle" will always be "bicycle" for every user. A user cannot have "bike" or "bi-cycle" or some other spelling of the property.

The labels are all user defined so it can be infinite depending on how each individual user wants to label their properties.

Update:
While musing on Jim's comments, the solution came.

The join table with 2 foreign keys is better because it is further normalizing the data.

For instance, this scenario:

User1 has labels "old" and "new" for property "bicycle"
User1 has labels "old" for property "car"

will produce this data using the option with 3 foreign keys:

user (id, name)
---------------
1, User1

property (id, name)
-------------------
10, bicycle
11, car

label (id, name)
----------------
100, old
101, new

user_property_label (user_id, property_id, label_id)
----------------------------------------------------
1, 10, 100
1, 10, 101
1, 11, 100

The user_id foreign key is redundant information because user_id:1 will always be paired to label_id:100. Hence, the data can be further normalized by using the 2 foreign key option. And will look like this instead:

user (id, name)
---------------
1, User1

property (id, name)
-------------------
10, bicycle
11, car

label (id, name, user_id)
-------------------------
100, old, 1
101, new, 1

property_label (property_id, label_id)
--------------------------------------
10, 100
10, 101
11, 100

Best Answer

It's hard to answer that question without knowing what you are trying to model and the queries that are relevant to your problem domain. I recommend that you trace out some queries and see how that design works. Absent any other information, I would tend towards the 3 column option because the interpretation of a label depends on the property and you probably will do the query "give me all the labels associated with a user."

I think the most important question you should ask yourself are all the labels unrelated. If you are storing user preferences for an application, I can see the utility of your data model. On the other hand, if you want to store postal addresses, email address, access permissions, etc, then a different design is appropriate. Generic designs like this often shift data management and manipulation out of the DBMS into your application. Tables are cheap and I would recommend having tables of related facts/attributes (e.g. POSTAL_ADDRESS).

Revised answer based on edits:

First, never store an age in a database. Age depends on the current date. You should store a date and calculate the age (you can use a virtual column to do that).

Second, I suggest you identify the types of things you want to store in the database. Bicycles and car are types of vehicles. I would recommend a VEHICLE table that contains the facts (attributes, property). You can then store properties relevant to bicycles in a BICYCLE_ATTRIBUTE (or whatever name you prefer) table. Do this for all the types of data you want to store and try to make groups that related. Postal address might be a group and education might be another.

You should try to avoid a completely generic database because you will end up creating a DBMS in your application and the underlying DBMS has been relegated to a fancy file system.