Seriously look at third normal form. I would use surrogate keys with the natural keys implemented as unique keys. You will likely find that author belongs in its own authors table. You may find you have a few tables which are quite similar such as user_content_faves, user_author_faves, user_author_shares. This is normal.
Having a single content table with a content_type column may be appropriate. The content column would need to be capable of storing all the content types.
EDIT: For relationship tables I usually name the join table by concatenating the names of the joined tables, abbreviating as necessary. If there multiple relationships between the two tables, I use one of two options:
- Append the purpose of the relationship (as I did above); or
- Add a type/reason code to the relationship (in which case the type is not needed in the relationship name).
I made the assumption that you would want to track who favorited or shared things. It appears that both the producer (author or user) as well as the content items. Therefore you have users favoriting a producer (user_author_faves now user_user_faves), or a product (user_content_faves). Depending on how you do sharing,
- it could be an attribute on the content, or
- a relationship like user_content_shares, where content is share with a particular user. Re-sharing could be problematic, if you track who shared things, and multiple users share the same content to the same user. Un-sharing re-shared items is problematic if you don't track who did the sharing.
You may want to consider (and set policies for):
- tracking/audit information like when something was done (added, favorited, shared, etc).
- whether to do physical or logical deletes.
- if you do logical deletes how to handle refavoriing or sharing something after a logical delete.
When indexing relationship tables I generally have the primary key consisting of the primary keys of the two tables being joined. A second index with the primary keys reversed, or just the primary key which is the second column in the primary key is usually required. If the relationship between two rows can occur more than once, the column(s) used to differentiate the reason/type and/or timing (since date) of the relationship needs to added to the primary key.
Second normal form
If I understand you correctly, the combination of {date, compound_type, location, method} uniquely identifies {value, units}, and all four are needed in order to identify a unique sample ({date, location, method} isn't enough by itself, for example).
I'm going to write this as if I hadn't received an answer on my question about functional dependencies, since other people might be interested in an explanation of both possibilities.
If there are no partial dependencies
1) Assuming none of the non-prime attributes {value, unit} depend on part of the candidate keys {id} or {date, compound_type, location, method}, your table is in 2NF since, as Wikipedia puts it, "every non-prime attribute of the table is either dependent on the whole of a candidate key, or on another non-prime attribute."
If there are partial dependencies
2) One or both of the non-prime attributes {value, unit} depend on only parts of the candidate key {date, compound_type, location, method}. You've confirmed this is the case with {compund}->{unit}, so your table is not in 2NF.
In order to fix the violation of 2NF, I would suggest moving {unit} to the compound table, which I'm guessing would end up looking something like this: {id, name, unit}. Here, the candidate keys are {id} and {name}. Since there are no composite candidate keys, the table is automatically 2NF. It's also 3NF since there are no transitive dependencies, I.E. there's no attribute that's dependent on unit.
Third normal form
OK, that leaves us with the samples table looking like this: {id, date, compund_type, location, method, value}. The two candidate keys are {id} and {date, compund_type, location, method}, which leaves {value} as the single non-prime attribute. Assuming that there are no more 2NF violations (you can't use a subset of {date, compund_type, location, method} to uniquely determine value), we can check the table for violations against 3NF.
3NF states that every non-prime attribute (attributes that don't belong to a candidate key) must be directly dependent on every superkey. Since we only have one non-prime attribute, {value}, it's impossible for the table to violate 3NF, since there's no non-prime attribute for {value} to be dependent on, and no non-prime attribute that can depend on {value}.
I'm going to leave discussions about BCNF out of this for simplicity.
Surrogate key vs. natural key
As for your other questions: "is using an id(PK) column like I have above the best way to go with all of the repeating dates?"
I think so. Semantically, the surrogate key id isn't necessary, but it does help keep things simple. I'm not sure how MySQL works underneath the hood, but in other DBMSs composite primary keys with non-integer data types can lead to unnecessary overhead for example when indexing. Another problem with composite keys is that it gets annoying to query them.
Imagine that you need to add information about which labs each sample was sent to. A sample can be sent to several labs and each lab can receive several samples, so you create a table to connect the two tables. Would you rather write this
SELECT *
FROM samples s
JOIN labs_samples ls ON
s.date = sl.date,
s.compund_type = sl.compund_type,
s.location = sl.location,
s.method = sl.method
or this
SELECT *
FROM samples s
JOIN labs_samples ls ON s.id = ls.id
?
Best Answer
db fiddle
If you have more then one record for the same id, object_id, and type, this will take the max, which may or may not be what you need.