Two tables, one foreign key, and two primary keys

database-designforeign keyprimary-key

Say there are two tables:
Student:
StudentID(Primary Key)
Name

StudentTeams:
StudentID(Primary Key, Foreign Key)
TeamName

StudentID in StudentTeams table references Student table primary key. What would be the pros and cons of such a structure in terms of relational integrity?

Best Answer

Just to clarify: Every student can have either 0 or 1 associated TeamName value, with this setup.

Having StudentTeams without StudentID as a foreign key would allow records with invalid student IDs, so that is presumably a pro. Similarly, if StudentID were not the primary key of StudentTeams, then a student could be attached to multiple team names, also presumably a pro (the assumption being that those are your business rules).

However, if this is literally the table structure, then it would actually be better for TeamName to be a field in the Student table, with NULL values allowed. Each student could have 0 or 1 TeamName values, and no separate relationship (or table) is required. So, that's a con.

In my experience, I've only seen a couple of situations where it made sense to have TableA with a primary key of A_ID, and then to have TableB which uses A_ID as both a primary and a foreign key:

  • Performance: If TableA is used in many places, by many parts of the application, and the data in TableB is only used in a isolated part of the application (a set of commands that only 2% of the user base have access to, or that are only used at year-end), and each row in TableB is large (for instance, if Student.Name is, on average, 40 characters long, and StudentTeams.TeamName is, on average 4000 characters long), then queries that don't need TeamName might perform much more quickly if it's not in the Student table.

  • Parallel table: If TableA's structure is not under your control, so you cannot add TeamName to it, then maintaining TableB as a "parallel table" (my own term; a separate table that will only have a row if there's a matching row in TableA) might be the only way to do things. I once set up a number of these tables, when I needed to built an application that used data from a third-party vendor's application, but where our app needed to supply additional data that didn't exist in the vendor's application. The foreign key is necessary, as it can keep your information properly tied to the vendor's data (cascading updates and deletes), without modifying the vendor's application.

  • Logical Separation: If the data in TableA and TableB are related, but are sharply divided in purpose, then it may make sense to store them in separate tables. As with the previous example, Making TableA's primary key be primary key and foreign key in TableB guarantees that you'll only have a TableB entry if the TableA entry exists, and you have to eliminate the TableB row to delete the TableA row. I've worked with a system with a Task table (with a list of work to be done), and a separate TaskHistory table (with information about the conditions under which the task was done, consumables used by the task, etc.). This can simplify business rules; a field could be NOT NULL in TaskHistory, even if it could not possibly be filled in until the task was done; if the field were in the Task table, then a special business rule would have to exist, guaranteeing that certain fields were not NULL once the task had reached completed status. Note that this will often be a mix of logic and performance: there were a number of queries in the application that only had to hit the Task table.

  • Further Relational Requirements: As noted by ypercubeᵀᴹ in the comments, one could use this to allow a third table to use the StudentID from the StudentTeams table as a foreign key, guaranteeing that the third table only linked back to students who were on a team. Personally, I'm not sure that I would implement a structure this way. This could lead to confusion as to the meaning of StudentID; it would always link back to a Student record, but it would not always link back to a StudentTeam record (although, admittedly, in the context of this third table, it would always link back to both a Student and a StudentTeam). I would be inclined to create a StudentTeamID value in StudentTeams, and link to that, as its meaning would be immediately clear (it would only link to students on a team).