In the following example, we have a Comments
table with a composite primary key (PK)
CREATE TABLE Comments (
video_id INT NOT NULL,
user_id INT NOT NULL,
comment_dt DATETIME NOT NULL,
PRIMARY KEY (video_id, user_id, comment_dt),
FOREIGN KEY (video_id) REFERENCES Videos(video_id),
FOREIGN KEY (user_id) REFERENCES Users(user_id)
);
There appears to be an identifying relationship, as the primary key of Comments
includes its foreign keys.
However, watch what happens when you give this Comments
table a surrogate primary key of its own:
CREATE TABLE Comments (
comment_id SERIAL PRIMARY KEY,
video_id INT NOT NULL,
user_id INT NOT NULL,
comment_dt DATETIME NOT NULL,
FOREIGN KEY (video_id) REFERENCES Videos(video_id),
FOREIGN KEY (user_id) REFERENCES Users(user_id)
);
Does that get rid of the identifying relationship? Logically, it still seems to be there, so does this really come down to the physical model and whether or not you want to include a composite key? Are identifying relationships even necessary?
Best Answer
For reasons of accuracy, it is indispensable to know the informational characteristics of the business environment under consideration thoroughly so as to supply a proper database design answer but, in order to answer the main inquiry and the other subquestions, I will work on some points of this post in terms of a several assumptions based on the information you did disclose.
Note: An important omission is that the question does not made known the property or properties whose values uniquely identify the entities of type Video in your particular business domain —provided that (i)
Videos.video_id
is an additional column set up to contain system-controlled surrogates and (ii) there is one or more conceptual properties whose values uniquely individuate each Video instance, which would entail a natural KEY in the respective logical-level table—.Of course, supplying a detailed description of the entity types User, Video and Comment would have been of immense assistance in understanding the case at hand in a fitting manner and elaborating a more rounded answer.
First of all, identifying relationship types, entity types, properties, entities (i.e. instances of an entity type), etc., are aspects that are part of the conceptual level of abstraction of a given database. In order to create a suitable conceptual schema (i.e., one that mirrors the real world faithfully), the database modeler/designer must work very closely with business experts to unveil and define these constructs with high precision, due to the fact that it must be the foundation of the database structure and constraints.
Once the modeler/designer has developed a stable conceptual schema, it has to be represented with exactitude in the respective logical SQL-DDL design. As for your example, you have added an extra column,
Comments.comment_id
, for system-controlled surrogates to theComments
table, which I assume stands for an entity type called Comment. That kind of column is an artifact that is not part of the significant conceptual schema, you added it and declared it as the PRIMARY KEY of said table, but having proceeded so does not mean that the conceptual schema disappears or that it loses importance.So, assuming that
(a) the addition —and the declaration as the PRIMARY KEY— of the
Comments.comment_id
column is actually beneficial;(b) in the real world, an instance of the Comment entity type is in fact uniquely identified by the values of the combination of the properties VideoId, UserId and CommentDateTime; and
(c) the remaining columns [and constraints] of your
Comments
table mirror the properties [and business rules] of the Comment entity type correctly……then you also have to
(Comments.video_id, Comments.user_id, Comments.comment_dt)
as a composite ALTERNATE KEY, by means of one multi-column UNIQUE constraint accompanied by three NOT NULL counterparts.In this way, you would not be “getting rid of” the identifying relationship type, and one should not get rid of any conceptual representation at the logical level unless the conceptual schema has undergone a modification produced by a real-world informational change.
Conceptually speaking, an identifying relationship type —if and only if discovered and delimited properly— is there, it does not matter if you represented it or not in a logical-level design. The problem with not representing it properly is that (1) the logical design does not reflect (2) the conceptual schema with (3) accuracy.
On the other hand, if a SQL table is carrying the meaning of a certain entity type whose instances are uniquely identified by the values of two or more properties, then the database modeler/designer must declare the applicable multi-column key (be it PRIMARY or ALTERNATE), this is not a matter of wanting or not to include such a constraint. If the multi-column key is not declared, then the rows of the table will eventually become inconsistent with the entity type instances of the real world that they are supposed to represent (a row in a relational table is a fact asserted about a certain entity of an exact type).
In addition, composite keys help, e.g., in:
TableA
toTableC
without passing first throughTableC
, which speeds up data retrieval.In the same way, attaching extra columns for system-controlled surrogates should not depend on what the modeler/designer wants or not; to express it differently, it is not a matter of personal preference. Appending this kind of column to a table ought to be well-justified (because, e.g., it will very likely require another supplementary index), and should be decided in terms of the overall characteristics of the database at hand, although, yes, that is a different —yet relevant— topic.
For its part, the physical level of abstraction has to do with indexes, pages, extents, records, partitions, the hard drive itself, other hardware devices, or whatever physical structure the database management system of choice employs to support the logical-level declarations of a database, so the conceptual and logical level definitions must remain independent with respect to physical constructs. The tools of the physical tier serve as well to optimize the performance of a database.
As pointed out above, an identifying relationship type is a conceptual construct, and it comes about in terms of the informational characteristics of the organization of interest. If a certain business environment presents identifying relationship types, then it involves several requirements: (i) to discover them, (ii) to capture them in the applicable conceptual schema and (iii) to convey all their aspects in the respective logical design.
With regards to identifying relationship types —just like with other conceptual factors—, a database modeler/designer should not
It is the responsibility of a database/modeler designer to (1) find and (2) define each identifying relationship type of interest in order to be able to (3) represent it accurately in the corresponding logical layout. It is carrying out this set of tasks properly what is, in actual fact, necessary.