Does adding a surrogate key get rid of an identifying relationship between two entities

database-designerdsurrogate-key

In the following example, we have a Comments table with a composite primary key (PK)

CREATE TABLE Comments (
  video_id INT NOT NULL,
  user_id INT NOT NULL,
  comment_dt DATETIME NOT NULL,
  PRIMARY KEY (video_id, user_id, comment_dt),
  FOREIGN KEY (video_id) REFERENCES Videos(video_id),
  FOREIGN KEY (user_id) REFERENCES Users(user_id)
);

There appears to be an identifying relationship, as the primary key of Comments includes its foreign keys.

However, watch what happens when you give this Comments table a surrogate primary key of its own:

CREATE TABLE Comments (
comment_id SERIAL PRIMARY KEY,
video_id INT NOT NULL,
user_id INT NOT NULL,
comment_dt DATETIME NOT NULL,
FOREIGN KEY (video_id) REFERENCES Videos(video_id),
FOREIGN KEY (user_id) REFERENCES Users(user_id)
);

Does that get rid of the identifying relationship? Logically, it still seems to be there, so does this really come down to the physical model and whether or not you want to include a composite key? Are identifying relationships even necessary?

Best Answer

For reasons of accuracy, it is indispensable to know the informational characteristics of the business environment under consideration thoroughly so as to supply a proper database design answer but, in order to answer the main inquiry and the other subquestions, I will work on some points of this post in terms of a several assumptions based on the information you did disclose.


Note: An important omission is that the question does not made known the property or properties whose values uniquely identify the entities of type Video in your particular business domain —provided that (i) Videos.video_id is an additional column set up to contain system-controlled surrogates and (ii) there is one or more conceptual properties whose values uniquely individuate each Video instance, which would entail a natural KEY in the respective logical-level table—.

Of course, supplying a detailed description of the entity types User, Video and Comment would have been of immense assistance in understanding the case at hand in a fitting manner and elaborating a more rounded answer.


Does adding a surrogate key get rid of an identifying relationship between two entities?

First of all, identifying relationship types, entity types, properties, entities (i.e. instances of an entity type), etc., are aspects that are part of the conceptual level of abstraction of a given database. In order to create a suitable conceptual schema (i.e., one that mirrors the real world faithfully), the database modeler/designer must work very closely with business experts to unveil and define these constructs with high precision, due to the fact that it must be the foundation of the database structure and constraints.

Once the modeler/designer has developed a stable conceptual schema, it has to be represented with exactitude in the respective logical SQL-DDL design. As for your example, you have added an extra column, Comments.comment_id, for system-controlled surrogates to the Comments table, which I assume stands for an entity type called Comment. That kind of column is an artifact that is not part of the significant conceptual schema, you added it and declared it as the PRIMARY KEY of said table, but having proceeded so does not mean that the conceptual schema disappears or that it loses importance.

So, assuming that

  • (a) the addition —and the declaration as the PRIMARY KEY— of the Comments.comment_id column is actually beneficial;

  • (b) in the real world, an instance of the Comment entity type is in fact uniquely identified by the values of the combination of the properties VideoId, UserId and CommentDateTime; and

  • (c) the remaining columns [and constraints] of your Comments table mirror the properties [and business rules] of the Comment entity type correctly…

…then you also have to

  • (d) declare the combination of (Comments.video_id, Comments.user_id, Comments.comment_dt) as a composite ALTERNATE KEY, by means of one multi-column UNIQUE constraint accompanied by three NOT NULL counterparts.

In this way, you would not be “getting rid of” the identifying relationship type, and one should not get rid of any conceptual representation at the logical level unless the conceptual schema has undergone a modification produced by a real-world informational change.

Logically, it still seems to be there, so does this really come down to the physical model and whether or not you want to include a composite key?

Conceptually speaking, an identifying relationship type —if and only if discovered and delimited properly— is there, it does not matter if you represented it or not in a logical-level design. The problem with not representing it properly is that (1) the logical design does not reflect (2) the conceptual schema with (3) accuracy.

On the other hand, if a SQL table is carrying the meaning of a certain entity type whose instances are uniquely identified by the values of two or more properties, then the database modeler/designer must declare the applicable multi-column key (be it PRIMARY or ALTERNATE), this is not a matter of wanting or not to include such a constraint. If the multi-column key is not declared, then the rows of the table will eventually become inconsistent with the entity type instances of the real world that they are supposed to represent (a row in a relational table is a fact asserted about a certain entity of an exact type).

In addition, composite keys help, e.g., in:

  • Ensuring that the connections between the rows values remain consistent with the cardinality ratios that come about between the entity types of significance.
  • Making the queries meaningful (in contrast with columns enclosing system-controlled surrogate values that are meaningless).
  • Easing data browsing since they allow, e.g., going from TableA to TableC without passing first through TableC, which speeds up data retrieval.

In the same way, attaching extra columns for system-controlled surrogates should not depend on what the modeler/designer wants or not; to express it differently, it is not a matter of personal preference. Appending this kind of column to a table ought to be well-justified (because, e.g., it will very likely require another supplementary index), and should be decided in terms of the overall characteristics of the database at hand, although, yes, that is a different —yet relevant— topic.

For its part, the physical level of abstraction has to do with indexes, pages, extents, records, partitions, the hard drive itself, other hardware devices, or whatever physical structure the database management system of choice employs to support the logical-level declarations of a database, so the conceptual and logical level definitions must remain independent with respect to physical constructs. The tools of the physical tier serve as well to optimize the performance of a database.

Are identifying relationships even necessary?

As pointed out above, an identifying relationship type is a conceptual construct, and it comes about in terms of the informational characteristics of the organization of interest. If a certain business environment presents identifying relationship types, then it involves several requirements: (i) to discover them, (ii) to capture them in the applicable conceptual schema and (iii) to convey all their aspects in the respective logical design.

With regards to identifying relationship types —just like with other conceptual factors—, a database modeler/designer should not

  • (a) invent them;
  • (b) overlook them;
  • (c) omit them; or
  • (d) misrepresent them.

It is the responsibility of a database/modeler designer to (1) find and (2) define each identifying relationship type of interest in order to be able to (3) represent it accurately in the corresponding logical layout. It is carrying out this set of tasks properly what is, in actual fact, necessary.