Best way to define related resources db model

database-design

We have different type of resources (docs, events, media,…) each one has each own table to store related information for each resource, having their respective PK-id.

We have the need to manage the relation between different resources, it is, a document could have, for example, two events and one media collection related to it.

Our first idea is have the following schema:

resource_id | resource_type | related_id | related_type

We need to know the type of the resource because different type of resources could have the same id.

A tuple example of the approach:

resource_id | resource_type | related_id | related_type
-------------------------------------------------------
1           | document      | 1          | event         
1           | document      | 5          | event
1           | document      | 33         | media

The relation could be bidirectional or directional. Then, for example this record could also be valid

33          | media         | 1          | document

Then,
– when displaying the collection with id 33, we will find the document related (doc_id:1)
– when displaying the document with id 1, we will find the related collection and other two events (media_id:33,event_id:1,event_id:5)

To know the details of the related resources (title of the media collection, for example, we will query the table of media collections)

I think that schema could cover our need, but since I am not a DBA guru, I would appreciate to know if this approach is good enough or there are other ideas which cover this need better in terms of performance, flexibility, …

EDIT: Adding the schema of the mentioned approach

CREATE TABLE `related_resources` (
  `fk_resource_id` int(11) NOT NULL,
  `resource_type` varchar(64) NOT NULL,
  `fk_related_id` int(11) NOT NULL,
  `related_type` varchar(64) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

--
-- Indexes for dumped tables
--

--
-- Indexes for table `related_resources`
--
ALTER TABLE `related_resources`
  ADD PRIMARY KEY (`fk_resource_id`,`resource_type`,`fk_related_id`,`related_type`);

Best Answer

This is definitely a workable solution. The one thing that this prevents is the use of built-in referential integrity options that your DBMS may provide.

As an example, resource_id and related_id cannot be foreign keys to another table - because both could reference multiple other tables.

So, maintenance of the relationship between a "document" resource_id and the document table must be handled in your code. You need to be sure that (for example) when you delete a document row, you remove rows from your resource_link table with that "document" resource_id or related_id.

In general, it's easiest to allow the database to maintain such information appropriately by itself. It's much less likely that you'll inadvertently allow a document row to be deleted without cleaning up connections elsewhere.

To avoid this, you'd need a solution where either:

  • you have 2 bridge tables for each possible pair of resource types (document_media, document_event, media_document, media_event, event_document, event_media, etc.; possibly even document_document, media_media and event_event); or
  • each resource type would need a bridge table with a column for each possible related resource (e.g., document_bridge would have the columns document_id, media_id, event_id, related_document_id, etc.), with a constraint to ensure one and only one of the related IDs is set.

Each of your options has its pros and cons. If you're confident that you'll be able to enforce data referential integrity "manually", your solution may be the cleanest, and the easiest for another person to follow. If the available resource type are expected to grow dramatically, then the other solutions (both of which would require adding or modifying multiple tables) will become more and more awkward to work with over time.

On the other hand, if the resource types are expected to be static, and there aren't more than a couple more, using one of the other solutions would ensure referential integrity (if properly set up with foreign keys). Also note that, with either of the other two solutions, it would be possible to create a view matching your proposed single table; if simply having a single place to get a simple list of what's related to something is considered a plus, you can have that with any option.

The queries to retrieve all related resources will be a little messy regardless of which option you take.

In terms of performance, the more indirection you have, the more work the DB engine will have to do - technically, at least. In practice, I suspect the performance difference between what you propose and (say) the "resource X resource" version (specific tables for each possible relationship, for both possible directions) would be small enough to be ignorable.

So, I don't think there is a single "best" way to do this. Ultimately, the comfort level of the development and support teams with the solution is likely to be the most critical determining factor. If your development team has experience with manually maintaining data referential integrity, your solution is perfectly reasonable, and might well be the easiest for someone else to follow.