Entity ID field that can refer to multiple tables

database-design

In designing both audit tables and notification tables, I've run into an issue where I need a row to refer to an entity in another table, but which table that entity is isn't always the same. For example:

Each time a user makes a post or uploads a new photo to their album, a notification is created. If the notification is for a post, the notification entry must refer to the posts table. If it is for a photo, the notification entry must refer to the photos table.
Moderators may edit posts, delete posts, or block users. The audit_actions entry must refer to the entity that was acted upon in the corresponding table.

My original approach looked like this (this is heavily simplified, not the actual table structure):

TABLE notification_types:
     id         INT
     type_id    INT
     entity_id  INT

If type_id was 1, entity_id would refer to an entry in the posts table
If type_id was 2, entity_id would refer to an entry in the photos table
etc

I was starting to use a similar scheme for audit tables. My plan was to make an audit_history table for each auditable table that mirrored the original table (e.g. posts and post_audit_history), and maintain a single audit_actions table that linked an actor and action to each history item

TABLE audit_actions
     admin_id        INT
     action_id       INT
     entity_id       INT
     audit_entry_id  INT

If action_id was 1 ("edited post"), entity_id would refer to an entry in posts, audit_entry_id would refer to an entry in posts_audit_history, etc

Of course, this approach is terrible. entity_id can refer to multiple tables, which is confusing and prevents it from being a foreign key. There is no way to enforce integrity without complicated triggers.

I've been trying researching and trying to find solutions but I am stuck. Some approaches I've considered:

Maintain an entity_ids table. Each entity in every table then has a foreign key to an entry in the entity_ids table; in other words, every single entity in any table has a globally unique ID. This maintains integrity, but when we looked up an item by its entity_id we'd have to search multiple tables to find the entity with that ID
Maintain separate notification tables for each type of notification, and separate audit tables for each type of auditable table. E.g. post_notifications, photo_notifications, post_audit_actions, profile_audit_actions, etc. This would require running messy and expensive joins for what were formerly simple queries like 'find all notifications for user X' or 'get all audit records for user Y' or 'get the 100 most recent audit records'

Which approach is best? Is there a better strategy that I haven't thought of here?

Best Answer

One idea comes from extending the analogy of interfaces in object-oriented programming. You can have an IGeneratesNotification "interface" entity that the Posts and Photos entities "implement", as well as an IAuditable "interface" entity that the Posts and Users entities "implement". It would be possible for an entity to implement multiple interface entities, such as in the case of Posts. Here's a diagram of how this can be done.

enter image description here

Note that the fields containing references to the interface entities would be unique.

Advantages:

You can enforce referential integrity to the interface entity that can be implemented by one of several entities.
If there are fields common to all implementing entities, they can be abstracted out to the interface entity.

Disadvantages:

It is possible for the id of an interface entity to be present in 0, or more than one, implementing entities.
The service layer would be required to determine which entity the notification or audit action applies to before knowing which table to query.

Related Solutions

Design Tagging module in the database

What you describe in your second paragraph sounds like the Entity Attribute Value pattern.

This is a known pattern for dealing with the type of data you are talking about. When you need to be able to be extensible, drive the types from the data and allow you to start tracking new types as you grow. It allows you stay flexible and agile and be a little more forgiving and generic in your database design and schema.

This pattern works, even in relational databases, but it does take more work to get good performance, and can cause you headaches down the line depending on how many rows, how many inserts you are doing and how frequently you query it. This SO Question talks about some of the pitfalls nicely in the firs answer listed with 20 votes. I have seen this model work well, especially in environments where the new types and entities really do come in and we want to let that be more application driven or user driven but we were fighting with performance more often in that model than a traditional relational database model but we needed the flexibility and we made it work. Definitely look at that SO question and do some more research on EAV and whatever DBMS you are using to find examples from others who have tried the same.

Sql-server – SQL Server – Database per company. How to query across databases

I don't know if this will help, and this maybe a little much for your situation, but here goes our solution in use today.

All users are grouped into logical entities under a domain (something.com) umbrella. Our particular scenario required an additional layer of Domain->Company->Group->User break out. Not sure if you ever worked with Active Directory or domain trees, but it follows that logic.

Using the domain model, the server itself is at the top, presumably the client of your services would be the administrator. Each company flows like a branch from the server itself. Then flows into user-defined groups which contain users. Each company in this scenario would have a Domain Admin which can administer the xyz.com, abc.com, etc domains, but can't access any other domain.

Each of the service containers (the databases) use a full trust model with the security database to provide services to the users contained in this security database (Think OpenID for perspective). We use a home brewed C++ compiled module for Apache 2.x to provide an application firewall and session security host.

Each run, the module "asks" the security database to record the page hit, produce a random (64 character string) cookie and a session cookie (two total cookies), and authenticate the session. A new session or the same session is returned based on if the cookies match. The session key and anything else relevant is provided to the end-user UI application code over HTTP headers (so the firewalls could sit in front of Google Apps or AppEngine).

Once the UI has the code, the UI can act on behalf of the user and the service database accepts this token and provides direct permissions to the user based on the token. Our application provides the username as well to match a unique user within the service database to provide extended permissions. Since we also use per-transaction random keys, the state is unable to be cached.

This model supports isolation (UI developers are unable to eavesdrop), allows a central security with a delegation aspect, and the ability to provide centralized services. Such as providing a central forum, trending (statistical compiled) data stored centrally (which maybe restricted to paying clients), or maybe even weblogs as we provide to each of our service databases to make decisions to gauge a fraudulent checkout request.

    SERVERS TABLE Minimum of id (unique key), domain, adminuser
                |
                | -------> OPTIONAL Access Control List
                |
    SERVER USER CONTROL LIST Minimum of id (unique key), serverid, actual url you wish to protect. 
                OPTIONAL restrictions on AdminOnly, NoSearchEngines, Restricted (Authentication Required)
                |
                |
                | --------> OPTIONAL ENTITY (Sub-Company) Abstraction
                |
                | --------> OPTIONAL GROUPS
                |
                |
    USERS   Minimum of id (unique key), username, serverid (unless you use the entity container which are already tied to the server)
                |
                |
                |
    PERMISSIONS Minimum of id (unique key), server_ucl_id, groupid/entityid/userid (Depending on preference)

    SESSIONS    Minimum of id (unique key), serverid, userid if authenticated

    Provide the sessionid and either userid or username to tamper-resistant proxy code that will provide it to the end-user UI.

We have a lot of bells and whistles in our app to provide a more robust multi-tenant solution, but this is the basics of what we are doing.

The objective is to totally isolate each company into it's own container. Proxy code of some sort should help. C++ is not required, just need code to sit in front of the web application. Also if you want to provide security in one database and shared services in another database, your shared database, which presumably will also be under your client's control, would be allowed to query directly to the security database to see if the user is logged in based on the tokens provided to it, such as when the user posts.

There are no error messages to humans using this model. If anything, any errors would either be pushed via some type of messaging que, email, or log files. The authentication layer is between the proxy code and the security database. User management is within an application that you create for the end-user companies.

This model also scales well with SQL Azure in the middle if needed as it does not need anything like CLR, FT, or etc. The only true weak point will always be the proxy code for security enthusiast. Physical security, and limited user access should lock that down.

Hope I wasn't to verbose and that this helps!

Best Answer

Related Solutions

Design Tagging module in the database

Sql-server – SQL Server – Database per company. How to query across databases

Related Question