Attribute and Value tables on SQL – Is this a good practice

best practicesdesign-pattern

I'm not a SQL expert and I'd like to ask for the best practices on the following schema.

On my current project, we're using Attribute and AttributeValue tables quite often. I'm thinking this might not be a good practice since it involves joins when we need that data. Example:

  • Given the Person table, there's two more tables that basically contain "extra" data about person; those are PersonAttribute and PersonAttributeValue.

I don't have a strong argument to prevent this practice on my team, but I'd like to hear what SQL experts have to say.

Best Answer

When you start using a relationnal database as a key/value store, that's normally a good indicator that you are dealing with schemaless/semi-structured data. SQL databases aren't best suited for the task and there are better alternatives such as NoSQL databases that will offer you great advantages over traditional relationnal databases, such as simplicity of design and performance gains.

Note: Schemaless data isn't the only reason that makes NoSQL databases attractive, for instance they also greatly simplify the tedious Object-relational mapping problem.

It's quite common these days to use multiple different storage technologies and the reason is simple: it's better to use the right tool to solve a problem elegantly than trying to adapt one to solve them all.

However, sometimes it isin't an option to choose your persitence layer for different reasons and you will have to find a way to deal with the problem with the tools you have.

The first thing I would do in your case is to check wheter the data you are trying to store really needs that level of schema flexibility and why?

Here's an example of a few valid use cases (in my opinion):

  1. You need to allow the creation of new entities at runtime and be able to define and change their schema dynamically.

  2. You need to store lots entities of the same kind where each of them could have arbritrary properties (think of various activities in an activity feed). I emphasis on lots because otherwise you can generally solve the problem using single table inheritance, concrete table inheritance or class table inheritance if it doesn't lead to an explosion of tables.

Now, both of these could probably still be solved using a standard relational approach but the solution would feel wrong. For instance, to solve #1, nothing prevents you use a solution where tables are dynamically created and altered and for #2, nothing prevents you from manually managing an explosion of tables, but I think there are more maintainable alternatives.

If your reason isin't listed above, I strongly advise you to validate that you actually need this kind of flexibility by reviewing what your solution would look like if you used a standard approach. If you only need to create a few additionnal tables, that's generally the best path since there are many drawbacks for the various alternatives I propose below.

Practical solutions (not perfect) to store semi-structured data:

First of all, if applicable, identify carefully the common properties for all the different type of entities and create a table ready to hold that structure. That will consist of the semi part of our semi-structured data.

Then:

  1. Add a column, let it be called "data", where you will store a serialized representation of your schemaless data in your data-interchange format of choice, such as JSON or XML.

    ADVANTAGES:

    • Reads from the DB are very fast since you do not have to perform any joins.

    • You can store any kind of complex object hierarchy, as long as the serialization format supports it.

    DOWNSIDES:

    • The data has to be deserialized to be useful (it's generally fast enough not to cause any performance issues).

    • Searches are more complex and would rely on full-text searches (I've never used this solution when the data had to be searchable and have no idea how it would scale)

    • The data is denormalized and would have to be kept in sync, if applicable. If you are too concerned about having to sync your data since it's denormalized, an approach I've experimented is to store computed properties instead of the property value. For instance, instead of storing the value "John Doe", you would store meta-data that allows to fetch the data dynamically at read, such as: id, table_name, field_name. You can implement memoization to make the reads faster. However I do not recommend this approach if you need a scalable solution.

  2. Create another table, let it be called "EntityAttribute" that will look like (that's basically what you have I believe):

 id int NOT NULL IDENTITY(1, 1) PRIMARY KEY

 entity_id int NOT NULL

 [key] varchar(50) NOT NULL

 [value] sql_variant NOT NULL

Please note that both solutions also have the disadvantage of being unable to rely on database contraints to enforce data integrity.

In conclusion, make sure to clearly identify the problem and your needs first. Check if a solution can be efficiently implemented using the tools your have. If it cannot, check if there are other tools you could use that would suit your needs. Also, have a look at alternative ways of solving your issue with the technologies you have. I haven't highlighted all the advantages and drawbacks of the alternative solutions and there are probably other alternatives as well, so be sure to look up other sources before making any choice.