Sql-server – Applying user-defined fields to arbitrary entities

database-designrelational-theorysql serversql-server-2008-r2

Currently we have an old (rather crude) system that has user-defined fields, which are mapped against rows in arbitrary tables. This was an after-the-fact modification based on a customer request, and it wasn't really designed to scale well. Our system has around 60 different types of entities, which makes things even more complicated. Essentially the implementation looks like this:

USER_DEFINED_FIELDS:

UDF_ID         int
ENTITY_TYPE    int
UDF_NAME       nvarchar(64)
UDF_DATA_TYPE  int
UDF_FORMAT     nvarchar(16)
UDF_MASK       nvarchar(24)
UDF_FLAGS      int

UDF_VALUES_NUMBER:

UDF_ID         int
ENTITY_ID      int
VALUE          int
MODIFIED       datetime

UDF_VALUES_TEXT:

UDF_ID         int
ENTITY_ID      int
VALUE          nvarchar(255)
MODIFIED       datetime

etc…

This gets nice and fun when we generate our own ways to index compound primary keys, but that's another DailyWTF-worthy story.

Obviously this is pretty hideous, and leads to some spectacularly horrific queries being generated, but it's worked alright for now because we limit each entity to a maximum of 5 user-defined fields. As a quick disclaimer, I wasn't with the company when this design decision was made! 😉

Anyway, we're about to start a shiny new project and will inevitably need a better way of doing this, with no restrictions on the number of UDFs we can apply to entities, increased performance, and less horror in the generated query department. If we could run a NoSQL solution like Mongo or Redis I'd be happy and wouldn't need to bother you all, but sadly that's not an option. Instead, I need a way to do this from within SQL Server 2008 R2.

So far, I've come up with the following options:

  • Individual UDF table for each entity type, with identical structures.
    • Benefits: Queries are faster and simpler, solves the compound index problem, can search by UDF.
    • Downsides: Table versioning is more cumbersome, lots of extra tables, difficult to index across similar entity types (e.g. purchase orders and sales orders)
  • Binary blob UDF column in every entity's table.
    • Benefits: Really fast: no fetch overhead, no extra JOINs.
    • Downsides: Can't search by UDF, extra proprietary code required.
  • XML column in every entity's table.
    • Benefits: Fast like binary blobs, seems to be indexable.
    • Downsides: No idea how they work – looks complicated!

So, do any of these ideas have merit? Is there a better alternative?

Best Answer

The immediate question is "what do you mean by 'user defined' fields?" Who defines these? For purpose of this I am assuming that these are fixed for each table type by a small number of users (administrators) who have special access.

One thing I have done in the past is to have separate UDF tables for each entity type, but instead of separate rows per value, allow stored procedures to alter the columns on those UDF tables, so you can essentially join straight across, 1:1. This approach I call "semi-EAV." I think have catalogs like your user_defined_fields table to help the app generate queries.

Of the methods you are discussing, I would suggest using XML if my semi-EAV approach is not sufficient. The tradeoff here is that semi-EAV allows you to do more rigorous schema validation using the tools you are familiar with in SQL. The advantage of XML is that you can have a great deal of additional flexibility that this approach doesn't allow.

As a note, with LedgerSMB, we are talking about moving from semi-EAV for custom values to JSON, but we are on PostgreSQL. XML might have been an option as well.