The best database design to keep track of fields with multiple values

database-design

I am trying to dynamically build a form with a label (field) and a user entered value (fieldValue) but the form can have several different control, like a combobox, textbox, listbox, checkbox, etc….

Here is my schema right now

Page
* PageId

Field
* FieldId
* FiledTypeId
* FieldName

FieldType
* FieldTypeId
* FieldTypeName (Combobox, Textbox, etc…)

FieldValue
* FieldValueId
* FieldId
* PageId
* FieldValueValue
* CurrentValue (This is a bit because a combobox could have multiple values but I wanna show which one is selected. Similarly, a listbox could have multiple values selected, a textbox would not have any values selected.)

The problem with my design is that every time I have a form using the same fields, I have to add all of the items in the combobox again as it is a new field, specific to a new page AND when I create a new page how will it know that "these" values belong to a combobox, for example. I've thought of having a 0 to many table to keep track of comboboxes and things with multiple values and then my field value table will only hold one value per field and the CurrentValue column will be gone but I'm not 100% sure if this is correct.

Any suggestions?

Best Answer

Your new design constraint is such that there's a many-to-many relationship between pages and fields, and a many-to-many relationship between page-fields and field values. That means you'd have to do this:

Page
* Page_Id

Field
* Field_Id
* Field_Type_Id
* Field_Name

Page_Field
* Page_Field_ID
* Page_Id
* Field_Id

Page_Field_FieldValue
* Page_Field_FieldValue_ID
* Page_Field_ID
* FieldValue_Id

FieldValue
* FieldValue_Id
* FieldValue_Value
* Current_Value

"That's really gross" you're thinking. Yes, it is. I'd strongly suggest creating views (indexed or materialized if possible) to make retrieval easier. It doesn't really solve your problem of having to define everything for each new page, but at least you won't be storing the same data repeatedly.

The Field_Type table doesn't change at all. It's relation to Field is the same. I did not list it in order to reduce confusion.

I'd also caution about using the FieldValue.Current_Value field as you seem to be suggesting, as that will not really work if your application allows multiple users. Application state needs to be stored separately from application data, typically.

Related Solutions

Sql-server – Table structure for many custom profiles

It always pains me to suggest it but theres a reasonable case for making use of an XML column here.

Create appropriate tables to handle commonly used fields i.e. name, address, email, age, gender etc. I'd be tempted to store the custom fieldsets as an XML column. Once the number of subscribers they are emailing reaches a level where performance may become a problem, trigger a notification to the DBA so they can review the custom fieldsets created by your users and see if there is justification for moving them out of XML columns to defined sub-type tables.

Obviously you end up with two code paths for handling XML and sub-type definitions but you retain flexibility while having a process for dealing with the potential performance implications.

If SQL2008 was an option, I'd consider making use of sparse columns for the custom fields (generic definitions, along the lines you mentioned) and combine with filtered indexes for the larger customers.

Field types database design best practice

Not discounting @gbn's suggestion (in a comment on the question) to possibly store this data somewhere other than in a RDBMS, I will say that if you do decide to go the RDBMS route, you are better off using a single table with a column per each "type" such that they can be strongly-typed. Or, you could use a single table with a single string field as all data can be converted/serialized to a string, but then you have to be careful to do the conversion properly in both directions for all operations, and that might prove problematic if some uses of that table either incorrectly convert or forget to convert altogether.

The reason for choosing a single table is mostly logistics: it is far easier to dynamically select a different column based on a COALESCE or CASE (or CASE-like) clause than it is to dynamically pull from different tables in a query based on the TypeID of the row (at least not without having to LEFT JOIN all "type" tables in all queries (yikes!).

Depending on which RDBMS you are using, there might even be vendor-specific features to assist in this model. For example, SQL Server, starting in version 2008, I believe, has a SPARSE option that can be added to NULLable columns. This allows for a NULL value to take up 0 bytes whereas normally, for fixed-length fields (i.e. pretty much all of them that aren't [N]VAR{something} or {something} VARYING or XML or generically text or SQL_VARIANT), they would always take up their specified number of bytes. The downside is that for non-NULL values, they now take up 4 extra bytes per row. But it can be a huge savings when the column is 75% or more NULL across all rows. This feature was designed specifically to address this type of model.

Just make sure to have a CHECK CONSTRAINT to ensure that for all of the value_ fields, either none, or only one, of them is NOT NULL.

Then you can access via something like:

COALESCE(value_int, value_string, value_datetime, etc.)

And again, if you are using SQL Server (and maybe one or more other vendors have this feature), you can make use of "filtered indexes" to index each specific value_ field where that value is not null:

CREATE NONCLUSTERED INDEX [IX_FieldValues_ValueInt]
  ON dbo.[FieldValues] ([ValueInt])
  WHERE [ValueInt] IS NOT NULL;

This will allow you to target rows that do have values, although probably not when using the generic COALESCE access method, but when targeting a specific Value% field.

AND, looking again at the two options, I would actually go with Option 3, which is a single table. In Option 2 it does not seem like there is much purpose in breaking field_type_id out as that data is not really a parent to the field_value data. A field_id can only have a single value, right? So just put field_type_id in the field_value table. And make use of it in the CHECK CONSTRAINT that ensures a single field at most is NOT NULL, to ensure that the one field, if any, that is NOT NULL matches the field_type_id.

Best Answer

Related Solutions

Sql-server – Table structure for many custom profiles

Field types database design best practice

Related Question