Mandatory and non mandatory attributes for products

database-design

I am trying to come up with a simple way of implementing the following design, but I always end up with something overly complicated.

I am trying to design a database where companies can store products. Each product belongs to exactly one category, and depending on the category, they share some common attributes that I want to be mandatory. They can also have different attributes that are up to the company to define.

As of now, and based on this question I asked last year, my idea is to have the following:

  • Company table to store everything related to the company (one company can have several products)
  • Product table to store everything related to the product (one product can be in one category, and can have several features)
  • Category table just to save the name of the category
  • Attribute table attributes that can be defined by the company for that product
  • Product Feature table joint table

With these tables, companies will be able to add their products to a category and add as many attributes as they want.

Now, dependant on the category, I want to make mandatory to define a set of attributes. For example, if we are talking about groceries, and a category would be 'Fruits', I might want the company to always provide the following attributes: color, country of origin. If instead of fruits, the category is 'Cereals' I might want to be always provided with the following: allergens, expire date

I have thought of two different solutions, which are:

  • Create a new table for each category that needs these mandatory attributes. This, of course, will be a problem as if a new mandatory attribute is added to an existing category, all previous ones will have no value. Also, every time I add a new category I should define a new table for the attributes.
  • The other option I have thought of is to simply manage this in the frontend of the application. Once a company adds a product belonging to one of those categories, prompt them to input the mandatory attributes and simply save them as a 'normal' attribute.

I am more inclined towards using the second solution. Using the first one I could end up with a ddbb with dozens of tables, but I would like some input to see if anyone can come up with a better solution.

Best Answer

I think further information is still required before a design can be decided; namely, whether it is important to validate the attribute data in any way. This may affect the data type you choose for storing attribute data.

For example, you will probably want to store your "expire date" attribute using a date-based data type. There are many reasons for this, but consider one: if you accept just strings in which people type information representing dates, you may get a variety of formats for those dates, eg) "1/2/20" for first of February, 2020, in Australia, "1/2/20" for second of January, 2020, in USA, "Feb 1st 2020", "1 Feb 20" and "1 February 20" - all representing the same date. Yuck.

You could mitigate that issue by introducing rules in your application logic - for example, use a date picker which always outputs dates in a standard format ... but now the robustness of your data model depends on application developers doing what you ask them to.

However, even without a decision on data types, I can propose some guidelines that have their caveats (relating to the questions of validation).

I would have a table called product. If there is data that every product has, regardless of category, then I would store it here - think name, code, activeFlag, dateCreated, createdBy, dateLastModified, lastModifiedBy, etc.

I would have a table called category. This will probably have a name field. (If you are building a category tree then you will need fields to allow you to put categories into a hierarchy. For now I will assume the category list is a straight list of categories - not a hierarchy).

I would have a table called attribute. This will have a field named code. The code is a unique, non-null value that, once set, should never be modified. This means if you need to introduce business rule logic into your application that relates to a given attribute, your application can reference the attribute by its code value - I am talking about hard-coding the data value in application code - and not have to worry about the logic breaking in future (because that data value will never change). A concrete example: Say "colour" is an attribute - let's put a field on the table called name and put the word "Colour" in that field. Now, in the code field we might put a value like "COL" (short for "colour"). In your application you might have a business rule saying that managers can update the colour value for products, but other staff cannot, so your application code will have a line of code something like if userIsManager===1 and fieldCode==='COL' then renderFieldAsEditable().

All good so far. Now I would create a table categoryAttribute which contains the PK columns from category and attribute (probably autoincrement integer values - I didn't bother mentioning them before). This table essentially says "this category has this attribute". Now I would put further fields on this table, such as mandatory - perhaps as a bit data type where 1 means this attribute is mandatory in this category, and 0 means it is optional. I might include sortOrder of some kind - so you can dynamically define the order in which these fields should be rendered by the application code - which, if you haven't guessed, should just be looping through all attributes for a given category when it generates the data entry form in the application. If you want to get really clever, maybe have a field for validationMask which will contain a regular expression - in the language of your choice, to be applied by the application at data entry time: user data fails to match the regular expression? Too bad - you can't update. Oh, then why not have validationMessage too, to show them why we can't save that bad data (like "Dates must be in the format DD/MM/YYYY").

Coming back to that question of data validation - all well and good if you can validate with a singular regular expression (and the mandatory flag), but if you need to validate on the database server prior to updating/modifying data, then consider the following additions. You might like to put another column on your attribute table called something like type or attributeType. It is a pseudo-representation of data type. It might contain string values from a fixed set that you define, like "date", "int", "string". You might then write a few database functions along the lines of validateIntAttribute, validateStringAttribute and validateDateAttribute. They might make use of native functions like SQL Server's try_cast to check that the data value received from the application (ie. received as string input) can be cast to the data type we want.

Lastly, we come to saving user-contributed data. For this we have a table for data. It relates to product and attribute. This means for a given product-attribute combination, here we are saving the data value - probably in a field like value or data. This means literally every single data value becomes a separate row on this table - regardless of which product relates to or how many attributes are saved for a given product. If you wanted to, you could actually have a few tables like this - for the different data types (I mentioned date, int and string before) - and then you could actually cast your data to the relevant data type, but splitting across tables means that when you retrieve data you will have lots of "if the attribute is of type 'int' then join to the intData table" - and that's really bad for performance and legibility - I don't recommend it.

So... the above works fine, if you are ok with the caveat of all your data being saved in a string-based type (and therefore building whatever you need to, to ensure consistency in data representation).

You might also want to think about another type of attribute: the dropdown list. I'll leave that modeling to you.

I have built all the above, and it works.

It takes some people time to wrap their heads around the notion of fields being records (rows) on a table (here, attribute) and the data values for a single "record" (like your products) being split across multiple rows (in your data table). It also takes time to think about how you're going to query your data and edit or delete it, but it can be done. The result is a very dynamic and incredibly extensible approach to storing attribute data which has a good measure of future-proofing. In fact, during development of the system I've designed, the business encountered an excellent opportunity - but needed to be able to store two new kinds of information against their records. Two new fields were defined and added to the system in less than five minutes and the application rendered them - in correct sort order, with validation rules implemented - automatically; no coding/deploy/release required.

Oh! One last comment. You mentioned you are looking for a simple solution and always come up with something overly complicated. I wouldn't call the above design simple, but what I will say is that any time you need new fields for storing data in future, you simply define them dynamically (in data) in minutes without the need for any application deployment. Secondly, the system I built was geared around contact information - I found at least seven different areas in the application where contact information had been modeled in different ways: that's seven iterations of design, code, test, deploy - each with a different data model, each with its own issues, none of them extensible. This one design will, in due course, replace all seven of those - making that information stored in a consistent way across the entire application. So that's your second trade-off - building something extensible is more work now for trivial work in future. Building something "simple" now means less work now and ongoing work - and reduced business responsiveness - in future.