Mysql – Best practice for when the ideal number of columns is unknown

database-designMySQL

I'm currently making a MySQL database to list products for an e-commerce site. Each product will have several sizes available which will each have their own dimensions and measurements (e.g. length, width, thickness) and pricing. However, different products, even within the same category, have different numbers of dimensions (some will have 2 widths at different sections, etc.) and there isn't necessarily a maximum number of measurements required to uniquely ID a product size. The options I can see are:

  • Create a table every time a product is created to account for how many size measurements it needs. Obviously a poor way to handle this.
  • Create a table with an arbitrarily large number of "measurement" columns. Leaves a lot of nulls if it's too large, and requires that I manually edit the table structure later if it's too small.
  • Create a table with two CSV varchars: one to identify pseudo-column names, the other for values. Sorting and searching by any value but the first would be largely impossible to my knowledge, but the information is all there.
  • Create two tables: one for the individual product size code (linking to the product of course), and another with some kind of "key-value" setup: one column for the product code, another for the name of the measurement (e.g. length), and a third for the value of that measurement. Necessitates more joins to get the same information as the other solutions, plus an extra column for an "index" if I want to keep sizing attributes in a particular order (or the size key could be an enum, I suppose).

The last solution seems most reasonable, but is there a better way to go about this?

Best Answer

Will you ever search by those parameters accross different products? It does not seem very probable to me.

If you do not need to manipulate those in sql, only retrieve them for any given product, then store it as xml/json/anything in one blob column (or dynamic column on MariaDB). You then can get one product with all variants and maybe "filter" them by user input in application code (there won't probably be hundreds of them).

But if you have to search/aggregate etc, then do the second table for them and use join or another query to retrieve them as needed. Join is fast, it is a core function of relational DB.