How to model when you could have many (thousands of) attributes to a table (product)

database-design

We have a main table, let's say, Product. It has several general attributes like 'Product Name', Manufacturer ID, etc. As of now, all these attributes are implemented in the data model as columns to the Product.

Now we have a new requirement where there could be thousands of attributes. Color, size, weight, dimensions, style, etc. Each of these attributes have pre-defined values. Like color can only be ('green', 'yellow', 'red'). The value set is defined for each attribute. But there could be thousands of attributes and each attribute could have thousands of pre-defined values.

So, for this new feature, we defined these attributes as 'row values' instead of 'columns'.

Something like this:


Product Table

ProductID   Name    Date_Created
1           Candy     some date
2           Shirt     some date 
3           Pants     some date     


Attribute Table

AttributeID   Name
1             Color
2             Dimension
3             Weight


AttributeValue Table

AttributeValueID   AttributeID    Value
1                  1              Green
2                  1              Yellow
3                  1              Red
4                  2              11x11
5                  2              12x12
6                  2              13x13
7                  3              1 lb
8                  3              2 lb
9                  3              3 lb

Now to create association between Product Table and Attribute table, we have an association table called 'ProductAttributes'


ProductAttributes Table

ProductAttributeId      ProductID      AttributeID      AttributeValueID
1                           1              1                   1
2                           1              2                   4
3                           1              3                   9
4                           2              1                   2
5                           2              3                   8
6                           3              3                   8

Does this model make sense? Could you suggest any improvements? As of now this ProductAttributes Table is expected to grow to 200 Million as there are a more than 100K attributes.

Other points:
1. A product could have more than one attribute
2. A product could have more than one of the same attribute. A shirt is both 'brown' and 'green'
3. All Attributes and its Values have to be pre-defined (loaded from text file)
4. Once the data is loaded (once every few months), there is heavy SELECTing. So the model needs to be read-optimized.

Best Answer

What's the question? Does this make sense? Yea, sure.

1. A product could have more than one attribute Fine
2. A product could have more than one of the same attribute. A shirt is both 'brown' and 'green' Um. If you say so. Sounds a bit strange but that's up to you, I guess.
3. All Attributes and its Values have to be pre-defined (loaded from text file) Again, sounds a bit strange that numerical fields like weight and length have to have pre-defined values. I personally go for a table of pre-defined values and storing non-predefined values separately, but it makes the structure a LOT more complex. Up to you if you want (or need) to do it this way, really.
4. Once the data is loaded (once every few months), there is heavy SELECTing. So the model needs to be read-optimized. If it's properly indexed then the fact that you've got a squillion rows shouldn't impact on performance so much.

Example query to get the product name and attributes for a product:

SELECT
Product.ProductID, Product.Name,
Attribute.AttributeID, Attribute.Name,
ProductAttributes.AttributeValueID,
AttributeValue.AttributeValueID, AttributeValue.Value,

FROM
Product
JOIN ProductAttributes ON Product.ProductID = ProductAttributes.productID
JOIN AttributeValue ON ProductAttributes.AttributeValueID = AttributeValue.AttributeValueID
JOIN Attribute ON AttributeValue.AttributeID = Attribute.AttributeID

WHERE Product.ProductID = 1234

Index-wise, make sure you've got the usual primary key indexes. You'll also need an index on ProductAttributes.productID. You'll need an index on Attribute.name if you're planning on referring to this field in a query condition (eg get products which have a value for attribute 'weight')

You don't really need the AttributeID in the ProductAttributes table, as you've got the AttributeValueID, whose table then links to ProductAttributes. But having it there could conceivably act as a short cut if you were frequently getting a list of Attributes for a product without their values:

SELECT
Product.ProductID, Product.Name,
Attribute.AttributeID, Attribute.Name
FROM
Product
JOIN ProductAttributes ON Product.ProductID = ProductAttributes.productID
--THIS JOIN IS NOW NOT REQUIRED...  JOIN AttributeValue ON ProductAttributes.AttributeValueID = AttributeValue.AttributeValueID
--THIS JOIN CAN BE REPLACED BY THE ONE BELOW...  JOIN Attribute ON AttributeValue.AttributeID = Attribute.AttributeID
JOIN Attribute on ProductAttributes.AttributeID = Attribute.AttributeID
WHERE Product.ProductID = 1234

@Colin'tHart - you may be right about it not being a good design, but you'll need to explain why for your argument to have any weight.