We have a main table, let's say, Product. It has several general attributes like 'Product Name', Manufacturer ID, etc. As of now, all these attributes are implemented in the data model as columns to the Product.
Now we have a new requirement where there could be thousands of attributes. Color, size, weight, dimensions, style, etc. Each of these attributes have pre-defined values. Like color can only be ('green', 'yellow', 'red'). The value set is defined for each attribute. But there could be thousands of attributes and each attribute could have thousands of pre-defined values.
So, for this new feature, we defined these attributes as 'row values' instead of 'columns'.
Something like this:
Product Table
ProductID Name Date_Created 1 Candy some date 2 Shirt some date 3 Pants some date
Attribute Table
AttributeID Name 1 Color 2 Dimension 3 Weight
AttributeValue Table
AttributeValueID AttributeID Value 1 1 Green 2 1 Yellow 3 1 Red 4 2 11x11 5 2 12x12 6 2 13x13 7 3 1 lb 8 3 2 lb 9 3 3 lb
Now to create association between Product Table and Attribute table, we have an association table called 'ProductAttributes'
ProductAttributes Table
ProductAttributeId ProductID AttributeID AttributeValueID 1 1 1 1 2 1 2 4 3 1 3 9 4 2 1 2 5 2 3 8 6 3 3 8
Does this model make sense? Could you suggest any improvements? As of now this ProductAttributes Table is expected to grow to 200 Million as there are a more than 100K attributes.
Other points:
1. A product could have more than one attribute
2. A product could have more than one of the same attribute. A shirt is both 'brown' and 'green'
3. All Attributes and its Values have to be pre-defined (loaded from text file)
4. Once the data is loaded (once every few months), there is heavy SELECTing. So the model needs to be read-optimized.
Best Answer
What's the question? Does this make sense? Yea, sure.
1. A product could have more than one attribute Fine
2. A product could have more than one of the same attribute. A shirt is both 'brown' and 'green' Um. If you say so. Sounds a bit strange but that's up to you, I guess.
3. All Attributes and its Values have to be pre-defined (loaded from text file) Again, sounds a bit strange that numerical fields like weight and length have to have pre-defined values. I personally go for a table of pre-defined values and storing non-predefined values separately, but it makes the structure a LOT more complex. Up to you if you want (or need) to do it this way, really.
4. Once the data is loaded (once every few months), there is heavy SELECTing. So the model needs to be read-optimized. If it's properly indexed then the fact that you've got a squillion rows shouldn't impact on performance so much.
Example query to get the product name and attributes for a product:
Index-wise, make sure you've got the usual primary key indexes. You'll also need an index on ProductAttributes.productID. You'll need an index on Attribute.name if you're planning on referring to this field in a query condition (eg get products which have a value for attribute 'weight')
You don't really need the AttributeID in the ProductAttributes table, as you've got the AttributeValueID, whose table then links to ProductAttributes. But having it there could conceivably act as a short cut if you were frequently getting a list of Attributes for a product without their values:
@Colin'tHart - you may be right about it not being a good design, but you'll need to explain why for your argument to have any weight.