How to model when you could have many (thousands of) attributes to a table (product)

database-design

We have a main table, let's say, Product. It has several general attributes like 'Product Name', Manufacturer ID, etc. As of now, all these attributes are implemented in the data model as columns to the Product.

Now we have a new requirement where there could be thousands of attributes. Color, size, weight, dimensions, style, etc. Each of these attributes have pre-defined values. Like color can only be ('green', 'yellow', 'red'). The value set is defined for each attribute. But there could be thousands of attributes and each attribute could have thousands of pre-defined values.

So, for this new feature, we defined these attributes as 'row values' instead of 'columns'.

Something like this:

Product Table

ProductID   Name    Date_Created
1           Candy     some date
2           Shirt     some date 
3           Pants     some date

Attribute Table

AttributeID   Name
1             Color
2             Dimension
3             Weight

AttributeValue Table

AttributeValueID   AttributeID    Value
1                  1              Green
2                  1              Yellow
3                  1              Red
4                  2              11x11
5                  2              12x12
6                  2              13x13
7                  3              1 lb
8                  3              2 lb
9                  3              3 lb

Now to create association between Product Table and Attribute table, we have an association table called 'ProductAttributes'

ProductAttributes Table

ProductAttributeId      ProductID      AttributeID      AttributeValueID
1                           1              1                   1
2                           1              2                   4
3                           1              3                   9
4                           2              1                   2
5                           2              3                   8
6                           3              3                   8

Does this model make sense? Could you suggest any improvements? As of now this ProductAttributes Table is expected to grow to 200 Million as there are a more than 100K attributes.

Other points:
1. A product could have more than one attribute
2. A product could have more than one of the same attribute. A shirt is both 'brown' and 'green'
3. All Attributes and its Values have to be pre-defined (loaded from text file)
4. Once the data is loaded (once every few months), there is heavy SELECTing. So the model needs to be read-optimized.

Best Answer

What's the question? Does this make sense? Yea, sure.

1. A product could have more than one attribute Fine
2. A product could have more than one of the same attribute. A shirt is both 'brown' and 'green' Um. If you say so. Sounds a bit strange but that's up to you, I guess.
3. All Attributes and its Values have to be pre-defined (loaded from text file) Again, sounds a bit strange that numerical fields like weight and length have to have pre-defined values. I personally go for a table of pre-defined values and storing non-predefined values separately, but it makes the structure a LOT more complex. Up to you if you want (or need) to do it this way, really.
4. Once the data is loaded (once every few months), there is heavy SELECTing. So the model needs to be read-optimized. If it's properly indexed then the fact that you've got a squillion rows shouldn't impact on performance so much.

Example query to get the product name and attributes for a product:

SELECT
Product.ProductID, Product.Name,
Attribute.AttributeID, Attribute.Name,
ProductAttributes.AttributeValueID,
AttributeValue.AttributeValueID, AttributeValue.Value,

FROM
Product
JOIN ProductAttributes ON Product.ProductID = ProductAttributes.productID
JOIN AttributeValue ON ProductAttributes.AttributeValueID = AttributeValue.AttributeValueID
JOIN Attribute ON AttributeValue.AttributeID = Attribute.AttributeID

WHERE Product.ProductID = 1234

Index-wise, make sure you've got the usual primary key indexes. You'll also need an index on ProductAttributes.productID. You'll need an index on Attribute.name if you're planning on referring to this field in a query condition (eg get products which have a value for attribute 'weight')

You don't really need the AttributeID in the ProductAttributes table, as you've got the AttributeValueID, whose table then links to ProductAttributes. But having it there could conceivably act as a short cut if you were frequently getting a list of Attributes for a product without their values:

SELECT
Product.ProductID, Product.Name,
Attribute.AttributeID, Attribute.Name
FROM
Product
JOIN ProductAttributes ON Product.ProductID = ProductAttributes.productID
--THIS JOIN IS NOW NOT REQUIRED...  JOIN AttributeValue ON ProductAttributes.AttributeValueID = AttributeValue.AttributeValueID
--THIS JOIN CAN BE REPLACED BY THE ONE BELOW...  JOIN Attribute ON AttributeValue.AttributeID = Attribute.AttributeID
JOIN Attribute on ProductAttributes.AttributeID = Attribute.AttributeID
WHERE Product.ProductID = 1234

@Colin'tHart - you may be right about it not being a good design, but you'll need to explain why for your argument to have any weight.

Related Solutions

Database schema for a product with multiple categories and hierarchical categories

What you are proposing is a good solution for your requirement of M:N products to categories and hierarchical categories.

To avoid exposing yourself to numerous updates: You need to do two things to ensure that you don't have a lot of updates in your intersection table.

First, you need to be sure that your categories have a stable, persistent primary key.

Second, you need to link food items to leaf categories. Don't join Cherry to Red, Healthy, Fruit and Food - just join it to Red and Healthy. Your nested sets take care of all of the secondary (and higher level) associations.

Product Attribute List Design Pattern in MySQL

I personally would use a model similar to the following:

The product table would be pretty basic, your main product details:

create table product
(
  part_number int, (PK)
  name varchar(10),
  price int
);
insert into product values
(1, 'product1', 50),
(2, 'product2', 95.99);

Second the attribute table to store the each of the different attributes.

create table attribute
(
  attributeid int, (PK)
  attribute_name varchar(10),
  attribute_value varchar(50)
);
insert into attribute values
(1, 'color', 'red'),
(2, 'color', 'blue'),
(3, 'material', 'chrome'),
(4, 'material', 'plastic'),
(5, 'color', 'yellow'),
(6, 'size', 'x-large');

Finally create the product_attribute table as the JOIN table between each product and its attributes associated with it.

create table product_attribute
(
  part_number int, (FK)
  attributeid int  (FK) 
);
insert into product_attribute values
(1,  1),
(1,  3),
(2,  6),
(2,  2),
(2,  6);

Depending on how you want to use the data you are looking at two joins:

select *
from product p
left join product_attribute t
  on p.part_number = t.part_number
left join attribute a
  on t.attributeid = a.attributeid;

See SQL Fiddle with Demo. This returns data in the format:

PART_NUMBER | NAME       | PRICE | ATTRIBUTEID | ATTRIBUTE_NAME | ATTRIBUTE_VALUE
___________________________________________________________________________
1           | product1   | 50    | 1           | color          | red
1           | product1   | 50    | 3           | material       | chrome
2           | product2   | 96    | 6           | size           | x-large
2           | product2   | 96    | 2           | color          | blue
2           | product2   | 96    | 6           | size           | x-large

But if you want to return the data in a PIVOT format where you have one row with all of the attributes as columns, you can use CASE statements with an aggregate:

SELECT p.part_number,
  p.name,
  p.price,
  MAX(IF(a.ATTRIBUTE_NAME = 'color', a.ATTRIBUTE_VALUE, null)) as color,
  MAX(IF(a.ATTRIBUTE_NAME = 'material', a.ATTRIBUTE_VALUE, null)) as material,
  MAX(IF(a.ATTRIBUTE_NAME = 'size', a.ATTRIBUTE_VALUE, null)) as size
from product p
left join product_attribute t
  on p.part_number = t.part_number
left join attribute a
  on t.attributeid = a.attributeid
group by p.part_number, p.name, p.price;

See SQL Fiddle with Demo. Data is returned in the format:

PART_NUMBER | NAME       | PRICE | COLOR | MATERIAL | SIZE
_________________________________________________________________
1           | product1   | 50    | red   | chrome   | null
2           | product2   | 96    | blue  | null     | x-large

As you case see the data might be in a better format for you, but if you have an unknown number of attributes, it will easily become untenable due to hard-coding attribute names, so in MySQL you can use prepared statements to create dynamic pivots. Your code would be as follows (See SQL Fiddle With Demo):

SET @sql = NULL;
SELECT
  GROUP_CONCAT(DISTINCT
    CONCAT(
      'MAX(IF(a.attribute_name = ''',
      attribute_name,
      ''', a.attribute_value, NULL)) AS ',
      attribute_name
    )
  ) INTO @sql
FROM attribute;

SET @sql = CONCAT('SELECT p.part_number
                    , p.name
                    , ', @sql, ' 
                   from product p
                   left join product_attribute t
                     on p.part_number = t.part_number
                   left join attribute a
                     on t.attributeid = a.attributeid
                   GROUP BY p.part_number
                    , p.name');

PREPARE stmt FROM @sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;

This generates the same result as the second version with no need to hard-code anything. While there are many ways to model this I think this database design is the most flexible.