Mysql – Product Attribute List Design Pattern

database-designMySQL

I am working on updating our website's product database. It’s built in MySQL but this is more of a general database design pattern question.

I’m planning on switching to a Supertype/Subtype pattern. Our current/previous database is mainly a single table that has data on a single type of product. We’re looking at expanding our product offering to include dissimilar products.

This new draft design is like this:

Product             product_[type]          product_attribute_[name]
----------------    ----------------        ----------------------------
part_number (PK)    part_number (FK)        attributeId (PK)
UPC                 specific_attr1 (FK)     attribute_name
price               specific_attr2 (FK)
...                 ...

I have a question regarding the product attributes tables. The idea here is a product can have a list of given attributes such as color: red, green, blue, or material: plastic, wood, chrome, aluminum, etc.

This list would be stored in a table and the primary key (PK) for that attribute item will be used in the specific product table as a foreign key (FK).

(Martin Fowler's book Patterns of Enterprise Application Architecture call this "Foreign Key Mapping")

This allows a website interface to pull the list of attributes for a given attribute type and spit it out in a drop-down select menu or some other UI element. This list can be considered an "authorized" list of attribute values.

The number of joins that ends up happening when pulling a specific product appears excessive to me. You must join every product attribute table to the product so you can get that attribute's fields. Commonly, that field might simply be nothing more than a string (varchar) for its name.

This design pattern ends up creating a large number of tables as well as you end up with a table for each attribute. One idea to counteract this would be to create something more of a “grab bag” table for all product attributes. Something like this:

product_attribute
----------------
attributeId (PK) 
name
field_name

This way, your table might look like this:

1  red     color
2  blue    color
3  chrome  material
4  plastic material
5  yellow  color
6  x-large size

This could help reduce table creep but it doesn’t cut down the number of joins and it feels a little wrong combining so many different types into a single table. But you would be able to get all the available “color” attributes pretty easily.

However, there might be an attribute that has more fields than just "name" such as a color's RGB value. This would require that specific attribute to possibly have another table or to have a single field for name:value pair (which has it's own drawbacks).

The last design pattern I can think of is storing the actual attribute value in the specific product table and not have an “attribute table” at all. Something like this:

Product             product_[type] 
----------------    ----------------
part_number (PK)    part_number (FK) 
UPC                 specific_attr1 
price               specific_attr2 
...                 ...

Instead of a Foreign Key to another table, it would contain the actual value such as:

part_number    color    material
-----------    -----    --------
1234           red      plastic

This would eliminate joins and prevent table creep (maybe?). However, this prevents having an “authorized list” of attributes. You could return all the currently entered values for a given field (ie: color) but this also eliminates the idea of having an “authorized list” of values for a given attribute.

To have that list, you would have to still create a “grab bag” attribute table or have multiple tables (table creep) for each attribute.

This creates the bigger drawback (and why I’ve never used this approach) of now having the product name in multiple locations.

If you have the color value of “red” in the “master attribute table” and also store it in the “product_[type]” table, an update to the “master” table will cause a potential data integrity issue if the application doesn’t update all the records with the old value in the “product_type” table as well.

So, after my long winded explanation and analysis of this scenario, my realization is that this can’t be an uncommon scenario and there might even a name for this type of situation.

Are there generally accepted solutions to this design challenge? Is the potentially large number of joins acceptable if the tables are relatively small? Is storing the attribute name, instead of a attribute PK acceptable under some situation? Is there another solution I’m not thinking about?

A few notes about this product database / application:

Products aren’t frequently updated/added/removed
Attributes aren’t frequently updated/added/removed
The table is most frequently queried for reading / returning information
Server side caching is enabled to cache the result of a given query/result
I plan on starting with just one product type and extending/adding others over time and will have potentially 10+ different types

Best Answer

I personally would use a model similar to the following:

The product table would be pretty basic, your main product details:

create table product
(
  part_number int, (PK)
  name varchar(10),
  price int
);
insert into product values
(1, 'product1', 50),
(2, 'product2', 95.99);

Second the attribute table to store the each of the different attributes.

create table attribute
(
  attributeid int, (PK)
  attribute_name varchar(10),
  attribute_value varchar(50)
);
insert into attribute values
(1, 'color', 'red'),
(2, 'color', 'blue'),
(3, 'material', 'chrome'),
(4, 'material', 'plastic'),
(5, 'color', 'yellow'),
(6, 'size', 'x-large');

Finally create the product_attribute table as the JOIN table between each product and its attributes associated with it.

create table product_attribute
(
  part_number int, (FK)
  attributeid int  (FK) 
);
insert into product_attribute values
(1,  1),
(1,  3),
(2,  6),
(2,  2),
(2,  6);

Depending on how you want to use the data you are looking at two joins:

select *
from product p
left join product_attribute t
  on p.part_number = t.part_number
left join attribute a
  on t.attributeid = a.attributeid;

See SQL Fiddle with Demo. This returns data in the format:

PART_NUMBER | NAME       | PRICE | ATTRIBUTEID | ATTRIBUTE_NAME | ATTRIBUTE_VALUE
___________________________________________________________________________
1           | product1   | 50    | 1           | color          | red
1           | product1   | 50    | 3           | material       | chrome
2           | product2   | 96    | 6           | size           | x-large
2           | product2   | 96    | 2           | color          | blue
2           | product2   | 96    | 6           | size           | x-large

But if you want to return the data in a PIVOT format where you have one row with all of the attributes as columns, you can use CASE statements with an aggregate:

SELECT p.part_number,
  p.name,
  p.price,
  MAX(IF(a.ATTRIBUTE_NAME = 'color', a.ATTRIBUTE_VALUE, null)) as color,
  MAX(IF(a.ATTRIBUTE_NAME = 'material', a.ATTRIBUTE_VALUE, null)) as material,
  MAX(IF(a.ATTRIBUTE_NAME = 'size', a.ATTRIBUTE_VALUE, null)) as size
from product p
left join product_attribute t
  on p.part_number = t.part_number
left join attribute a
  on t.attributeid = a.attributeid
group by p.part_number, p.name, p.price;

See SQL Fiddle with Demo. Data is returned in the format:

PART_NUMBER | NAME       | PRICE | COLOR | MATERIAL | SIZE
_________________________________________________________________
1           | product1   | 50    | red   | chrome   | null
2           | product2   | 96    | blue  | null     | x-large

As you case see the data might be in a better format for you, but if you have an unknown number of attributes, it will easily become untenable due to hard-coding attribute names, so in MySQL you can use prepared statements to create dynamic pivots. Your code would be as follows (See SQL Fiddle With Demo):

SET @sql = NULL;
SELECT
  GROUP_CONCAT(DISTINCT
    CONCAT(
      'MAX(IF(a.attribute_name = ''',
      attribute_name,
      ''', a.attribute_value, NULL)) AS ',
      attribute_name
    )
  ) INTO @sql
FROM attribute;

SET @sql = CONCAT('SELECT p.part_number
                    , p.name
                    , ', @sql, ' 
                   from product p
                   left join product_attribute t
                     on p.part_number = t.part_number
                   left join attribute a
                     on t.attributeid = a.attributeid
                   GROUP BY p.part_number
                    , p.name');

PREPARE stmt FROM @sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;

This generates the same result as the second version with no need to hard-code anything. While there are many ways to model this I think this database design is the most flexible.

Related Solutions

Optimum design / structure for product data database

All you need is two tables. 1 for Products and 1 for Attributes. You would then have a one to many relationship from the Products table to the Attributes table.

This means that each product can have as many attributes as it needs.

Table structure would look something like this.

Products
Product_Id INT PK Auto Increment
ProductCode VARCHAR(255)

Attriutes
Attribute_Id INT PK Auto Increment
Product_Id INT FK
AttributeName VARCHAR(255)

AttributeValues
AttributeValue_Id INT PK
Attribute_Id INT FK
AttributeValue VARCHAR(255)

How to model when you could have many (thousands of) attributes to a table (product)

What's the question? Does this make sense? Yea, sure.

1. A product could have more than one attribute Fine
2. A product could have more than one of the same attribute. A shirt is both 'brown' and 'green' Um. If you say so. Sounds a bit strange but that's up to you, I guess.
3. All Attributes and its Values have to be pre-defined (loaded from text file) Again, sounds a bit strange that numerical fields like weight and length have to have pre-defined values. I personally go for a table of pre-defined values and storing non-predefined values separately, but it makes the structure a LOT more complex. Up to you if you want (or need) to do it this way, really.
4. Once the data is loaded (once every few months), there is heavy SELECTing. So the model needs to be read-optimized. If it's properly indexed then the fact that you've got a squillion rows shouldn't impact on performance so much.

Example query to get the product name and attributes for a product:

SELECT
Product.ProductID, Product.Name,
Attribute.AttributeID, Attribute.Name,
ProductAttributes.AttributeValueID,
AttributeValue.AttributeValueID, AttributeValue.Value,

FROM
Product
JOIN ProductAttributes ON Product.ProductID = ProductAttributes.productID
JOIN AttributeValue ON ProductAttributes.AttributeValueID = AttributeValue.AttributeValueID
JOIN Attribute ON AttributeValue.AttributeID = Attribute.AttributeID

WHERE Product.ProductID = 1234

Index-wise, make sure you've got the usual primary key indexes. You'll also need an index on ProductAttributes.productID. You'll need an index on Attribute.name if you're planning on referring to this field in a query condition (eg get products which have a value for attribute 'weight')

You don't really need the AttributeID in the ProductAttributes table, as you've got the AttributeValueID, whose table then links to ProductAttributes. But having it there could conceivably act as a short cut if you were frequently getting a list of Attributes for a product without their values:

SELECT
Product.ProductID, Product.Name,
Attribute.AttributeID, Attribute.Name
FROM
Product
JOIN ProductAttributes ON Product.ProductID = ProductAttributes.productID
--THIS JOIN IS NOW NOT REQUIRED...  JOIN AttributeValue ON ProductAttributes.AttributeValueID = AttributeValue.AttributeValueID
--THIS JOIN CAN BE REPLACED BY THE ONE BELOW...  JOIN Attribute ON AttributeValue.AttributeID = Attribute.AttributeID
JOIN Attribute on ProductAttributes.AttributeID = Attribute.AttributeID
WHERE Product.ProductID = 1234

@Colin'tHart - you may be right about it not being a good design, but you'll need to explain why for your argument to have any weight.

Best Answer

Related Solutions

Optimum design / structure for product data database

How to model when you could have many (thousands of) attributes to a table (product)

Related Question