Mysql – Product Attribute List Design Pattern

database-designMySQL

I am working on updating our website's product database. It’s built in MySQL but this is more of a general database design pattern question.

I’m planning on switching to a Supertype/Subtype pattern. Our current/previous database is mainly a single table that has data on a single type of product. We’re looking at expanding our product offering to include dissimilar products.

This new draft design is like this:

Product             product_[type]          product_attribute_[name]
----------------    ----------------        ----------------------------
part_number (PK)    part_number (FK)        attributeId (PK)
UPC                 specific_attr1 (FK)     attribute_name
price               specific_attr2 (FK)
...                 ...

I have a question regarding the product attributes tables. The idea here is a product can have a list of given attributes such as color: red, green, blue, or material: plastic, wood, chrome, aluminum, etc.

This list would be stored in a table and the primary key (PK) for that attribute item will be used in the specific product table as a foreign key (FK).

(Martin Fowler's book Patterns of Enterprise Application Architecture call this "Foreign Key Mapping")

This allows a website interface to pull the list of attributes for a given attribute type and spit it out in a drop-down select menu or some other UI element. This list can be considered an "authorized" list of attribute values.

The number of joins that ends up happening when pulling a specific product appears excessive to me. You must join every product attribute table to the product so you can get that attribute's fields. Commonly, that field might simply be nothing more than a string (varchar) for its name.

This design pattern ends up creating a large number of tables as well as you end up with a table for each attribute. One idea to counteract this would be to create something more of a “grab bag” table for all product attributes. Something like this:

product_attribute
----------------
attributeId (PK) 
name
field_name

This way, your table might look like this:

1  red     color
2  blue    color
3  chrome  material
4  plastic material
5  yellow  color
6  x-large size

This could help reduce table creep but it doesn’t cut down the number of joins and it feels a little wrong combining so many different types into a single table. But you would be able to get all the available “color” attributes pretty easily.

However, there might be an attribute that has more fields than just "name" such as a color's RGB value. This would require that specific attribute to possibly have another table or to have a single field for name:value pair (which has it's own drawbacks).

The last design pattern I can think of is storing the actual attribute value in the specific product table and not have an “attribute table” at all. Something like this:

Product             product_[type] 
----------------    ----------------
part_number (PK)    part_number (FK) 
UPC                 specific_attr1 
price               specific_attr2 
...                 ...

Instead of a Foreign Key to another table, it would contain the actual value such as:

part_number    color    material
-----------    -----    --------
1234           red      plastic

This would eliminate joins and prevent table creep (maybe?). However, this prevents having an “authorized list” of attributes. You could return all the currently entered values for a given field (ie: color) but this also eliminates the idea of having an “authorized list” of values for a given attribute.

To have that list, you would have to still create a “grab bag” attribute table or have multiple tables (table creep) for each attribute.

This creates the bigger drawback (and why I’ve never used this approach) of now having the product name in multiple locations.

If you have the color value of “red” in the “master attribute table” and also store it in the “product_[type]” table, an update to the “master” table will cause a potential data integrity issue if the application doesn’t update all the records with the old value in the “product_type” table as well.

So, after my long winded explanation and analysis of this scenario, my realization is that this can’t be an uncommon scenario and there might even a name for this type of situation.

Are there generally accepted solutions to this design challenge? Is the potentially large number of joins acceptable if the tables are relatively small? Is storing the attribute name, instead of a attribute PK acceptable under some situation? Is there another solution I’m not thinking about?

A few notes about this product database / application:

  • Products aren’t frequently updated/added/removed
  • Attributes aren’t frequently updated/added/removed
  • The table is most frequently queried for reading / returning information
  • Server side caching is enabled to cache the result of a given query/result
  • I plan on starting with just one product type and extending/adding others over time and will have potentially 10+ different types

Best Answer

I personally would use a model similar to the following:

The product table would be pretty basic, your main product details:

create table product
(
  part_number int, (PK)
  name varchar(10),
  price int
);
insert into product values
(1, 'product1', 50),
(2, 'product2', 95.99);

Second the attribute table to store the each of the different attributes.

create table attribute
(
  attributeid int, (PK)
  attribute_name varchar(10),
  attribute_value varchar(50)
);
insert into attribute values
(1, 'color', 'red'),
(2, 'color', 'blue'),
(3, 'material', 'chrome'),
(4, 'material', 'plastic'),
(5, 'color', 'yellow'),
(6, 'size', 'x-large');

Finally create the product_attribute table as the JOIN table between each product and its attributes associated with it.

create table product_attribute
(
  part_number int, (FK)
  attributeid int  (FK) 
);
insert into product_attribute values
(1,  1),
(1,  3),
(2,  6),
(2,  2),
(2,  6);

Depending on how you want to use the data you are looking at two joins:

select *
from product p
left join product_attribute t
  on p.part_number = t.part_number
left join attribute a
  on t.attributeid = a.attributeid;

See SQL Fiddle with Demo. This returns data in the format:

PART_NUMBER | NAME       | PRICE | ATTRIBUTEID | ATTRIBUTE_NAME | ATTRIBUTE_VALUE
___________________________________________________________________________
1           | product1   | 50    | 1           | color          | red
1           | product1   | 50    | 3           | material       | chrome
2           | product2   | 96    | 6           | size           | x-large
2           | product2   | 96    | 2           | color          | blue
2           | product2   | 96    | 6           | size           | x-large

But if you want to return the data in a PIVOT format where you have one row with all of the attributes as columns, you can use CASE statements with an aggregate:

SELECT p.part_number,
  p.name,
  p.price,
  MAX(IF(a.ATTRIBUTE_NAME = 'color', a.ATTRIBUTE_VALUE, null)) as color,
  MAX(IF(a.ATTRIBUTE_NAME = 'material', a.ATTRIBUTE_VALUE, null)) as material,
  MAX(IF(a.ATTRIBUTE_NAME = 'size', a.ATTRIBUTE_VALUE, null)) as size
from product p
left join product_attribute t
  on p.part_number = t.part_number
left join attribute a
  on t.attributeid = a.attributeid
group by p.part_number, p.name, p.price;

See SQL Fiddle with Demo. Data is returned in the format:

PART_NUMBER | NAME       | PRICE | COLOR | MATERIAL | SIZE
_________________________________________________________________
1           | product1   | 50    | red   | chrome   | null
2           | product2   | 96    | blue  | null     | x-large

As you case see the data might be in a better format for you, but if you have an unknown number of attributes, it will easily become untenable due to hard-coding attribute names, so in MySQL you can use prepared statements to create dynamic pivots. Your code would be as follows (See SQL Fiddle With Demo):

SET @sql = NULL;
SELECT
  GROUP_CONCAT(DISTINCT
    CONCAT(
      'MAX(IF(a.attribute_name = ''',
      attribute_name,
      ''', a.attribute_value, NULL)) AS ',
      attribute_name
    )
  ) INTO @sql
FROM attribute;

SET @sql = CONCAT('SELECT p.part_number
                    , p.name
                    , ', @sql, ' 
                   from product p
                   left join product_attribute t
                     on p.part_number = t.part_number
                   left join attribute a
                     on t.attributeid = a.attributeid
                   GROUP BY p.part_number
                    , p.name');

PREPARE stmt FROM @sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;

This generates the same result as the second version with no need to hard-code anything. While there are many ways to model this I think this database design is the most flexible.