Mysql – Product attribute schema design, mixing solutions

database-designMySQL

I am working on a product database in MySQL and the products can be multiple types with different attributes. The number of attributes and there datatype vary. I read about some design patterns (post) used in this situation, but none of them looks promising. I am thinking about mixing them somehow.

Lets say, I have these product types (D – dimension, but could be attribute or entity…):

Product_type1: D1 – int, D2 – int, D3 – string, D4 – date, D5 – string

Product_type2: D1 – int, D2 – string, D3 – date, D4 – int, D5 – string, D6 – int

Solution 1:

I store a fix number of attributes of every datatype for a product, let's say 3 int field, 2 date, 2 string. The most important attributes stored here (important regarding indexing and searching). The rest of the attributes are stored in a JSON field or an array format.

Solution 2:

Store everything in indexed JSON field and search in the table with 'LIKE' queries.

How fast is a Like query in MySQL when you have a big product database (20 000 – 30 000 products). Is solution 1 better or not?

Best Answer

Put common things (price, SKU) in MySQL columns; index them if needed.

Put the rest in a JSON column. Do not try to use LIKE to search that.

Instead, first filter on what you can from the other columns, then use your application language to finish the filtering.

To save on space and bandwidth, compress (in the client) the JSON string and put it into a BLOB column.

Write your user-interface to encourage them to filter on the real columns, only reluctantly let them specify the other things.

See also MariaDB's "Dynamic Columns; it's been available for several releases. In the future, 5.7 promises to have a bunch of JSON tools.

Related Solutions

Inventory Database Design Issue

I agree with TomTom on the merits of the Datamodel resource book - it will give you more insight than StackOverflow, and if memory serves, there's a fairly detailed discussion of exactly this scenario.

The design you've come up with is similar to what I've seen for other systems - what you call ProductVariation is often called "variant".

I'm a little confused about the role of SKU in your design - the acronym stands for "Stock Keeping Unit". As it's supposed to be unique and never change, I'd make that the primary key of your ProductVariation table, rather than ID.

If you're only ever going to sell your current product range, or are happy to refactor in future, you can leave it as is; alternatively, you could consider a design with "table per subclass". This would give you a schema along the lines of:

Product (id, name, categoryID, description)
- SizeVariant (SKU, productID, finish, width, thickness, Size, location, quantity)
- ColourVariant (SKU, productID, Colour, finish, location, quantity)
- NoVariant (SKU, productID, location, quantity)

Note how I've also put some of the attributes you've currently assigned to "Product" into the child tables.

This keeps the schema more "self describing", but adds a lot of work - only do that if you think the product catalogue will grow in the future, and you want to avoid EAV.

Mysql – Product Attribute List Design Pattern

I personally would use a model similar to the following:

The product table would be pretty basic, your main product details:

create table product
(
  part_number int, (PK)
  name varchar(10),
  price int
);
insert into product values
(1, 'product1', 50),
(2, 'product2', 95.99);

Second the attribute table to store the each of the different attributes.

create table attribute
(
  attributeid int, (PK)
  attribute_name varchar(10),
  attribute_value varchar(50)
);
insert into attribute values
(1, 'color', 'red'),
(2, 'color', 'blue'),
(3, 'material', 'chrome'),
(4, 'material', 'plastic'),
(5, 'color', 'yellow'),
(6, 'size', 'x-large');

Finally create the product_attribute table as the JOIN table between each product and its attributes associated with it.

create table product_attribute
(
  part_number int, (FK)
  attributeid int  (FK) 
);
insert into product_attribute values
(1,  1),
(1,  3),
(2,  6),
(2,  2),
(2,  6);

Depending on how you want to use the data you are looking at two joins:

select *
from product p
left join product_attribute t
  on p.part_number = t.part_number
left join attribute a
  on t.attributeid = a.attributeid;

See SQL Fiddle with Demo. This returns data in the format:

PART_NUMBER | NAME       | PRICE | ATTRIBUTEID | ATTRIBUTE_NAME | ATTRIBUTE_VALUE
___________________________________________________________________________
1           | product1   | 50    | 1           | color          | red
1           | product1   | 50    | 3           | material       | chrome
2           | product2   | 96    | 6           | size           | x-large
2           | product2   | 96    | 2           | color          | blue
2           | product2   | 96    | 6           | size           | x-large

But if you want to return the data in a PIVOT format where you have one row with all of the attributes as columns, you can use CASE statements with an aggregate:

SELECT p.part_number,
  p.name,
  p.price,
  MAX(IF(a.ATTRIBUTE_NAME = 'color', a.ATTRIBUTE_VALUE, null)) as color,
  MAX(IF(a.ATTRIBUTE_NAME = 'material', a.ATTRIBUTE_VALUE, null)) as material,
  MAX(IF(a.ATTRIBUTE_NAME = 'size', a.ATTRIBUTE_VALUE, null)) as size
from product p
left join product_attribute t
  on p.part_number = t.part_number
left join attribute a
  on t.attributeid = a.attributeid
group by p.part_number, p.name, p.price;

See SQL Fiddle with Demo. Data is returned in the format:

PART_NUMBER | NAME       | PRICE | COLOR | MATERIAL | SIZE
_________________________________________________________________
1           | product1   | 50    | red   | chrome   | null
2           | product2   | 96    | blue  | null     | x-large

As you case see the data might be in a better format for you, but if you have an unknown number of attributes, it will easily become untenable due to hard-coding attribute names, so in MySQL you can use prepared statements to create dynamic pivots. Your code would be as follows (See SQL Fiddle With Demo):

SET @sql = NULL;
SELECT
  GROUP_CONCAT(DISTINCT
    CONCAT(
      'MAX(IF(a.attribute_name = ''',
      attribute_name,
      ''', a.attribute_value, NULL)) AS ',
      attribute_name
    )
  ) INTO @sql
FROM attribute;

SET @sql = CONCAT('SELECT p.part_number
                    , p.name
                    , ', @sql, ' 
                   from product p
                   left join product_attribute t
                     on p.part_number = t.part_number
                   left join attribute a
                     on t.attributeid = a.attributeid
                   GROUP BY p.part_number
                    , p.name');

PREPARE stmt FROM @sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;

This generates the same result as the second version with no need to hard-code anything. While there are many ways to model this I think this database design is the most flexible.

Best Answer

Related Solutions

Inventory Database Design Issue

Mysql – Product Attribute List Design Pattern

Related Question