Mysql – Eav model sql select only products with all 3 values

eavMySQLperformancequery-performance

I have an EAV (Entity Attribute Value) based table for smartphones like the following:

Image of table data

What if the user wants to see all phones with
– red color
– steel material

The query is like:

WHERE (attribute_id = color AND value = red)
OR
(attribute_id = material AND value = steel)

In this case the search result would also select product 2 because it has red color but I don't want this, because its material is plastic instead of steel.

You can say group results by product id column having count more than 2 rows per product.

But because it's EAV flexibility, what if the material is a new option for a product, and added to a product after a 1 million values/rows later than its early added values for this product?

I mean not all specs are one after another right? Isn't that going to slow down the search?

So I need to get products only but only with red and steel (only one option like 2. Product in the table above cant be accepted) and it doesn't matter if those options have millions rows between them it still should find all values for given values of a product.

Is there a solution?

Best Answer

To select more than one attribute:

FROM attribute AS color
JOIN attribute AS material
  ON color.content_id = material.content_id
   AND color.attribute_id = 'color' AND color.value = 'red'
   AND material.attribute_id = 'material' AND material.value = 'steel'

So much flexibility you can strangle yourself on the long sql queries it generates :-)

For other solutions see eav

Related Solutions

Mysql – Product Attribute List Design Pattern

I personally would use a model similar to the following:

The product table would be pretty basic, your main product details:

create table product
(
  part_number int, (PK)
  name varchar(10),
  price int
);
insert into product values
(1, 'product1', 50),
(2, 'product2', 95.99);

Second the attribute table to store the each of the different attributes.

create table attribute
(
  attributeid int, (PK)
  attribute_name varchar(10),
  attribute_value varchar(50)
);
insert into attribute values
(1, 'color', 'red'),
(2, 'color', 'blue'),
(3, 'material', 'chrome'),
(4, 'material', 'plastic'),
(5, 'color', 'yellow'),
(6, 'size', 'x-large');

Finally create the product_attribute table as the JOIN table between each product and its attributes associated with it.

create table product_attribute
(
  part_number int, (FK)
  attributeid int  (FK) 
);
insert into product_attribute values
(1,  1),
(1,  3),
(2,  6),
(2,  2),
(2,  6);

Depending on how you want to use the data you are looking at two joins:

select *
from product p
left join product_attribute t
  on p.part_number = t.part_number
left join attribute a
  on t.attributeid = a.attributeid;

See SQL Fiddle with Demo. This returns data in the format:

PART_NUMBER | NAME       | PRICE | ATTRIBUTEID | ATTRIBUTE_NAME | ATTRIBUTE_VALUE
___________________________________________________________________________
1           | product1   | 50    | 1           | color          | red
1           | product1   | 50    | 3           | material       | chrome
2           | product2   | 96    | 6           | size           | x-large
2           | product2   | 96    | 2           | color          | blue
2           | product2   | 96    | 6           | size           | x-large

But if you want to return the data in a PIVOT format where you have one row with all of the attributes as columns, you can use CASE statements with an aggregate:

SELECT p.part_number,
  p.name,
  p.price,
  MAX(IF(a.ATTRIBUTE_NAME = 'color', a.ATTRIBUTE_VALUE, null)) as color,
  MAX(IF(a.ATTRIBUTE_NAME = 'material', a.ATTRIBUTE_VALUE, null)) as material,
  MAX(IF(a.ATTRIBUTE_NAME = 'size', a.ATTRIBUTE_VALUE, null)) as size
from product p
left join product_attribute t
  on p.part_number = t.part_number
left join attribute a
  on t.attributeid = a.attributeid
group by p.part_number, p.name, p.price;

See SQL Fiddle with Demo. Data is returned in the format:

PART_NUMBER | NAME       | PRICE | COLOR | MATERIAL | SIZE
_________________________________________________________________
1           | product1   | 50    | red   | chrome   | null
2           | product2   | 96    | blue  | null     | x-large

As you case see the data might be in a better format for you, but if you have an unknown number of attributes, it will easily become untenable due to hard-coding attribute names, so in MySQL you can use prepared statements to create dynamic pivots. Your code would be as follows (See SQL Fiddle With Demo):

SET @sql = NULL;
SELECT
  GROUP_CONCAT(DISTINCT
    CONCAT(
      'MAX(IF(a.attribute_name = ''',
      attribute_name,
      ''', a.attribute_value, NULL)) AS ',
      attribute_name
    )
  ) INTO @sql
FROM attribute;

SET @sql = CONCAT('SELECT p.part_number
                    , p.name
                    , ', @sql, ' 
                   from product p
                   left join product_attribute t
                     on p.part_number = t.part_number
                   left join attribute a
                     on t.attributeid = a.attributeid
                   GROUP BY p.part_number
                    , p.name');

PREPARE stmt FROM @sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;

This generates the same result as the second version with no need to hard-code anything. While there are many ways to model this I think this database design is the most flexible.

Sql-server – Efficient query for an advanced EAV model

First of all, what you are about to design is probably a VERY bad idea. A much better solution would be to have a dynamic schema where you add new tables and have the application understand how to query those table (you could place them in a schema). This largely avoids all the locking and query plan issues you are bound to run into with this model. There is nothing wrong with applications running CREATE TABLE now and again.

Second, I am not sure I understand why you have normalised Parameter into its own table? Why not put that directly into the ManufacturerParameter table.

Third, if you insist on proceeding with your current model, there are ways to achieve what you want (at least if I am interpreting your requirement correctly). What you can do is to write your query in such a way that it is a sums up the search argument when there is a match and then use HAVING to filter out the values that match. I am assuming that only one of the fields Text, Boolean, Datum etc are populated per ProductParameter record (you probably want to enforce this with a constraint)

For example, to search for all products that have a bolean = true for one parameter AND text = 'abc' for some other parameter you can do:

SELECT P.Name
FROM Product P
JOIN ProductParameter PP
WHERE P.ID = Foo
  AND PP.Boolean = 1 OR PP.Text = 'abc'  ... /* For each filter */
GROUP BY P.Name /* And any other things you want out of product */
HAVING COUNT(*) >= [Number of where clauses]

If you need to list all the parameters of this product, you can use the above query template as a nested query and join back to ProductParameter.

The above query CAN be optimised by maintaining a computed column in ProductParameter that has a string representation of the different data types in that table. That way, the above OR statements can be rewritten as an IN list (which you will want to pass as a table valued parameter).

I would like to repeat that what you are doing is probably very wrong. If you do it, you will most likely need to hand tune most of your query plans - the optimiser will not help you anymore. And that is assuming you don't have too many query variants, which will run your plan cache full.

Best Answer

Related Solutions

Mysql – Product Attribute List Design Pattern

Sql-server – Efficient query for an advanced EAV model

Related Question