I personally would use a model similar to the following:
The product table would be pretty basic, your main product details:
create table product
(
part_number int, (PK)
name varchar(10),
price int
);
insert into product values
(1, 'product1', 50),
(2, 'product2', 95.99);
Second the attribute table to store the each of the different attributes.
create table attribute
(
attributeid int, (PK)
attribute_name varchar(10),
attribute_value varchar(50)
);
insert into attribute values
(1, 'color', 'red'),
(2, 'color', 'blue'),
(3, 'material', 'chrome'),
(4, 'material', 'plastic'),
(5, 'color', 'yellow'),
(6, 'size', 'x-large');
Finally create the product_attribute table as the JOIN table between each product and its attributes associated with it.
create table product_attribute
(
part_number int, (FK)
attributeid int (FK)
);
insert into product_attribute values
(1, 1),
(1, 3),
(2, 6),
(2, 2),
(2, 6);
Depending on how you want to use the data you are looking at two joins:
select *
from product p
left join product_attribute t
on p.part_number = t.part_number
left join attribute a
on t.attributeid = a.attributeid;
See SQL Fiddle with Demo. This returns data in the format:
PART_NUMBER | NAME | PRICE | ATTRIBUTEID | ATTRIBUTE_NAME | ATTRIBUTE_VALUE
___________________________________________________________________________
1 | product1 | 50 | 1 | color | red
1 | product1 | 50 | 3 | material | chrome
2 | product2 | 96 | 6 | size | x-large
2 | product2 | 96 | 2 | color | blue
2 | product2 | 96 | 6 | size | x-large
But if you want to return the data in a PIVOT
format where you have one row with all of the attributes as columns, you can use CASE
statements with an aggregate:
SELECT p.part_number,
p.name,
p.price,
MAX(IF(a.ATTRIBUTE_NAME = 'color', a.ATTRIBUTE_VALUE, null)) as color,
MAX(IF(a.ATTRIBUTE_NAME = 'material', a.ATTRIBUTE_VALUE, null)) as material,
MAX(IF(a.ATTRIBUTE_NAME = 'size', a.ATTRIBUTE_VALUE, null)) as size
from product p
left join product_attribute t
on p.part_number = t.part_number
left join attribute a
on t.attributeid = a.attributeid
group by p.part_number, p.name, p.price;
See SQL Fiddle with Demo. Data is returned in the format:
PART_NUMBER | NAME | PRICE | COLOR | MATERIAL | SIZE
_________________________________________________________________
1 | product1 | 50 | red | chrome | null
2 | product2 | 96 | blue | null | x-large
As you case see the data might be in a better format for you, but if you have an unknown number of attributes, it will easily become untenable due to hard-coding attribute names, so in MySQL you can use prepared statements to create dynamic pivots. Your code would be as follows (See SQL Fiddle With Demo):
SET @sql = NULL;
SELECT
GROUP_CONCAT(DISTINCT
CONCAT(
'MAX(IF(a.attribute_name = ''',
attribute_name,
''', a.attribute_value, NULL)) AS ',
attribute_name
)
) INTO @sql
FROM attribute;
SET @sql = CONCAT('SELECT p.part_number
, p.name
, ', @sql, '
from product p
left join product_attribute t
on p.part_number = t.part_number
left join attribute a
on t.attributeid = a.attributeid
GROUP BY p.part_number
, p.name');
PREPARE stmt FROM @sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
This generates the same result as the second version with no need to hard-code anything. While there are many ways to model this I think this database design is the most flexible.
Your sortpath isn't technically being "changed" by the ORDER BY. The correct values are all there, just not coming up in the order you anticipated. In SQL, if you don't explicitly sort a set of data, the order in which it comes back to you is undefined... and the fact that it originally appeared to be in the anticipated order was more along the lines of coincidental.
I tweaked your fiddle (the "sorted" query):
GROUP_CONCAT(s.SortOrder ORDER BY a.PathLength DESC separator '') AS SortPath
Now, it looks like the results are being explicitly sorted in the order you expect, which is the order in which they were implicitly sorted before.
So why did that order change? If you do an EXPLAIN SELECT on the "sorted" query with and without the ORDER BY clause, you'll notice that the ordering of the tables changes in the output of EXPLAIN depending on whether you specify ORDER BY on the entire query. This is because the query optimizer re-orders the join-order of the tables to what it thinks is the most cost-effective sequence.
With the ORDER BY, the tables are joined from left to right, n-d-a-s. Without the ORDER BY, the tables are joined from left to right, d-n-a-s. Both are logically valid, but this causes the rows to be read from the tables in a different order, and your unsorted GROUP CONCAT's output order changes as a result of the change to the query plan.
Best Answer
Short and dirty answer. Not efficient, it will have to scan the whole table. Will give only 1 row as result, if there are more than one rows with same (maximal) number of matches ("wrong" data in your terminology), the choice is arbitrary:
Slightly better than the above but not more efficient. If there are more than one rows with same (maximal) number of matches, the choice is not arbitrary but based on the assumption that a row with "item and type" is preferred over a row with "item and colour" which is preferred over a row with "type and colour":