What's the question? Does this make sense? Yea, sure.
1. A product could have more than one attribute Fine
2. A product could have more than one of the same attribute. A shirt is both 'brown' and 'green' Um. If you say so. Sounds a bit strange but that's up to you, I guess.
3. All Attributes and its Values have to be pre-defined (loaded from text file) Again, sounds a bit strange that numerical fields like weight and length have to have pre-defined values. I personally go for a table of pre-defined values and storing non-predefined values separately, but it makes the structure a LOT more complex. Up to you if you want (or need) to do it this way, really.
4. Once the data is loaded (once every few months), there is heavy SELECTing. So the model needs to be read-optimized. If it's properly indexed then the fact that you've got a squillion rows shouldn't impact on performance so much.
Example query to get the product name and attributes for a product:
SELECT
Product.ProductID, Product.Name,
Attribute.AttributeID, Attribute.Name,
ProductAttributes.AttributeValueID,
AttributeValue.AttributeValueID, AttributeValue.Value,
FROM
Product
JOIN ProductAttributes ON Product.ProductID = ProductAttributes.productID
JOIN AttributeValue ON ProductAttributes.AttributeValueID = AttributeValue.AttributeValueID
JOIN Attribute ON AttributeValue.AttributeID = Attribute.AttributeID
WHERE Product.ProductID = 1234
Index-wise, make sure you've got the usual primary key indexes. You'll also need an index on ProductAttributes.productID. You'll need an index on Attribute.name if you're planning on referring to this field in a query condition (eg get products which have a value for attribute 'weight')
You don't really need the AttributeID in the ProductAttributes table, as you've got the AttributeValueID, whose table then links to ProductAttributes. But having it there could conceivably act as a short cut if you were frequently getting a list of Attributes for a product without their values:
SELECT
Product.ProductID, Product.Name,
Attribute.AttributeID, Attribute.Name
FROM
Product
JOIN ProductAttributes ON Product.ProductID = ProductAttributes.productID
--THIS JOIN IS NOW NOT REQUIRED... JOIN AttributeValue ON ProductAttributes.AttributeValueID = AttributeValue.AttributeValueID
--THIS JOIN CAN BE REPLACED BY THE ONE BELOW... JOIN Attribute ON AttributeValue.AttributeID = Attribute.AttributeID
JOIN Attribute on ProductAttributes.AttributeID = Attribute.AttributeID
WHERE Product.ProductID = 1234
@Colin'tHart - you may be right about it not being a good design, but you'll need to explain why for your argument to have any weight.
Your answer
more than one attribute with same name (Color for example),
So then create a different attribute with more specificity.
I am also required to store the image photo url.
This is just another attribute...
Product Attribute
-----------------
2 | photo_exterior
Product Attribute Value
-----------------------
id: 1
attribute_id: 2
value: 01/a0/01a0sdadasfsdfsdfasdfasfdfsafs.jpg
All of that said, I wouldn't be doing this unless I had to. I would be using JSONB.
What is the best way to fetch products with all attributes?
I'm unsure what kind of resultset you want. Are you asking how do you query an EAV? If you can make it, you can certainly do the joins to get the result set out of it.
Is there any way to fetch each product with all attributes with one query (one product -> one row)?
Yes, if you don't know how many attributes you have, you'll likely end up using hstore
or jsonb
though. See jsonb_agg()
X-Y problem?
Examining speed in order to get you to reasses JSONB and save your life. Create a million products, rename the attribute using jsonb.
CREATE TABLE products(id,name,data)
AS
SELECT x, 'foo'::text, jsonb_build_object('color', 'blue') AS data
FROM generate_series(1,1e6) AS gs(x);
UPDATE products
SET data = data-'color' || jsonb_build_object('colour', data->'color');
Time: 5063.010 ms
On my x230 laptop, with an SSD. That's 197 rows/ms? You recreating Amazon that you have to update faster than that?
Best Answer
I would design a meta-data/value based model, as:
Or, where formality is enforced,
Being
AttributeCode
andValueCode
unique, textual, human-readable identifiers, alongside the correspondingAttributeId
andValueId
.For instance, the hypothetical meta-attribute length, physically stored with
AttributeId = 1
, would (should!) be referenced in specific queries asAttributeCode = 'LENGTH'
.(of course,
AttributeCode
andValueCode
could compose the PKs... but I personally prefer PKs columns to be integer-only)