I personally don't like to use a multi-table schema for this purpose.
- It's hard to ensure integrity.
- It's hard to maintain.
- It's difficult to filter results.
I've set a dbfiddle sample.
My proposed table schema:
CREATE TABLE #Brands
(
BrandId int NOT NULL PRIMARY KEY,
BrandName nvarchar(100) NOT NULL
);
CREATE TABLE #Clothes
(
ClothesId int NOT NULL PRIMARY KEY,
ClothesName nvarchar(100) NOT NULL
);
-- Lookup table for known attributes
--
CREATE TABLE #Attributes
(
AttrId int NOT NULL PRIMARY KEY,
AttrName nvarchar(100) NOT NULL
);
-- holds common propeties, url, price, etc.
--
CREATE TABLE #BrandsClothes
(
BrandId int NOT NULL REFERENCES #Brands(BrandId),
ClothesId int NOT NULL REFERENCES #Clothes(ClothesId),
VievingUrl nvarchar(300) NOT NULL,
Price money NOT NULL,
PRIMARY KEY CLUSTERED (BrandId, ClothesId),
INDEX IX_BrandsClothes NONCLUSTERED (ClothesId, BrandId)
);
-- holds specific and unlimited attributes
--
CREATE TABLE #BCAttributes
(
BrandId int NOT NULL REFERENCES #Brands(BrandId),
ClothesId int NOT NULL REFERENCES #Clothes(ClothesId),
AttrId int NOT NULL REFERENCES #Attributes(AttrId),
AttrValue nvarchar(300) NOT NULL,
PRIMARY KEY CLUSTERED (BrandId, ClothesId, AttrId),
INDEX IX_BCAttributes NONCLUSTERED (ClothesId, BrandId, AttrId)
);
Let me insert some data:
INSERT INTO #Brands VALUES
(1, 'Brand1'), (2, 'Brand2');
INSERT INTO #Clothes VALUES
(1, 'Pants'), (2, 'T-Shirt');
INSERT INTO #Attributes VALUES
(1, 'Color'), (2, 'Size'), (3, 'Shape'), (4, 'Provider'), (0, 'Custom');
INSERT INTO #BrandsClothes VALUES
(1, 1, 'http://mysite.com?B=1&C=1', 123.99),
(1, 2, 'http://mysite.com?B=1&C=2', 110.99),
(2, 1, 'http://mysite.com?B=2&C=1', 75.99),
(2, 2, 'http://mysite.com?B=2&C=2', 85.99);
INSERT INTO #BCAttributes VALUES
(1, 1, 1, 'Blue, Red, White'),
(1, 1, 2, '32, 33, 34'),
(1, 2, 1, 'Pearl, Black widow'),
(1, 2, 2, 'M, L, XL'),
(2, 1, 4, 'Levis, G-Star, Armani'),
(2, 1, 3, 'Slim fit, Regular fit, Custom fit'),
(2, 2, 4, 'G-Star, Armani'),
(2, 2, 3, 'Slim fit, Regular fit'),
(2, 2, 0, '15% Discount');
If you need to fetch common attributes:
SELECT b.BrandName, c.ClothesName, bc.VievingUrl, bc.Price
FROM #BrandsClothes bc
INNER JOIN #Brands b
ON b.BrandId = bc.BrandId
INNER JOIN #Clothes c
ON c.ClothesId = bc.ClothesId
ORDER BY bc.BrandId, bc.ClothesId;
BrandName ClothesName VievingUrl Price
--------- ----------- ------------------------- ------
Brand1 Pants http://mysite.com?B=1&C=1 123.99
Brand1 T-Shirt http://mysite.com?B=1&C=2 110.99
Brand2 Pants http://mysite.com?B=2&C=1 75.99
Brand2 T-Shirt http://mysite.com?B=2&C=2 85.99
Or you can easily get Clothes by Brand:
Give me all clothes of Brand2
SELECT c.ClothesName, b.BrandName, a.AttrName, bca.AttrValue
FROM #BCAttributes bca
INNER JOIN #BrandsClothes bc
ON bc.BrandId = bca.BrandId
AND bc.ClothesId = bca.ClothesId
INNER JOIN #Brands b
ON b.BrandId = bc.BrandId
INNER JOIN #Clothes c
ON c.ClothesId = bc.ClothesId
INNER JOIN #Attributes a
ON a.AttrId = bca.AttrId
WHERE bca.ClothesId = 2
ORDER BY bca.ClothesId, bca.BrandId, bca.AttrId;
ClothesName BrandName AttrName AttrValue
----------- --------- -------- ---------------------
T-Shirt Brand1 Color Pearl, Black widow
T-Shirt Brand1 Size M, L, XL
T-Shirt Brand2 Custom 15% Discount
T-Shirt Brand2 Shape Slim fit, Regular fit
T-Shirt Brand2 Provider G-Star, Armani
But for me, one of the best of this schema is that you can filter by Attibutes:
Give me all Clothes that has the attribute: Size
SELECT c.ClothesName, b.BrandName, a.AttrName, bca.AttrValue
FROM #BCAttributes bca
INNER JOIN #BrandsClothes bc
ON bc.BrandId = bca.BrandId
AND bc.ClothesId = bca.ClothesId
INNER JOIN #Brands b
ON b.BrandId = bc.BrandId
INNER JOIN #Clothes c
ON c.ClothesId = bc.ClothesId
INNER JOIN #Attributes a
ON a.AttrId = bca.AttrId
WHERE bca.AttrId = 2
ORDER BY bca.ClothesId, bca.BrandId, bca.AttrId;
ClothesName BrandName AttrName AttrValue
----------- --------- -------- ----------
Pants Brand1 Size 32, 33, 34
T-Shirt Brand1 Size M, L, XL
Using a multi-table schema whatever of the previous queries will require to deal with an unlimited number of tables, or with XML or JSON fields.
Another option with this schema, is that you can define templates, for example, you could add a new table BrandAttrTemplates. Every time you add a new record you can use a trigger or a SP to generate a set of a predefined attributes for this Branch.
I'm sorry, I'd like to extend my explanations by I think it is more clear than my English.
Update
My current answer should works on no matter which RDBMS. According to your comments, if you need to filter attributes values I'd suggest small changes.
As far as MS-Sql doesn't allow arrays, I've set up a new sample mantaining same table schema, but changing AttrValue to an ARRAY field type.
In fact, using POSTGRES, you can take advantatge of this array using a GIN index.
(Let me say that @EvanCarrol has a good knowledge about Postgres, certainly better than me. But let me add my bit.)
CREATE TABLE BCAttributes
(
BrandId int NOT NULL REFERENCES Brands(BrandId),
ClothesId int NOT NULL REFERENCES Clothes(ClothesId),
AttrId int NOT NULL REFERENCES Attrib(AttrId),
AttrValue text[],
PRIMARY KEY (BrandId, ClothesId, AttrId)
);
CREATE INDEX ix_attributes on BCAttributes(ClothesId, BrandId, AttrId);
CREATE INDEX ix_gin_attributes on BCAttributes using GIN (AttrValue);
INSERT INTO BCAttributes VALUES
(1, 1, 1, '{Blue, Red, White}'),
(1, 1, 2, '{32, 33, 34}'),
(1, 2, 1, '{Pearl, Black widow}'),
(1, 2, 2, '{M, L, XL}'),
(2, 1, 4, '{Levis, G-Star, Armani}'),
(2, 1, 3, '{Slim fit, Regular fit, Custom fit}'),
(2, 2, 4, '{G-Star, Armani}'),
(2, 2, 3, '{Slim fit, Regular fit}'),
(2, 2, 0, '{15% Discount}');
Now, you can additionally query using individual attributes values like:
Give me a list of all pants Size:33
AttribId = 2 AND ARRAY['33'] && bca.AttrValue
SELECT c.ClothesName, b.BrandName, a.AttrName, array_to_string(bca.AttrValue, ', ')
FROM BCAttributes bca
INNER JOIN BrandsClothes bc
ON bc.BrandId = bca.BrandId
AND bc.ClothesId = bca.ClothesId
INNER JOIN Brands b
ON b.BrandId = bc.BrandId
INNER JOIN Clothes c
ON c.ClothesId = bc.ClothesId
INNER JOIN Attrib a
ON a.AttrId = bca.AttrId
WHERE bca.AttrId = 2
AND ARRAY['33'] && bca.AttrValue
ORDER BY bca.ClothesId, bca.BrandId, bca.AttrId;
This is the result:
clothes name | brand name | attribute | values
------------- ------------ ---------- ----------------
Pants Brand1 Size 32, 33, 34
Best Answer
You may want to try the retrieval (re trie val) tree, as known as the TRIE. Some refer to this as a Radix Tree.
The idea is to create tree node structures that contain branches for every character your data field can possibly contain.
Let's use a simple case, a numeric field. Obviously, the character range is 0-9. Each tree node would contain ten(10) branches. Let's take the worst case for a 4 byte unsigned integer, 2^32 - 1, which is 4294967295. What is its length? Just compute the length by take the integer of the log base 10 of 4294967295 and adding 1.
So, you would have a TRIE with a maximum height of 10. Starting at the root of the TRIE, if you have the number 4294967295, you traverse branches 4,2,9,4,9,6,7,2,9, and 5. At each branch, you would perform an array-styled binary search.
If the branch located at that TRIE node is an exact match, you can assign a percentage to that level, and recursively walk down that branch to check for the next digit and return percentages from deeper TRIE nodes to add to the percentage you have at the currently searched TRIE node.
If the branch located at that TRIE node is NOT an exact match, you stop your recursive search there and return either 0 or some other percentage you may want to designate.
Given the sum of return values from all searched TRIE nodes, you may want to sum up the percentages and divide that answer by the length of the string. In other words,
Pct per Node = (1 / (Number of TRIE nodes that need to be searched)) or Zero(0).
Sum(Pct) = (Number of TRIE nodes exactly matched) / (Number of TRIE nodes that need to be searched [length of the string being searched]).
Given the length of the numberic field you store, you have O(log n) due to field length. For each TRIE node, you have O(log n) for searching for the proper branch. Overall, your search should have O(log (log n)) search time.
This performance stands out ever more if the field is alphanumeric. Assuming using only ASCII, each TRIE node would have 256 branches. The height of the TRIE would depend on the length of the character field. Representing this TRIE for variable-length strings would produce TRIE nodes that would be very sparse, but quickly searchable nonetheless.
Regardless what database you use, carefuly plan the data types you will be using to represent the TRIE node. You may also want to partition the table so that strings of length n terminate in partition n. Thus, you will have O(log n) search time at each partition.
http://en.wikipedia.org/wiki/Trie
http://www.eecs.harvard.edu/~ellard/Q-97/HTML/root/node24.html
http://www.webreference.com/js/tips/000318.html
http://en.wikipedia.org/wiki/Radix_tree