How to normalize a table that is indirectly dependent on a has-many relationship

normalization

I currently have the following database design:

Product: contains various description attributes
- Belongs to many ProductSets
ProductSet:
- Has many Products
- Belongs to many Stores
Store:
- Has one ProductSet
- Must, through some set of relationships, specify the price for each product in its product set

I'm specifically struggling with the best way to incorporate the final bullet in a normalized way. So far, the only solution I've come up with is to have a join table between Store and Product, but this doesn't guarantee that the store has prices for all of (and only) the products in its ProductSet. What (if any) is the best way to accomplish this?

Best Answer

You can create a table ProductPrice:

CREATE TABLE ProductPrice (
  ProductID int NOT NULL,
  ProductSetID int NOT NULL,
  StoreID int NOT NULL,
  Price numeric(10,2) NOT NULL,
  PRIMARY KEY (ProductID, ProductSetID, StoreID),
  FOREIGN KEY (ProductID, ProductSetID) REFERENCES ProductProductSet,
  FOREIGN KEY (StoreID, ProductSetID) REFERENCES Store
)

ProductProductSet beging the many-to-many relationship between Product and ProductSet.

These two foreign keys would ensure that the price refers to a store that is intended to have such product. Then you can create a query to find out which prices are missing:

SELECT s.*, p.*
FROM Product p
JOIN ProductProductSet pps ON (pps.ProductID = p.ProductID)
JOIN Store s ON (s.ProductSetID = pps.ProductSetID)
LEFT JOIN ProductPrice pp ON (pp.ProductID = p.ProductID AND pp.StoreID = s.StoreID)
WHERE pp.ProductID IS NULL

Related Solutions

Loading multiValued fields into Solr via flat file, and possibly value position preservation in those fields

Getting multiValued data into Solr via CSV:

The solr documentation describes a "split" function in UpdateCSV. Essentially, it parses a field value using a second CSV parser. See Solr - UpdateCSV - split. The parameters look like so (adjust field name, separator, and encapsulator as necessary):

f.fieldA.split=true&f.fieldA.separator=%2C&f.fieldA.encapsulator='

Getting multiValued data from separate fields to maintain position:

Since asking this question, I've done some reading about dimensional models. It seems that what I was trying to do is poor design, because it places too many expectations on the application, to much complexity in the warehouse, or both.

When trying to preserve the relationships between two field values on a single record, it's better to store them separately as well as together. Here's a comparison of my former input to the new input:

Former CSV input:

name|licenseState|licenseType
Josh|MA,CA|123,456
Fred|MD,OH|789,123

Transformed CSV input:

name|licenseState|licenseType|licenseStateType
Josh|MA,CA|123,456|MA123,CA456
Fred|MD,OH|789,123|MD789,OH123

This way your application can use the licenseState and licenseType dimension values independently, or it can use the licenseStateType dimension values, all without requiring complicated app or warehouse logic.

Mysql – Is normalization advisable on transactional data

You can definitely keep all your dimensions and measures in one fact table and not use any dimension tables. Make sure your OLAP tool supports this though.

Normalizing out your dimensions into other tables is done mostly to minimize the size of the fact table, which can get large fast.

With no dimension tables you're looking at about 336 MB per year (not counting indexes), which isn't so bad.

With dimension tables, you're looking at about 34 MB per year, plus a couple dozen MB for storing dimension details. Indexes will be smaller too.

You'll want to expand your date column into something more analyzable (year, month, quarter, etc), which will add to the size.

You'll want to index all fields. Drop indexes before insert, add them after.

You can use a tool like Pentaho Aggregation Designer to find useful aggregates and generate them for you.

Best Answer

Related Solutions

Loading multiValued fields into Solr via flat file, and possibly value position preservation in those fields

Mysql – Is normalization advisable on transactional data

Related Question