Let's say I am collecting information on products over time from different retailers. I would want to query the product's pricing over time from all the retailers as well as the retailer's price for the individual product over time, among other things. How would the design below stand up to the task?
(Postgres)
Product
-------
id = PK(int)
upc = VarChar(20, unique)
prices = FK(Price(int))
retailers = M2M(Retailer(int))
Retailer
--------
id = PK(int)
name = VarChar(124, unique)
products = M2M(Product(int))
prices = FK(Price(int))
Price
-----
id = PK(int)
timestamp = DateTime()
product = FK(Product(int))
retailer = FK(Retailer(int))
product_price = Decimal()
shipping_price = Decimal()
I am a little concerned with what seems like circular relationships, for example: do I need the prices relationship to also be present in the Retailer table? But it makes sense in terms of the queries I envision. Any suggestions? Also, how would you go about tracking whether it was in stock seeing as that varies depending on retailer?
Best Answer
Your product table should have one row per UPC (assuming that uniquely identifies a product). It shouldn't reference prices or retailers.
Your retailer table should have one row per store. It shouldn't reference products or prices.
You can then have an intersection table that tells you which retailer sells which products
If you want to be able to track that a particular retailer used to sell a particular product but no longer does so, you could add dates to indicate when they started and stopped selling the product. If you want to track that and allow the retailer to stop selling the product for a while and then start selling it again, you'd need to tweak the primary key.
Normally, for pricing, you'd store something like
This lets me indicate that a particular retailer offered a particular product for sale at a particular price for the period between the two timestamps. If the product is sold at the same price for months at a time, I only need one row in the table for that. If I want to change the price of the product on a minute-by-minute basis, I can do that as well.
If you are capturing prices rather than controlling price changes, it might make sense to write 1 row to the pricing table for every product for every retailer for every day (or whatever frequency you are polling for pricing on). If you were building a price comparison site, you'll likely want a different data model than if you are building an Amazon site that allows different retailers to sell goods and manage their own pricing. If you're intending to poll the actual source of truth once a day and want to have a row for every product for every retailer for every day, something like this would be reasonable.
Depending on how frequently prices change or go out of stock, how many products and retailers you have, and how long you intend to retain this information, you might want to combine multiple otherwise identical rows