Postgresql – acceptable design

database-designpostgresql

Let's say I am collecting information on products over time from different retailers. I would want to query the product's pricing over time from all the retailers as well as the retailer's price for the individual product over time, among other things. How would the design below stand up to the task?
(Postgres)

Product
-------
id = PK(int)
upc = VarChar(20, unique)
prices = FK(Price(int))
retailers = M2M(Retailer(int))

Retailer
--------
id = PK(int)
name = VarChar(124, unique)
products = M2M(Product(int))
prices = FK(Price(int))

Price
-----
id = PK(int)
timestamp = DateTime()
product = FK(Product(int))
retailer = FK(Retailer(int))
product_price = Decimal()
shipping_price = Decimal()

I am a little concerned with what seems like circular relationships, for example: do I need the prices relationship to also be present in the Retailer table? But it makes sense in terms of the queries I envision. Any suggestions? Also, how would you go about tracking whether it was in stock seeing as that varies depending on retailer?

Best Answer

Your product table should have one row per UPC (assuming that uniquely identifies a product). It shouldn't reference prices or retailers.

Your retailer table should have one row per store. It shouldn't reference products or prices.

Product
  product_id pk integer,
  upc varchar(20) unique

Retailer
  retailer_id pk integer,
  name varchar(128) unique

You can then have an intersection table that tells you which retailer sells which products

Retailer_Product
  retailer_id integer references retailer( retailer_id )
  product_id  integer references product( product_id )
  pk( retailer_id, product_id )

If you want to be able to track that a particular retailer used to sell a particular product but no longer does so, you could add dates to indicate when they started and stopped selling the product. If you want to track that and allow the retailer to stop selling the product for a while and then start selling it again, you'd need to tweak the primary key.

Normally, for pricing, you'd store something like

Product_Pricing
  product_pricing_id integer pk,
  retailer_id integer references retailer(retailer_id),
  product_id  integer references product(product_id),
  price       integer, -- I'm following your example, I wouldn't store prices as integers
  from_ts     timestamp,
  to_ts       timestamp

This lets me indicate that a particular retailer offered a particular product for sale at a particular price for the period between the two timestamps. If the product is sold at the same price for months at a time, I only need one row in the table for that. If I want to change the price of the product on a minute-by-minute basis, I can do that as well.

If you are capturing prices rather than controlling price changes, it might make sense to write 1 row to the pricing table for every product for every retailer for every day (or whatever frequency you are polling for pricing on). If you were building a price comparison site, you'll likely want a different data model than if you are building an Amazon site that allows different retailers to sell goods and manage their own pricing. If you're intending to poll the actual source of truth once a day and want to have a row for every product for every retailer for every day, something like this would be reasonable.

Product_Pricing
  product_pricing_id integer pk,
  poll_dt     date,
  retailer_id integer references retailer(retailer_id),
  product_id  integer references product(product_id),
  price       integer, -- I'm following your example, I wouldn't store prices as integers
  in_stock    boolean

Depending on how frequently prices change or go out of stock, how many products and retailers you have, and how long you intend to retain this information, you might want to combine multiple otherwise identical rows

Product_Pricing
  product_pricing_id integer pk,
  first_poll_dt date,
  last_poll_dt  date,
  retailer_id integer references retailer(retailer_id),
  product_id  integer references product(product_id),
  price       integer, -- I'm following your example, I wouldn't store prices as integers
  in_stock    boolean,