Postgresql – Storing data in PostgreSQL: One table or two

postgresqlpsqlrdbms

I've just started using PostgreSQL 9.2 and my data consists of product prices at various points in time, usually a different price every month.

Question: Because every product can have different prices over time, I am thinking of storing all the data in a single table, each row having full details of the product, its price at that time, and the time.

id  | name           | color  | manufacturer | date      | price
123   Socks 123        black    nikki          12-12-2012   123.70
124   Socks 123        black    nikki          12-01-2013   145.67
125   Graphic Tee xXx  red      departed       13-01-2013   250.80

My reason for storing everything in one table and have duplicate data in certain columns like name, color, manufacturer is to facilitate doing queries like fetching all products' prices in a particular month (can return 10k rows) will avoid costly table joins if I have a table with the product details and a foreign key in another table with just the time-cost data.

Is this a good reason for doing what I've described? Will using hstore make it easier to do such queries? Or did I get it all wrong?

Thanks!!

Best Answer

If you mix the history data in with the current like that in order to speed up queries over a time period, you do so at the expense of slowing down queries for current data. You can add an extra column to explicitly mark the relevant rows as the current prices (and have id+flag as the PK) but that adds extra work to your business logic to both keep it maintained and respect it in all reports.

Of course if you move the price completely out of the main table you have a similar problem in finding the latest price becoming more expensive unless you have a "latest" flag or do-normalise slightly and keep a copy of the current price in the main table as well as the price history table. Personally I would do the latter, and use a trigger on that product table to automatically update the price history table when a new product is added or the price updated (I'm assuming the products tables does no see massive write activity most of the time so the performance impact of using a trigger here should be minimal) in order to remove that auditing task from your other logic to avoid bugs caused by new code turning up that forgets to update the history/audit.

Caveat: I'm answering this from a general PoV as I've not used postgres much in anger, so do some benchmarks before taking anything I've said regarding performance as fact in that system!