Why not just use distinct on
like this:
INSERT INTO entries(username, date, time, x0, x1, x2, x3, x4, x5, x6)
SELECT DISTINCT ON (username, date, time, x0, x1, x2) username, date, time, x0, x1, x2, x3, x4, x5, x6
FROM mytesttable;
?
But beware that this will retain the first row for every distinct combination -- which might not be the right thing without an order by
clause this. But since you don't order in your question I presume you either don't care or haven't thought that far yet.
Your product table should have one row per UPC (assuming that uniquely identifies a product). It shouldn't reference prices or retailers.
Your retailer table should have one row per store. It shouldn't reference products or prices.
Product
product_id pk integer,
upc varchar(20) unique
Retailer
retailer_id pk integer,
name varchar(128) unique
You can then have an intersection table that tells you which retailer sells which products
Retailer_Product
retailer_id integer references retailer( retailer_id )
product_id integer references product( product_id )
pk( retailer_id, product_id )
If you want to be able to track that a particular retailer used to sell a particular product but no longer does so, you could add dates to indicate when they started and stopped selling the product. If you want to track that and allow the retailer to stop selling the product for a while and then start selling it again, you'd need to tweak the primary key.
Normally, for pricing, you'd store something like
Product_Pricing
product_pricing_id integer pk,
retailer_id integer references retailer(retailer_id),
product_id integer references product(product_id),
price integer, -- I'm following your example, I wouldn't store prices as integers
from_ts timestamp,
to_ts timestamp
This lets me indicate that a particular retailer offered a particular product for sale at a particular price for the period between the two timestamps. If the product is sold at the same price for months at a time, I only need one row in the table for that. If I want to change the price of the product on a minute-by-minute basis, I can do that as well.
If you are capturing prices rather than controlling price changes, it might make sense to write 1 row to the pricing table for every product for every retailer for every day (or whatever frequency you are polling for pricing on). If you were building a price comparison site, you'll likely want a different data model than if you are building an Amazon site that allows different retailers to sell goods and manage their own pricing. If you're intending to poll the actual source of truth once a day and want to have a row for every product for every retailer for every day, something like this would be reasonable.
Product_Pricing
product_pricing_id integer pk,
poll_dt date,
retailer_id integer references retailer(retailer_id),
product_id integer references product(product_id),
price integer, -- I'm following your example, I wouldn't store prices as integers
in_stock boolean
Depending on how frequently prices change or go out of stock, how many products and retailers you have, and how long you intend to retain this information, you might want to combine multiple otherwise identical rows
Product_Pricing
product_pricing_id integer pk,
first_poll_dt date,
last_poll_dt date,
retailer_id integer references retailer(retailer_id),
product_id integer references product(product_id),
price integer, -- I'm following your example, I wouldn't store prices as integers
in_stock boolean,
Best Answer
It means with the same statement. You can truncate more than one tables:
More details in Postgres docs:
TRUNCATE
.