I have a rather large MySQL db that is tracking pricing changes for a large number of products. Each entry has a SKU a datetime and a price. Sometimes, there is no change in the price from one time period to the next.
I am trying to figure out a way to more efficiently store this data. I would like to delete any instances where the price didn't change. For example, if a product was $10 on march 10, 13, and 16th, then it went to $12 on the 19th, then we have no need to store those duplicate prices on the 13th and 16th. We can just delete those entries, and when presenting data, we will assume no entry for those dates means no change in price.
I am thinking the best way to do this is to run a script to clean up the table and remove these duplicates every day or so. Would there be any way to create a mySQL query that can do this? Otherwise, I guess I'll just need to write a script that will cycle through the whole table and do it with multiple queries.
edit: more detail
Here is my table:
+—-+—–+—————-+——-+
| id | sku | date | price |
+—-+—–+—————-+——-+
id is the auto-increment primary key.
Best Answer
For this table:
You can run the following query to search for unchanged prices on consecutive records for the same sku:
Those are the records to be deleted. So:
If you have duplicated records (same sku, price and date), it deletes the ones with higher id. This assumes that a record with a higher id will always have the same or higher date (it normally happens due to the auto_increment with no updates).
Please note that this is a correlated subquery, with an efficiency of
O(n^2)
row operations, where n is the number of records. You can solve this inO(n)
programmatically if you can affordO(n)
memory in a single table scan.EDIT: Changed query for the structure proposed, handling duplicated values.