In a previous question How to merge data sets without including redundant rows? I asked about filtering redundant historical data during import, but @DavidSpillett correctly replied that I couldn't do what I was trying to do.
Instead of filtering the table during import, I now want to create a view on the table that returns only records where the price has changed.
Here's the original scenario rephrased to suite this question:
We have a table of historical prices for items. The table contains rows where the same price is recorded for multiple dates. I want to create a view on this data which only shows price changes over time, so if a price changes from A to B I want to see it, but if it "changes" from B to B then I don't want to see it.
Example: if the price yesterday was $1, and the price today is $1, and there were no other price changes, then the price today can be inferred from the price yesterday so I only need the record from yesterday.
Example (http://sqlfiddle.com/#!3/c95ff/1):
Table data:
Effective Product Kind Price
2013-04-23T00:23:00 1234 1 1.00
2013-04-24T00:24:00 1234 1 1.00 -- redundant, implied by record 1
2013-04-25T00:25:00 1234 1 1.50
2013-04-26T00:26:00 1234 1 2.00
2013-04-27T00:27:00 1234 1 2.00 -- redundant, implied by record 4
2013-04-28T00:28:00 1234 1 1.00 -- not redundant, price changed back to 1.00
Expected view data:
Effective Product Kind Price
2013-04-23T00:23:00 1234 1 1.00
2013-04-25T00:25:00 1234 1 1.50
2013-04-26T00:26:00 1234 1 2.00
2013-04-28T00:28:00 1234 1 1.00
My initial attempt used ROW_NUMBER:
SELECT
Effective,
Product,
Kind,
Price
FROM
(
SELECT
History.*,
ROW_NUMBER() OVER
(
PARTITION BY
Product,
Kind,
Price
ORDER BY
Effective ASC
) AS RowNumber
FROM History
) H
WHERE RowNumber = 1
ORDER BY Effective
Which returned:
Effective Product Kind Price
2013-04-23T00:23:00 1234 1 1.00
-- not 2013-04-24, good
2013-04-25T00:25:00 1234 1 1.50
2013-04-26T00:26:00 1234 1 2.00
-- not 2013-04-27, good
-- not 2013-04-28, bad
I tried searching for a similar question/answer but it's hard to work out how to phrase the search, an example is worth a lot of words.
Any suggestions appreciated. Thanks
Best Answer
SQLfiddle
This is a good execution plan for the small number of rows given in the question. For a larger table, the ideal index for this query is:
That index is essentially the clustered index keys in a more helpful order. Depending on how the table is used for other queries, it might be better to replace the clustered index instead of creating this new index.
Do not use batch estimated cost percentages to compare different queries. This is not a valid comparison in general. The costs are always optimizer estimates, and not intended to be used this way. Check actual performance metrics (elapsed time, I/Os, CPU usage, memory usage) by all means, but do not put your faith in the percentages.
Whether this query or the one based on
ROW_NUMBER
is more efficient depends on the distribution of the data, and other factors. Each has their advantages and disadvantages. This query has advantages when there are many rows for each (product, kind) combination.