I'm attempting to select data from a table that's already been made containing records of file size as taken every twelve hours. The table looks something like this:
surrogate_id | partition_id | data_wanted | collection_ts
1 | 1 | 100 | 2015-09-30 17:00:00
2 | 1 | 100 | 2015-09-30 17:00:00
3 | 1 | 150 | 2015-09-30 17:00:00
4 | 2 | 120 | 2015-09-30 17:00:00
5 | 2 | 130 | 2015-09-30 17:00:00
6 | 3 | 100 | 2015-09-30 17:00:00
7 | 3 | 100 | 2015-09-30 17:00:00
8 | 3 | 100 | 2015-09-30 17:00:00
9 | 3 | 100 | 2015-09-30 17:00:00
10 | 3 | 100 | 2015-09-30 17:00:00
11 | 3 | 100 | 2015-09-30 17:00:00
1 | 1 | 100 | 2015-09-30 18:00:00
2 | 1 | 100 | 2015-09-30 18:00:00
3 | 1 | 150 | 2015-09-30 18:00:00
4 | 2 | 120 | 2015-09-30 18:00:00
5 | 2 | 130 | 2015-09-30 18:00:00
6 | 3 | 100 | 2015-09-30 18:00:00
7 | 3 | 100 | 2015-09-30 18:00:00
8 | 3 | 100 | 2015-09-30 18:00:00
9 | 3 | 100 | 2015-09-30 18:00:00
10 | 3 | 100 | 2015-09-30 18:00:00
11 | 3 | 100 | 2015-09-30 18:00:00
Primary key here would be (surrogate_id, collection_ts).
What I need to do is select partitions that do not have uniform values in the data_wanted
column based on the most recent timestamp
. So in the above example I'd want to pick partitions 1 and 2 out. I understand that in order to get the proper timestamp selected a window function is appropriate, which I've used to generate other reports fine. This is the CTE I'm working with that implements a window function:
;WITH cte
AS
(
SELECT *, ROW_NUMBER() OVER(PARTITION BY surrogate_id
ORDER BY collection_ts DESC) RowNumber
FROM example_table
)
I've been attempting to do things with averages but I can't seem to find a way to do it in particular that works out well. It's apparent to me at least that if a partition's given data is different from the average value of that partition in that timestamp then we need to select that partition. I just can't figure out how to individually compare each piece of data in a partition to just the average of that one partition and still allow myself to return the partition_id while also using the CTE I defined.
Best Answer
You're on your way and understand CTEs and Window Functions so here's an option:
CTE AvgSize establishes the average for the partition ID.
CTE LatestValue is your provided CTE restricted to necessary columns to establish the latest row.
The final step is to bring them all together. The CTEs restrict the query to the latest row and provide the partition average. The only thing left to do is compare the latest row partition size to the average partition size and you're left with your result set.
You could get a little more advanced with your variance check with something like: