My table looks like this:
CREATE TABLE [dbo].[TimeSeries](
[LOCID] [int] NOT NULL,
[Date] DateTime,
[YEAR] int,
[MONTH] tinyint,
[RTT] [int]
PRIMARY KEY ( [LOCID] ASC,[Date] ASC)
)
It contains 12 rows(or MONTHS) per LOCID, per YEAR.
I have the following query:
SELECT
LOCID,
[Year],
[Date],
AVG(cast(RTT as float)) OVER (
PARTITION BY LOCID ORDER BY Date
ROWS BETWEEN CURRENT ROW AND 4 FOLLOWING ) AS avgRTT
FROM TimeSeries
example data:
LOCID Date Year Month RTT
1 01/01/1982 1982 1 58
1 01/02/1982 1982 2 63
1 01/03/1982 1982 3 34
1 01/04/1982 1982 4 27
1 01/05/1982 1982 5 6
1 01/06/1982 1982 6 4
1 01/07/1982 1982 7 3
1 01/08/1982 1982 8 14
1 01/09/1982 1982 9 22
1 01/10/1982 1982 10 16
1 01/11/1982 1982 11 17
1 01/12/1982 1982 12 44
1 01/01/1983 1983 1 58
1 01/02/1983 1983 2 63
1 01/03/1983 1983 3 33
1 01/04/1983 1983 4 27
1 01/05/1983 1983 5 9
1 01/06/1983 1983 6 0
1 01/07/1983 1983 7 3
1 01/08/1983 1983 8 0
1 01/09/1983 1983 9 6
1 01/10/1983 1983 10 27
1 01/11/1983 1983 11 11
1 01/12/1983 1983 12 48
Now I am trying to execute the above query for a specific MONTH in every year only, calculating the average for the 5 following dates/months/rows. But a WHERE clause to specify the MONTH value will not work because that affects which numbers are averaged.
I need to do this for tens of thousands of LOCID's and millions of rows.
My question is if there is a way to execute the correct moving average for only a specific value for 'MONTH', thus hopefully reducing the processing time a number of times…
example desired output:
LOCID Date Year Month avgRTT
1 29952 1982 1 37.6
1 30317 1983 1 38
Any pointers will be greatly appreciated.
Best Answer
You can apply your final filter on
@dekade
after computing the running averages.In order to reduce the number of rows that need to be processed for the running averages, you can apply an earlier filter on
[dekade] IN (@dekade, (@dekade+1)%36, (@dekade+2)%36)
to ensure that you are processing the minimal amount of rows but still including all the rows that are necessary to including the following 11 rows in the running average. (The only reason for the% 36
is to handle values at@dekade
that fall at the end of the year.)This will still result in a table scan given your current table structure, but at least the rows can be filtered out earlier on in the query plan.
It would be helpful to post some sample data so that we can actually run the query though.