Sql-server – Calculating Running Average using Over clause in SQL server

aggregatesql serverwindow functions

My table looks like this:

CREATE TABLE [dbo].[TimeSeries](
[LOCID] [int] NOT NULL,
[Date] DateTime,
[YEAR] int,
[MONTH] tinyint, 
[RTT] [int]
PRIMARY KEY ( [LOCID] ASC,[Date] ASC)
 )

It contains 12 rows(or MONTHS) per LOCID, per YEAR.

I have the following query:

SELECT 
    LOCID,
    [Year],
    [Date],
    AVG(cast(RTT as float)) OVER (
         PARTITION BY LOCID ORDER BY Date 
             ROWS BETWEEN CURRENT ROW AND 4 FOLLOWING ) AS avgRTT
FROM TimeSeries

example data:

LOCID   Date    Year    Month   RTT
1   01/01/1982  1982    1   58
1   01/02/1982  1982    2   63
1   01/03/1982  1982    3   34
1   01/04/1982  1982    4   27
1   01/05/1982  1982    5   6
1   01/06/1982  1982    6   4
1   01/07/1982  1982    7   3
1   01/08/1982  1982    8   14
1   01/09/1982  1982    9   22
1   01/10/1982  1982    10  16
1   01/11/1982  1982    11  17
1   01/12/1982  1982    12  44
1   01/01/1983  1983    1   58
1   01/02/1983  1983    2   63
1   01/03/1983  1983    3   33
1   01/04/1983  1983    4   27
1   01/05/1983  1983    5   9
1   01/06/1983  1983    6   0
1   01/07/1983  1983    7   3
1   01/08/1983  1983    8   0
1   01/09/1983  1983    9   6
1   01/10/1983  1983    10  27
1   01/11/1983  1983    11  11
1   01/12/1983  1983    12  48

Now I am trying to execute the above query for a specific MONTH in every year only, calculating the average for the 5 following dates/months/rows. But a WHERE clause to specify the MONTH value will not work because that affects which numbers are averaged.

I need to do this for tens of thousands of LOCID's and millions of rows.

My question is if there is a way to execute the correct moving average for only a specific value for 'MONTH', thus hopefully reducing the processing time a number of times…

example desired output:

LOCID   Date    Year    Month   avgRTT
1   29952   1982    1   37.6
1   30317   1983    1   38

Any pointers will be greatly appreciated.

Best Answer

You can apply your final filter on @dekade after computing the running averages.

In order to reduce the number of rows that need to be processed for the running averages, you can apply an earlier filter on [dekade] IN (@dekade, (@dekade+1)%36, (@dekade+2)%36) to ensure that you are processing the minimal amount of rows but still including all the rows that are necessary to including the following 11 rows in the running average. (The only reason for the % 36 is to handle values at @dekade that fall at the end of the year.)

This will still result in a table scan given your current table structure, but at least the rows can be filtered out earlier on in the query plan.

DECLARE @dekade TINYINT = 1
SELECT *
FROM (
    SELECT 
        LOCID,
        [Year],
        [Date],
        [dekade],
        AVG(cast(RTT as float)) OVER 
            (PARTITION BY LOCID ORDER BY Date
                ROWS BETWEEN CURRENT ROW AND 11 FOLLOWING) AS avgRTT
    FROM TimeSeries
    -- If you want to limit the rows that are use when computing the running average,
    -- you can make sure that only the desired @dekade plus the following two @dekades
    -- (which may be needed to get the following 11 rows) are used for each year
    WHERE [dekade] (@dekade, (@dekade+1)%36, (@dekade+2)%36)
) x
-- Filter your results be @dekade after computing the running average
WHERE x.[dekade] = @dekade

It would be helpful to post some sample data so that we can actually run the query though.