Sql-server – Calculating Running Average using Over clause in SQL server

aggregatesql serverwindow functions

My table looks like this:

CREATE TABLE [dbo].[TimeSeries](
[LOCID] [int] NOT NULL,
[Date] DateTime,
[YEAR] int,
[MONTH] tinyint, 
[RTT] [int]
PRIMARY KEY ( [LOCID] ASC,[Date] ASC)
 )

It contains 12 rows(or MONTHS) per LOCID, per YEAR.

I have the following query:

SELECT 
    LOCID,
    [Year],
    [Date],
    AVG(cast(RTT as float)) OVER (
         PARTITION BY LOCID ORDER BY Date 
             ROWS BETWEEN CURRENT ROW AND 4 FOLLOWING ) AS avgRTT
FROM TimeSeries

example data:

LOCID   Date    Year    Month   RTT
1   01/01/1982  1982    1   58
1   01/02/1982  1982    2   63
1   01/03/1982  1982    3   34
1   01/04/1982  1982    4   27
1   01/05/1982  1982    5   6
1   01/06/1982  1982    6   4
1   01/07/1982  1982    7   3
1   01/08/1982  1982    8   14
1   01/09/1982  1982    9   22
1   01/10/1982  1982    10  16
1   01/11/1982  1982    11  17
1   01/12/1982  1982    12  44
1   01/01/1983  1983    1   58
1   01/02/1983  1983    2   63
1   01/03/1983  1983    3   33
1   01/04/1983  1983    4   27
1   01/05/1983  1983    5   9
1   01/06/1983  1983    6   0
1   01/07/1983  1983    7   3
1   01/08/1983  1983    8   0
1   01/09/1983  1983    9   6
1   01/10/1983  1983    10  27
1   01/11/1983  1983    11  11
1   01/12/1983  1983    12  48

Now I am trying to execute the above query for a specific MONTH in every year only, calculating the average for the 5 following dates/months/rows. But a WHERE clause to specify the MONTH value will not work because that affects which numbers are averaged.

I need to do this for tens of thousands of LOCID's and millions of rows.

My question is if there is a way to execute the correct moving average for only a specific value for 'MONTH', thus hopefully reducing the processing time a number of times…

example desired output:

LOCID   Date    Year    Month   avgRTT
1   29952   1982    1   37.6
1   30317   1983    1   38

Any pointers will be greatly appreciated.

Best Answer

You can apply your final filter on @dekade after computing the running averages.

In order to reduce the number of rows that need to be processed for the running averages, you can apply an earlier filter on [dekade] IN (@dekade, (@dekade+1)%36, (@dekade+2)%36) to ensure that you are processing the minimal amount of rows but still including all the rows that are necessary to including the following 11 rows in the running average. (The only reason for the % 36 is to handle values at @dekade that fall at the end of the year.)

This will still result in a table scan given your current table structure, but at least the rows can be filtered out earlier on in the query plan.

DECLARE @dekade TINYINT = 1
SELECT *
FROM (
    SELECT 
        LOCID,
        [Year],
        [Date],
        [dekade],
        AVG(cast(RTT as float)) OVER 
            (PARTITION BY LOCID ORDER BY Date
                ROWS BETWEEN CURRENT ROW AND 11 FOLLOWING) AS avgRTT
    FROM TimeSeries
    -- If you want to limit the rows that are use when computing the running average,
    -- you can make sure that only the desired @dekade plus the following two @dekades
    -- (which may be needed to get the following 11 rows) are used for each year
    WHERE [dekade] (@dekade, (@dekade+1)%36, (@dekade+2)%36)
) x
-- Filter your results be @dekade after computing the running average
WHERE x.[dekade] = @dekade

It would be helpful to post some sample data so that we can actually run the query though.

Related Solutions

Sql-server – Searching from the latest date in SQL Server

For starters, don't use @DealClauseString - just have your query.

Also, remove ORDER BY D ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING - when you're using the whole partition, there's no reason to have the ORDER BY clause at all.

Anyway - for your particular question, if I understand it correctly, you want it to be like your #t table only has records that are at least a month old. So you should put a WHERE clause in the filter CTE. So it becomes something like:

FROM #t
WHERE D <= DATEADD(MONTH,-1,SYSDATETIME())

Postgresql – Postgres – Querying Min, Max and getting associated columns data specific to those result of those rows

I would recommend you do something like this - and don't go down the route of having a row per month.

What happens when you have 5 (10...15... x) years' data?

This is superior IMHO. You can always use the CROSSTAB table function, but I would recommend against it.

SELECT
  esite,
  MIN(evalue), 
  MAX(evalue),
  AVG(evalue)::INT,
  EXTRACT(MONTH FROM edate) AS emonth,
  CASE
    WHEN EXTRACT(MONTH FROM edate) = 1 THEN 'January'
    WHEN EXTRACT(MONTH FROM edate) = 2 THEN 'February'
    -- <... fill in rest of months here ...>
  END AS litMonth
FROM events
GROUP BY esite, emonth, litMonth
ORDER BY esite, emonth -- , min, max, avg;

This gives the result:

esite;     min;   max;   avg;    emonth;    litmonth
----------------------------------------------------
    1;       7;    66;    24;     1;         January
    1;      15;    61;    37;     2;        February
    2;      40;    90;    61;     1;         January
    2;      23;    70;    50;     2;        February

Note, that I have CAST the AVG as an INTEGER - you can of course ROUND this off to whatever you like. You can add EXTRACT(YEAR FROM edate) AS eyear, after the EXTRACT(MONTH... for more clarity and a more elegant result.

Best Answer

Related Solutions

Sql-server – Searching from the latest date in SQL Server

Postgresql – Postgres – Querying Min, Max and getting associated columns data specific to those result of those rows

Related Question