SQL Server Window Functions – Understanding Aggregate Window Functions

learningsql serverwindow functions

Consider the following table:

CREATE TABLE T1
(
  keycol INT         NOT NULL CONSTRAINT PK_T1 PRIMARY KEY,
  col1   VARCHAR(10) NOT NULL
);

INSERT INTO T1 VALUES
  (2, 'A'),(3, 'A'),
  (5, 'B'),(7, 'B'),(11, 'B'),
  (13, 'C'),(17, 'C'),(19, 'C'),(23, 'C');

Currently, I am looking into window functions and am trying out aggregate window functions. Although I feel I understand how the windows are defined with the OVER and PARITION clauses, I am unsure how the aggregate window functions are being calculated, such as AVG() OVER ().

I am looking to understand the following three queries.

-- Query 1
SELECT keycol, col1, AVG(keycol) OVER (PARTITION BY col1) AS RowAvg 
FROM T1 
keycol | col1 | RowAvg
-----: | :--- | -----:
     2 | A    |      2
     3 | A    |      2
     5 | B    |      7
     7 | B    |      7
    11 | B    |      7
    13 | C    |     18
    17 | C    |     18
    19 | C    |     18
    23 | C    |     18
-- Query 2
SELECT keycol, col1, AVG(keycol) OVER (ORDER BY keycol) AS RowAvg
FROM T1 
keycol | col1 | RowAvg
-----: | :--- | -----:
     2 | A    |      2
     3 | A    |      2
     5 | B    |      3
     7 | B    |      4
    11 | B    |      5
    13 | C    |      6
    17 | C    |      8
    19 | C    |      9
    23 | C    |     11
-- Query 3
SELECT keycol, col1, AVG(keycol) OVER (PARTITION BY col1 ORDER BY keycol) AS RowAvg
FROM T1 
keycol | col1 | RowAvg
-----: | :--- | -----:
     2 | A    |      2
     3 | A    |      2
     5 | B    |      5
     7 | B    |      6
    11 | B    |      7
    13 | C    |     13
    17 | C    |     15
    19 | C    |     16
    23 | C    |     18

Query 1: I believe RowAvg should be the average of the rows for each col1 level. Are the numbers 2 and 7 the FLOOR of the average or is my understanding incorrect?

Query 2: I am not too sure what is being done to produce RowAvg here. As no PARTITION or framing is used here, I believe the window should be the entire table, is this correct? Also, how is the RowAvg being found?

Query 3: Is this finding the (FLOOR) average for each partition however doing this incrementally? That is, for row 1 of the first partition ('A'), we find the average of that row. Then, for row 2 of the first partition, we find the average of the first 2 rows.

General question: Does introducing ORDER BY into the aggregate window function perform the aggregate function 'consecutively' such as in queries 1 and 2? It is interesting to see that in query 1, AVG is performed to each partition as a whole, whereas in queries 1 and 2 the RowAvg's are almost different for each row.

Best Answer

I suggest you add a sum to understand what is going on:

SELECT col1
     , keycol
     , SUM(keycol)  OVER (PARTITION BY col1) AS mysum
     , AVG(keycol) OVER (PARTITION BY col1) AS RowAvg 
FROM T1
order by col1;

Since you don't have an order by in your window, the aggregate applies to the whole partition:

col1    keycol  mysum   RowAvg
A       2       5       2
A       3       5       2
B       5       23      7
B       7       23      7
B       11      23      7
C       13      72      18
C       17      72      18
C       19      72      18
C       23      72      18

I.e. for partition A, mysum = 2+3 for every row in the partition

If you use an ORDER by clause the aggregate is applied from the beginning to the current row:

SELECT keycol
, col1
, SUM(keycol)  OVER (ORDER BY keycol) AS mysum
, AVG(keycol) OVER (ORDER BY keycol) AS RowAvg
FROM T1
order by col1;

since you don't have a partition, the whole result set is treated as 1 partition:

keycol  col1    mysum   RowAvg
2       A       2       2
3       A       5       2
5       B       10      3
7       B       17      4
11      B       28      5
...

For first row (according to order by) mysum = 2, rowavg = 2/1 , second row mysum = 2+3, rowavg = 5/2 , third row mysum = 2+3+5 rowavg = 10/3 ...

As you can see the sum(...) becomes a cumulative sum

With both a partition and order, the aggregate applies to each partition, but with the behaviour described above:

SELECT keycol
, col1
, SUM(keycol)  OVER (PARTITION BY col1 ORDER BY keycol) AS mysum
, AVG(keycol) OVER (PARTITION BY col1 ORDER BY keycol) AS RowAvg
FROM T1
order by col1;


keycol  col1    mysum   RowAvg
2       A       2       2
3       A       5       2
5       B       5       5
7       B       12      6
...

For A you get mysum 2, 2+3. For B it restarts so it becomes 5, 5+7,

In addition you can override the default behavior which is:

SELECT keycol
, col1
, SUM(keycol)  OVER (PARTITION BY col1 
                     ORDER BY keycol
                     RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as ...
FROM T1
order by col1;

by declaring your own, for example if you want a sliding average over 3 rows:

SELECT keycol
, col1
, AVG(keycol)  OVER (PARTITION BY col1 
                     ORDER BY keycol
                     ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) as ... 

...
13  C   15
17  C   16
19  C   19
23  C   21

The first row gets an average of (13+17)/2 (since there is no preceding row), the second (13+17+19)/3, ... and the fourth row becomes (19+23)/2 (no rows following)

Fiddle with examples