SQL Server Window Function – How ORDER BY Affects MAX() OVER() Result

querysql serversql-server-2016t-sqlwindow functions

I have a table with below structure and it's data :

create table test_table
(
Item_index   int,
Item_name    varchar(50)
)

insert into test_table (Item_index,Item_name) values (0,'A')
insert into test_table (Item_index,Item_name) values (1,'B')
insert into test_table (Item_index,Item_name) values (0,'C')
insert into test_table (Item_index,Item_name) values (1,'D')
insert into test_table (Item_index,Item_name) values (0,'E')

I want to know why changing the column in order by section of the query , changes the result? In QUERY-1 , I used item_index and in the QUERY-2 I used item_name column in the order by section. I thought that both queries must generate the same result because I used item_index in both queries for partitioning! I'm completely confused now ! why the order by column should affect the final result?

QUERY-1:

select t.*,
       max(t.Item_name)over(partition by t.item_index order by item_index) new_column
from test_table t;

RESULT:

Item_index  Item_name     new_column
----------- --------------------------
0           A                E
0           C                E
0           E                E
1           D                D
1           B                D

QUERY-2:

select t.*,
       max(t.Item_name)over(partition by t.item_index order by item_name) new_column
from test_table t;

RESULT:

Item_index  Item_name  new_column
----------- -----------------------
0           A             A
0           C             C
0           E             E
1           B             B
1           D             D

Can anybody explain how exactly these two queries are being executed and why each of them generates different result?

Thanks in advance

Best Answer

max(t.Item_name)over(partition by t.item_index order by item_index) new_column

Let's take a group where t.item_index = 0. It is

Item_index Item_name
0 A
0 C
0 E

When order by item_index is applied then all rows have the same value, hence all of them are included into the frame, and all rows values are used for MAX() selection. So the value 'E' is returned for all rows.


max(t.Item_name)over(partition by t.item_index order by item_name)

Let's take the same group.

Item_index Item_name
0 A
0 C
0 E

Now the sorting key differs, and when the window RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW is applied then different rows are included into the frame to be investigated.

For 1st row only this row is included into the frame, and 'A' is the only value in the frame, so it is returned.

For 2nd row first 2 rows are included into the frame, the values 'A' and 'C' are compared, and 'C' is returned as maximal value in the frame.

For 3rd row all 3 rows are included into the frame, the values 'A', 'C' and 'E' are compared, and 'E' is returned as maximal value in the frame.