SQL Server – Sum by Implicit Group Using Window Functions

order-bysql serverwindow functions

I'm sure I'm missing an obvious solution on this, but I'm trying to summarize values for groups that are defined by an explicit group number and an implicit ordering. I'm sure that doesn't make this much clearer, so let's say I have this example source heap table:

GroupID     Value
----------- -----------
1           5
1           5
1           3
2           4
2           1
1           4
2           3
2           5
2           2
1           1

I would like a query that provides me with the following results:

GroupID     Values
----------- -----------
1           13
2           5
1           4
2           10
1           1

The implied ordering is the challenge that I've just not found a way to get around… yet. Any help would be appreciated.

I was hoping I could create a deterministic row ordering with a query similar to the following:

SELECT    * 
        , ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS RowNum  
FROM Table WITH(TABLOCK)
OPTION (MAXDOP 1)

I am hoping this forces an allocation order scan which would then give me a deterministic row order.

Sadly, I'm stuck with the data as is. I've got no other indicators here such as date, etc. that would provide any set order. My hope was the trick outlined above would suffice, but I'm not entirely certain it will.

EDIT: Just to close this up as I'm know there were questions around why I was asking this, I had a series of heap tables, named by month/year, that contained line-item amounts that the business wanted to be summed up by day (which they correlated to the implied groups in my question). Because it doesn't look feasible to do this effectively, we've settled on aggregating at the month (e.g. table) level, so this post helped me justify the alteration to the business requirements. Thanks for everyone's input!

Best Answer

The "implicit" group you mention seems to be based on the row order. Unlike a spreadsheet or text file, a relational table is logically an unordered set of rows regardless of whether it is stored as a heap or has a clustered index. It will not be possible to write a query to provide the desired results unless you have another column to facilitate the grouping.

I was hoping I could create a deterministic row ordering with a query similar to the following:

SELECT      * 
        , ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS RowNum  
FROM Table WITH(TABLOCK)
OPTION (MAXDOP 1)

Sorry to be the bearer of bad news, but ordering by a literal will not guarantee deterministic ordering (and even if it did, you'd need an ORDER BY clause there). If it appears to do so, it is only by happenstance. Any incremental column will work, though.

I am hoping this forces an allocation order scan which would then give me a deterministic row order.

An allocation-ordered scan is no more deterministic than any other implementation; you're just (unsafely) relying on a different observed behaviour.