SQL Server 2008 R2 – Partition By Column with Non-Continuous Values

sql serversql-server-2008-r2

How can I Partition By the PreviousScanIndex column when the value isn't continuous from one record to the next (ordered by the RowIndex column), so that I can rank each continuous group of records with the same ID?

E.g. I want to give the group of rows in red the same ID, and then the group of rows in green to have the next ID, and then the group of rows in blue the next ID after that, etc.

Result Set

Best Answer

Answer

Try this column definition:

, rank() over (partition by RowIndex - PreviousScanIndex order by example.RowIndex) as Testing

Explanation

This assumes - as you've stated in your question - that the rowset is ordered by RowIndex.

The problem breaks down to: how can you get a value that is constant from row to row (so we can partition by that constant) unless the difference in PreviousScanIndex row value is more than 1.

After playing around with windowed functions, partition clauses, lead and lag, it became clear it's not so easy to reference a prior row - especially with the error "Windowed functions cannot be used in the context of another windowed function or aggregate."

... but then the answer became quite simple - if you accept a couple of assumptions.

The key assumption that makes this work is that RowIndex will always increment by 1 and PreviousScanIndex will always increment by 1 also, within the partition you are attempting to define. If both values always increment by 1 within a partition, then one value minus the other will be a constant - until either value increments by some number other than 1 which, in this case, is your PreviousScanIndex jumping up by two.

The other assumption is RowIndex always increments by 1.

In fact, your PreviousScanIndex column could jump up by any number other than 1. Say it jumps up by 3, but then after this it continues to increment by 1, per row - then that is fine. Your jump by 3 simply yields a different answer to the equation RowIndex - PreviousScanIndex, but after that, the result is again the same, row by row, so long as both columns increment only by 1, from row to row.

Full query including your test data

Here is the full SQL with your test data:

select example.PreviousScanIndex, example.OperatorPartitionIndex, example.RowIndex

, rank() over (partition by RowIndex - PreviousScanIndex order by example.RowIndex) as Testing

from (values (1, 2, 1),(2, 3, 2),(3,4,3),(4,5,4),(6,7,5),(7,8,6),(8,9,7),(9,10,8),(11,12,9),(12,13,10),(13,14,11),(14,15,12),(16,17,13),(17,18,14),(18,19,15),(19,20,16),(25,26,17),(26,27,18),(28,29,19),(29,30,20)) as example(PreviousScanIndex, OperatorPartitionIndex, RowIndex) order by RowIndex

Postscript:

Extending the problem:

Order by a column where the value increments unevenly from row to row

We can actually extend this solution.

What if your RowIndex was not guaranteed to increment by exactly 1 from row to row?

Put another way - what if you wanted to sort your rows on some column, like OperatorPartitionIndex, that does not have the same incremental value from row to row?

Well actually you can do it. You simply select your source data in a CTE, then, in a second CTE you select all your source data and rank it based on the column of interest. This creates a new ranking column which does increment by 1, and we are back to the original problem, solved above.

The final step, then, is to select from this second CTE and apply the partition as described above.

Below is a full example. In this case the results will look the same as with the original question - because your OperatorPartitionIndex column is in exactly the same sequence as your RowIndex column. The important thing to note is that in the below solution we are sorting rows on OperatorPartitionIndex - and the values in this column are not evenly incremental - but we can still partition the results as required.

with SourceData as (
    select example.PreviousScanIndex as PreviousScanIndex, example.OperatorPartitionIndex as OperatorPartitionIndex, example.RowIndex as RowIndex
from (values (1, 2, 1),(2, 3, 2),(3,4,3),(4,5,4),(6,7,5),(7,8,6),(8,9,7),(9,10,8),(11,12,9),(12,13,10),(13,14,11),(14,15,12),(16,17,13),(17,18,14),(18,19,15),(19,20,16),(25,26,17),(26,27,18),(28,29,19),(29,30,20)) as example(PreviousScanIndex, OperatorPartitionIndex, RowIndex) 
),
ResortedSourceData as (
select 
    SourceData.PreviousScanIndex, SourceData.OperatorPartitionIndex, SourceData.RowIndex
    , rank() over (order by OperatorPartitionIndex) as NewRowIndex
from SourceData
)
select ResortedSourceData.PreviousScanIndex, ResortedSourceData.OperatorPartitionIndex, ResortedSourceData.RowIndex, ResortedSourceData.NewRowIndex

, rank() over (partition by ResortedSourceData.NewRowIndex - ResortedSourceData.PreviousScanIndex order by ResortedSourceData.NewRowIndex) as Testing2

from ResortedSourceData order by NewRowIndex

Note: Rows are ordered by NewRowIndex which is defined in the second CTE as:

, rank() over (order by OperatorPartitionIndex) as NewRowIndex

and here you see we've ordered rows by OperatorPartitionIndex, QED.