Sql-server – Any way to calculate frequency of a value without a cursor

gaps-and-islandssql serversql-server-2008-r2

Given I have a column that has an increment value and I know that value is going to climb to X (X being arbitrary) I would like to count what the longest repetitive streak is without the last count (because it may be going up at he end and we have just not made it there yet)

They are keyed by datetime, so arranging them in order to get the proper sequence is trivial. As is excluding everything forward of the last occurring zero

Example:

0,1,2,3, 0,1,2, 0, 0,1,2,3,4, 0,1,2,3,4, 0,1,2,3,4,5, 0,1

Would be "The longest repetitive streak is 4 and it happened twice in a row."

This will eventually become a loop, where I start with the highest value in the set, yielding 1@>=5. 2@>=4 3@>=3, 3@>=2

I can explain in more detail, just a concept right now, so forgive me if vague, just ask where I need to elaborate more.

Since I know I will have to use a procedural loop to get the final value, Id est Cursor… And, since I really do not want to nest them, I could live with 5 or ten iterations of max value to 0, but do not want the thousands of potential iterations inside that to repeat however many times that may be…. I would like to let SQL do as much of the work as possible, but not feed it something I know is going to cripple it in the end.

So… Any speculations on how that could be achieved in MSSQL 2008R2+ in a set based query, without the use of a cursor?

Or should I just quit while I am ahead, pull this into C# and do it there.

Edit:
As best I can type this on my phone, this is the intended objective if I were doing it out of SQL in psudocode.

Define datetime for start and end point
Select data that is between the first zero after start, and the last zero prior to end.
Define current, last, streak, count
Loop through rows returned.
If (current != 0) { 
last = current
 }
Else
{ //we have reset, how high did we get before, and is it the same as the last time we reset?
    If (last == streak) {
        //we have made another streak of same value as last
        Count +=1
(update the table keeping count that the sequence of (last) has now occurred (count) times at this point) 
    }
    Else
    {
        //we set a different max value, therefore a *different* streak
        //so we record it as the new streak to see if *it* repeats further
        Streak = last
Count = 1
    }
Last = 0
}

I think I will go back tonight and code a functional example in C# and a graphic of the pattern and see if that assists in saying "How would I or can I even do the same in SQL?"

Clarifications: If the values were:

0,1,2,3,4, 0,1,2, 0,1,2, 0,1,2, 0,1,2,3,4, 0,1 

should the result be the two 4s or the three 2s?

In that example there would only be one repetitive streak, the thee consecutive sequences of 0,1,2 as indicated by it going back to zero afterward. Repetitive is the key, though the same sequence will happen throughout the data, I want to know how many times it happens in direct repetition, not that it went 0,1,2 three time in the whole set, that it did it 3 times in a row. Logically the same pattern will repeat often through my data, I am concerned where it does it in a pattern, and then I will look for consistent patterns such as 0-4 three times, then random, then back to 0-4 three times.

If it helps the data is a measure of vibration intensity. By detecting the repetitive pattern it exposes harmonics that are not directly part of the source. I want to know what is essentially noise, and what is feedback sort of like when you are driving a car with an imbalance and it only does that shake between certain speeds.

Edit #2 (Better explanation, and functional cursor example)
Given the data presented here (actual sample) and eliminating what seems to be ancillary to the problem which is simple filtering I can handle, row number vs datetime, etc…

Sample data:

Sample Data

With that data loaded into a sample table

CREATE TABLE [dbo].[Table_1](
    [row] [int] NOT NULL,
    [streak] [int] NOT NULL,
    [rStreak] [varchar](100) NULL
) ON [PRIMARY]

A functional example that achieves the calculation…

DECLARE @row INT
DECLARE @current INT
DECLARE @last INT
DECLARE @streak INT
DECLARE @streakCount INT
DECLARE @parser CURSOR

SET @streak = 0
--SET @streakCount = 1
SET @parser = CURSOR
FOR SELECT [row],
           [streak]
    FROM   [Table_1] --For the sake of brievity I am not filtering the *between* zeros and date range here, but I have that part covered.

OPEN @parser

FETCH next FROM @parser INTO @row, @current

WHILE @@FETCH_STATUS = 0
  BEGIN
      PRINT 'Row:' + Cast(@row AS VARCHAR) + ' Value:'
            + Cast(@current AS VARCHAR)

      IF @current = 0
        BEGIN
            PRINT 'Resetting streak of:'
                  + Cast(@streak AS VARCHAR)

            IF @streak = @last
               AND NOT @streak = 0
              BEGIN
                  SET @streakCount = @streakCount + 1

                  UPDATE [Table_1]
                  SET    [rStreak] = 'Concurrent Streak:'
                                     + Cast(@streakCount AS VARCHAR) + '@'
                                     + Cast(@streak AS VARCHAR)
                  WHERE  [row] = @row
              END
            ELSE
              SET @streakCount = 1

            SET @last = @streak
            SET @streak = 0
        END
      ELSE
        BEGIN
            SET @streak = @current

            PRINT 'Calculating streak length:'
                  + Cast(@streak AS VARCHAR)

            UPDATE [Table_1]
            SET    [rStreak] = 'Streak:' + Cast(@streak AS VARCHAR)
            WHERE  [row] = @row
        END

      FETCH next FROM @parser INTO @row, @current
  END

CLOSE @parser

DEALLOCATE @parser 

ouputs:

Row:1 Value:0
Resetting streak of:0
Row:2 Value:1
Calculating streak length:1

(1 row(s) affected)
Row:3 Value:2
Calculating streak length:2

(1 row(s) affected)
Row:4 Value:3
Calculating streak length:3

(1 row(s) affected)
Row:5 Value:0
Resetting streak of:3
Row:6 Value:1
Calculating streak length:1

(1 row(s) affected)
Row:7 Value:2
Calculating streak length:2

(1 row(s) affected)
Row:8 Value:3
Calculating streak length:3

(1 row(s) affected)
Row:9 Value:0
Resetting streak of:3

(1 row(s) affected)
Row:10 Value:1
Calculating streak length:1

(1 row(s) affected)
Row:11 Value:2
Calculating streak length:2

(1 row(s) affected)
Row:12 Value:0
Resetting streak of:2
Row:13 Value:0
Resetting streak of:0
Row:14 Value:0
Resetting streak of:0
Row:15 Value:0
Resetting streak of:0
Row:16 Value:1
Calculating streak length:1

(1 row(s) affected)
Row:17 Value:2
Calculating streak length:2

(1 row(s) affected)
Row:18 Value:3
Calculating streak length:3

(1 row(s) affected)
Row:19 Value:0
Resetting streak of:3
Row:20 Value:1
Calculating streak length:1

(1 row(s) affected)
Row:21 Value:2
Calculating streak length:2

(1 row(s) affected)
Row:22 Value:3
Calculating streak length:3

(1 row(s) affected)
Row:23 Value:0
Resetting streak of:3

(1 row(s) affected)
Row:24 Value:1
Calculating streak length:1

(1 row(s) affected)
Row:25 Value:2
Calculating streak length:2

(1 row(s) affected)
Row:26 Value:0
Resetting streak of:2
Row:27 Value:1
Calculating streak length:1

(1 row(s) affected)
Row:28 Value:2
Calculating streak length:2

(1 row(s) affected)
Row:29 Value:0
Resetting streak of:2

(1 row(s) affected)

and yields the following data (correctly)

row streak  rStreak
1   0   NULL
2   1   Streak:1
3   2   Streak:2
4   3   Streak:3
5   0   NULL
6   1   Streak:1
7   2   Streak:2
8   3   Streak:3
9   0   Concurrent Streak:2@3
10  1   Streak:1
11  2   Streak:2
12  0   NULL
13  0   NULL
14  0   NULL
15  0   NULL
16  1   Streak:1
17  2   Streak:2
18  3   Streak:3
19  0   NULL
20  1   Streak:1
21  2   Streak:2
22  3   Streak:3
23  0   Concurrent Streak:2@3
24  1   Streak:1
25  2   Streak:2
26  0   NULL
27  1   Streak:1
28  2   Streak:2
29  0   Concurrent Streak:2@2

So hopefully that clarifies the question as a whole, I want to know can this be achieved without the cursor, requiring that I modify the original table, or use an intermediate temp table/table variable?

Essentially this model works, but it does not scale well, whereas this is simple on small data sets, when I go RBAR it is going to get very inefficient when I start processing hundreds of thousands or rows.

And thank you all for your continued input and patience. 🙂

Best Answer

Assuming that all your sequences consist of numbers monotously growing from 0 to X without gaps you can use smth like this:

declare @t table (id int identity, col int);
insert into @t(col) values
(0),(1),(2),(3),(0),(1),(2),(0),(0),(1),(2),(3),(4),(0),(1),(2),(3),(4),(0),(1),(2),(3),(4),(5),(0),(1);

with cte as
(
select id, row_number() over(order by id) as rn
from @t
where col = 0
)

,cte1 as
(
select c1.id as id1,
       c2.id as id2,
       c2.id - c1.id as len_
from cte as c1 
     join cte as c2
        on c1.rn + 1 = c2.rn
)

,cte2 as
(
select len_, count(*) as cnt
from cte1
group by len_
having  count(*) > 1
)

select top 1 *
from cte2
order by len_ desc;

Here I use the identity column to imitate your

They are keyed by datetime, so arranging them in order to get the proper sequence is trivial

So I just count the number of members contained between 0s and assume that if the count is the same, the numbers are also the same, and then filter only those that has more than 1 occurence.

In the result the len_ is the length of the streak (maybe you need to substract 1 from it) and cnt is respective count.

If your table is big enough of course you should not use cte but save the first resultset(cte) into #tmp table