Sql-server – Query (duration) different between data value change

sql serversql-server-2008

I am trying to write a query that would get me the duration of a status in a table. This query needs to work in SQL Server 2008.

Say I have the following table:

Key Value   RecordDate
1   1   2012-01-01
2   1   2012-01-02
3   1   2012-01-03
4   5   2012-01-05
5   5   2012-01-05 12:00:00
6   12  2012-01-06
7   1   2012-01-07
8   1   2012-01-08

I would like to get the following result

Value StartDate   EndDate     Duration
1     2012-01-01  2012-01-05  4 days
5     2012-01-05  2012-01-06  1 days
12    2012-01-06  2012-01-07  1 days
1     2012-01-07  NULL        NULL

Basically I would like the get the duration when of the value before it changes.

I am getting somewhere close, but still can't figure it out:

SELECT [Key], [Value],  
       MIN(RecordDate) OVER(PARTITION BY [Value]) as 'StarDate',
       MAX(RecordDate) OVER(PARTITION BY [Value]) as 'EndDate',
       DATEDIFF(day, (MIN(RecordDate) OVER(PARTITION BY [Value])), 
                     (MAX(RecordDate) OVER(PARTITION BY [Value])))
FROM [RateTable]
Order by RecordDate

I know that SQL Server 2012 has LAG and LEAD function, but since I am deal with SQL Server 2008, I can't use it.

Please advise

Here is the SQL statement that generate the sample data

CREATE TABLE RateTable(
    [Key] [int] IDENTITY(1,1) NOT NULL,
    [Value] [int] NULL,
    [RecordDate] [DateTime] NULL
    )
GO

INSERT INTO [RateTable] VALUES (1, '2012-01-01');
INSERT INTO [RateTable] VALUES (1, '2012-01-02');
INSERT INTO [RateTable] VALUES (1, '2012-01-03');
INSERT INTO [RateTable] VALUES (5, '2012-01-04');
INSERT INTO [RateTable] VALUES (5, '2012-01-05 12:00:00');
INSERT INTO [RateTable] VALUES (12, '2012-01-06');
INSERT INTO [RateTable] VALUES (1, '2012-01-07');
INSERT INTO [RateTable] VALUES (1, '2012-01-08');
GO

[Update] Thanks for everyone's inputs. I like to clarify a number of questions here, b/c I simplified many things here so that I won't add extra complexity to the problem that I am working on. The background is that I have a table that logs a tons of vehicle's malfunction light indicate (MIL) status. This MIL status information is logged irregularly based on many factors. What I would like to get is the duration of when the vehicle MIL status is turned on (value of 1). The value can be -1 (undefined) or 0 (off).

  1. The RecordDate is pretty much guarantee to be unique and it should be type DateTime
    (sorry I didn't have the data type correct the first time I posted
    this question)
  2. Since it is supposed to be datetime, you won't have to worry about
    multiple value per day
  3. RecordDate and Value are not Nullable
  4. The duration of the value change is the challenge. Can't really use
    Group By clause as some of you mentioned, b/c I need to know when
    the status of the MIL change, not the last record date of the
    grouped value.
  5. The table has mover 16 millions of record, b/c it is a log table.

Originally when I posted this question I was thought that there is simple sql statement solution, instead of using T-SQL function with Cursor… There are a number of advantages if I can use sql statement b/c it integrate with other third party report app better. It looks like this is rather difficult to do so without SQL 2012 analytic functions. Even if there is a SQL statement solution, I am now worry it can be too complicate to understand and maintain.

[Update 2] I hope that someone could still provide a solution based on sql query. However, due to time constraint for my project, I wrote a function to solve this issue for now.

CREATE Function GetRateReport (@StartDate DateTime,  @EndDate DateTime)
RETURNS @ResultSet TABLE
(
    [RecordDate] DateTime,
    [Value] int,
    [Duration] DateTime
) AS
BEGIN

    DECLARE @PreviousValue INT;
    DECLARE @EarliestRecordDate DateTime;
    DECLARE @CurrentValue INT;
    DECLARE @CurrentRecordDate DateTime;

    --Open a cursor to get source data
    DECLARE sourceCursor CURSOR LOCAL STATIC READ_ONLY FORWARD_ONLY
    FOR
    SELECT RecordDate, Value
    FROM   RateTable 
    WHERE RecordDate BETWEEN @StartDate AND @EndDate
    ORDER BY RecordDate;

    --Get the first record
    OPEN sourceCursor 
    FETCH NEXT FROM sourceCursor INTO @CurrentRecordDate, @CurrentValue;

    --Initize value
    SET @EarliestRecordDate = @CurrentRecordDate;
    SET @PreviousValue = @CurrentValue;

    WHILE(@@Fetch_Status = 0) -- check for more row
    BEGIN

        -- Insert result when value changed
        IF @CurrentValue <> @PreviousValue
        BEGIN
            INSERT INTO @ResultSet(RecordDate, Value, Duration)
            VALUES (@EarliestRecordDate, @PreviousValue, 
                      @CurrentRecordDate - @EarliestRecordDate);

            SET @EarliestRecordDate = @CurrentRecordDate;
            SET @PreviousValue = @CurrentValue;
        END

    FETCH NEXT FROM sourceCursor INTO @CurrentRecordDate, @CurrentValue;
    END

    --Edge case, retrieve the last value with duration up to current date
    INSERT INTO @ResultSet(RecordDate, Value, Duration)
    VALUES (@CurrentRecordDate, @CurrentValue, GETDATE() - @CurrentRecordDate);

    RETURN
END
GO

--Example
Select [RecordDate], [Value], 
  DateDiff(day, '1900-01-01', [Duration]) as 'Day Duration' 
  from GetRateReport('2012-01-01', '2012-01-31')
GO

Best Answer

This solution assumes there is only a single Value for each RecordDate. Performance will be better with tuned indexes, and probably breaking up the process using a temporary table or table variable. I used your script for test data. The output may not be exactly how you want it, but the process to get you close is what's important -- it works by filtering the rows to only the rows that start an interval, and then fabricates the end date by offsetting the results from the previous step.

WITH a AS
(
    SELECT
        Value,
        RecordDate,
        ROW_NUMBER() OVER(ORDER BY RecordDate) AS RN
        FROM [dbo].[RateTable] rt
),
b AS
(
    SELECT
        a1.Value,
        a1.RecordDate,
        ROW_NUMBER() OVER(ORDER BY a1.RecordDate) AS RN
        FROM a a1
        LEFT OUTER JOIN a a2 ON a2.RN = a1.RN - 1
        WHERE
            (a1.Value != a2.Value) OR
            (a2.RN IS NULL)
)
SELECT
    b1.Value,
    b1.RecordDate AS StartDate,
    b2.RecordDate AS EndDate,
    DATEDIFF(DAY, b1.RecordDate, b2.RecordDate) + 1 AS Duration /* Fixme? */
    FROM b b1
    LEFT OUTER JOIN b b2 ON b2.RN = b1.RN + 1
    ORDER BY b1.RecordDate;

Output:

Value   StartDate                EndDate                  Duration
1       2012-01-01 00:00:00.000  2012-01-04 00:00:00.000  4
5       2012-01-04 00:00:00.000  2012-01-06 00:00:00.000  3
12      2012-01-06 00:00:00.000  2012-01-07 00:00:00.000  2
1       2012-01-07 00:00:00.000  NULL                     NULL