Sql-server – SQL Server turn Series into a Range

performancequery-performancesql server

Has anyone found a great way to convert individual rows to date ranges for a specific value. In this case, I'm looking to get a date range for each code in my given table.

Date        Code
1/1/2017    F
1/2/2017    F
1/3/2017    F
1/4/2017    R
1/5/2017    P
1/6/2017    P
1/7/2017    P
1/8/2017    P

My result should be similar to the following.

From        To          Code
1/1/2017    1/3/2017    F
1/4/2017    1/4/2017    R
1/5/2017    1/8/2017    P

Best Answer

Assuming there are no gaps in the range of dates for a particular code, a simple aggregation query would solve this:

DECLARE @TestData TABLE (
    DateInfo DATE
    ,Code VARCHAR(1)
    )

insert into @TestData (DateInfo,Code)
values
('1/1/2017',    'F'),
('1/2/2017' ,   'F'),
('1/3/2017' ,   'F'),
('1/4/2017' ,   'R'),
('1/5/2017' ,   'P'),
('1/6/2017' ,   'P'),
('1/7/2017' ,   'P'),
('1/8/2017' ,   'P')

;
WITH TestData
AS (
    SELECT min(DateInfo) AS 'From'
        ,max(DateInfo) AS 'To'
        ,Code
    FROM @TestData
    GROUP BY code
    )
SELECT convert(VARCHAR(10), [From], 101) AS 'From'
    ,convert(VARCHAR(10), [To], 101) AS 'To'
    ,Code
FROM TestData
ORDER BY [From]

If the dates for a particular 'code' in your example are not consecutive, then you have what is known as a 'Gaps and Island' problem. I took this example and tweaked it for your situation. I added a new row for code 'F' with a date of 1/9/2017 so you could see how the same code could have multiple date ranges.

DECLARE @TestData TABLE (
    DateInfo DATE
    ,Code VARCHAR(1)
    )

insert into @TestData (DateInfo,Code)
values
('1/1/2017',    'F'),
('1/2/2017' ,   'F'),
('1/3/2017' ,   'F'),
('1/4/2017' ,   'R'),
('1/5/2017' ,   'P'),
('1/6/2017' ,   'P'),
('1/7/2017' ,   'P'),
('1/8/2017' ,   'P'),
('1/9/2017' ,   'F')
;
;

WITH mycte
AS (
    SELECT *
        ,DATEADD(DAY, - ROW_NUMBER() OVER (
                PARTITION BY code ORDER BY [DateInfo]
                ), [DateInfo]) AS grp
    FROM @TestData
    )


--select * from mycte       --Uncomment to see the data from mycte


SELECT Code
    ,min([DateInfo]) AS [From]
    ,max([DateInfo]) AS [To]
FROM mycte
GROUP BY Code
    ,grp
ORDER BY [From]

The ROW_NUMBER() window function in this example creates 'partitions' of each code and assigns a row number for each row in each 'partition'. A new 'code' starts the row number back to 1 for that partition.

The DateAdd logic takes the source date from your table, subtracts 'its' row number as 'days' and creates a 'grp' column that we can use to GROUP BY when we select from mycte.

To get a better understanding of what the code with the ROW_NUMBER() window function is doing, you can uncomment the line that says uncomment to see mycte and just run down the that select statement. It returns the following result - (Notice that the unbroken date sequences for each 'code' have the exact same 'grp' data.)

Now, it just a matter of selecting the MIN as [From] and MAX as [To] for each 'grp' and order appropriately to give you the final result.

Related Solutions

Postgresql – postgres query performance: view vs function

A view cannot help you producing an aggregate based on unknown parameters (business_id, start_date and end_date). It is nothing else than a given query, stored permanently in the database for later reuse. (Well, the implementation is more complicated, but that does not affect their usage.)

Considering your query, the furthest you can get with a view is

CREATE VIEW staff_commission AS
SELECT 
    user_id AS staff_id,
    business_id,
    amount,
    start_date,
    end_date
FROM transaction
WHERE
    type = 'staff' AND
    kind = 'commission';

Everything that is known beforehand is there, plus the columns you need for producing the desired output.

For obtaining the latter, you have to create a function in any case (optionally, working from the view). Building on the view makes sense when you have several queries that filter the date the same way.

The performance of all these will be the very same. If the indexes makes sense or not depends very much on the actual data. (I am pretty sure you don't need all of them, though.) Without knowing anything about those, it is very hard to guess which improvements are needed. Try what you have, check the EXPLAIN ANALYZE output and see if there is something missing.

Finally about caching: PostgreSQL is very smart about this. Going into details would fill a chapter or two in a book, but I would not worry about this until I see too many disk reads (which can be seen from EXPLAIN (ANALYZE, BUFFERS)).

SQL Server Stored Procedure – How to Create a Row for Every Day in a Date Range

One option is a recursive CTE:

DECLARE @StartDate datetime = '2017-03-05'
       ,@EndDate   datetime = '2017-04-11'
;

WITH theDates AS
     (SELECT @StartDate as theDate
      UNION ALL
      SELECT DATEADD(day, 1, theDate)
        FROM theDates
       WHERE DATEADD(day, 1, theDate) <= @EndDate
     )
SELECT theDate, 1 as theValue
  FROM theDates
OPTION (MAXRECURSION 0)
;

(MAXRECURSION hint added thanks to Scott Hodgin's comment, below.)

Best Answer

Related Solutions

Postgresql – postgres query performance: view vs function

SQL Server Stored Procedure – How to Create a Row for Every Day in a Date Range

Related Question