Sql-server – Grouping Subsets of Rows with Null Values within an Ordered Set

gaps-and-islandsgroup bysql serversql-server-2008-r2

Lets say we have an table where each row is a day, and it is ordered by this day column. Then we have left joined a membership data set showing which day the members were active (and not).

Lets say our current data set looks like this… Membership was active from day 3-5, inactive from 5-8, and active from day 9 onward etc.

DAY     DATE        MEMBER  ACTIVE
 1      2017-01-01  123     null
 2      2017-01-02  123     null
 3      2017-01-03  123     2017-01-03
 4      2017-01-04  123     2017-01-04
 5      2017-01-05  123     2017-01-05
 6      2017-01-06  123     null
 7      2017-01-07  123     null
 8      2017-01-08  123     null
 9      2017-01-09  123     2017-01-09
10      2017-01-10  123     2017-01-10

…so ACTIVE=null means membership was not active on those days.

With this data structure, I would like to get to a "collapsed" set, showing "spans" of time inactive/active:

MEMBER  MIN(DATE)   MAX(DATE)   STATUS
123,    2017-01-01, 2017-01-02  INACTIVE
123,    2017-01-03, 2017-01-05  ACTIVE
123,    2017-01-06, 2017-01-08  INACTIVE
123,    2017-01-09, 2017-01-10  ACTIVE

I have tried using row_number() to somehow partition out the subsets of a certain status, but in this case, using min()/max() over the rows where ACTIVE is null, treats those as a single group, when in reality, there are several distinct spans of "inactive membership".

How can I distinguish the spans of inactive membership from one-another for grouping purposes?
What technique can I use to achieve that output above?

Here is the script to generate the dummy source data:

CREATE TABLE ##SRC (ID INT, D DATE, MEMBER INT, ACTIVE DATE);

INSERT INTO ##SRC (ID, D, MEMBER, ACTIVE)
SELECT 1, '2017-01-01', 123, NULL UNION 
SELECT 2, '2017-01-02', 123, NULL UNION 
SELECT 3, '2017-01-03', 123, '2017-01-03' UNION 
SELECT 4, '2017-01-04', 123, '2017-01-04' UNION 
SELECT 5, '2017-01-05', 123, '2017-01-05' UNION 
SELECT 6, '2017-01-06', 123, NULL UNION 
SELECT 7, '2017-01-07', 123, NULL UNION 
SELECT 8, '2017-01-08', 123, NULL UNION 
SELECT 9, '2017-01-09', 123, '2017-01-09' UNION 
SELECT 10, '2017-01-10',    123, '2017-01-10' 
;

Best Answer

Your sample data does not match your description and confused me at first. As sp_BlitzErik points out this is an island-and-gap problem. The solution is pretty straightforward if you have access to window functions. First, we can enumerate the table per member alone, let's call this full_order (this happens to be the same as day, but I'll ad it for generality). Second, we can enumerate the table per member and whether they were active on that day, let's call this partial_order

 select day, active, date, member
      , row_number() over (partition by member 
                           order by day) as fullorder
      , row_number() over (partition by member
                          ,case when active is null then 0 else 1 end
                          order by day) as partialorder
 from src

DAY         ACTIVE     MEMBER      FULLORDER            PARTIALORDER        

----------- ---------- ----------- -------------------- --------------------
      1 -                  123                    1                    1
      2 -                  123                    2                    2
      3 01/03/2017         123                    3                    1
      4 01/04/2017         123                    4                    2
      5 01/05/2017         123                    5                    3
      6 -                  123                    6                    3
      7 -                  123                    7                    4
      8 -                  123                    8                    5
      9 01/09/2017         123                    9                    4
     10 01/10/2017         123                   10                    5

If the difference between full_order and partial_order changes, it means that active has changed from null to a value, or vice versa. Therefore we can form a group with this difference. Within each such group we can pick the min(active) and max(active) to form an interval:

select member, grp, min(date), max(active) 
from (
    select day, active, date, member
         , row_number() over (partition by member order by day) 
         - row_number() over (partition by member
                             ,case when active is null then 0 else 1 end 
                              order by day) as grp  
    from src
) 
group by member, grp

MEMBER      GRP                  3          4         
----------- -------------------- ---------- ----------
    123                    0 01/01/2017 -         
    123                    2 01/03/2017 01/05/2017
    123                    3 01/05/2017 -         
    123                    5 01/08/2017 01/10/2017

It's probably easiest to add another level of nesting to get the desired result:

select member, min_active
     , coalesce(max_active, min_active) as max_active
     , case when max_active is null then 'INACTIVE' else 'ACTIVE' end as status 
from (
    select member, grp, min(date) as min_active, max(active) as max_active 
    from (
        select day, active, date, member
             , row_number() over (partition by member order by day) 
             - row_number() over (partition by member
                                 ,case when active is null then 0 else 1 end 
                                 order by day) as grp  
        from src
    ) 
    group by member, grp)

MEMBER      MIN_ACTIVE MAX_ACTIVE STATUS  
----------- ---------- ---------- --------
    123 01/01/2017 01/01/2017 INACTIVE
    123 01/03/2017 01/05/2017 ACTIVE  
    123 01/05/2017 01/05/2017 INACTIVE
    123 01/08/2017 01/10/2017 ACTIVE