Recursive Join to get Spans

gaps-and-islandsoracle

I am pretty new to working in SQL at and have a gaps and islands problem in which I am trying to get a series of consecutive months of membership. Any member may have multiple memberships and their membership may have gaps or overlap. My thought was to order memberships by member and plan and work backward within this group month by month until I find a gap, then assign the begin and end date of the membership.

Here's a sampling of my data:

Member  YYYYMM      Plan
1       201601      123
1       201602      123
2       201505      123
2       201506      123
2       201510      123
2       201511      123
2       201512      123
1       201510      456
1       201511      456
1       201512      456
1       201601      456

The above data would represent memberships for two separate members to same/different plans for different time spans. Now, I need to get it to look more like this:

Member  YYYYMM_Begin    YYYYMM_End    Plan
1       201601          201602        123
2       201505          201506        123
2       201510          201512        123
1       201510          201601        456

So far, I've got a recursive join that partitions based on Member and Plan and ranks them in descending order by YYYYMM, which seems to be working. But I'm stuck on how to figure out the begin and end dates for consecutive months of membership.

SELECT DISTINCT A.Member, A.Plan, A.YYYYMM, B.Rank
FROM Member_Hist A
 JOIN (SELECT 
    MH.Member,
    MH.Plan,
    MH.YYYYMM,
RANK() OVER (PARTITION BY Member, Plan ORDER BY YYYYMM DESC) AS Rank
FROM Member_Hist MH) B ON A.Member = B.Member AND A.Plan = B.Plan 
ORDER BY Member ASC, Plan ASC, YYYYMM DESC

Edit: I'm using Oracle, not MySQL; Also, I only have read permission.
Edit: Sorry, NOT using MySQL or SQL Server

Best Answer

This is a Gap & Island problem. Plenty of sites talk about it if you Google it. Here is just the first link I clicked among many others: Solving Gaps and Islands with Enhanced Window Functions

Here is how the query works:

  • YYYYMM is a varchar and does not give consecutive numbers. Therefore it first changes them to proper date format.
  • then ROW_NUMBER() will number them by group of Member and [Plan]
  • consecutive months will have consecutive rn and once rn is removed from the date, they will have identical values for DATEADD(month, rn, l1.dt)

    Member  Plan    dt          rn  => DATEADD(month, rn, l1.dt)
    2       123     2015-12-01  1   => 2016-01-01
    2       123     2015-11-01  2   => 2016-01-01
    2       123     2015-10-01  3   => 2016-01-01
    2       123     2015-06-01  4   => 2015-10-01
    2       123     2015-05-01  5   => 2015-10-01
    
  • DATEADD(month, rn, l1.dt) is finally used to group them along with Member and Plan

Query Oracle:

WITH dates AS(
    SELECT DISTINCT Member, "Plan"
        , dt = CAST(YYYYMM || '01' as date)
    FROM data
), list AS (
    SELECT *
        , rn = ROW_NUMBER() OVER(PARTITION BY Member, "Plan" ORDER BY dt DESC)
    FROM dates d
)
SELECT l1.Member
    , MIN(l1.dt) as YYYYMM_Begin
    , MAX(l1.dt) as YYYYMM_End
    , l1."Plan"
FROM list l1
GROUP BY Member, "Plan", ADD_MONTHS(l1.dt, rn);

Oracle [SQL Fiddle][]

Query SQL Server:

WITH dates AS(
    SELECT DISTINCT Member, [Plan]
        , dt = CAST(YYYYMM+'01' as date)
    FROM @data
), list AS (
    SELECT *
        , rn = ROW_NUMBER() OVER(PARTITION BY Member, [Plan] ORDER BY dt DESC)
    FROM dates d
)
SELECT l1.Member
    , MIN(l1.dt) as YYYYMM_Begin
    , MAX(l1.dt) as YYYYMM_End
    , l1.[Plan]
FROM list l1
GROUP BY Member, [Plan], DATEADD(month, rn, l1.dt);

SQL Server SQL Fiddle

Output:

Member  YYYYMM_Begin    YYYYMM_End  Plan
1       2016-01-01      2016-02-01  123
1       2015-10-01      2016-01-01  456
2       2015-05-01      2015-06-01  123
2       2015-10-01      2015-12-01  123