SQL Server 2012 – Finding Gaps on Intersection Dates

querysql serversql-server-2012t-sql

This is a problem I've been trying to work through on and off for the past month, and haven't made any progress. Essentially, I am working with a HR system that stores two major types of position for employees: Primary (P) and Acting (A). The position types aren't hugely relevant, but are important to know to understand the way the data is structured. An employee will always be in a Primary position, but may be put into an Acting position temporarily while it is vacant, or while the Primary person for that role is on leave.

The way that this is represented in the system is diagrammed below (as an example and not to scale):

P (2018-07-28 - 9999-99-99) <------------------------------------------------------------->
A (2018-09-04 - 2018-09-05)             <--->
A (2018-11-12 - 2018-11-13)                                          <--->
A (2018-11-19 - 2018-11-20)                                                  <--->

Due to a set of data I'm trying to join the position information with, I need it to be intersected to look like this, where the Acting position always takes precedence over the Primary position.

P 2018-07-28 - 2018-09-03 <---------->
A 2018-09-04 - 2018-09-05             <--->
P 2018-09-06 - 2018-11-11                  <----------------------->
A 2018-11-12 - 2018-11-13                                           <--->
P 2018-11-14 - 2018-11-18                                                <----->
A 2018-11-19 - 2018-11-20                                                  
<--->
P 2018-11-21 - 9999-99-99                                                  
     <----->

Below is a representation of the data directly from the database, to give an idea of what it looks like overall:

Employee    Position    Start Date  End Date    Position Type
1           30          2016-12-10  2016-12-11  P
1           30          2016-12-12  2017-12-08  P
1           20          2017-01-31  2017-02-02  A
1           20          2017-03-29  2017-03-31  A
1           20          2017-04-19  2017-04-21  A
1           20          2017-06-22  2017-06-23  A
1           20          2017-06-29  2017-06-30  A
1           20          2017-10-03  2017-10-06  A
1           30          2017-12-09  2018-07-26  P
1           20          2018-02-12  2018-03-02  A
1           20          2018-07-19  2018-07-20  A
1           30          2018-07-27  2018-07-27  P
1           30          2018-07-28  9999-99-99  P
1           20          2018-09-04  2018-09-05  A
1           20          2018-11-12  2018-11-13  A
1           20          2018-11-19  2018-11-20  A

As you can see, there are even multiple entries for the same position where the dates are contiguous. These need to be combined into a contiguous entry until they are intersected/overridden by Acting Positions. I've been scouring different sources on the internet for the past month and haven't even come close to a viable solution. The closest I've come is from SQL – Combining Overlapping Date Rows, which I've converted from Oracle but I don't have enough knowledge of T-SQL to take it all the way to what I need, original Oracle code below:

SELECT
  KEY,
  MIN(TD) MINTD,
  MAX(TD) MAXTD
FROM
  (SELECT
    KEY,
    TD,
    SUM(C) OVER (PARTITION BY KEY ORDER BY TD) SC
  FROM
    (SELECT
      KEY,
      D,
      TD,
      TPD,
      DECODE(SIGN(LTD-TPD),1,1,0) C
    FROM
      (SELECT
        KEY,
        D,
        TRUNC(D,'MM') TD,
        ADD_MONTHS(TRUNC(D,'MM'),-1) LTD,
        LAG(D,1) OVER (PARTITION BY KEY ORDER BY D) PD,
        LAG(TRUNC(D,'MM'),1) OVER (PARTITION BY KEY ORDER BY D) TPD
      FROM
        T1)))
GROUP BY
  KEY,
  SC
ORDER BY
  KEY,
  SC;

Any ideas on a SELECT query to group and intersect the dates as outlined above?

Best Answer

First, I think the link you had supplied doesn't fit for your question because it is about combining overlapped dates, and you need to generate (GAPs) new dates.

I would divide your data in two groups, those corresponding to Primary(s) and those that belong to Acting(s).

SELECT   t1.*,
         ROW_NUMBER() OVER (PARTITION BY Employee ORDER BY Employee, StartDate) P1
FROM     tbl t1
WHERE    t1.PosType = 'P'
ORDER BY t1.Employee, t1.StartDate
;
GO
Employee | Position | StartDate  | EndDate    | PosType | P1
-------: | -------: | :----------| :----------| :------ | :-
       1 |       30 | 10/12/2016 | 11/12/2016 | P       | 1 
       1 |       30 | 12/12/2016 | 08/12/2017 | P       | 2 
       1 |       30 | 09/12/2017 | 26/07/2018 | P       | 3 
       1 |       30 | 27/07/2018 | 27/07/2018 | P       | 4 
       1 |       30 | 28/07/2018 | 31/12/9999 | P       | 5 

Next Acting rows are subsumed into the second Principal row: 12/12/2016 | 08/12/2017

SELECT   t2.*,
         ROW_NUMBER() OVER (ORDER BY t2.StartDate) P2
FROM     tbl t2
WHERE    t2.PosType = 'A'
         AND t2.StartDate >= '20161212'
         AND t2.StartDate <= '20171208'
ORDER BY t2.Employee, t2.StartDate
;
GO
Employee | Position | StartDate  | EndDate    | PosType | P2
-------: | -------: | :----------| :----------| :------ | :--
       1 |       20 | 31/01/2017 | 02/02/2017 | A       | 1    
       1 |       20 | 29/03/2017 | 31/03/2017 | A       | 2    
       1 |       20 | 19/04/2017 | 21/04/2017 | A       | 3    
       1 |       20 | 22/06/2017 | 23/06/2017 | A       | 4    
       1 |       20 | 29/06/2017 | 30/06/2017 | A       | 5    
       1 |       20 | 03/10/2017 | 06/10/2017 | A       | 6    

Now let me generate a new dataset, for each Primary I'll add the corresponding Acting(s) using an OUTER JOIN, this outer join returns all Primary(s) even if they haven't subsumed Acting(s).

SELECT   t1.*, COALESCE(t3.P2, 0) P2, P, t3.PrevED, t3.SD, t3.ED, NextSD, t3.PT
INTO #mydata
FROM     (SELECT *,
                 ROW_NUMBER() OVER (PARTITION BY Employee ORDER BY Employee, StartDate) as P1
          FROM   tbl 
          WHERE  PosType = 'P'
          ) t1
OUTER APPLY (SELECT   ROW_NUMBER() OVER (ORDER BY t2.StartDate) as P2,
                      LAG(t2.EndDate) OVER (ORDER BY t2.StartDate) as PrevED,
                      LEAD(t2.StartDate) OVER (ORDER BY t2.StartDate) as NextSD,
                      t2.Position as P, t2.StartDate as SD, t2.EndDate as ED, t2.PosType as PT
             FROM     tbl t2
             WHERE    t2.Employee = t1.Employee
                      AND t2.PosType = 'A'
                      AND t2.StartDate >= t1.StartDate
                      AND t2.StartDate <= t1.EndDate
            ) t3
WHERE    t1.PosType = 'P'
ORDER BY t1.Employee, t1.StartDate, t3.SD
;

CREATE CLUSTERED INDEX PK_Mydata ON #mydata(Employee, P1, P2);

SELECT * FROM #mydata ORDER BY Employee, P1, P2;
GO

NOTE: I've used a temporary table plus and index, but you could try by using a CTE. You should check performance of both options.

This is the new scenario:

Employee | Position | StartDate  | EndDate    | PosType | P1 | P2 |    P | PrevED     | SD         | ED         | NextSD     | PT  
-------: | -------: | :----------| :----------| :------ | :- | :- | ---: | :----------| :----------| :----------| :----------| :---
       1 |       30 | 10/12/2016 | 11/12/2016 | P       | 1  | 0  | null | null       | null       | null       | null       | null
       1 |       30 | 12/12/2016 | 08/12/2017 | P       | 2  | 1  |   20 | null       | 31/01/2017 | 02/02/2017 | 29/03/2017 | A   
       1 |       30 | 12/12/2016 | 08/12/2017 | P       | 2  | 2  |   20 | 02/02/2017 | 29/03/2017 | 31/03/2017 | 19/04/2017 | A   
       1 |       30 | 12/12/2016 | 08/12/2017 | P       | 2  | 3  |   20 | 31/03/2017 | 19/04/2017 | 21/04/2017 | 22/06/2017 | A   
       1 |       30 | 12/12/2016 | 08/12/2017 | P       | 2  | 4  |   20 | 21/04/2017 | 22/06/2017 | 23/06/2017 | 29/06/2017 | A   
       1 |       30 | 12/12/2016 | 08/12/2017 | P       | 2  | 5  |   20 | 23/06/2017 | 29/06/2017 | 30/06/2017 | 03/10/2017 | A   
       1 |       30 | 12/12/2016 | 08/12/2017 | P       | 2  | 6  |   20 | 30/06/2017 | 03/10/2017 | 06/10/2017 | null       | A   
       1 |       30 | 09/12/2017 | 26/07/2018 | P       | 3  | 1  |   20 | null       | 12/02/2018 | 02/03/2018 | 19/07/2018 | A   
       1 |       30 | 09/12/2017 | 26/07/2018 | P       | 3  | 2  |   20 | 02/03/2018 | 19/07/2018 | 20/07/2018 | null       | A   
       1 |       30 | 27/07/2018 | 27/07/2018 | P       | 4  | 0  | null | null       | null       | null       | null       | null
       1 |       30 | 28/07/2018 | 31/12/9999 | P       | 5  | 1  |   20 | null       | 04/09/2018 | 05/09/2018 | 12/11/2018 | A   
       1 |       30 | 28/07/2018 | 31/12/9999 | P       | 5  | 2  |   20 | 05/09/2018 | 12/11/2018 | 13/11/2018 | 19/11/2018 | A   
       1 |       30 | 28/07/2018 | 31/12/9999 | P       | 5  | 3  |   20 | 13/11/2018 | 19/11/2018 | 20/11/2018 | null       | A   

I've added two partitions P1, P2 and I set up an index on them, these partitions will help me to filter information in this new scenario according to 4 types of rows:

|---------------------------------|   Type 1
|--P--|                               Type 2
       |--A--|                        Type 3
              |--P--|                 Type 2
                     |--A--|          Type 3
                            |--P--|   Type 4

Type 1:

Primary(s) that don't have any subsumed Acting.

  • P2 = 0

Type 2:

If exists, left gaps between Acting(s) plus between first Acting and Primary start date.

  • P2 <> 0
  • Actign start date - 1 > Primary start date/Acting end date

Type 3:

All Acting preriods.

  • P2 <> 0

Type 4:

If exists, right gaps between last Acting and Primary end date.

  • P2 <> 0
  • NextSD IS NULL (Last Acting)
  • Acting end date < Primary end date

Now the selection:

SELECT *
FROM
    (
    -- Primary(s) without Acting(s)
    SELECT Employee, Position, StartDate, EndDate, PosType
    FROM   #mydata
    WHERE  Employee = 1
           AND P2 = 0
    --
    -- plus all Acting(s)
    --
    UNION
    SELECT Employee, P, SD, ED, PT
    FROM   #mydata
    WHERE  Employee = 1
           AND P2 <> 0
    --
    -- plus all (left) Gaps
    --
    UNION
    SELECT Employee, 
           Position, 
           IIF(PrevED IS NULL, StartDate, DATEADD(day, 1, PrevED)), 
           DATEADD(day, -1, SD),
           PosType
    FROM   #mydata
    WHERE  Employee = 1
           AND P2 <> 0
           AND DATEDIFF(day, IIF(PrevED IS NULL, StartDate, DATEADD(day, 1, PrevED)), SD) > 0
    UNION
    --
    -- plus all (last right) Gap
    --
    SELECT Employee, 
           Position, 
           DATEADD(day, 1, ED),
           EndDate,
           PosType
    FROM   #mydata
    WHERE  Employee = 1
           AND P2 <> 0
           AND NextSD IS NULL
           AND DATEDIFF(day, ED, EndDate) > 0
    ) T
ORDER BY 1, 3;
GO

This is the final result:

Employee | Position | StartDate  | EndDate    | PosType
-------: | -------: | :----------| :----------| :------
       1 |       30 | 10/12/2016 | 11/12/2016 | P      
       1 |       30 | 12/12/2016 | 30/01/2017 | P      
       1 |       20 | 31/01/2017 | 02/02/2017 | A      
       1 |       30 | 03/02/2017 | 28/03/2017 | P      
       1 |       20 | 29/03/2017 | 31/03/2017 | A      
       1 |       30 | 01/04/2017 | 18/04/2017 | P      
       1 |       20 | 19/04/2017 | 21/04/2017 | A      
       1 |       30 | 22/04/2017 | 21/06/2017 | P      
       1 |       20 | 22/06/2017 | 23/06/2017 | A      
       1 |       30 | 24/06/2017 | 28/06/2017 | P      
       1 |       20 | 29/06/2017 | 30/06/2017 | A      
       1 |       30 | 01/07/2017 | 02/10/2017 | P      
       1 |       20 | 03/10/2017 | 06/10/2017 | A      
       1 |       30 | 07/10/2017 | 08/12/2017 | P      
       1 |       30 | 09/12/2017 | 11/02/2018 | P      
       1 |       20 | 12/02/2018 | 02/03/2018 | A      
       1 |       30 | 03/03/2018 | 18/07/2018 | P      
       1 |       20 | 19/07/2018 | 20/07/2018 | A      
       1 |       30 | 21/07/2018 | 26/07/2018 | P      
       1 |       30 | 27/07/2018 | 27/07/2018 | P      
       1 |       30 | 28/07/2018 | 03/09/2018 | P      
       1 |       20 | 04/09/2018 | 05/09/2018 | A      
       1 |       30 | 06/09/2018 | 11/11/2018 | P      
       1 |       20 | 12/11/2018 | 13/11/2018 | A      
       1 |       30 | 14/11/2018 | 18/11/2018 | P      
       1 |       20 | 19/11/2018 | 20/11/2018 | A      
       1 |       30 | 21/11/2018 | 31/12/9999 | P      

db<>fiddle here