Sql-server – Table join to match without duplicates

join;sql server

I have a situation where I need to match arrivals and departures of assets in a location. The problem is that asset movements are not always entered into the database chronologically, nor is there any current way of tying the arrival and departure together in the database.

Caveats:

  • Each movement (arrival or departure) consists of a single row with a unique id (moveID)
  • Each row will have an item ID (itemID) that is unique to that item. Each item, however, could have multiple rows (movements) in the table.
  • I want to match each arrival for the asset to the departure chronologically, ensuring that we only match each moveID to one or fewer other moves (e.g. we need to allow for an arrival and depature, an arrival only or a departure only).
  • Matches should be made based on location (e.g. location and shelf or location and subshelf on arrival matches the location and shelf or location and subshelf on the departure).
  • This is a MS SQL Server

The structure of the table is (similar) to the following:

  CREATE TABLE Movements
 (
      MoveID int IDENTITY (1,1),
      ItemID int,
      EventType CHAR,
      moveTime datetime,
      LocID int,
      ShelfID int,
      altShelfID int
)

and I have included test data:

SET IDENTITY_INSERT movements on

INSERT INTO movements
(MoveID,ItemID,EventType,moveTime,LocID,ShelfID,altShelfID)
VALUES(1,3,'A','2013-01-05 09:00',1,3,NULL)

INSERT INTO movements
(MoveID,ItemID,EventType,moveTime,LocID,ShelfID,altShelfID)
VALUES(2,3,'D','2013-01-06 13:00',1,3,NULL)

INSERT INTO movements
(MoveID,ItemID,EventType,moveTime,LocID,ShelfID,altShelfID)
VALUES(3,3,'A','2013-01-07 09:00',1,3,NULL)

INSERT INTO movements
(MoveID,ItemID,EventType,moveTime,LocID,ShelfID,altShelfID)
VALUES(4,3,'A','2013-01-15 09:00',1,3,NULL)

INSERT INTO movements
(MoveID,ItemID,EventType,moveTime,LocID,ShelfID,altShelfID)
VALUES(5,3,'D','2013-01-07 15:00',1,3,NULL)

INSERT INTO movements
(MoveID,ItemID,EventType,moveTime,LocID,ShelfID,altShelfID)
VALUES(6,3,'A','2013-01-16 09:00',2,NULL,7)

INSERT INTO movements
(MoveID,ItemID,EventType,moveTime,LocID,ShelfID,altShelfID)
VALUES(7,3,'D','2013-01-15 12:00',1,3,NULL)

SET IDENTITY_INSERT movements off

Query

SELECT    CASE WHEN sch_A.itemID IS NULL THEN Sch_D.itemID
                     ELSE Sch_A.itemID
                END AS ID ,
                Sch_A.moveTime AS Arrival_Time ,
                Sch_D.moveTime AS Departure_Time ,
                Sch_A.LocID ,
                Sch_A.ShelfID ,
                Sch_A.AltShelfID ,
                Sch_D.LocID ,
                Sch_D.ShelfID ,
                Sch_D.AltShelfID ,
                Sch_A.MoveID AS ArrivalMoveID ,
                Sch_D.MoveID AS DepartureMoveID
      FROM      Movements Sch_A
                FULL JOIN Movements Sch_D ON Sch_A.itemID = Sch_D.itemID
                                                AND Sch_A.LocID = Sch_D.LocID
                                                AND ISNULL(Sch_A.ShelfID,
                                                          0) = ISNULL(Sch_D.ShelfID,
                                                          0)
                                                AND ISNULL(Sch_A.AltShelfID,
                                                          0) = ISNULL(Sch_D.AltShelfID,
                                                          0)
                                                AND Sch_A.MoveID != Sch_D.MoveID
                                                AND Sch_A.moveTime <= Sch_D.moveTime
      WHERE     ( Sch_A.EventType != 'D'
                  OR Sch_A.EventType IS NULL
                )
                AND ( Sch_D.EventType != 'A'
                      OR Sch_D.EventType IS NULL
                    )

Results returned have matches for the same schedule multiple times:

3   2013-01-05 09:00:00.000 2013-01-06 13:00:00.000 1   3   NULL    1   3   NULL    1   2
3   2013-01-05 09:00:00.000 2013-01-07 15:00:00.000 1   3   NULL    1   3   NULL    1   5
3   2013-01-05 09:00:00.000 2013-01-15 12:00:00.000 1   3   NULL    1   3   NULL    1   7
3   2013-01-07 09:00:00.000 2013-01-07 15:00:00.000 1   3   NULL    1   3   NULL    3   5
3   2013-01-07 09:00:00.000 2013-01-15 12:00:00.000 1   3   NULL    1   3   NULL    3   7
3   2013-01-15 09:00:00.000 2013-01-15 12:00:00.000 1   3   NULL    1   3   NULL    4   7
3   2013-01-16 09:00:00.000 NULL    2   NULL    7   NULL    NULL    NULL    6   NULL

What I would like to see is the following:

3   2013-01-05 09:00:00.000 2013-01-06 13:00:00.000 1   3   NULL    1   3   NULL    1   2
3   2013-01-07 09:00:00.000 2013-01-07 15:00:00.000 1   3   NULL    1   3   NULL    3   5
3   2013-01-15 09:00:00.000 2013-01-15 12:00:00.000 1   3   NULL    1   3   NULL    4   7
3   2013-01-16 09:00:00.000 NULL    2   NULL    7   NULL    NULL    NULL    6   NULL

How do I reduce the duplicate matches for schedules? Thanks.

Update: As Paul suggested below, I could try and make a correlation column to match items on the fly. What are the perils of doing something like the following:

SELECT    CASE WHEN sch_A.itemID IS NULL THEN Sch_D.itemID
                 ELSE Sch_A.itemID
            END AS ID ,
            Sch_A.moveTime AS Arrival_Time ,
            Sch_D.moveTime AS Departure_Time ,
            Sch_A.LocID ,
            Sch_A.ShelfID ,
            Sch_A.AltShelfID ,
            Sch_D.LocID ,
            Sch_D.ShelfID ,
            Sch_D.AltShelfID ,
            Sch_A.MoveID AS ArrivalMoveID ,
            Sch_D.MoveID AS DepartureMoveID
  FROM      Movements Sch_A
            FULL JOIN Movements Sch_D 
                ON Sch_A.itemID = Sch_D.itemID
                    AND Sch_D.MoveID = (select top 1 MoveID from movements i 
                            where i.EventType = 'D' and Sch_A.movetime < i.movetime
                             AND Sch_A.itemID = i.itemID
                                            AND Sch_A.LocID = i.LocID
                                            AND ISNULL(Sch_A.ShelfID,
                                                      0) = ISNULL(i.ShelfID,
                                                      0)
                                            AND ISNULL(Sch_A.AltShelfID,
                                                      0) = ISNULL(i.AltShelfID,
                                                      0)
                                            AND Sch_A.MoveID != i.MoveID        
                                order by i.moveTime ASC)

  WHERE     ( Sch_A.EventType != 'D'
              OR Sch_A.EventType IS NULL
            )
            AND ( Sch_D.EventType != 'A'
                  OR Sch_D.EventType IS NULL
                )

Best Answer

A bit ugly but I could only do this with precalculation myself.

update movements 
set linkTime = (select top 1 movetime from movements i 
        where i.EventType = 'D' and movements.movetime < i.movetime
        and movements.locid = i.locid
        order by i.moveTime asc)
where movements.EventType = 'A'

select a.*, d.*
from (select * from movements
        where EventType = 'A') as a
left outer join (select * from movements
        where EventType = 'D') as d ON a.locid = d.locid AND a.linktime = d.movetime

Essentially I precalc the link you talk about in the UPDATE statement then use it to link as desired in the SELECT. I missed out some of the columns that had no relevance in the sample data (itemID etc.).

Would probably make more sense to insert into a #temporary table rather than permenantly add a new column to movements but that will depend on local knowledge ..

NB: linkTime is a new column of type datetime added to the table movements