Sql-server – sql server 2014 – Find gaps in sequence on 2 different fields group by third field

gaps-and-islandssql server

I need to detect the gaps between the sequences for each different seller considering that sequence is in 2 different rows considering only supplied range (avoiding numbers below the first record and numbers above the last record):

I have a table with orders where relevant fields are:

    SellerID | OrderID |   OID   |
    seller1  | 123456  |         |
    seller1  | 123457  |         |
    seller1  | 123458  | 123460  |
    seller1  | 123459  | 123460  |
    seller2  | 234567  |         | (first record)
    seller1  | 123455  |         | 
    seller2  | 234568  | 234570  |
    seller2  | 234569  | 234570  | 
    seller1  | 123463  |         |
    seller1  | 123466  |         |
    seller1  | 123453  |         | (first record)
    seller2  | 234572  |         | (last record)
    seller1  | 123470  |         | (last record)

I expect a result like this:

    seller 1 | 123454
    seller 1 | 123461
    seller 1 | 123462
    seller 1 | 123464
    seller 1 | 123465
    seller 1 | 123467
    seller 1 | 123468
    seller 1 | 123469
    seller 2 | 234571

it is quite easy also for me to find gaps in a single row and for single seller:
I used this query that I repeat for every seller:

Declare @SellerID nvarchar(50)='seller1'
SELECT s1.OrderID
FROM Orders s1
LEFT JOIN Orders s2
ON s1.OrderID = s2.OrderID -1
WHERE s2.OrderID IS NULL
and SellerID=@SellerID
order by s1.OrderID asc

but I do not know how to modify in way to check for the sequence in the 2 columns

moreover, is there a way to achieve this in a single query instead of repeating for each seller?

Best Answer

You can use CROSS APPLY to "unpivot" multiple columns. This query will give you a list of orders (unpivoted). I've added a third column, which is the LEADing (i.e. next) OrderID, if there is any:

SELECT o.SellerID, x.OrderID,
       LEAD(x.OrderID, 1) OVER (
           PARTITION BY o.SellerID
           ORDER BY x.OrderID) AS _nextOrderID
FROM Orders AS o
CROSS APPLY (
    SELECT o.OrderID
    UNION ALL
    SELECT o.OID AS OrderID WHERE o.OID IS NOT NULL
    ) AS x(OrderID);

This gives you:

SellerID   OrderID     _nextOrderID
---------- ----------- ------------
seller1    123453      123455
seller1    123455      123456
seller1    123456      123457
seller1    123457      123458
seller1    123458      123459
seller1    123459      123460
seller1    123460      123460
seller1    123460      123463
seller1    123463      123466
seller1    123466      123470
seller1    123470      NULL
seller2    234567      234568
seller2    234568      234569
seller2    234569      234570
seller2    234570      234570
seller2    234570      234572
seller2    234572      NULL

Now, if we subtract the OrderID from the _nextOrderID, we get the gap (how many orders are "missing"):

SELECT o.SellerID, x.OrderID,
       LEAD(x.OrderID, 1) OVER (
           PARTITION BY o.SellerID
           ORDER BY x.OrderID)-x.OrderID AS _gap
FROM Orders AS o
CROSS APPLY (
    SELECT o.OrderID
    UNION ALL
    SELECT o.OID AS OrderID WHERE o.OID IS NOT NULL
    ) AS x(OrderID);

.. which looks something like this:

SellerID   OrderID     _gap
---------- ----------- -----------
seller1    123453      2
seller1    123455      1
seller1    123456      1
seller1    123457      1
seller1    123458      1
seller1    123459      1
seller1    123460      0    <- 0 because you have two OrderID=123460
seller1    123460      3
seller1    123463      3
seller1    123466      4
seller1    123470      NULL
seller2    234567      1
seller2    234568      1
seller2    234569      1
seller2    234570      0
seller2    234570      2
seller2    234572      NULL

So where _gap<=1, we don't want to return any rows. Where _gap=4, we want to insert 3 rows, from OrderID+1 to OrderID+3. There are a few ways to do this, but I'm going to stick with CROSS APPLY here as well.

I'm putting the query above in a subquery (called sub in my example), and for each row in that result, I'm going to CROSS APPLY any dummy table. If you expect large gaps, you may want to create a separate table with a single IDENTITY column for this purpose, but I'm just going to use the Orders table along with a ROW_NUMBER():

SELECT sub.SellerID, sub.OrderID+n.rownum AS OrderID
FROM (
    SELECT o.SellerID, x.OrderID,
           LEAD(x.OrderID, 1) OVER (
               PARTITION BY o.SellerID
               ORDER BY x.OrderID)-x.OrderID AS _gap
    FROM Orders AS o
    CROSS APPLY (
        SELECT o.OrderID
        UNION ALL
        SELECT o.OID AS OrderID WHERE o.OID IS NOT NULL
        ) AS x(OrderID)
    ) AS sub
CROSS APPLY (
    --- For each row in sub, where _gap>1, return
    --- (_gap) number of rows, starting with 1, 2, 3, ..., (_gap-1).
    SELECT TOP (sub._gap-1)
           ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS rownum
    FROM Orders    -- or any dummy table
    ) AS n
--- Only where there's actually a gap:
WHERE sub._gap>1;

Finally, at the top, we add n.rownum to sub.OrderID to get the missing OrderID.

Here's the final output:

SellerID   OrderID
---------- --------------------
seller1    123454
seller1    123461
seller1    123462
seller1    123464
seller1    123465
seller1    123467
seller1    123468
seller1    123469
seller2    234571

Related Solutions

Find and remove gaps in sequence across two different columns

You can store pre-calculated gaps, and use constraints to make sure that your pre-calcualted data is always up-to-date:

Here is the table and the first interval

CREATE TABLE dbo.IntegerSettings(SettingID INT NOT NULL,

  IntValue INT NOT NULL,

  StartedAt DATETIME NOT NULL,

  FinishedAt DATETIME NOT NULL,

  PreviousFinishedAt DATETIME NULL,

  CONSTRAINT PK_IntegerSettings_SettingID_FinishedAt PRIMARY KEY(SettingID, FinishedAt),

  CONSTRAINT UNQ_IntegerSettings_SettingID_PreviousFinishedAt UNIQUE(SettingID, PreviousFinishedAt),

  CONSTRAINT FK_IntegerSettings_SettingID_PreviousFinishedAt

    FOREIGN KEY(SettingID, PreviousFinishedAt)

    REFERENCES dbo.IntegerSettings(SettingID, FinishedAt),

  CONSTRAINT CHK_IntegerSettings_PreviousFinishedAt_NotAfter_StartedAt CHECK(PreviousFinishedAt <= StartedAt),

  CONSTRAINT CHK_IntegerSettings_StartedAt_Before_FinishedAt CHECK(StartedAt < FinishedAt)

);

GO

INSERT INTO dbo.IntegerSettings(SettingID, IntValue, StartedAt, FinishedAt, PreviousFinishedAt)

  VALUES(1, 1, '20070101', '20070103', NULL);

It has five constraints which work together to implement the business rule. Let me demonstrate how the more complex ones work. Of course, some constraints are simple and as such do not need any explanations.

There can be only one first interval for a setting

The constraint UNQ_IntegerSettings_SettingID_PreviousFinishedAt ensures exactly that. The first interval does not have a previous one, which means that PreviousFinishedAt IS NULL. The UNIQUE constraint guarantees that there can be only one such row per setting. See for yourself:

INSERT INTO dbo.IntegerSettings(SettingID, IntValue, StartedAt, FinishedAt, PreviousFinishedAt)

  VALUES(1, 1, '20070104', '20070105', NULL);

/*

Server: Msg 2627, Level 14, State 2, Line 1

Violation of UNIQUE KEY constraint 'UNQ_IntegerSettings_SettingID_PreviousFinishedAt'. Cannot insert duplicate key in object 'dbo.IntegerSettings'.

The statement has been terminated.

*/

Next window must begin after the end of the previous one.

The constraint CHK_IntegerSettings_PreviousFinishedAt_NotAfter_StartedAt guarantees exactly that. See for yourself:

INSERT INTO dbo.IntegerSettings(SettingID, IntValue, StartedAt, FinishedAt, PreviousFinishedAt)

  VALUES(1, 2, '20070104', '20070109', '20070105')

/*

Server: Msg 547, Level 16, State 1, Line 1

INSERT statement conflicted with TABLE CHECK constraint 'CHK_IntegerSettings_PreviousFinishedAt_NotAfter_StartedAt'. The conflict occurred in database 'RiskCenter', table 'IntegerSettings'.

The statement has been terminated.

*/

Two different windows cannot refer to one and the same window as their previous one.

Again, the same constraint UNQ_IntegerSettings_SettingID_PreviousFinishedAt guarantees precisely that, as demonstrated below:

INSERT INTO dbo.IntegerSettings(SettingID, IntValue, StartedAt, FinishedAt, PreviousFinishedAt)

  VALUES(1, 3, '20070104', '20070115', '20070103')



Msg 2627, Level 14, State 1, Line 1

Violation of UNIQUE KEY constraint 'UNQ_IntegerSettings_SettingID_PreviousFinishedAt'. Cannot insert duplicate key in object 'dbo.IntegerSettings'.

The statement has been terminated.

This means that there can be no overlaps.

As you have seen, for every time window, there can be at most one preceding it, and at most one following it. The following interval cannot begin before the end of its previous one. Together these two statements mean that there can be no overlaps.

Working with gaps.

You can prohibit gaps altogether, just replace the following constraint:

  CONSTRAINT CHK_IntegerSettings_PreviousFinishedAt_NotAfter_StartedAt CHECK(PreviousFinishedAt <= StartedAt),

With a stricter one, as follows:

 CONSTRAINT CHK_IntegerSettings_PreviousFinishedAt_EqualTo_StartedAt CHECK(PreviousFinishedAt = StartedAt),

But if you allow gaps, the query to retrieve them is very simple and performant, as follows:

SELECT PreviousFinishedAt AS GapStart, StartedAt AS GapEnd
  FROM dbo.IntegerSettings
  WHERE StartedAt > PreviousFinishedAt;

Sql-server – Prevent gaps while using Identity/sequence object in sql server

In order to create auto numbers without gaps we can use QUEUE object of MS SQL Server.

So the idea is that Queue supports concurrency and roll back of transactions. We use SEQUENCE object to generate continuous set of numbers and send them to QUEUE. After that we can receive them back from that Queue concurrently and if one of client's requests fails then our auto number stays in queue for the next available client.

So this way we achieve concurrency and unique stream of auto-numbers without gaps.

The only potential issue is that sometimes our auto numbers are going to be in the wrong time order, but it should be ok.

Best Answer

Related Solutions

Find and remove gaps in sequence across two different columns

Sql-server – Prevent gaps while using Identity/sequence object in sql server

Related Question