Sql-server – Sort by Start Date Desc if End Date is null, if not Sort by End Date

sortingsql server

I have a SQL Server (RDBMS) table with a list of Activities. Each activity has a Start and End Date.

The requirement is to sort them is order such that the ones with a NULL end date display first (which suggest "current" – start date descending), but the ones with values for both start and end to be sorted at the end of the "current" rows and sorted by end_date descending.

Lastly, all values with null for start and end dates must be at the end of the result set.

Values with an end date but a null start date should be sorted in the second section of the sort among the END date desc piece.

Best Answer

You can use a CASE expression in the ORDER BY:

ORDER BY CASE WHEN EndDate IS NULL 0 ELSE 1 END ASC
    , EndDate DESC
    , StartDate DESC

Sort criteria in order (link):

Values with a NULL EndDate will get a 0 and others will get a 1. This is used as the first sort criteria.
Then the second criteria is EndDate.
Finally ties will be sorted by StartDate.

You can play with ASC and DESC and column order if the global order is not what you expect.

Sample data and query:

DECLARE @data TABLE(id int identity(0, 1), StartDate date, EndDate date)
INSERT INTO @data(StartDate, EndDate) VALUES
    ('20160101', '20160105')
    , ('20160101', null)
    , ('20160101', '20160107')
    , ('20160102', '20160103')
    , ('20160102', '20160105')
    , ('20160102', null)
    --, (null, '20160105')
    --, (null, null);

SELECT * 
FROM @data
ORDER BY CASE WHEN EndDate IS NULL THEN 0 ELSE 1 END ASC
    , EndDate DESC
    , StartDate DESC

Output:

id  StartDate   EndDate
5   2016-01-02  NULL
1   2016-01-01  NULL
2   2016-01-01  2016-01-07
4   2016-01-02  2016-01-05
0   2016-01-01  2016-01-05
3   2016-01-02  2016-01-03

This ORDER BY clause (link) will put double NULL (Start + End) at the end:

ORDER BY CASE 
          WHEN EndDate IS NULL AND StartDate IS NULL THEN 2 
          WHEN EndDate IS NULL THEN 0 
          ELSE 1 END ASC
    , EndDate DESC
    , StartDate DESC;

id | StartDate            | EndDate
5  | 2016-01-02 00:00:00  | NULL
1  | 2016-01-01 00:00:00  | NULL
2  | 2016-01-01 00:00:00  | 2016-01-07 00:00:00
4  | 2016-01-02 00:00:00  | 2016-01-05 00:00:00
0  | 2016-01-01 00:00:00  | 2016-01-05 00:00:00
6  | NULL                 | 2016-01-05 00:00:00
3  | 2016-01-02 00:00:00  | 2016-01-03 00:00:00
7  | NULL                 | NULL

Related Solutions

Sql-server – Sort order specified in primary key, yet sorting is executed on SELECT

For a non partitioned table I get the following plan

Plan 1

There is a single seek predicate on Seek Keys[1]: Prefix: DeviceId, SensorId = (3819, 53), Start: Date < 1339225010.

Meaning that SQL Server can perform an equality seek on the first two columns and then begin a range seek starting at 1339225010 and ordered FORWARD (as the index is defined with [Date] DESC)

The TOP operator will stop requesting more rows from the seek after the first row is emitted.

When I create the partition scheme and function

CREATE PARTITION FUNCTION PF (int)
AS RANGE LEFT FOR VALUES (1000, 1339225009 ,1339225010 , 1339225011);
GO
CREATE PARTITION SCHEME [MyPartitioningScheme]
AS PARTITION PF
ALL TO ([PRIMARY] );

And populate the table with the following data

INSERT INTO [dbo].[SensorValues]    
/*500 rows matching date and SensorId, DeviceId predicate*/
SELECT TOP (500) 3819,53,1, ROW_NUMBER() OVER (ORDER BY (SELECT 0))           
FROM master..spt_values
UNION ALL
/*700 rows matching date but not SensorId, DeviceId predicate*/
SELECT TOP (700) 3819,52,1, ROW_NUMBER() OVER (ORDER BY (SELECT 0))           
FROM master..spt_values
UNION ALL 
/*1100 rows matching SensorId, DeviceId predicate but not date */
SELECT TOP (1100) 3819,53,1, ROW_NUMBER() OVER (ORDER BY (SELECT 0)) + 1339225011      
FROM master..spt_values

The plan on SQL Server 2008 looks as follows.

Plan 2

The actual number of rows emitted from the seek is 500. The plan shows seek predicates

Seek Keys[1]: Start: PtnId1000 <= 2, End: PtnId1000 >= 1, 
Seek Keys[2]: Prefix: DeviceId, SensorId = (3819, 53), Start: Date < 1339225010

Indicating it is using the skip scan approach described here

the query optimizer is extended so that a seek or scan operation with one condition can be done on PartitionID (as the logical leading column) and possibly other index key columns, and then a second-level seek, with a different condition, can be done on one or more additional columns, for each distinct value that meets the qualification for the first-level seek operation.

This plan is a serial plan and so for the specific query you have it seems that if SQL Server ensured that it processed the partitions in descending order of date that the original plan with the TOP would still work and it could stop processing after the first matching row was found rather than continuing on and outputting the remaining 499 matches.

In fact the plan on 2005 looks like it does take that approach

Plan on 2005

I'm not sure if it is straight forward to get the same plan on 2008 or maybe it would need an OUTER APPLY on sys.partition_range_values to simulate it.

Sql-server – Updating table based on conditions (overlapping dates)

Condition 1:

WITH ord as (
    SELECT ID, CustomerID, CheckInDate, CheckOutDate
        , n = ROW_NUMBER() over(partition by [CustomerID] order by [CheckInDate], [CheckOutDate])
    FROM @data d1
), first as (
    SELECT o1.ID, o1.CustomerID, o1.CheckInDate, o1.CheckOutDate, o1.n
        , m = ROW_NUMBER() over(partition by o1.[CustomerID] order by o1.[CheckInDate], o1.[CheckOutDate])
    FROM ord o1
    INNER JOIN ord o2 ON o1.CustomerID = o2.CustomerID AND o2.n+1 = o1.n AND o1.CheckInDate > o2.CheckOutDate
), groups as (
    SELECT o.ID, o.CustomerID, nx = MIN(coalesce(f.n, 1)), n = MAX(o.n)
        , p = ROW_NUMBER() over(partition by o.CustomerID, MIN(coalesce(f.n, 1)) ORDER BY o.ID)
    FROM ord o
    LEFT JOIN first f ON o.CustomerID = f.CustomerID AND o.n < f.n
    GROUP BY o.ID, o.CustomerID
), dates as (
    SELECT g.CustomerID, g.nx, CheckInDate = MIN(o.CheckInDate)
        , CheckOutDate = CASE WHEN SUM(CASE WHEN o.CheckOutDate IS NULL THEN 1 END) IS NULL THEN MAX(o.CheckOutDate) END
    FROM groups g
    INNER JOIN ord o ON g.ID = o.ID
    GROUP BY g.nx, g.CustomerID
    HAVING COUNT(g.nx) > 1
)
SELECT o.ID, o.CustomerID
    , CheckInDate = CASE WHEN g.p = 1 THEN d.CheckInDate END
    , CheckOutDate = CASE WHEN g.p = 1 THEN d.CheckOutDate END
FROM groups g
INNER JOIN ord o ON g.ID = o.ID
INNER JOIN dates d on g.CustomerID = d.CustomerID AND g.nx = d.nx
ORDER BY ID

This query output the rows that must be updated:

ord = I first partition by CustomerID and order by CheckInDate
first = I join consecutive rows from previous CTE where the next CheckInDate does not overlap previous CheckInDate and CheckDate and I partition and order them
groups = I group by the previous partitioned number in order to know to which group a row belongs
dates = I join with the original data in order to get the first and last date for each groups. Groups with only 1 row are removed
main select output the dates for p=1 or NULL otherwise

Output:

ID  CustomerID  CheckInDate CheckOutDate
1   1           2015-03-04  NULL
3   1           NULL        NULL
4   1           NULL        NULL

Condition 2:

WITH ord as (
    SELECT ID, CustomerID, CheckInDate, CheckOutDate
        , n = ROW_NUMBER() over(partition by [CustomerID] order by [CheckInDate], [CheckOutDate])
    FROM @data d1
), first as (
    SELECT o1.ID, o1.CustomerID, o1.CheckInDate, o1.CheckOutDate, o1.n
        , m = ROW_NUMBER() over(partition by o1.[CustomerID] order by o1.[CheckInDate], o1.[CheckOutDate])
    FROM ord o1
    INNER JOIN ord o2 ON o1.CustomerID = o2.CustomerID AND o2.n+1 = o1.n AND o1.CheckInDate > o2.CheckOutDate
), groups as (
    SELECT o.ID, o.CustomerID, nx = MIN(coalesce(f.n, 1)), n = MAX(o.n)
        , p = ROW_NUMBER() over(partition by o.CustomerID, MIN(coalesce(f.n, 1)) ORDER BY o.ID)
        , last = ROW_NUMBER() over(partition by o.CustomerID, MIN(coalesce(f.n, 1)) ORDER BY o.ID DESC)
    FROM ord o
    LEFT JOIN first f ON o.CustomerID = f.CustomerID AND o.n < f.n
    GROUP BY o.ID, o.CustomerID
), dates as (
    SELECT g.CustomerID, g.nx, CheckInDate = MIN(o.CheckInDate)
        , CheckOutDate = MAX(o2.CheckOutDate)
    FROM groups g
    INNER JOIN ord o ON g.ID = o.ID
    INNER JOIN (SELECT ID, CustomerID, nx FROM groups WHERE last = 1) l ON g.CustomerID = l.CustomerID AND g.nx = l.nx
    INNER JOIN ord o2 ON l.ID = o2.ID
    GROUP BY g.nx, g.CustomerID
    HAVING COUNT(g.nx) > 1
)
SELECT o.ID, o.CustomerID
    , CheckInDate = CASE WHEN g.p = 1 THEN d.CheckInDate END
    , CheckOutDate = CASE WHEN g.p = 1 THEN d.CheckOutDate END
FROM groups g
INNER JOIN ord o ON g.ID = o.ID
INNER JOIN dates d on g.CustomerID = d.CustomerID AND g.nx = d.nx
ORDER BY ID

Output:

ID  CustomerID  CheckInDate CheckOutDate
1   1           2015-03-04  2015-05-03
3   1           NULL        NULL
4   1           NULL        NULL

For Updates, replace SELECT by UPDATE:

UPDATE g SET 
    CheckInDate = CASE WHEN g.p = 1 THEN d.CheckInDate END
    , CheckOutDate = CASE WHEN g.p = 1 THEN d.CheckOutDate END
FROM ...

Your data:

declare @data table([ID] int, [CustomerID] int, [CheckInDate] date, [CheckOutDate] date);
Insert into @data([ID], [CustomerID], [CheckInDate], [CheckOutDate])
VALUES
    (1, 1, '2015-04-02', '2015-04-05'),
    (2, 2, '2015-03-04', '2015-05-02'),
    (3, 1, '2015-04-01', NULL),
    (4, 1, '2015-03-04', '2015-05-03'),
    (5, 1, '2015-01-03', '2015-02-03')
;

It works as well with this sample:

(1, 1, '2015-04-02', '2015-04-05'),
(2, 2, '2015-03-04', '2015-05-02'),
(3, 1, '2015-04-01', NULL),
(4, 1, '2015-03-04', '2015-05-03'),
(5, 1, '2015-01-03', '2015-02-03'),
(6, 1, '2015-01-02', '2015-02-03'),
(7, 1, '2015-03-04', '2015-03-06'),
(8, 1, '2015-03-04', '2015-05-06'),
(9, 1, '2014-04-02', '2014-04-05'),
(10, 1, '2014-03-04', '2014-05-02')

If it does not work with some of your data, update Input and Output table with more relevant values.

Best Answer

Related Solutions

Sql-server – Sort order specified in primary key, yet sorting is executed on SELECT

Sql-server – Updating table based on conditions (overlapping dates)

Related Question