I would use a Calendar
table. This table simply has a list of dates for several decades.
CREATE TABLE [dbo].[Calendar](
[dt] [date] NOT NULL,
CONSTRAINT [PK_Calendar] PRIMARY KEY CLUSTERED
(
[dt] ASC
))
In my system it has few extra columns, such as [IsLastDayOfMonth]
, [IsLastDayOfQuarter]
, which are useful in some reports, but in your case you need just the date column. There are many ways to populate such table.
For example, 100K rows (~270 years) from 1900-01-01:
INSERT INTO dbo.Calendar (dt)
SELECT TOP (100000)
DATEADD(day, ROW_NUMBER() OVER (ORDER BY s1.[object_id])-1, '19000101') AS dt
FROM sys.all_objects AS s1 CROSS JOIN sys.all_objects AS s2
OPTION (MAXDOP 1);
Sample data
DECLARE @Stocks TABLE (
StockId int
, ProductId int
, AvailableFromDate date);
INSERT INTO @Stocks(StockId, ProductId, AvailableFromDate) VALUES
(1, 1, '2016-01-01'),
(2, 1, '2016-01-01');
DECLARE @RentStockOrders TABLE (
RentStockOrderId int
, StockId int
, BeginRentDate date
, EndRentDate date);
INSERT INTO @RentStockOrders (RentStockOrderId, StockId, BeginRentDate, EndRentDate) VALUES
(1, 1, '2016-01-15', '2016-02-14'),
(2, 2, '2016-01-30', '2016-02-20'),
(3, 2, '2016-02-26', '2016-03-07'),
(4, 1, '2016-02-29', '2016-03-14');
Parameters
DECLARE @ParamProductID int = 1;
DECLARE @ParamDays int = 14;
DECLARE @ParamStartDate date = '2015-12-01';
DECLARE @ParamEndDate date = '2017-01-01';
-- these dates define some reasonable limit
First variant
It turns out that window functions accept only literal constant as a size of a window, not a variable. Alas. Still, I'll show this query, because it illustrates the approach and shows how simple it would be if SQL Server supported variables as the size of the window. It also gives the correct answer that we can use to verify the second variant.
WITH
CTE_AllDays
-- list all days and stock IDs between @ParamStartDate and @ParamEndDate
-- for each day and stock indicate whether it is available based on AvailableFromDate
AS
(
SELECT
S.StockId
,dbo.Calendar.dt
,CASE WHEN dbo.Calendar.dt >= S.AvailableFromDate
THEN 1 ELSE 0 END AS AvailableStockDay
-- 1 - available
-- 0 - not available
FROM
@Stocks AS S
INNER JOIN dbo.Calendar ON
dbo.Calendar.dt >= @ParamStartDate
AND dbo.Calendar.dt <= @ParamEndDate
WHERE
S.ProductId = @ParamProductID
)
,CTE_BookedDays
-- list all booked (unavailable) days for each stock ID
AS
(
SELECT
S.StockId
,CA.dt
,0 AS AvailableStockDay
-- 0 - not available
FROM
@RentStockOrders AS R
INNER JOIN @Stocks AS S ON S.StockId = R.StockId
CROSS APPLY
(
SELECT dbo.Calendar.dt
FROM dbo.Calendar
WHERE
dbo.Calendar.dt >= @ParamStartDate
AND dbo.Calendar.dt <= @ParamEndDate
AND dbo.Calendar.dt >= R.BeginRentDate
AND dbo.Calendar.dt <= R.EndRentDate
) AS CA
WHERE
S.ProductId = @ParamProductID
)
,CTE_Daily
-- combine individual availability flags
-- first: multiply to get the final availability for a stock ID and day
-- second: group further by day and SUM flags
AS
(
SELECT
CTE_AllDays.dt
,CASE WHEN
SUM(
CTE_AllDays.AvailableStockDay * ISNULL(CTE_BookedDays.AvailableStockDay, 1)
) = 0
THEN 0 ELSE 1 END AS AvailableDay
-- day is available, if any stock is available
-- SUM=0 - not available
-- SUM>0 - available
FROM
CTE_AllDays
LEFT JOIN CTE_BookedDays ON
CTE_BookedDays.StockId = CTE_AllDays.StockId AND
CTE_BookedDays.dt = CTE_AllDays.dt
GROUP BY
CTE_AllDays.dt
)
,CTE_Sum
-- rolling sum of flags with 14 days window
AS
(
SELECT
dt
,SUM(CTE_Daily.AvailableDay) OVER (ORDER BY CTE_Daily.dt
ROWS BETWEEN CURRENT ROW AND 13 FOLLOWING) AS AvailableConsecutive
-- we can't put @ParamDays here instead of constant
FROM CTE_Daily
)
-- If a rolling sum = 14,
-- it means that all 14 consecutive days are available
-- If a rolling sum <> 14,
-- it means that at least one of the 14 consecutive days is not available
SELECT dt
FROM CTE_Sum
WHERE AvailableConsecutive <> 14
-- we can put @ParamDays here instead of 14, but not above
ORDER BY dt;
Result
Note, I started the range of dates from 2015-12-01
, before the general availability.
+------------+
| dt |
+------------+
| 2015-12-01 |
| 2015-12-02 |
| 2015-12-03 |
| 2015-12-04 |
| 2015-12-05 |
| 2015-12-06 |
| 2015-12-07 |
| 2015-12-08 |
| 2015-12-09 |
| 2015-12-10 |
| 2015-12-11 |
| 2015-12-12 |
| 2015-12-13 |
| 2015-12-14 |
| 2015-12-15 |
| 2015-12-16 |
| 2015-12-17 |
| 2015-12-18 |
| 2015-12-19 |
| 2015-12-20 |
| 2015-12-21 |
| 2015-12-22 |
| 2015-12-23 |
| 2015-12-24 |
| 2015-12-25 |
| 2015-12-26 |
| 2015-12-27 |
| 2015-12-28 |
| 2015-12-29 |
| 2015-12-30 |
| 2015-12-31 |
| 2016-01-17 |
| 2016-01-18 |
| 2016-01-19 |
| 2016-01-20 |
| 2016-01-21 |
| 2016-01-22 |
| 2016-01-23 |
| 2016-01-24 |
| 2016-01-25 |
| 2016-01-26 |
| 2016-01-27 |
| 2016-01-28 |
| 2016-01-29 |
| 2016-01-30 |
| 2016-01-31 |
| 2016-02-01 |
| 2016-02-02 |
| 2016-02-03 |
| 2016-02-04 |
| 2016-02-05 |
| 2016-02-06 |
| 2016-02-07 |
| 2016-02-08 |
| 2016-02-09 |
| 2016-02-10 |
| 2016-02-11 |
| 2016-02-12 |
| 2016-02-13 |
| 2016-02-14 |
| 2016-02-16 |
| 2016-02-17 |
| 2016-02-18 |
| 2016-02-19 |
| 2016-02-20 |
| 2016-02-21 |
| 2016-02-22 |
| 2016-02-23 |
| 2016-02-24 |
| 2016-02-25 |
| 2016-02-26 |
| 2016-02-27 |
| 2016-02-28 |
| 2016-02-29 |
| 2016-03-01 |
| 2016-03-02 |
| 2016-03-03 |
| 2016-03-04 |
| 2016-03-05 |
| 2016-03-06 |
| 2016-03-07 |
| 2016-12-20 |
| 2016-12-21 |
| 2016-12-22 |
| 2016-12-23 |
| 2016-12-24 |
| 2016-12-25 |
| 2016-12-26 |
| 2016-12-27 |
| 2016-12-28 |
| 2016-12-29 |
| 2016-12-30 |
| 2016-12-31 |
| 2017-01-01 |
+------------+
Second variant
The first part of the query until CTE_Daily
is the same. Then, I'll use a gaps-and-islands approach to find the islands of available dates and calculate their sizes.
WITH
CTE_AllDays
-- list all days and stock IDs between @ParamStartDate and @ParamEndDate
-- for each day and stock indicate whether it is available based on AvailableFromDate
AS
(
SELECT
S.StockId
,dbo.Calendar.dt
,CASE WHEN dbo.Calendar.dt >= S.AvailableFromDate
THEN 1 ELSE 0 END AS AvailableStockDay
-- 1 - available
-- 0 - not available
FROM
@Stocks AS S
INNER JOIN dbo.Calendar ON
dbo.Calendar.dt >= @ParamStartDate
AND dbo.Calendar.dt <= @ParamEndDate
WHERE
S.ProductId = @ParamProductID
)
,CTE_BookedDays
-- list all booked (unavailable) days for each stock ID
AS
(
SELECT
S.StockId
,CA.dt
,0 AS AvailableStockDay
-- 0 - not available
FROM
@RentStockOrders AS R
INNER JOIN @Stocks AS S ON S.StockId = R.StockId
CROSS APPLY
(
SELECT dbo.Calendar.dt
FROM dbo.Calendar
WHERE
dbo.Calendar.dt >= @ParamStartDate
AND dbo.Calendar.dt <= @ParamEndDate
AND dbo.Calendar.dt >= R.BeginRentDate
AND dbo.Calendar.dt <= R.EndRentDate
) AS CA
WHERE
S.ProductId = @ParamProductID
)
,CTE_Daily
-- combine individual availability flags
-- first: multiply to get the final availability for a stock ID and day
-- second: group further by day and SUM flags
AS
(
SELECT
CTE_AllDays.dt
,CASE WHEN
SUM(
CTE_AllDays.AvailableStockDay * ISNULL(CTE_BookedDays.AvailableStockDay, 1)
) = 0
THEN 0 ELSE 1 END AS AvailableDay
-- day is available, if any stock is available
-- SUM=0 - not available
-- SUM>0 - available
FROM
CTE_AllDays
LEFT JOIN CTE_BookedDays ON
CTE_BookedDays.StockId = CTE_AllDays.StockId AND
CTE_BookedDays.dt = CTE_AllDays.dt
GROUP BY
CTE_AllDays.dt
)
,CTE_RowNumbers
-- calculate two sets of row numbers to isolate consecutive rows with 0s and 1s
-- (gaps and islands)
AS
(
SELECT
dt
,AvailableDay
,ROW_NUMBER() OVER (ORDER BY dt) AS rn1
,ROW_NUMBER() OVER (PARTITION BY AvailableDay ORDER BY dt) AS rn2
FROM CTE_Daily
)
,CTE_Groups
-- each gaps and island will have the same GroupNumber
-- count the size of the Group
-- number the rows within each Group to find if @ParamDays rows fit into the Group
AS
(
SELECT
dt
,AvailableDay
,GroupNumber
,COUNT(*) OVER (PARTITION BY GroupNumber) AS GroupSize
,ROW_NUMBER() OVER (PARTITION BY GroupNumber ORDER BY dt) AS GroupRN
FROM
CTE_RowNumbers
CROSS APPLY (SELECT rn1 - rn2 AS GroupNumber) AS CA
)
SELECT
dt
--,CASE WHEN AvailableDay = 1 AND GroupSize - GroupRN + 1 >= @ParamDays
--THEN 1 ELSE 0 END AS AvailableConsecutive
FROM CTE_Groups
WHERE
CASE WHEN AvailableDay = 1 AND GroupSize - GroupRN + 1 >= @ParamDays
THEN 1 ELSE 0 END = 0
-- AvailableConsecutive = 0 to list all unavailable days
ORDER BY dt;
The result is the same as in the first variant, but this variant uses parameter @ParamDays
.
To understand how it works, run the query starting with the first CTE, examine the results, then add the next CTE, examine the results and so on.
Any idea why the IF EXISTS
would make it run so much longer and do so many more reads? I also changed the select statement to do SELECT TOP 1 [dlc].[id]
and I killed it after 2 minutes.
As I explained in my answer to this related question:
How (and why) does TOP impact an execution plan?
Using EXISTS
introduces a row goal, where the optimizer produces an execution plan aimed at locating the first row quickly. In doing this, it assumes that the data is uniformly distributed. For example, if statistics show there are 100 expected matches in 100,000 rows, it will assume it will have to read only 1,000 rows to find the first match.
This will result in longer than expected execution times if this assumption turns out to be faulty. For example, if SQL Server chooses an access method (e.g. unordered scan) that happens to locate the first matching value very late on in the search, it could result in an almost complete scan. On the other hand, if a matching row happens to be found amongst the first few rows, performance will be very good. This is the fundamental risk with row goals - inconsistent performance.
As a temporary fix I have changed it to do a count( * ) and assign that value to a variable
It is usually possible to reformulate the query such that a row goal is not assigned. Without the row goal, the query can still terminate when the first matching row is encountered (if written correctly), but the execution plan strategy is likely to be different (and hopefully, more effective). Obviously, count(*) will require reading all rows, so it is not a perfect alternative.
If you are running SQL Server 2008 R2 or later, you can also generally use documented and supported trace flag 4138 to get an execution plan without a row goal. This flag can also be specified using the supported hint OPTION (QUERYTRACEON 4138)
, though be aware it requires runtime sysadmin permission, unless used with a plan guide.
Unfortunately
None of the above is functional with an IF EXISTS
conditional statement. It only applies to regular DML. It will work with the alternate SELECT TOP (1)
formulation you tried. That may well be better than using COUNT(*)
, which has to count all qualified rows, as previously mentioned.
That said, there are any number of ways to express this requirement that will allow you to avoid or control the row goal, while terminating the search early. One last example:
DECLARE @Exists bit;
SELECT @Exists =
CASE
WHEN EXISTS
(
SELECT [dlc].[ID]
FROM TableDLC [dlc]
JOIN TableD [d]
ON [d].[ID] = [dlc].[ID]
JOIN TableC [c]
ON [c].[ID] = [d].[ID2]
WHERE [c].[Name] <> [dlc].[Name]
)
THEN CONVERT(bit, 1)
ELSE CONVERT(bit, 0)
END
OPTION (QUERYTRACEON 4138);
IF @Exists = 1
BEGIN
...
END;
Best Answer
Unfortunately I'm afraid by trying to prevent repeating code you might end up in writing more code and making the query less readable. But I believe this can be advantageous for you - take into account situations where in the test you would only have to search in a one or two tables, where in order to retrieve all the details you will have to join to several more. In those situations you can simplify the test query thus also making them quicker to execute.
And while we are at it, you might also consider changing your WHERE statement. Using functions in the predicate makes them not SARGable and prevent using indexes in those cases. A better approach in this example would be to check
as the optimizer will perform conversion to something along the lines of
which supports the use of indexes. So this would be Index Scan (using left()) versus Index Seek (using LIKE).