SQL Server – Using ‘If Exists’ to Prevent Repeating Code

performanceperformance-tuningsql servert-sql

We receive an order list, and validate it starts with certain prefixes, example if OrderNumber does not start with ABC, than display order list to user as error. We use exists, to search for first existence, don't want to consume much performance time. If none exist, then we can start conducting other task in code.

Is there anyway in sql to get rid of the repetitive code? We have Many checks like this on multiple tables, trying to make code more efficient.

if exists
(
    select *
    from dbo.OrdersImport
    where left(OrderNumber,3) <> 'ABC'
)
begin
    select OrderNumber as OrderErrorList
    from dbo.OrdersImport
    where left(OrderNumber,3) <> 'ABC'
end
else

Best Answer

Unfortunately I'm afraid by trying to prevent repeating code you might end up in writing more code and making the query less readable. But I believe this can be advantageous for you - take into account situations where in the test you would only have to search in a one or two tables, where in order to retrieve all the details you will have to join to several more. In those situations you can simplify the test query thus also making them quicker to execute.

And while we are at it, you might also consider changing your WHERE statement. Using functions in the predicate makes them not SARGable and prevent using indexes in those cases. A better approach in this example would be to check

WHERE OrderNumber LIKE 'ABC%'

as the optimizer will perform conversion to something along the lines of

OrderNumber >= 'ABC' AND OrderNumber < 'ABD'

which supports the use of indexes. So this would be Index Scan (using left()) versus Index Seek (using LIKE).

Related Solutions

SQL Server – Getting Unavailable Dates for Renting a Product with Stocks

I would use a Calendar table. This table simply has a list of dates for several decades.

CREATE TABLE [dbo].[Calendar](
    [dt] [date] NOT NULL,
 CONSTRAINT [PK_Calendar] PRIMARY KEY CLUSTERED 
(
    [dt] ASC
))

In my system it has few extra columns, such as [IsLastDayOfMonth], [IsLastDayOfQuarter], which are useful in some reports, but in your case you need just the date column. There are many ways to populate such table.

For example, 100K rows (~270 years) from 1900-01-01:

INSERT INTO dbo.Calendar (dt)
SELECT TOP (100000) 
    DATEADD(day, ROW_NUMBER() OVER (ORDER BY s1.[object_id])-1, '19000101') AS dt
FROM sys.all_objects AS s1 CROSS JOIN sys.all_objects AS s2
OPTION (MAXDOP 1);

Sample data

DECLARE @Stocks TABLE (
    StockId int
    , ProductId int
    , AvailableFromDate date);

INSERT INTO @Stocks(StockId, ProductId, AvailableFromDate) VALUES
(1, 1, '2016-01-01'),
(2, 1, '2016-01-01');

DECLARE @RentStockOrders TABLE (
    RentStockOrderId int
    , StockId int
    , BeginRentDate date
    , EndRentDate date);

INSERT INTO @RentStockOrders (RentStockOrderId, StockId, BeginRentDate, EndRentDate) VALUES
(1, 1, '2016-01-15', '2016-02-14'),
(2, 2, '2016-01-30', '2016-02-20'),
(3, 2, '2016-02-26', '2016-03-07'),
(4, 1, '2016-02-29', '2016-03-14');

Parameters

DECLARE @ParamProductID int = 1;
DECLARE @ParamDays int = 14;
DECLARE @ParamStartDate date = '2015-12-01';
DECLARE @ParamEndDate date = '2017-01-01';
-- these dates define some reasonable limit

First variant

It turns out that window functions accept only literal constant as a size of a window, not a variable. Alas. Still, I'll show this query, because it illustrates the approach and shows how simple it would be if SQL Server supported variables as the size of the window. It also gives the correct answer that we can use to verify the second variant.

WITH
CTE_AllDays
-- list all days and stock IDs between @ParamStartDate and @ParamEndDate
-- for each day and stock indicate whether it is available based on AvailableFromDate
AS
(
    SELECT
        S.StockId
        ,dbo.Calendar.dt
        ,CASE WHEN dbo.Calendar.dt >= S.AvailableFromDate 
            THEN 1 ELSE 0 END AS AvailableStockDay
        -- 1 - available
        -- 0 - not available
    FROM
        @Stocks AS S
        INNER JOIN dbo.Calendar ON
            dbo.Calendar.dt >= @ParamStartDate
            AND dbo.Calendar.dt <= @ParamEndDate
    WHERE
        S.ProductId = @ParamProductID
)
,CTE_BookedDays
-- list all booked (unavailable) days for each stock ID
AS
(
    SELECT
        S.StockId
        ,CA.dt
        ,0 AS AvailableStockDay
        -- 0 - not available
    FROM
        @RentStockOrders AS R
        INNER JOIN @Stocks AS S ON S.StockId = R.StockId
        CROSS APPLY
        (
            SELECT dbo.Calendar.dt
            FROM dbo.Calendar
            WHERE
                dbo.Calendar.dt >= @ParamStartDate
                AND dbo.Calendar.dt <= @ParamEndDate
                AND dbo.Calendar.dt >= R.BeginRentDate
                AND dbo.Calendar.dt <= R.EndRentDate
        ) AS CA
    WHERE
        S.ProductId = @ParamProductID
)
,CTE_Daily
-- combine individual availability flags
-- first: multiply to get the final availability for a stock ID and day
-- second: group further by day and SUM flags
AS
(
    SELECT
        CTE_AllDays.dt
        ,CASE WHEN 
            SUM(
            CTE_AllDays.AvailableStockDay * ISNULL(CTE_BookedDays.AvailableStockDay, 1)
            ) = 0 
        THEN 0 ELSE 1 END AS AvailableDay
        -- day is available, if any stock is available
        -- SUM=0 - not available
        -- SUM>0 - available
    FROM
        CTE_AllDays
        LEFT JOIN CTE_BookedDays ON
            CTE_BookedDays.StockId = CTE_AllDays.StockId AND
            CTE_BookedDays.dt = CTE_AllDays.dt
    GROUP BY
        CTE_AllDays.dt
)
,CTE_Sum
-- rolling sum of flags with 14 days window
AS
(
    SELECT
        dt
        ,SUM(CTE_Daily.AvailableDay) OVER (ORDER BY CTE_Daily.dt
            ROWS BETWEEN CURRENT ROW AND 13 FOLLOWING) AS AvailableConsecutive
    -- we can't put @ParamDays here instead of constant
    FROM CTE_Daily
)
-- If a rolling sum =  14, 
-- it means that all 14 consecutive days are available

-- If a rolling sum <> 14, 
-- it means that at least one of the 14 consecutive days is not available
SELECT dt
FROM CTE_Sum
WHERE AvailableConsecutive <> 14
-- we can put @ParamDays here instead of 14, but not above
ORDER BY dt;

Result

Note, I started the range of dates from 2015-12-01, before the general availability.

+------------+
|     dt     |
+------------+
| 2015-12-01 |
| 2015-12-02 |
| 2015-12-03 |
| 2015-12-04 |
| 2015-12-05 |
| 2015-12-06 |
| 2015-12-07 |
| 2015-12-08 |
| 2015-12-09 |
| 2015-12-10 |
| 2015-12-11 |
| 2015-12-12 |
| 2015-12-13 |
| 2015-12-14 |
| 2015-12-15 |
| 2015-12-16 |
| 2015-12-17 |
| 2015-12-18 |
| 2015-12-19 |
| 2015-12-20 |
| 2015-12-21 |
| 2015-12-22 |
| 2015-12-23 |
| 2015-12-24 |
| 2015-12-25 |
| 2015-12-26 |
| 2015-12-27 |
| 2015-12-28 |
| 2015-12-29 |
| 2015-12-30 |
| 2015-12-31 |
| 2016-01-17 |
| 2016-01-18 |
| 2016-01-19 |
| 2016-01-20 |
| 2016-01-21 |
| 2016-01-22 |
| 2016-01-23 |
| 2016-01-24 |
| 2016-01-25 |
| 2016-01-26 |
| 2016-01-27 |
| 2016-01-28 |
| 2016-01-29 |
| 2016-01-30 |
| 2016-01-31 |
| 2016-02-01 |
| 2016-02-02 |
| 2016-02-03 |
| 2016-02-04 |
| 2016-02-05 |
| 2016-02-06 |
| 2016-02-07 |
| 2016-02-08 |
| 2016-02-09 |
| 2016-02-10 |
| 2016-02-11 |
| 2016-02-12 |
| 2016-02-13 |
| 2016-02-14 |
| 2016-02-16 |
| 2016-02-17 |
| 2016-02-18 |
| 2016-02-19 |
| 2016-02-20 |
| 2016-02-21 |
| 2016-02-22 |
| 2016-02-23 |
| 2016-02-24 |
| 2016-02-25 |
| 2016-02-26 |
| 2016-02-27 |
| 2016-02-28 |
| 2016-02-29 |
| 2016-03-01 |
| 2016-03-02 |
| 2016-03-03 |
| 2016-03-04 |
| 2016-03-05 |
| 2016-03-06 |
| 2016-03-07 |
| 2016-12-20 |
| 2016-12-21 |
| 2016-12-22 |
| 2016-12-23 |
| 2016-12-24 |
| 2016-12-25 |
| 2016-12-26 |
| 2016-12-27 |
| 2016-12-28 |
| 2016-12-29 |
| 2016-12-30 |
| 2016-12-31 |
| 2017-01-01 |
+------------+

Second variant

The first part of the query until CTE_Daily is the same. Then, I'll use a gaps-and-islands approach to find the islands of available dates and calculate their sizes.

WITH
CTE_AllDays
-- list all days and stock IDs between @ParamStartDate and @ParamEndDate
-- for each day and stock indicate whether it is available based on AvailableFromDate
AS
(
    SELECT
        S.StockId
        ,dbo.Calendar.dt
        ,CASE WHEN dbo.Calendar.dt >= S.AvailableFromDate 
            THEN 1 ELSE 0 END AS AvailableStockDay
        -- 1 - available
        -- 0 - not available
    FROM
        @Stocks AS S
        INNER JOIN dbo.Calendar ON
            dbo.Calendar.dt >= @ParamStartDate
            AND dbo.Calendar.dt <= @ParamEndDate
    WHERE
        S.ProductId = @ParamProductID
)
,CTE_BookedDays
-- list all booked (unavailable) days for each stock ID
AS
(
    SELECT
        S.StockId
        ,CA.dt
        ,0 AS AvailableStockDay
        -- 0 - not available
    FROM
        @RentStockOrders AS R
        INNER JOIN @Stocks AS S ON S.StockId = R.StockId
        CROSS APPLY
        (
            SELECT dbo.Calendar.dt
            FROM dbo.Calendar
            WHERE
                dbo.Calendar.dt >= @ParamStartDate
                AND dbo.Calendar.dt <= @ParamEndDate
                AND dbo.Calendar.dt >= R.BeginRentDate
                AND dbo.Calendar.dt <= R.EndRentDate
        ) AS CA
    WHERE
        S.ProductId = @ParamProductID
)
,CTE_Daily
-- combine individual availability flags
-- first: multiply to get the final availability for a stock ID and day
-- second: group further by day and SUM flags
AS
(
    SELECT
        CTE_AllDays.dt
        ,CASE WHEN 
            SUM(
            CTE_AllDays.AvailableStockDay * ISNULL(CTE_BookedDays.AvailableStockDay, 1)
            ) = 0 
        THEN 0 ELSE 1 END AS AvailableDay
        -- day is available, if any stock is available
        -- SUM=0 - not available
        -- SUM>0 - available
    FROM
        CTE_AllDays
        LEFT JOIN CTE_BookedDays ON
            CTE_BookedDays.StockId = CTE_AllDays.StockId AND
            CTE_BookedDays.dt = CTE_AllDays.dt
    GROUP BY
        CTE_AllDays.dt
)
,CTE_RowNumbers
-- calculate two sets of row numbers to isolate consecutive rows with 0s and 1s 
-- (gaps and islands)
AS
(
    SELECT
        dt
        ,AvailableDay
        ,ROW_NUMBER() OVER (ORDER BY dt) AS rn1
        ,ROW_NUMBER() OVER (PARTITION BY AvailableDay ORDER BY dt) AS rn2
    FROM CTE_Daily
)
,CTE_Groups
-- each gaps and island will have the same GroupNumber
-- count the size of the Group
-- number the rows within each Group to find if @ParamDays rows fit into the Group
AS
(
    SELECT
        dt
        ,AvailableDay
        ,GroupNumber
        ,COUNT(*) OVER (PARTITION BY GroupNumber) AS GroupSize
        ,ROW_NUMBER() OVER (PARTITION BY GroupNumber ORDER BY dt) AS GroupRN
    FROM
        CTE_RowNumbers
        CROSS APPLY (SELECT rn1 - rn2 AS GroupNumber) AS CA
)
SELECT
    dt
    --,CASE WHEN AvailableDay = 1 AND GroupSize - GroupRN + 1 >= @ParamDays
    --THEN 1 ELSE 0 END AS AvailableConsecutive
FROM CTE_Groups
WHERE
    CASE WHEN AvailableDay = 1 AND GroupSize - GroupRN + 1 >= @ParamDays
    THEN 1 ELSE 0 END = 0
    -- AvailableConsecutive = 0 to list all unavailable days
ORDER BY dt;

The result is the same as in the first variant, but this variant uses parameter @ParamDays.

To understand how it works, run the query starting with the first CTE, examine the results, then add the next CTE, examine the results and so on.

SQL Server – IF EXISTS Taking Longer Than Embedded Select Statement

Any idea why the IF EXISTS would make it run so much longer and do so many more reads? I also changed the select statement to do SELECT TOP 1 [dlc].[id] and I killed it after 2 minutes.

As I explained in my answer to this related question:

How (and why) does TOP impact an execution plan?

Using EXISTS introduces a row goal, where the optimizer produces an execution plan aimed at locating the first row quickly. In doing this, it assumes that the data is uniformly distributed. For example, if statistics show there are 100 expected matches in 100,000 rows, it will assume it will have to read only 1,000 rows to find the first match.

This will result in longer than expected execution times if this assumption turns out to be faulty. For example, if SQL Server chooses an access method (e.g. unordered scan) that happens to locate the first matching value very late on in the search, it could result in an almost complete scan. On the other hand, if a matching row happens to be found amongst the first few rows, performance will be very good. This is the fundamental risk with row goals - inconsistent performance.

As a temporary fix I have changed it to do a count( * ) and assign that value to a variable

It is usually possible to reformulate the query such that a row goal is not assigned. Without the row goal, the query can still terminate when the first matching row is encountered (if written correctly), but the execution plan strategy is likely to be different (and hopefully, more effective). Obviously, count(*) will require reading all rows, so it is not a perfect alternative.

If you are running SQL Server 2008 R2 or later, you can also generally use documented and supported trace flag 4138 to get an execution plan without a row goal. This flag can also be specified using the supported hint OPTION (QUERYTRACEON 4138), though be aware it requires runtime sysadmin permission, unless used with a plan guide.

Unfortunately

None of the above is functional with an IF EXISTS conditional statement. It only applies to regular DML. It will work with the alternate SELECT TOP (1) formulation you tried. That may well be better than using COUNT(*), which has to count all qualified rows, as previously mentioned.

That said, there are any number of ways to express this requirement that will allow you to avoid or control the row goal, while terminating the search early. One last example:

DECLARE @Exists bit;

SELECT @Exists =
    CASE
        WHEN EXISTS
        (
            SELECT [dlc].[ID]
            FROM TableDLC [dlc]
            JOIN TableD [d]
            ON [d].[ID] = [dlc].[ID]
            JOIN TableC [c]
            ON [c].[ID] = [d].[ID2]
            WHERE [c].[Name] <> [dlc].[Name]
        )
        THEN CONVERT(bit, 1)
        ELSE CONVERT(bit, 0)
    END
OPTION (QUERYTRACEON 4138);

IF @Exists = 1
BEGIN
    ...
END;

Best Answer

Related Solutions

SQL Server – Getting Unavailable Dates for Renting a Product with Stocks

SQL Server – IF EXISTS Taking Longer Than Embedded Select Statement

Unfortunately

Related Question