SQL Server 2008 R2 – How to Optimize tSQL Stored Procedure with CTE and CASE Statement

sql-server-2008-r2stored-procedurest-sql

I have a user-defined Stored Procedure
When calling without passing explicit values, I want to just pass all Locations (nvarchar(50)), which is the primary key'd field of a table: Monitor_Locations (with ~850 entries)

One part of the SP is defined as follows (clipped).

ALTER PROCEDURE [dbo].[dev_Tech@Locs2b] ( --CREATE or ALTER
     @Locations as nvarchar(MAX) = NULL -- = 'GG1,BenBr14,BenBr00,YB_ToeDrain_Base'
    ,@rangeStart as DateTime = '1970-01-01'
    ,@rangeEnd as DateTime = '2099-12-31'
) AS BEGIN
SET NOCOUNT ON; --otherwise concrete5 chokes for multi-table returns.
DECLARE @loclist as TABLE (
    Location nvarchar(50) PRIMARY KEY
)
IF @Locations is NULL
    INSERT INTO @loclist(Location)
        SELECT Location from Monitor_Locations order by Location
ELSE --irrelevant for this question
    INSERT INTO @loclist(Location)
        SELECT
            ML.Location
        FROM Monitor_Locations as ML join
            tvf_splitstring(@Locations) as ss ON 
                ML.Location=ss.Item OR 
                ML.Location like ss.Item+'[_]%'
        ORDER BY ML.Location;
With Deploys as (
    SELECT
        D.Location,
        MIN(D.Start) as Start,
        MAX(D.[Stop]) as Stop
    FROM
        Deployments as D 
    WHERE 
        D.Stop is not NULL
)

…do a bunch of other stuff…

in order to improve the speed of the stored procedure when a restricted list of sites is sent into the SP, I wanted to replacing the WHERE clause with

WHERE 
    CASE
        WHEN D.Stop IS NULL THEN 0
        WHEN @Locations IS NULL THEN 1 -- full list, so binding to another list doesn't do us any good.
        WHEN EXISTS (SELECT 1 from (SELECT Location from @loclist as l where l.Location=D.Location) as ll) THEN 1 --ELSE NULL which is not 1
    END=1

but where the SP once took 6-8 seconds to execute, now it takes 2.5 mins (for calling without a restrictive list). I thought it would take roughly the same amount of time each way for the full list, as the second clause of the CASE should be fired very quickly and the third clause should never be examined.

So what's going on? This code:

WHERE 
    CASE
        WHEN D.Stop IS NULL THEN NULL
        WHEN @Locations IS NULL THEN 1 -- full list, so binding to another list doesn't do us any good.
        WHEN EXISTS (SELECT 1 from (SELECT Location from @loclist as l where l.Location=D.Location) as ll) THEN 1 --else null
    END is not null

Takes a ~10 minute run time with this plan:

CaseWhere

To contrast here's the WHERE D.Stop is not NULL plan (6s):
StandardWhere

At one point, this SP was taking 1 second with this version, but by changing the SP and then back again, it took 6 seconds again. As mentioned in the answers, this was likely due to parameter sniffing.

Run times

My goal time for execution is less than 2 seconds, as this will be a frequently executed SP on a web application that uses this to populate and restrict other user selections. Basically, I don't want this to be a noticeable bottleneck. Initial run times were on the order of 3 minutes, but after adding or altering some indexes, this dropped to the 6-8 second range.

On Monday (2016-08-29), prior to major alterations
Simple WHERE without input parameters: 5s
Simple WHERE with rangeStart and rangeEnd: 4s
Simple WHERE with @Locations set to a 7 element CSV variable
CASEd WHERE: up to 10 minutes

After re-working the CLR function (see my answer below)
Tuesday (2016-08-30)
Simple or CASEd WHERE without input parameters OR
Simple or CASEd WHERE with rangeStart and rangeEnd: 3s
Simple or CASEd WHERE with 7 element @Locations: 0-1s

After migrating table variable @loclist to temp table #loclist
All tested WHEREs/parameters: 0-1s

Best Answer

Two big performance problems:

  1. Your CSV splitter function is a major performance killer. Swap it out for Jeff Moden's DelimitedSplit8k function. You can read all about it here. Or better yet, swap it out for the CLR function or a table-valued parameter if you are on 2008+. Check out Aaron Bertrand's performance tests for the various CSV splitter functions. CLR is the winner overall.
  2. The table variable can be a big performance killer even if it only has 1 row. Switch it to a temporary table and add a clustered index to it.

In your now provided execution plans, the function shows a 0% cost, but that is not the case. The function cost is higher, but you won't see the actual cost in the execution plan unless it's an inline table valued function.

Unfortunately, although I once had a runtime of 1 sec, by changing the SP and then back again, it takes 6 seconds again.

That smells of parameter sniffing.