SQL Server Index Optimization – Getting a Scan Instead of a Seek

indexoptimizationsql serversql-server-2012

I need to optimize a SELECT statement but SQL Server always does an index scan instead of a seek. This is the query which, of course, is in a stored procedure:

CREATE PROCEDURE dbo.something
  @Status INT = NULL,
  @IsUserGotAnActiveDirectoryUser BIT = NULL    
AS

    SELECT [IdNumber], [Code], [Status], [Sex], 
           [FirstName], [LastName], [Profession], 
           [BirthDate], [HireDate], [ActiveDirectoryUser]
    FROM Employee
    WHERE (@Status IS NULL OR [Status] = @Status)
    AND 
    (
      @IsUserGotAnActiveDirectoryUser IS NULL 
      OR 
      (
        @IsUserGotAnActiveDirectoryUser IS NOT NULL AND       
        (
          @IsUserGotAnActiveDirectoryUser = 1 AND ActiveDirectoryUser <> ''
        )
        OR
        (
          @IsUserGotAnActiveDirectoryUser = 0 AND ActiveDirectoryUser = ''
        )
      )
    )

And this is the index:

CREATE INDEX not_relevent ON dbo.Employee
(
    [Status] DESC,
    [ActiveDirectoryUser] ASC
)
INCLUDE (...all the other columns in the table...); 

The plan:

Plan picture

Why did SQL Server choose a scan? How can I fix it?

Column definitions:

[Status] int NOT NULL
[ActiveDirectoryUser] VARCHAR(50) NOT NULL

Status parameters can be:

NULL: all status,
1: Status= 1 (Active employees)
2: Status = 2 (Inactive employees)

IsUserGotAnActiveDirectoryUser can be:

NULL: All employees
0: ActiveDirectoryUser is empty for that employee
1: ActiveDirectoryUser  got a valid value (not null and not empty)

Best Answer

I don't think the scan is caused by a search for an empty string (and while you could add a filtered index for that case, it will only help very specific variations of the query). You are more likely a victim of parameter sniffing and a single plan not optimized for all of the various combinations of parameters (and parameter values) that you will be providing to this query.

I call this the "kitchen sink" procedure, because you are expecting one query to provide all the things, including the kitchen sink.

I have videos about my solution to this here and here as well as a blog post about it, but essentially, the best experience I have for such queries is to:

  • Build the statement dynamically - this will allow you to leave out clauses mentioning columns for which no parameters were supplied, and ensures that you will have a plan that is optimized precisely for the actual parameters that were passed with values.
  • Use OPTION (RECOMPILE) - this prevents specific parameter values from forcing the wrong type of plan, especially helpful when you have data skew, bad statistics, or when the first execution of a statement uses an atypical value that will lead to a different plan than later and more frequent executions.
  • Use the server option optimize for ad hoc workloads - this prevents query variations that are only used once from polluting your plan cache.

Enable optimize for ad hoc workloads:

EXEC sys.sp_configure 'show advanced options', 1;
GO
RECONFIGURE WITH OVERRIDE;
GO
EXEC sys.sp_configure 'optimize for ad hoc workloads', 1;
GO
RECONFIGURE WITH OVERRIDE;
GO
EXEC sys.sp_configure 'show advanced options', 0;
GO
RECONFIGURE WITH OVERRIDE;

Change your procedure:

ALTER PROCEDURE dbo.Whatever
  @Status INT = NULL,
  @IsUserGotAnActiveDirectoryUser BIT = NULL
AS
BEGIN 
  SET NOCOUNT ON;
  DECLARE @sql NVARCHAR(MAX) = N'SELECT [IdNumber], [Code], [Status], 
     [Sex], [FirstName], [LastName], [Profession],
     [BirthDate], [HireDate], [ActiveDirectoryUser]
   FROM dbo.Employee -- please, ALWAYS schema prefix
   WHERE 1 = 1';

   IF @Status IS NOT NULL
     SET @sql += N' AND ([Status]=@Status)'

   IF @IsUserGotAnActiveDirectoryUser = 1
     SET @sql += N' AND ActiveDirectoryUser <> ''''';
   IF @IsUserGotAnActiveDirectoryUser = 0
     SET @sql += N' AND ActiveDirectoryUser = ''''';

   SET @sql += N' OPTION (RECOMPILE);';

   EXEC sys.sp_executesql @sql, N'@Status INT, @Status;
END
GO

Once you have a workload based on that set of queries that you can monitor, you can analyze the executions and see which ones would most benefit from additional or different indexes - you can do this from a variety of angles, from simple "which combination of parameters are provided most often?" to "which individual queries have the longest runtimes?" We can't answer those questions based just on your code, we can only suggest that any index will only be helpful for a subset of all of the possible parameter combinations you're attempting to support. For example, if @Status is NULL, then no seek against that non-clustered index is possible. So for those cases where users don't care about status, you're going to get a scan, unless you have an index that caters to the other clauses (but such an index won't be useful either, given your current query logic - either empty string or not empty string is not exactly selective).

In this case, depending on the set of possible Status values and how distributed those values are, the OPTION (RECOMPILE) might not be necessary. But if you do have some values that will yield 100 rows and some values that will yield hundreds of thousands, you might want it there (even at the CPU cost, which should be marginal given the complexity of this query), so that you can get seeks in as many cases as possible. If the range of values is finite enough, you could even do something tricky with the dynamic SQL, where you say "I have this very selective value for @Status, so when that specific value is passed, make this slight alteration to the query text so that this is considered a different query and optimized for that param value."