Sql-server – Indexing to support advanced search queries

index-tuningsql server

I have a Person table 2.5 million rows, typical demographic fields. Need to support an advanced search screen that allows up to 10 search params (f/l name, phone 1 through phone 6, email, nickname, etc.) all varchar fields. The 3 name fields are a real puzzle – all need to support partial string matches, front and back LIKE (% @LastName %) or something like that. They can supply just one param, or all ten, or any combination.

So, forgetting about sniffing for a minute, I just want to design the best possible index(es) to support these searched. Tried a number of things, what is working best is an index with all 10 fields (see below), leading with the one most likely to be filled in. It's "just ok" performance wise, and I'm wondering if I'm missing something.

CREATE NONCLUSTERED INDEX [IN_Person_Search] ON [dbo].[Person] (FnameLname, LnameFname, NickName, Phone1, Phone2, Phone3, Phone4, Phone5, Phone6, Email)

I realize the answer might be wrapped up in the SELECT statement, so a quick word on that – I'm using the "Email = @Email OR @Email IS NULL" method to deal with empty params.

Best Answer

You're trying to write one magical query that will handle all possible combinations of search criteria efficiently. I call this the kitchen sink but it's simply not achievable to do this for 10 parameters with one query and one index. It's just not possible.

The bad news: You're not going to be able to create an index (or even 100 indexes - see here to get an idea of how many combinations, not permutations, you would have to cover) that can satisfy all possible combinations of parameters. And if you write a query with a bunch of OR conditionals, the plan that gets compiled will be based on the first combination of parameters that are passed in, which may not be good for any subsequent query, never mind all subsequent queries.

The good news: You can handle this scenario in a way that lets you prioritize.

Here's how I do it - I create a dynamic SQL version of the query for each possible combination as I see them. I append only the clauses necessary to search based on the set of parameters provided. This way I don't use queries optimized for @p1 when @p1 isn't even passed in. I also set up some qualitative criteria to determine whether I need to recompile for parameter skew. When the parameter represents a unique index key, there is no reason to recompile, but if different values could produce vastly different plans (based on my knowledge of the data, selectivity, usage patterns, etc.), then it might be useful to do so.

CREATE PROCEDURE dbo.SinkMe
  @p1 int,
  @p2 varchar(32),
  @p3 datetime
AS
BEGIN
  SET NOCOUNT ON;

  DECLARE @sql nvarchar(max) = N'SELECT <cols> FROM dbo.table WHERE 1 = 1';

  SELECT @sql += CASE WHEN @p1 IS NOT NULL THEN N' AND c1 = @p1'    ELSE N'' END
              +  CASE WHEN @p2 IS NOT NULL THEN N' AND c2 LIKE @p2' ELSE N'' END
              +  CASE WHEN @p3 IS NOT NULL THEN N' AND c3 >= @p3'   ELSE N'' END;

  IF @p1 IS NULL
  BEGIN
    SET @sql += N' OPTION (RECOMPILE)';
  END

  SET @sql += N';';

  EXEC sys.sp_executesql @sql, 
                         N'@p1 int, @p2 varchar(32), @p3 datetime',
                         @p1, @p2, @p3;
END

You can periodically review the plan cache to see which combinations of parameters get used most often and/or lead to the worst-performing queries, and create additional supporting indexes based on those queries (rather than trying to create all potentially helpful indexes up front and maintaining them all even though some may rarely/never get used).