Sql-server – Indexing to support advanced search queries

index-tuningsql server

I have a Person table 2.5 million rows, typical demographic fields. Need to support an advanced search screen that allows up to 10 search params (f/l name, phone 1 through phone 6, email, nickname, etc.) all varchar fields. The 3 name fields are a real puzzle – all need to support partial string matches, front and back LIKE (% @LastName %) or something like that. They can supply just one param, or all ten, or any combination.

So, forgetting about sniffing for a minute, I just want to design the best possible index(es) to support these searched. Tried a number of things, what is working best is an index with all 10 fields (see below), leading with the one most likely to be filled in. It's "just ok" performance wise, and I'm wondering if I'm missing something.

CREATE NONCLUSTERED INDEX [IN_Person_Search] ON [dbo].[Person] (FnameLname, LnameFname, NickName, Phone1, Phone2, Phone3, Phone4, Phone5, Phone6, Email)

I realize the answer might be wrapped up in the SELECT statement, so a quick word on that – I'm using the "Email = @Email OR @Email IS NULL" method to deal with empty params.

Best Answer

You're trying to write one magical query that will handle all possible combinations of search criteria efficiently. I call this the kitchen sink but it's simply not achievable to do this for 10 parameters with one query and one index. It's just not possible.

The bad news: You're not going to be able to create an index (or even 100 indexes - see here to get an idea of how many combinations, not permutations, you would have to cover) that can satisfy all possible combinations of parameters. And if you write a query with a bunch of OR conditionals, the plan that gets compiled will be based on the first combination of parameters that are passed in, which may not be good for any subsequent query, never mind all subsequent queries.

The good news: You can handle this scenario in a way that lets you prioritize.

Here's how I do it - I create a dynamic SQL version of the query for each possible combination as I see them. I append only the clauses necessary to search based on the set of parameters provided. This way I don't use queries optimized for @p1 when @p1 isn't even passed in. I also set up some qualitative criteria to determine whether I need to recompile for parameter skew. When the parameter represents a unique index key, there is no reason to recompile, but if different values could produce vastly different plans (based on my knowledge of the data, selectivity, usage patterns, etc.), then it might be useful to do so.

CREATE PROCEDURE dbo.SinkMe
  @p1 int,
  @p2 varchar(32),
  @p3 datetime
AS
BEGIN
  SET NOCOUNT ON;

  DECLARE @sql nvarchar(max) = N'SELECT <cols> FROM dbo.table WHERE 1 = 1';

  SELECT @sql += CASE WHEN @p1 IS NOT NULL THEN N' AND c1 = @p1'    ELSE N'' END
              +  CASE WHEN @p2 IS NOT NULL THEN N' AND c2 LIKE @p2' ELSE N'' END
              +  CASE WHEN @p3 IS NOT NULL THEN N' AND c3 >= @p3'   ELSE N'' END;

  IF @p1 IS NULL
  BEGIN
    SET @sql += N' OPTION (RECOMPILE)';
  END

  SET @sql += N';';

  EXEC sys.sp_executesql @sql, 
                         N'@p1 int, @p2 varchar(32), @p3 datetime',
                         @p1, @p2, @p3;
END

You can periodically review the plan cache to see which combinations of parameters get used most often and/or lead to the worst-performing queries, and create additional supporting indexes based on those queries (rather than trying to create all potentially helpful indexes up front and maintaining them all even though some may rarely/never get used).

Related Solutions

Sql-server – Advanced search engine or server for relational database

Stack Overflow search is now 81% less crappy, thanks to Lucene. Whether this, or any of the other document oriented search servers, would be a good fit is difficult to tell from the detail you've posted.

Lucene/Solr and their ilk are built to search textual data. It sounds like you should be considering a graph database. High Scalability has an interesting article on Neo4j which discusses some of the concepts well.

Sql-server – How to improve this queries performance by indexing properly

Try this:

CHILDEVENTPROPERITIES - Make a nonclustered index:
- Key columns: Property, PropertyValue
- Included column: EventId
CHILDEVENTS - Make a nonclustered index:
- Key columns: AddDate, Type
- Included column: ChildID, EventID
CHILDINFORMATION - Make a nonclustered index:
- Key columns: ChildID

In cases 1 and 2, check what is the optimal key column order (optimal leading column). Also evaluate if you need to keep existing nonclustered indexes, depending on your workload. This specific query doesn't need them.

One additional thing to try is breaking the query into a union, to get rid of the OR in the WHERE clause - see how it affects the plan.

SELECT
   CHILDID
   ,SUM(PREVIOUSWEEKSACCIDENTS) AS PREVIOUSWEEKSACCIDENTS
FROM
   (SELECT
      CHINFO.CHILDID
     ,COUNT(*) AS PREVIOUSWEEKSACCIDENTS
   FROM
      BKA.CHILDINFORMATION CHINFO
      JOIN BKA.CHILDEVENTS CHE
         ON CHE.CHILDID = CHINFO.CHILDID
      JOIN BKA.CHILDEVENTPROPERITIES CHEP
         ON CHE.EVENTID = CHEP.EVENTID
   WHERE
      CHE.TYPE = 'ACCIDENT'
      AND CHE.ADDDATE BETWEEN @lastweekstart AND @lastweekend
   GROUP BY
      CHINFO.CHILDID

   UNION ALL

   SELECT DISTINCT
      CHINFO.CHILDID
     ,COUNT(*) AS PREVIOUSWEEKSACCIDENTS
   FROM
      BKA.CHILDINFORMATION CHINFO
      JOIN BKA.CHILDEVENTS CHE
         ON CHE.CHILDID = CHINFO.CHILDID
      JOIN BKA.CHILDEVENTPROPERITIES CHEP
         ON CHE.EVENTID = CHEP.EVENTID
   WHERE
      CHE.TYPE = 'POTTYBREAK'
      AND CHEP.PROPERTY = 'SUCCESS'
      AND CHEP.PROPERTYVALUE = 'FALSE'
      AND CHE.ADDDATE BETWEEN @lastweekstart AND @lastweekend
   GROUP BY
      CHINFO.CHILDID) AS t
GROUP BY
   CHILDID

Best Answer

Related Solutions

Sql-server – Advanced search engine or server for relational database

Sql-server – How to improve this queries performance by indexing properly

Related Question